Alternative definitions

Something that happens very often in lecture courses is that you are presented with a definition, and soon after it you are told that a certain property is equivalent to that definition. This equivalence means that in principle one could have chosen the property as the “definition” and the definition as an equivalent property. To put that differently, suppose you are developing a piece of theory and have some word you want to define. To pick an imaginary example, suppose you have a notion of a set being “abundant”. Suppose that a set is defined to be abundant if it has property P, and that property P is equivalent to property Q. There may well not be much to choose between the following pair of alternatives. On the one hand you can say, “Definition: A set is abundant if it has property P,” and follow that with, “Proposition: A set is abundant if and only if it has property Q,” while on the other you can say, “Definition: A set is abundant if it has property Q,” and follow that with, “Proposition: A set is abundant if and only if it has property P.”

That is a simple observation, and one that you would have been likely to make for yourself, or to have had pointed out to you at some stage. But it has a very important practical consequence, which I can sum up as a slogan.

• You should treat equivalent properties as alternative definitions.
• Rather than say in detail what I mean, I am going to discuss several examples. Some of them you will meet in Group Theory and Numbers and Sets. The others I will discuss in less detail, since you will not meet them until later. But they make good examples — perhaps you will find it useful to reread this post at some point in the future.

1. Bijections.

I have already touched on this example. A bijection is defined to be a function that is both an injection and a surjection. It may therefore seem obvious that if you are asked to prove that some function $f$ is a bijection, then you should set about proving first that it is an injection and then that it is a surjection. However, that is not always true. One of the basic results about bijections is this.

Proposition. A function $f:A\to B$ is a bijection if and only if it has an inverse.

Once we’ve proved that proposition, we are allowed to treat “has an inverse” as an alternative definition of “is a bijection”. Let me give a very simple instance of a proof where “has an inverse” is a more convenient definition to use. Suppose you are given two sets $A$ and $B$ and asked to prove the easy fact that the function $\phi:A\times B\to B\times A$ that takes $(a,b)$ to $(b,a)$ is a bijection. If you work directly from the injection-surjection definition, your proof will look something like this.

Proof 1. First let us show that $\phi$ is an injection. If $(b,a)=(b',a')$ then (by the main property of ordered pairs) it follows that $b=b'$ and $a=a'$, and hence that $(a,b)=(a',b')$ (again by the main property of ordered pairs).

Now let us show that $\phi$ is a surjection. Let $(b,a)\in B\times A$. Then $\phi(a,b)=(b,a)$.

Since $\phi$ is an injection and a surjection, it follows that it is a bijection, as was wanted. $\square$

Contrast that with the following proof.

Proof 2. Define $\psi:B\times A\to A\times B$ by $\psi(b,a)=(a,b)$. Then $\psi$ is an inverse for $\phi$. Therefore, $\phi$ is a bijection. $\square$

Moral: if you need to prove that a function is a bijection, consider whether you can just write down an inverse.

More general moral: if you are dealing with bijections, bear in mind that “has an inverse” can be treated as a useful alternative definition, even if “officially” it is an equivalent property.

There is one exception to the second moral. If you are asked to prove that a function is a bijection if and only if it has an inverse, then it is not legitimate to interpret “is a bijection” as “has an inverse” and then argue that the statement is a tautology.

This touches on a question that many beginning mathematicians have asked: “What am I allowed to assume?” Many cases of this question are covered by the following principle.

• If a definition is followed by some basic equivalent properties, then at some point you need to prove the equivalence. Once you have proved it, from that point on you can assume it. If you are doing an exercise, it should be clear whether you are being tested on the equivalence itself or on something that is “beyond” the equivalence.
• A slightly more subtle situation occurs with the following example. Suppose you are asked to prove that the function $e^x$ is a bijection from the real numbers to the positive real numbers. Following my advice, you might say, “$\log(x)$ is an inverse!” and leave it at that. That might be a legitimate argument, but only if your definition of $\log(x)$ is not “the inverse of $e^x$“. And if you define $\log(x)$ in a different way — for example as $\int_1^xt^{-1}dt$ — then it may take you quite a bit of work to prove that what you have defined really does invert the function $e^x$. (It can be done, however. You might like to see whether you can use that integral definition of $\log(x)$ to prove that $\log(xy)=\log(x)+\log(y)$, and then use that property to prove that $\log(x)$ inverts some exponential function, and finally a bit of calculus to prove that the derivative of that exponential function at 0 is 1. However, it is easier — though less illuminating — to prove that $e^x$ is an injection and a surjection. The former is true because $e^x$ is a strictly increasing function, and the latter is a consequence of the intermediate value theorem, which is discussed later in the post.)

2. Invertible matrices.

An $n\times n$ matrix $A$ is defined to be invertible if there is an $n\times n$ matrix $B$ such that $AB=BA=I_n$, where $I_n$ is the $n\times n$ identity matrix. In a typical linear algebra course, the following statements are all shown to be equivalent to the statement that $A$ is invertible. (I’ll take the matrix to have real-number entries.)

(i) The only solution to the equation $Ax=0$ is $x=0$. (Here, $0$ is denoting the column vector of height $n$ that consists entirely of zeros.)

(ii) For every column vector $b$ of height $n$ the equation $Ax=b$ has a solution.

(iii) The row-rank of $A$ is $n$.

(iv) The column-rank of $A$ is $n$.

(v) The determinant of $A$ is non-zero.

A matrix that is not invertible is called singular. If you want to decide whether a matrix is invertible or singular, you can choose any one of the properties (i)-(v) and see whether it holds. And if you know that a matrix has one of the five properties (i)-(v), then you are free to use any of the other properties. For example, if in the particular problem you are working on it happens to be easy to show that the determinant of $A$ is zero, you could slip over to (i) and say, “Let us choose $x\ne 0$ such that $Ax=0$.”

In this case, it is probably best to think of invertibility and singularity as clusters of related properties rather than single properties. At any rate, you shouldn’t be wedded to any particular property as “the main” one.

3. Highest common factors.

A factor of a positive integer $n$ is a positive integer $m$ such that $m|n$. A common factor of two positive integers $m$ and $n$ is a positive integer $d$ such that $d|m$ and $d|n$. The highest common factor of two positive integers $m$ and $n$ is … well … the biggest of all the common factors. The phrase “highest common factor” is so suggestive that the temptation to think of the definition I have just given as the primary one is extremely strong. And yet for many problems it is not at all the most convenient definition to use.

What other equivalent definition is there? Well, it won’t be presented to you as an equivalent definition: more likely, it will be presented as part of the proof of a very important fact about highest common factors, sometimes known as Bézout’s theorem. It is the following result.

Theorem. Let $x$ and $y$ be two positive integers, and let $d$ be the highest common factor of $x$ and $y$. Then there exist integers $h$ and $k$ such that $hx+ky=d$.

How is this number $d$ identified? It is the smallest positive integer that can be written in the form $hx+ky$. Once you have proved that the smallest positive integer $d$ that can be written in the form $hx+ky$ is indeed a factor of both $x$ and $y$, you immediately obtain as a consequence that every integer of the form $hx+ky$ is a multiple of $d$. You also immediately obtain as a consequence that every multiple of $d$ can be written in the form $hx+ky$.

That gives us two alternative definitions of the highest common factor of $x$ and $y$.

(i) The highest common factor of $x$ and $y$ is the smallest positive integer of the form $hx+ky$ such that $h$ and $k$ are integers.

(ii) The highest common factor of $x$ and $y$ is the positive integer $d$ such that the set of integers of the form $hx+ky$ coincides with the set of all multiples of $d$.

Now let’s see how we can put these alternative definitions to use. Suppose that we are asked to prove the following simple fact. I’ll use the standard notation $(m,n)$ to stand for the highest common factor of $m$ and $n$.

Exercise. Let $m$ and $n$ be positive integers. Suppose that $a|mn$ and that $(a,m)=1$. Prove that $a|n$.

I think the majority of first-year undergraduates look at this statement and think something like this: “Every prime factor of $a$ goes into $mn$, but it can’t go into $m$ because $(a,m)=1$, so it must go into $n$.” As it stands, that argument isn’t quite correct, because it ignores the case where a prime goes into $mn$ more than once. But it can be turned into an ugly proof if you want.

What is ugly about it? Two things. First, it uses the fundamental theorem of arithmetic (that every positive integer has a unique factorization into primes), when it could get away with a more basic tool, namely Bézout’s theorem. Secondly, writing out proofs of this kind properly involves writing expressions like $p_1^{a_1}\dots p_r^{a_r}$, which is a pain.

But if you agree with my aesthetic sensibilities (which probably a fair percentage of you won’t, but I hope you’ll eventually come to change your minds), then there remains the question of how exactly you use Bézout’s theorem to prove a statement like this. And here is where alternative definitions come into play. In this case, they don’t really give you anything that you don’t get from the general instruction to use Bézout’s theorem, but they do at least keep you focused on that goal.

The statement that $(a,m)=1$ becomes, if we are thinking about highest common factors in a Bézout’s-theorem kind of way, the statement that we can write 1 as $ha+km$. We’re also given that $a|mn$, and we might like to note that that implies that $a$ goes into any number of the form $ra+smn$. Since we want to show that $a|n$, it makes sense to try to write $n$ in the form $ra+smn$. We know that $1=ha+km$. It follows that $n=han+kmn$, which is indeed of the desired form.

That proof is perhaps too similar to the proof that if a prime $p$ divides a product $ab$ then $p|a$ or $p|b$. So let’s try another one. It’s the statement that if $(m,n)=1$ and two numbers x and y are congruent mod $m$ and congruent mod $n$, then they must be congruent mod $mn$. (If you haven’t met it yet, two numbers are congruent mod $m$ if they differ by a multiple of $m$.)

Just before I start, let me point out that the condition $(m,n)=1$ is necessary. For example, if $m=8$ and $n=12$, then 5 and 29 are congruent mod $m$ and congruent mod $n$ but not congruent mod $mn$.

I am thinking of the statement $(m,n)=1$ as telling me that I can write $hm+kn=1$ for some integers $h$ and $k$. That is what it means to me. (Note that it is more helpful because it tells me that something exists, namely $h$ and $k$, rather than that something doesn’t exist, namely a non-trivial common factor.) What else am I given? I’m given that $x-y=am$ and $x-y=bn$ for some $a$ and $b$. I now want to prove that $x-y=cmn$ for some integer $c$.

Let us write down the three equations we have and the equation we want.

$hm+kn=1$
$x-y=am$
$x-y=bn$
—————-
$x-y=cmn$

We don’t really care about $x$ and $y$, so a fairly obvious thing to do is write $am=bn$ and change our target to that of proving that $am$ and $bn$ are both multiples of $mn$. But if we want to prove that $am=cmn$, we are trying to prove that $a=cn$. That is, we want to prove that $n|a$ (or we could if we wanted prove that $m|b$).

So after a tiny bit of rearranging, the problem is this.

$hm+kn=1$
$am=bn$
————–
$n|a$

How do we use Bézout’s theorem to show that things divide other things? Well, in the previous proof we took a statement of the form $hx+ky=1$ and multiplied it by the number we wanted the other number to go into. Let’s do that here. We want $n$ to go into $a$, so let’s multiply the first equation by $a$ and hope for the best. We get this.

$hma+kna=a$

We’ll be done if we can show that $n$ goes into the left-hand side. Does it? Well, it certainly goes into the second term, but what about the first?

Hang on, we haven’t used all the information yet. What else did we know? Oh yes, that $am=bn$. That tells us that $hma=hbn$, so the first term is divisible by $n$ too.

Here’s an exercise that’s well worth trying. See if you can prove that the lowest common multiple of $m$ and $n$ is $mn/(m,n)$, and see if you can do it without the help of the fundamental theorem of arithmetic. More precisely, your task is to prove that $mn/(m,n)$ is a multiple of both $m$ and $n$, and that every number that’s a multiple of both $m$ and $n$ is a multiple of $mn/(m,n)$.

I think it is highly likely that many people reading this will be far from convinced that using Bézout’s theorem is better than simply writing out prime factorizations. Let me try to explain again why I prefer it (and am not alone in this view). It’s partly a distaste for using results that are “more advanced” than what you are trying to prove. An extreme example is the result that if $p$ is a prime that divides $ab$ then either $p|a$ or $p|b$. That result is (usually) used to prove the fundamental theorem or arithmetic, so you shouldn’t use the fundamental theorem of arithmetic to prove it. Since the other results are of a similar flavour to that one, it seems somehow more appropriate to use similar techniques.

Another potential advantage is that there are algebraic structures called rings that are somewhat like the integers (in that you can add and multiply elements together but you can’t necessarily divide them) in which the fundamental theorem of arithmetic does not hold. I’m not enough of an algebraist (or algebraic number theorist) to have examples at my fingertips, but I am almost sure that there are results that can be generalized from the integers to more general rings, but only if you use a Bézout-type proof. If anyone can supply me with an example, I will be very grateful. (The basic point, however, is that in some rings unique factorization doesn’t hold. In such rings, it is no longer clear that any two elements $x$ and $y$ must have a common factor that’s a multiple of all other common factors. But we can still look at the set of numbers of the form $ax+by$, which forms an object called an ideal. An example of a ring in which unique factorization fails is the set of all numbers of the form $m+n\sqrt{-5}$ where $m$ and $n$ are integers. In this ring the number 6 can be factorized as $2\times 3$ and also as $(1+\sqrt{-5})(1-\sqrt{-5})$. In this ring, 6 and $2(1+\sqrt{-5})$ have both 2 and $1+\sqrt{-5}$ as common factors, but there isn’t — I’m pretty sure — a common factor that’s a multiple of both 2 and $1+\sqrt{-5}$.)

4. Normal subgroups.

If you haven’t met normal subgroups yet, you will do soon. Here is the usual definition.

Definition. Let $G$ be a group and let $H$ be a subgroup of $G$. We say that $H$ is normal if $ghg^{-1}\in H$ whenever $g\in G$ and $h\in H$.

If you are given a subgroup $H$ and asked to prove that it is normal, then the obvious thing to do is look at an arbitrary group element of the form $ghg^{-1}$ and show that it belongs to $H$. But that is often not the simplest way and even more often not the way that gives you the greatest insight into why $H$ is normal.

What is the alternative definition here? Well, a very important fact about normal subgroups (I would call it the most important fact myself) is this.

Proposition. A subgroup $H$ of a group $G$ is normal if and only if there exists a group $K$ and a homomorphism $\phi:G\to K$ such that $H$ is the kernel of $\phi$.

If you don’t know what homomorphisms and kernels are, then there are two things you could do at this point. One is to come back and read this when you’ve been shown them in lectures. Another is to look them up on Wikipedia — they aren’t that difficult. If you decide to skip this section and never come back to it, then please at least go away with this message in mind: it is often better to think of normal subgroups as kernels of homomorphisms rather than as subgroups that satisfy the condition in the original definition.

Let me give a simple example of a problem where thinking of normal subgroups this way is helpful. Recall that the dihedral group $D_{2n}$ is the symmetry group of a regular $n$-gon. (Some people write $D_n$ for this but I think $D_{2n}$ is the Cambridge standard.) This splits up into $n$ rotations and $n$ reflections. The rotations form a subgroup and that subgroup is normal. Why?

I won’t bother to give the argument that works directly from the definition. Instead, I want to show that the group of rotations is the kernel of some homomorphism. That is, I want to find a group $K$ and a homomorphism $\phi:D_{2n}\to K$ such that $\phi(g)=e$ if and only if $g$ is a rotation.

There are many ways to think about this. Probably the best is to use the concept of a group action, but since you haven’t got to that yet I will avoid it. (However, later on I plan a post on group actions and I’ll come back to this point.) Instead, let me simply define a homomorphism from $D_{2n}$ to the 2-element group, which I’ll think of as $\{-1,1\}$ under multiplication, by sending all rotations to $1$ and all reflections to $-1$. Basically, a transformation maps to 1 if you don’t have to turn your polygon over and to -1 if you do. (If you want, you can think of this homomorphism as the determinant of the linear map that does the transformation.)

What this simple example illustrates is that normal subgroups are subgroups that “leave something alone”. In this case, what the rotations leave alone is the way up that the polygon is. If you can find a proof of this kind, then you “feel the normality” in a way that you don’t if all you’ve done some calculation that happens to show that $ghg^{-1}$ belongs to $H$. And sometimes it is much simpler. For example, the set of $n\times n$ real matrices of determinant 1 is a normal subgroup of the set of all non-singular $n\times n$ real matrices. I have seen supervisees prove this directly using the multiplicative property of the determinant (that det$(AB)$=det$(A)$det$(B)$) without noticing that that very same property shows that the determinant is a homomorphism to the non-zero real numbers and that the subgroup in question is the kernel of that homomorphism.

Another nice example is the subgroup of the alternating group $A_4$ that consists of the identity and the three transposition pairs $(12)(34), (13)(24)$ and $(14)(23)$. It isn’t too hard to check that these form a subgroup, but why is it normal?

One answer is that conjugating a permutation always gives you a permutation of the same cycle type. Since we have included all transposition pairs, this subgroup must be normal. But that answer doesn’t really give me any feel for what is special about the subgroup. Here is another argument. We can identify $A_4$ with the group of rotations of a regular tetrahedron. (For example, if we label the places where a vertex can go by the numbers 1, 2, 3 and 4, then the 3-cycle $(123)$ is the 120-degree rotation that fixes the vertex in place 4 and sends the vertex in place 1 to the vertex in place 2, the vertex in place 2 to the vertex in place 3, and the vertex in place 3 to the vertex in place 1.)

Now for each pair of opposite edges of the tetrahedron, we can draw a line joining the two midpoints. This gives us three lines that go through the centre. If we label the places where these lines can be with the numbers 1, 2 and 3, then with any rotation of the tetrahedron we can look at what it does to the lines in those places and define a corresponding permutation of the set $\{1,2,3\}$. If, for instance, it sends the line in place 1 to the line in place 2 and vice versa then we associate with it the permutation $(12)$.

It is not hard to check that this gives us a homomorphism from $A_4$ to $S_3$. The kernel of this homomorphism is the set of permutations in $A_4$ that correspond to rotations that send each of the three lines to itself (but possibly rotating it through 180 degrees about the centre of the tetrahedron). And if you think about it a bit, you will see that the rotations that do that are the identity and the ones that take one of those three lines and rotate about it by 180 degrees. And those three rotations correspond to transposition pairs.

This second argument is longer and — at least the way I have phrased it (trying to avoid talking about group actions) — harder to understand. But it explains the normality of that subgroup in a way that means something and that can be visualized, as opposed to the other argument, which just does a calculation that mysteriously works.

5. Continuous functions.

You don’t meet continuous functions until next term. However, I think that if you read this section and skim over what you don’t understand, you will still get the point of what I am saying. And when you have come across continuous functions, then you may feel like coming back and having another look at this section.

The basic definition of a continuous function from $\mathbb{R}$ to $\mathbb{R}$ is this.

• A function $f:\mathbb{R}\to\mathbb{R}$ is continuous if for every $\epsilon>0$ and every $x\in\mathbb{R}$ there exists $\delta>0$ such that for every $y\in\mathbb{R}$, if $|x-y|<\delta$ then $|f(x)-f(y)|<\epsilon$.
• Fairly soon after that definition has been presented, it is customary to present the following result.

• A function $f:\mathbb{R}\to\mathbb{R}$ is continuous if and only if for every sequence $(a_n)$ that converges to a limit $a$ the sequence $(f(a_n))$ converges to $f(a)$.
• Since that is an equivalence, it can be used as an alternative definition. What are the signs that it is a more suitable definition? A rather simple and obvious one is that the proof so far should already have mentioned a convergent sequence, or, slightly less obviously, that you might like to bring in a convergent sequence later.

For example, there is a very useful tool in real analysis called the Bolzano-Weierstrass theorem. It says, “Every sequence in a closed bounded interval has a convergent subsequence.” If you’re planning to use that theorem, or have already used it, then the sequences definition of continuity is likely to be more convenient than the original definition.

Here is a second example. A well-known theorem of real analysis called the intermediate value theorem says that if $f$ is a continuous function and $f(a)<0$ and $f(b)>0$ then there is some $c$ between $a$ and $b$ such that $f(c)=0$. One proof of this result starts like this. We’ll define two sequences $a_0,a_1,a_2,\dots$ and $b_0,b_1,b_2,\dots$ as follows. [Already we have sequences, so already we should be thinking about using the sequence definition of continuity.] We start with $a_0=a$ and $b_0=b$. If $f((a_0+b_0)/2)\leq 0$ then define that to be $a_1$ and let $b_1=b_0$, while if $f((a_0+b_0)/2)>0$ then define that to be $b_1$ and let $a_1=a_0$. Continue this process, each time replacing one of $a_n$ and $b_n$ by the average and leaving the other one unchanged, and doing it in such a way that at each stage $f(a_n)\leq 0$ and $f(b_n)>0$.

A basic axiom for the real numbers tells us that $a_n$ and $b_n$ both converge, and simple results can be used to show that they converge to the same limit (something that might seem obvious, but it needs a proof). That limit is our $c$, and it remains to prove that $f(c)=0$.

It is here that the sequence definition of continuity comes into play. Since $a_n$ converges to $c$, we know that $f(a_n)$ converges to $f(c)$. Since $f(a_n)$ is always non-positive, so is $f(c)$. (This is another simple principle from real analysis.) Similarly, $f(b_n)$ is always non-negative, which implies that $f(c)$ is non-negative. The only number that is non-negative and non-positive is 0, so we are done.

It is perfectly possible to show that $f(c)=0$ using the original definition of continuity, but the proof is longer and repeats some of the steps that are used to prove that the sequences definition follows from the original definition. Once one has made the effort to prove the sequences definition, one might as well use it.

The sequences definition is by no means the end of the story. For example, another basic result about continuous functions (that applies in a much more general context) is this.

• A function $f:\mathbb{R}\to\mathbb{R}$ is continuous if and only if $f^{-1}(A)$ is open for every open set $A$.
• And that has an equivalent formulation that is sometimes more convenient to use.

• A function $f:\mathbb{R}\to\mathbb{R}$ is continuous if and only if $f^{-1}(A)$ is closed for every closed set $A$.
• And sometimes there are alternative definitions that work in particular contexts. For example, there is a mathematical concept known as a normed space. By far the most useful definition of continuity in the theory of normed spaces is this — but it applies only to certain sorts of functions.

• Let $X$ and $Y$ be normed spaces. A linear map $T:X\to Y$ is continuous if and only if there exists a constant $C$ such that $\|Tx\|\leq C\|x\|$ for every $x\in X$.
• Again, it doesn’t matter if you haven’t the faintest idea what that means. The point is that it looks different from the usual definition, but it is equivalent to it in this particular context and is much easier to use.

6. Differentiable functions.

I am not going to give a list of alternative definitions of differentiability. Instead, this section is here to give me an excuse to direct you towards a famous essay of William Thurston that includes, near the beginning, a discussion of the numerous ways that mathematicians have of thinking about differentiation.

6 Responses to “Alternative definitions”

1. David Roberts Says:

A non-UFD: Z[\sqrt{-5}], in which 6 = 2 x 3 = (1 + i\sqrt{5}) x (1 – \sqrt{5})

see

http://en.wikipedia.org/wiki/Unique_factorisation_domain#Non-examples

for more.

• Christian Says:

On the other hand, Bezout’s Lemma does not hold in that domain either; an example of a ring in which unique factorization fails but Bezout’s Lemma is true seems to be the ring of all algebraic integers. (I.e. all algebraic numbers solving a monic polynomial with integer coefficients)

2. Chris Purcell Says:

“the set of all numbers of the form a + b sqrt (-5)”

I find this _much_ easier to parse with the irrationality spelled out explicitly with an “i”. I literally missed the “-” five times, twice even after reading the more (to me) natural form on the Wikipedia page and thinking you’d made a mistake.

3. Maths lover Says:

Wow, this is too advanced for me. Great entry. I am currently building up my maths to this level and interest

4. Groups and Group Actions: Lecture 2 | Theorem of the week Says:

[…] Gowers (coincidentally another Fields medallist) on what definitions are.  I also recommend his blog post about alternative definitions, which links nicely with my comment in the lecture today when we […]

5. Groups and Group Actions: Lecture 3 | Theorem of the week Says:

[…] you might enjoy reading the musings of Tim Gowers on what definitions are.  I also recommend his blog post about alternative definitions, which links nicely with my comment in the lecture today when we […]