Definitions

By now you will have seen several definitions in lectures. Many of them will be written in the form

Definition. A blah is …

That is, the definition is displayed and the word being defined is in italics (or underlined if somebody is writing by hand). Sometimes, one doesn’t bother with the display, and simply says, during a discussion, “We define a blah to be …”

What is likely to have been emphasized less is that there are several different kinds of definition. In this post I’d like to enumerate some of them and give examples. It’s very much worth being aware, each time you meet a definition, what kind it is.

Before I start, I should make clear that there is some overlap between some of the categories of definition below.

1. Mere abbreviations.

Some definitions are little more than convenient abbreviations. For instance, it is annoying to keep writing $\{x:f(x)\in A\}$ so instead we write $f^{-1}(A)$. And having decided that the ratio of the circumference of a circle to its diameter is important (or not, as the case may be), we prefer to write $\pi$ rather than something like “half the circumference of a unit circle”.

A slightly different example (because what is being defined is an adjective) is saying “even” instead of “divisible by 2” and saying “odd” instead of “not divisible by 2”.

I don’t find it all that easy to think of examples of “mere” abbreviations, because almost all definitions do something more. “Equivalence relation” perhaps counts, since “$R$ is an equivalence relation” is short(ish) for “$R$ is reflexive, symmetric and transitive”.

However, that example shows that even mere abbreviations aren’t completely “mere”, since the fact that one bothers to come up with the abbreviation is a signal that the concept is worth naming. (This is similar to real-life examples such as UK, CIA, DPMMS, “bike”, “mobile”, etc.)

2. Definitions that replace entire sentences by single words or short phrases.

A positive integer $n$ not equal to 1 is prime if its only factors are 1 and $n$. Why do I think of this as a slightly different kind of definition from the “mere” abbreviations?

Consider the following two sentences.

• Every even number apart from 2 is a sum of two primes.
• Every number divisible by 2 apart from 2 is a sum of two primes.
• Although I had to put “divisible by 2” after “number”, basically all I did was replace “even” by “divisible by 2”. That is the sense in which I am calling “even” a “mere” abbreviation of “divisible by 2”.

What if I also wanted to do without the word “primes”? I would have to write something much more convoluted like this.

• Every number divisible by 2 apart from 2 is a sum of two positive integers, each of which is not equal to 1 and has no factors apart from 1 and itself.
• Here are a few more definitions of a similar kind.

• A subset $A$ of $\mathbb{R}$ is bounded if there exists $C$ such that $a\leq C$ for every $a\in A$.
• A sequence $(a_n)$ of real numbers is convergent if there exists $a$ such that for every $\epsilon>0$ there exists $N\in\mathbb{N}$ such that for every $n\geq N$, $|a_n-a|<\epsilon$.
• A function $f:A\to B$ is an injection if $x=y$ whenever $f(x)=f(y)$.
• A subgroup $H$ of a group $G$ is normal if $ghg^{-1}\in H$ whenever $g\in G$ and $h\in H$.
• If I want to replace the word being defined by what it means, I don’t just stick some slightly longer phrase where the word was: I end up rewriting the whole sentence. For instance, if I want to explain what it means to say that every right coset of a normal subgroup is equal to a left coset, I have to say something like, “Let $H$ be a subgroup of $G$ and suppose that $ghg^{-1}\in H$ whenever $g\in G$ and $h\in H$. Then every left coset of $H$ is equal to a right coset of $H$.”

3. Strange new definitions of concepts you thought were already defined.

A few concepts that you may have seen “defined” in this sense are positive integers, integers, rational numbers, real numbers, complex numbers (not to mention how to add and multiply all these kinds of numbers), ordered pairs, functions, relations, binary operations, and sequences.

Let me list how those are defined. (Apologies in advance if I get any of them wrong.)

• A positive integer is a non-empty finite set $n$ with the following two properties: its elements are totally ordered by the relation $\in$, and every element of $n$ is also a subset of $n$.
• Define an equivalence relation $\sim$ on ordered pairs of positive integers by $(a,b)\sim(c,d)$ if $a+d=b+c$. An integer is an equivalence class of $\sim$.
• Define an equivalence relation $\sim$ on ordered pairs $(p,q)$ of integers with $q\ne 0$ by $(p,q)\sim(r,s)$ if $ps=qr$. A rational number is an equivalence class of $\sim$.
• Define an equivalence relation $\sim$ on Cauchy sequences of rationals by saying that $(a_n)\sim(b_m)$ if the sequence $a_1,b_1,a_2,b_2,\dots$ is Cauchy. A real number is an equivalence class of $\sim$.
• A complex number is an ordered pair of real numbers.
• The definitions so far are of little value until one defines algebraic operations. I won’t give all those, but here’s one.

• Given two complex numbers $(a,b)$ and $(c,d)$ we define their product $(a,b)(c,d)$ to be $(ac-bd,ad+bc)$.
• Let $x$ and $y$ be two sets. The ordered pair $(x,y)$ is the set $\{x,\{x,y\}\}$.
• A function from a set $A$ to a set $B$ is a subset $F\subset A\times B$ with the property that for every $x\in A$ there is exactly one $y\in B$ such that $(x,y)\in F$.
• A relation on a set $A$ is a subset of $A\times A$.
• A binary operation on a set $A$ is a function from $A\times A$ to $A$.
• A sequence of real numbers is a function from $\mathbb{N}$ to $\mathbb{R}$.
• It is customary to present definitions of this kind as though they were getting to the essence of the concept being defined. Based on examples like “is less than” or “is a factor of” or “is congruent mod 7 to”, you might have thought that a relation on a set $A$ was a potential relationship between pairs of elements of $A$ that holds for some pairs and not for others, but actually, you are told, it’s just a subset of $A\times A$.

When you are presented with one of these “but actually” definitions, you should go through the following process.

(i) Understand how your intuitive understanding of the concept being defined relates to the formal definition you are presented with.

(ii) Continue to use the intuitive understanding, turning to the formal definition if you are ever in danger of getting muddled, or if you want to make general statements about the concept in question.

(iii) Think about what properties of the intuitive concept the formal definition is trying to capture.

Just to clarify what I mean by (ii), suppose you were asked a simple question like, “How many relations are there on a set $A$ of size $n$?” If you think of a relation as a way of relating elements of the set, then this could seem a difficult question. If you think of it as a subset of $A\times A$, then you will see instantly (I hope) that the answer is $2^{n^2}$.

Let’s try to do (i), again using relations as our example. How does the subset-of-$A\times A$ definition correspond to the relating-things definition? Well, if I have some way of relating things, it will be expressed as a sentence with gaps into which you insert two elements $x$ and $y$, both of which can range over all of $A$. For example, if $A$ is the set of all positive integers, we might have $*\leq 2*$ or $*\not | *$ as a relation on $A$. Once I have two elements $x$ and $y$ I can use the relation to form sentences such as $x\leq 2y$ and $x\not | y$.

To convert something like that into a subset of $A\times A$ I just take the set of all ordered pairs $(x,y)$ such that $x$ is related to $y$ in the way stated.

In the other direction, if I’m given a subset $X$ of $A\times A$, I can define a relating-things relation $R$ by setting $xRy$ if and only if $(x,y)\in X$. So every method of relating pairs of elements of $A$ gives rise to a subset of $A\times A$ and vice versa.

The exercise I have just carried out for relations tends not to be hard to carry out for other concepts. To give one other example, a real sequence $a_n$ gives us the function $f:\mathbb{N}\to\mathbb{R}$ defined by $f(n)=a_n,$ and a function $f:\mathbb{N}\to\mathbb{R}$ gives us the real sequence $(a_n)$ defined by $a_n=f(n)$.

What do I mean by (iii)? This is easier to say for some examples than it is for others. A particularly easy case is ordered pairs. What is the point of defining the ordered pair $(x,y)$ as the set $\{x,\{x,y\}\}$? It’s that, officially at least, mathematicians don’t like having too many primitive concepts — that is, concepts that can’t be defined in terms of lower-level concepts — so they try to build everything up from sets.

So far so good, but what makes us choose that funny set to count as “the ordered pair” $(x,y)$? Well, what is the main thing we care about when we deal with ordered pairs? It’s that $(x,y)=(z,w)$ if and only if $x=z$ and $y=w$. It’s a simple exercise to show that the sets $\{x,\{x,y\}\}$ and $\{z,\{z,w\}\}$ are equal if and only if $x=z$ and $y=w$, so this set-theoretic construction gives us a way of defining a set-theoretic object that has the key property we want of ordered pairs.

One consequence of the fact that it’s really the properties we are interested in rather than the objects themselves, is that we can “define” the same concept in more than one way. For example, I could define the ordered pair $(x,y)$ to be the set $\{\{x\},\{x,y\}\}$ instead. That would again have the required property.

Real numbers can be “defined” in several ways. I mentioned the Cauchy-sequences definition above, but another well-known one is the notion of a Dedekind cut. We define a real number to be a partition of the rational numbers into two sets $A$ and $B$ such that every element of $A$ is less than every element of $B$.

How does this correspond to what you might think of as a real number $t$? Well, given your number $t$, you can define $A$ to be the set of all rationals less than $t$ and $B$ to be the set of all rationals greater than or equal to $t$. In the other direction, given two sets $A$ and $B$ with that property, you can calculate the decimal expansion of a real number by using the following procedure. Start with the biggest integer that belongs to $A$. Let’s say it is 3. Now take the biggest multiple of 0.1 that belongs to $A$. Let’s say that is $3.1$. Then take the biggest multiple of $0.01$. Let’s say that is $3.14$. Continuing in this way, we build up the decimal expansion of a number $t$ that is at least as big as every number in $A$.

What properties do we want real numbers to have? The answer is that we want them to have the kinds of arithmetic properties we expect — things like $x(y+z)=xy+xz$ — and to have the property that every increasing sequence that is bounded above converges to a limit. If you don’t know what that means, it doesn’t matter too much here. What matters here is that it is an axiom on which is built the theory of real analysis, which you will be doing next term. There are certain properties that turn out to imply all the other statements we want to make about real numbers, and Dedekind cuts are a way of showing that if we’ve got the rationals and we’ve got some set theory, then we don’t have to introduce any new objects. (The rationals themselves are defined in terms of the integers, which are defined in terms of the natural numbers, which are defined in terms of sets.)

4. Calculation definitions.

I commented above that one can define ordered pairs or real numbers, or many other mathematical concepts, in several different ways. By that I meant really different: an equivalence class of Cauchy sequences of rational numbers is not the same thing as a Dedekind cut, but either will serve as a construction-definition of a real number.

There is another kind of non-uniqueness that frequently occurs when we want to define a number or function. For example, $\pi$ can be defined as $4(1-1/3+1/5-1/7+\dots)$, or it can be defined as the area of a unit circle (which itself can be defined using an integral). It is far from obvious that these two definitions result in the same number, but a bit of theory shows that they do.

An example of a function that can be defined in more than one way is $e^x$. Here are four ways of defining it.

(1) Let $e$ be the number $\frac 1{0!}+\frac 1{1!}+\frac 1{2!}+\frac 1{3!}+\dots$

For every positive integer $n$ define $e^n$ to be $e.e^{n-1}$, with $e^0$ defined to be 1.

For every pair of positive integers $p,q$ define $e^{p/q}$ to be the $q$th root of $e^p$.

Finally, given a real number $x$, let $(p_n/q_n)$ be a sequence of rational numbers converging to $x$ and define $e^x$ to be the limit of the numbers $e^{p_n/q_n}$.

(2) Define $e^x$ to be $1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\dots$

(3) Define $e^x$ to be the unique solution of the differential equation $f'(x)=f(x)$ such that $f(0)=1$.

(4) Define $e^x$ to be the limit as $n\to\infty$ of $(1+x/n)^n$.

Now it might seem that $e^x$ is a pre-existing function, and that these definitions are just ways of calculating $e^x$. But that isn’t really the point of these definitions. The point is that it is incredibly useful to us to have a continuous function $f:\mathbb{R}\to\mathbb{R}$ with the property that $f(x+y)=f(x)f(y)$ for every $x$ and $y$. It is an exercise to prove that if such a function exists, then it is determined by a single parameter (such as its value at 1, or the value of its derivative at 0). The above definitions are four routes to proving that it exists. You will probably be given Definition (2) as the “official” definition of $e^x$ (though I myself prefer to use Definition (4)). Whichever definition you choose, one of the first things you do is prove that $f(x+y)$ is always equal to $f(x)f(y)$. That pins your function down to one of the form $a^x$ for some real number $a$. To get the right function, you need to impose one more condition, which can be done in many ways: perhaps the simplest is to insist that $f'(0)=1$.

In general, when you are presented with a calculation-definition, for example of some new function, I strongly recommend that you pay close attention to the basic properties that your lecturer goes on to prove. Very often these determine the function uniquely and are what you use in practice when you are proving further things about the function.

With the exponential function, it is profitable to think of $f(x+y)=f(x)f(y)$ and $f'(0)=1$ as “axioms for $e^x$“. As an indication of how that is a useful point of view, let’s imagine that we have been given the power-series definition of $e^x$ and are now faced with the task of proving that the derivative of $e^x$ is $e^x$. We can of course differentiate term by term, but then we need to know that that’s allowed. (It is, but it takes a little bit of work to prove it.) Another way of proceeding is first to show that $e^{x+y}=e^xe^y$ and that the derivative at 0 is 1. Then the derivative at x, if it exists, is $\lim_{h\to 0}(e^{x+h}-e^x)/h$. This fraction equals $e^x(e^h-1)/h$, so we see straight away that the derivative will be $e^x$ times the derivative at 0.

That second proof probably ends up involving a similar amount of work to the first, but it has a big advantage, which is that it shows that the properties “differentiates to itself” and “turns addition into multiplication” are closely related. They aren’t just two properties that a function given by a funny formula happens to have.

Calculation-definitions are different from the construction-definitions discussed in the previous section, since what is calculated is the same thing for each definition. For instance, although definitions (1)-(4) above are different, they all define the same object — a certain function from $\mathbb{R}$ to $\mathbb{R}$. By contrast, as already mentioned, different ways of defining ordered pairs or real numbers give you distinct mathematical objects (that nevertheless have the same important properties).

As with construction-definitions, calculation-definitions are sometimes helpful for more than just the basic properties of what is being defined. For example, if I want to prove that $e^1$ is irrational, then I will certainly avail myself of the power-series definition. Note that once we know that there is only one continuous function $f$ such that $f(x+y)=f(x)f(y)$ for every $x$ and $y$ and $f'(0)=1$, we know that once we have established continuity and those properties for a new definition, we know that it gives us the same function as the other ones. In other words, we don’t have to prove that by means of some laborious calculation.

I haven’t posted for a while now, so I’m going to post this, even though I think that there may be entire classes of definitions that I have not mentioned. However, the main thing I want to say about definitions — that some definitions don’t look like definitions at all — is so important that I am going to devote a separate post (the next one) to it.

To summarize, some definitions are mere abbreviations. The main purpose of some definitions is to pick out certain properties (e.g., saying that a triangle is equilateral if its three sides have the same length). Some definitions are constructions of mathematical objects that may look a little bizarre but are designed to have properties that enable them to model our pre-existing intuitive concepts. And some are ways of specifying numbers or functions that are again of interest mainly for their properties.

13 Responses to “Definitions”

1. Lior Silberman Says:

Two corrections: The definition of “a sequence of real numbers” has $\mathbb{R}$ as the domain of the function instead of $\mathbb{Z}$.

More seriously, there is more than one function $f\colon \mathbb{R}\ to\mathbb{R}$ so that $f(x+y)=f(x)f(y)$ since if $f$ is such a function then so is $x\mapsto f(ax)$ for any $a \in \mathbb{R}$.

Oops, that was careless of me. I’ve reworded things and I hope eliminated all the false statements.

2. Veky Says:

Also, the function f(x)=0 satisfies the equation f(x+y)=f(x)f(y). It’s funny that you consider that to be very important property of exponential function, since you don’t even have to define e for its definition (2^x will do fine).

I really think the most important property we want to get is f’=f (and f(0)=1 for normalization). It seems more complicated than adition|->multiplication equation, but it is really the property that distinguishes exp from all other exponential functions.

• gowers Says:

I’m not sure I agree with you. To begin with, $2^x$ is most easily defined to be $\exp(x\log 2)$. And secondly, I would say that the difference between an exponential function and a non-exponential function is more important than the difference between two different exponential functions.

3. Gaurav Tiwari Says:

I’m more confused about the mathematical notations of definition than ‘the definitions’ of definition. When I was in first year, I used $\equiv$ or ${}^{\Delta} {}_{=}$ or ${}^{\mathrm{def}} {}_{=}$ or only $=$ to define things (I don’t know how to write them correctly using LaTeX), like $A = \{x^2 : x \mathrm{is \ prime} \}$. Now I’m using $:=$ type symbols. I don’t understand what $:$ mean before $=$ and what in $\{ x : P(x)\}$ before $P(x)$?

• Nirakar Neo Says:

A := B, means that the left hand side is defined by the stuff on the right hand side. For example, I may write

A := {x : x is prime}

which means that the symbol ‘A’ is defined as the set in the above manner. Of course the colon appearing inside the braces stand for “such that”. I hope that clears.

4. Andrew Stacey Says:

At the start of section 2, I see:

“A positive integer n not equal to 1 is prime if its only factors are 1 and 0^0”

Thanks — I’ve now swapped round the dollar sign and the full stop.

5. Peter Smith Says:

Actually, defining an ordered pair (x, y) as {x, {x, y}} makes showing (x, y) = (z, w) => x = z and y = w rather non-simple! [As some aggrieved first-year philosophers found when a typo in a colleague’s exercise sheet asked them to prove it …]

6. Terence Tao Says:

One quirk about mathematical definitions, particularly when one starts working at higher levels of abstraction and is trying to introduce several new concepts simultaneously, is that one often has to make a “preliminary” definition of one concept, before moving on to the “right” definition later. For instance, suppose one wanted to define the notions of “(abstract) vector” and “(abstract) vector space” separately. The most natural definitions of these two concepts are:

Definition 1. A vector is an element of a vector space.

Definition 2. A vector space is a collection of vectors obeying the following axioms ….

However, one cannot use both of these definitions, as they depend circularly on each other. Usually, what happens in practice is that one begins with an inferior version of Definition 2, and then cleans it up later, like so:

Definition 2′. A vector space is a set obeying the following axioms …

Definition 1. Elements of a vector space are known as vectors.

Remark 2. Thus, one can view a vector space as a collection of vectors obeying the following axioms …

In this particular case, there is no great distinction between the preliminary definition and the final definition, but in some cases the difference can be quite perceptible. For instance, the “right” definition of a smooth manifold is “a manifold which is locally diffeomorphic to an open subset of Euclidean space”, but one cannot adopt this definition initially, because in order to define “diffeomorphism” one needs to define smooth manifolds (or smooth structures) first. So instead we have to settle for the preliminary definition involving equivalence classes of atlases of smoothly compatible coordinate charts, even though we never use this definition again once we are able to state the “right” one.

• Terence Tao Says:

A pedantic correction: in Definition 2′, a vector space is not merely a set V, but is a set V equipped with some additional structures (a zero element 0, an addition map $+: V \times V \to V$, and a scalar multiplication map $\cdot: k \cdot V \to V$, where k is the underlying field.) (This point is usually glossed over in informal presentations of “natural” definitions, Definition 1 and Definition 2, as one often simply assumes that the various vector operations are externally provided and do not require definition.)

7. Veky Says:

“To begin with, 2^x is most easily defined to be exp(x ln 2)”
… where exp is defined as what? If you use (1) as your definition, 2 can be used same as e… even easier, since it is more easily defined.🙂 Of course, if you use (2), then it is easier to define than 2^x, but that is just the (Taylor expansion of) the condition f’=f. And in (3), that condition is even explicitly stated. (4) is the most interesting approach, but it also hides the condition f’=f.

“And secondly, I would say that the difference between an exponential function and a non-exponential function is more important than the difference between two different exponential functions.”
Of course. But I thought you set out to define _the_ exponential function, not _an_ exponential function. If you just want some exponential function, then f(x+y)=f(x)f(y) is of course the most important property. But in that story, the number e is really not very important.

8. » Liberal-artsy people Gyre&Gimble Says:

[…] more about the role of definitions, check out the abmath article and also Timothy Gowers’ post on definitions (one of a series of excellent posts on working with abstract […]

9. Tim S Says:

On type 3: computer scientists make the helpful distinction between specification (how you want what you’re defining, say real numbers, to behave), and implementation (how the real numbers are actually made, that is, what sets they are). It’s important that real numbers can be implemented as sets; and it’s interesting to see the various different ways that it can be done; but what’s important about real numbers is how the specification for how they behave. (Benacerraf’s question — does 3 belong to 5? (it does on Zermelo’s construction of the naturals as sets, but not on von Neumann’s) — is a warning to us not to confuse the two.)