# Naïve set theory

Unless you have taken the Discrete Mathematics course here at NYU or studied the equivalent material elsewhere, you probably would not have had an occasion to study the set-theoretic formalism that serves as a foundation for much of modern mathematics.

This isn’t to claim that you know nothing about set theory. The two prerequisite courses, linear algebra and vector calculus, cover more than enough examples of sets and functions. You have an idea of what sets are; you know how to take intersections, unions, differences, and complements of them; you know the basic properties of functions.

Our goal here is to round out your set-theoretic vocabulary, so that you can explore the world of modern mathematics with ease. The approach we take in this post is the so-called naïve set theory. Think of this as a math analogue to the Writing the Essay course: we are aiming for a working knowledge of the language, so we can go ahead and study the Great Works.

Indeed, our focus is on the language of set theory and its usage. The study of the inner workings of the language, while interesting in its own right, is not our concern here. In particular, we shall freely take certain facts for granted, instead of deriving them from first principles.

## 1. Sets

A set is a well-defined collection of mathematical objects, which we do not attempt to define. In other words, if we fix a set $X$, every mathematical object $x$ should either be an element of $X$ or not. We write $x \in X$ to denote that $x$ is an element of $X$; $x \notin X$ means $x$ is not an element of $X$. We take for granted that a lot of sets exist. For example, we assume that the set of integers

exists without getting into a discussion about what the symbol “1” means. We also assume the existence of the empty set $\varnothing$, which is the unique set that has no element. Often, we call upon an unspecified index set $I$, which is an arbitrary set with the desired “size”. Typically, we begin by assuming the existence of a set $\mathcal{X} = \{X_\alpha : \alpha \in I\}$ of sets $X_\alpha$ indexed by $I$, viz., for each $\alpha \in I$, we suppose that a set with the name $X_\alpha$ exists. This is a generalization of indexing by natural numbers: indeed, if $I = \mathbb{N} = \{0,1,2,3,\ldots\}$, then

Example 1.1. For each number $\alpha$ in the set of real numbers $\mathbb{R}$, we let $X_\alpha$ be the open interval $(\alpha-1,\alpha+1)$. The collection $\{X_\alpha : \alpha \in \mathbb{R}\}$ is a set of subsets of $\mathbb{R}$, indexed by the real numbers. $\square$

We write $E \subseteq X$ to denote that $E$ is a subset of $X$, i.e., $x \in E$ implies $x \in X$. For example, $\mathbb{N} \subseteq \mathbb{Z}$. Observe that that the two sets $X$ and $Y$ are equal if and only if $X \subseteq Y$ and $X \supseteq Y$. Given a set $X$, we often use the set-builder notation to define a subset of $X$. For example,

is the subset of $\mathbb{Z}$ consisting precisely of the even integers.   We do not attempt to discuss Russell’s paradox here. Given a set $X$, we define the powerset of $X$ to be the set

For example,

We assume that the power set of an arbitrary set exists. Given two sets $X$ and $Y$, we define the union

the intersection

and the set difference

We say that $X$ and $Y$ are djsoint if $X \cap Y = \varnothing$. We assume that $X \cup Y$, $X \cap Y$, and $X \smallsetminus Y$ exist regardless of our choice of $X$ and $Y$. The union operation and the intersection operation can be generalized to multiple sets. If $X_1,\ldots,X_n$ are sets, then we define the union

and the intersection

The following exercise is strongly recommended for those without much experience with logical quantifiers:

Exercise 1.2. Show that

Formulate a similar statement for the intersection operation and prove it. Formulate similar statements for $n$ sets $X_1,\ldots,X_n$ and prove them.

Even more generally, we can take an arbitrary collection of sets $\{X_\alpha : \alpha \in I\}$, labelled by an index set $I$, and define the union

and the intersection

As above, we assume that unions and intersections of sets exist regardless of our choice of sets. We say that $\{X_\alpha : \alpha \in I\}$ is pairwise disjoint in case $X_\alpha \cap X_\beta = \varnothing$ whenever $\alpha \neq \beta$.

## 2. Relations and Functions

How do we say that an element of a set $X$ and an element of another set $Y$ are related by some mathematical rule? To answer this question, we take for granted the existence of ordered pairs: namely, if $X$ and $Y$ are sets, and if $x \in X$ and $y \in Y$, then the mathematical object $(x,y)$ is well-defined and exists. The cartesian product of $X$ and $Y$ is defined to be the set of all ordered pairs:

A binary relation from $X$ to $Y$ is a subset of the cartesian product $X \times Y$.

Example 2.1. If $R \subseteq X \times Y$ and if $(x,y) \in R$, then we say that * $x$ is $R$-related to $y$* and write $xRy$. For example, we let $X = Y = \mathbb{R}$ and define the less-than-or-equal-to relation

In this case, a real number $x$ is $R$-related to another real number $y$ if and only if $x \leq y$ with the usual ordering. $\square$

Example 2.2. Another example is the inclusion relation: we take a set $X$ and define

In this case, a subset $Y$ of $X$ is $R'$-related to another subset $Z$ of $X$ if and only if $Y \subseteq Z$. $\square$

Example 2.3. Fix a positive integer $k$. We define the mod $k$ relation $\sim_k$ on $\mathbb{Z}$ by declaring that $n \sim_k m$ if and only if $n - m$ is an integer multiple of $k$; in other words, $n \sim m$ in case $n$ and $m$ have the same remainder when divided by $k$. $\sim_k$ is then a relation. $\square$

A binary relation $F \subseteq X \times Y$ is said to be a function if

• for each $x \in X$, there exists an $y \in Y$ such that $xFy$;
• $xFy$ and $xFy'$ imply that $y = y'$.

This is the formalization of the intuitive definition that a function is a rule that assigns a unique value to each element of a set. We write $F:X \to Y$ to denote that $F$ is a function from $X$ to $Y$. Given a function $F:X \to Y$, we write $F(x) = y$ to denote that $x$ is $F$-related to $y$. $X$ is the domain of the function $F:X \to Y$, and $Y$ is the codomain. The subset

of the codomain of $F$ is called the image of the function $F$. Given a subset $E$ of $X$, the set

is said to be the image of $E$ under $F$. Similarly, given a subset $G$ of $Y$, the set

is said to be the preimage of $G$ under $F$. We say that a function $F:X \to Y$ is

• injective if, for all $x,y \in X$, $f(x) = f(y)$ implies that $x = y$;
• surjective if $\mathrm{im} \, F = Y$;
• bijective if $F$ is injective and surjective.

Bijective functions are invertible, in the following sense:

Proposition 2.4. If $F:X \to Y$ is a bijective function, then the relation $R \subseteq Y \times X$ defined by declaring $yRx$ if and only if $f(x) = y$ is a function, called an inverse function of $F$. If $R$ and $R'$ are two inverse functions of $F$, then $R = R'$. In light of this uniqueness property, we write $F^{-1}:Y \to X$ to denote the unique inverse function of $F$.

Proof. Let $R$ be an inverse function of $F$. For each $y \in Y$, the surjectivity of $F$ furnishes an $x \in X$ such that $yRx$. Moreover, if $yRx$ and $yRx'$, then $f(x) = y = f(x')$, whence the injectivity of $F$ implies that $x = x'$. It follows that $R$ is a function. If $R'$ is another inverse function of $F$, then

whence $R = R'$, as was to be shown. $\square$

Example 2.5. The function $f_1:\mathbb{R} \to \mathbb{R}$ defined by setting $f_1(x) = x^2$ is neither injective nor surjective. The function $f_2:[0,\infty) \to \mathbb{R}$ defined by the same formula $f_2(x) = x^2$ is injective but not surjective. The function $f_3:[0,\infty) \to [0,\infty)$ defined by the same formula $f_3(x) = x^2$ is bijective, and its inverse is given by the formula $f_3^{-1}(x) = \sqrt{x}$. $\square$

Given three sets $X$, $Y$, and $Z$ and two functions $f:X \to Y$ and $g:Y \to Z$, we define the function composition of $f$ and $g$ to be the function $g \circ f:X \to Z$ defined by setting $(g \circ f)(x) = g(f(x))$ for each $x \in X$. The following exercise is recommended for those without much experience dealing with injectivity and surjectivity in the context of function compositions.

Exercise 2.6. Let $f:X \to Y$ and $g:Y \to Z$ be functions. Verify the following statements:

(1) $g \circ f$ is a function.

(2) If $f$ and $g$ are injective, then so is $g \circ f$.

(3) If $f$ and $g$ are surjective, then so is $g \circ f$.

(4) If $f$ and $g$ are bijective, then so is $g \circ f$.

(5) If $g \circ f$ is bijective, then $f$ is injective and $g$ is surjective.

Let $E \subseteq X$. The restriction of $f:X \to Y$ onto $E$, denoted by $f\vert_E$, is the composition $f \circ \iota_E:E \to Y$, where $\iota_E:E \to Y$ is the canonical injection map $\iota_E \to X$ defined by the formula $\iota_E(x) = x$. By the above exercise, the restriction of an injective function is injective. Let $G \subseteq Y$  Given a fixed $y_0 \in Y$, we can define a projection map $p:Y \to G$ by setting

We now assume that $\mathrm{im} \, f \subseteq G$. The codomain restriction of $f:X \to Y$ by $G$ is the composition $p \circ f:X \to G$. We note that the codomain restriction is invariant of the choice of the point $y_0 \in Y$, the only points of $Y$ that are mapped to $y_0$ are outside of $\mathrm{im} \, f$.

## 3. Equivalence Relations and Quotient Sets

Often, there is a need for categorization via declaring different objects to be “the same”. For example, instead of posting “Happy Birthday, Harry!” on a residence hall bulletin board on September 4, only to take it down the next day and post “Happy Birthday, Elisha!” on the same board, a resident assistant may choose to post “Happy Birthday to All September Birthday People: Harry, Elisha, Tish, and Joe!”

Similarly, it is impractical to create an analog watch that records the exact amount of time passed since the time of initial operation, and so “12 hours from now”, “24 hours from now”, and “36 hours from now” are represented in the same way.

The process of equating different objects is done rigorously by introducing an equivalence relation, which is a binary relation $\sim \, \subseteq X \times X$ on a set $X$ that satisfies the following three properties hold:

• reflexivity. $x \sim x$ for all $x \in X$;
• symmetry. $x \sim y$ implies that $y \sim x$;
• transitivity. $x \sim y$ and $y \sim z$ imply that $x \sim z$.

Exercise 3.1.  Check that the mod $k$ relation $\sim_k$ defined in Example 2.3 is an equivalence relation.

The single most important property of an equivalence relation is that it determines a partition.

Definition 3.2. A partition of a nonempty set $X$ is a set $P$ of nonempty subsets of $X$ such that

and that $E,E' \in P$ implies either $E = E'$ or $E \cap E' = \varnothing$.

Proposition 3.3. Let $X$ be a nonempty set. Let $\sim$ is an equivalence relation on $X$. For each $x \in X$, we define the equivalence class of $x$ to be the set $[x] = \{ y \in X : x \sim y\}.$ The collection $\{[x]: x \in X\}$ then forms a partition of $X$

Proof. Fix $x,y \in X$. By reflexivity, the equivalence classes $[x]$ and $[y]$ are nonempty. Suppose that there exists $z \in [x] \cap [y]$. Whenever $x' \in [x]$ and $y' \in [y]$, it follows from symmetry and transitivity that $x' \sim x \sim z \sim y \sim y'$, whence $y' \in [x]$ and $x' \in [y]$. It follows that $[x] \subseteq [y]$ and $[y] \subseteq [x]$, whence $[x] = [y]$. $\square$

The converse is also true:

Proposition 3.4. Let $X$ be a nonempty set and $I$ an index set. Suppose that $P = \{E_\alpha : \alpha \in I\}$ is a partition of $X$. The relation $\sim$ on $X$ defined by declaring $x \sim y$ if and only if there exists an index $\alpha \in I$ such that $x,y \in E_\alpha$ is an equivalence relation.

Proof. Fix arbitrary $x,y,z \in X$. Since $P$ is a partition, there exists an $\alpha \in I$ such that $x \in E_\alpha$. Observe that $x \in E_\alpha$ implies $x \in E_\alpha$; since $x$ was arbitrary, $\sim$ is reflexive. If $x,y \in E_\alpha$, then $y,x \in E_\alpha$, and so $\sim$ is symmetric. Lastly, if $x,y \in E_\alpha$ and $y,z \in E_\alpha$, then $x,z \in E_\alpha$, and so $\sim$ is transitive. $\square$

Why do we care about the relationship between partitions and equivalence relations? Equivalence relations are a natural way of declaring different objects to equal each other, as the next example shows.

Example 3.5. Let us now specialize Exercise 3.1 to the $k=12$ case to discuss the so-called clock arithmetic. The equivalence relation $\sim_{12}$, defined in Example 2.3, defines twelve equivalence classes: $[1],[2],\ldots,[11],[12]$. By partitioning $\mathbb{Z}$ into twelve pieces, we can represent an arbitrary hour $n$ by the numbers $1,\ldots,12$, as we would expect from the behaviors of a typical clock. $\square$

The collection of all equivalence classes is therefore the set of all “representatives”, after declaring the equality of elements as we desire. We give this collection a name.

Definition 3.6. Let $X$ be a nonempty set and $\sim$ an equivalence relation on $X$. The quotient set $X/\sim$, called $X$ modulo $\sim$, is the set of all equivalence classes induced by $\sim$.

Example 3.7. Let $\sim_k$ be the mod $k$ equivalence relation defined in Example 2.3. We write $\mathbb{Z}/k\mathbb{Z}$ to denote the quotient set $\mathbb{Z}/\sim_k$ and call it the set of integers modulo $k$. $\square$ We observe that every element of $X$ has a representative in the quotient set $X/\sim$.

Proposition 3.8. Let $X$ be a nonempty set and $\sim$ an equivalence relation on $X$. The canonical surjection map $p:X \to X/\sim$ defined by setting $p(x) = [x]$ is a well-defined surjective function.

Remark. The term “well-defined” means that the relation $f$ is actually a function.

Proof. We first note that the value of $p$ at a fixed element $x \in X$ is unique by the definition of $p$. Moreover, for each $x \in X$, the value $p(x)=[x]$ exists as an element of $X/\sim$. Therefore, $p$ is a well-defined function. To show that $p$ is surjective, we fix an arbitrary set $E \in X/\sim$. By the definition of $X/\sim$, we can find an $x \in X$ such that $E = [x]$, whence $p(x) = [x] = E$. Since $E$ was arbitrary, the desired result follows. $\square$

## 4. Cartesian Products

With the notion of functions at hand, we can now generalize the concept of ordered pairs to an arbitrary number of coordinates.

The key feature of ordered pairs is that there is a designated output for each coordinate. This is to say that we can think of ordered pairs as functions on the two-element set $\{\mathrm{first},\mathrm{second}\}$: in case of $(x,y)$, the value corresponding to “first” is $x$, and the value corresponding to “second” is $y$.

The cartesian product $X \times Y$ can then be thought of as the collection of all functions $f$ on $\{\mathrm{first},\mathrm{second}\}$  such that $f(\mathrm{first}) \in X$ and $f(\mathrm{second}) \in Y$. We are thus led to the following notion:

Definition 4.1. Let $\{X_\alpha : \alpha \in I\}$ be a collection of sets indexed by $I$. The cartesian product of the sets in the collection $\{\{X_\alpha : \alpha \in I\}$ is

By specializing this definition to one set $X$, we obtain the $I$-fold cartesian product of $X$

An $X$-valued $I$-tuple is an element of $X^I$.

Note that the usual notion of $n$-tuples from linear algebra and vector calculus coincides with the above definition, with $I = \{1,2,\ldots,n\}$.

It is not entirely clear whether the cartesian product $\prod_{\alpha \in I} X_\alpha$ of nonempty sets $X_\alpha$ is nonempty. This issue is resolved in the affirmative if we take for granted the following intuitive statement:

Axiom 4.2 (Axiom of choice). Every indexed family of nonempty sets $\mathcal{X} = \{X_\alpha : \alpha \in I\}$ admits a function $c:\mathcal{X} \to \bigcup_{\alpha \in I} X_\alpha$ such that $c(X_\alpha) \in X_\alpha$. In other words, the choice function $c$ picks out one element from each set $X_\alpha$.

The relation $\varphi:I \to \mathcal{X}$ that sends $\alpha \in I$ to $X_\alpha \in \mathcal{X}$ is a function, whence the composition $c \circ \varphi$ is an element of $\prod_{\alpha \in I} X_\alpha$. It follows that the cartesian product of nonempty sets is nonempty. We shall study more consequences of the axiom of choice in the subsequent sections.

Note that we have not defined ordered tuples, as we do not yet know what it means to compare the size of two elements of an index set $I$. We shall take up this matter in Section 6.

## 5. Cardinality

In real life, we count the number of objects in a collection $X$ by exhibiting a one-to-one correspondence between $X$ and a subset of the natural numbers: one, two, three, four, and so on. This motivates the following definition:

Definition 5.1. A set $X$ is finite if, for some $N \in \mathbb{N}$, there exists a bijection $% $; if not, $X$ is said to be infinite$X$ is said to be /1 in case $X$ is finite or countably infinite; otherwise, $X$ is said to be uncountable.

Example 5.2. $\mathbb{Z}$ is countably infinite. Indeed, the function

is a bijection from $\mathbb{Z}$ to $\mathbb{N}$. $\square$

Example 5.3. $\mathbb{N} \times \mathbb{N}$  is countably infinite. Here’s a bijection; see Example 6.17 for a rigorous proof in a more general case:

More generally, if $X$ and $Y$ are countably infinite sets, then there exist bujections $f:X \to \mathbb{N}$ and $g: Y \to \mathbb{N}$, so that the map $h(x,y) = (f(x),g(y))$ is a bijection from $X \times Y$ to $\mathbb{N} \times \mathbb{N}$. Since the composition of two bijective functions is bijective, we conclude that $X \times Y$ is countable. $\square$

We can now show that the set $\mathbb{Q}$ of rational numbers is countably infinite. The function $q:\mathbb{Q} \to \mathbb{Z} \times \mathbb{Z}$ given by the formula $q(\frac{n}{m}) = (n,m)$ is injective. Since $\mathbb{Z}$ is countably infinite, the cartesian product $\mathbb{Z} \times \mathbb{Z}$ is countably infinite, and we have a bijective $z: \mathbb{Z} \times \mathbb{Z} \to \mathbb{N}$.

The function $z \circ q:\mathbb{Q} \to \mathbb{N}$ is now a composition of two injective functions, whence $z \circ q$ is injective.

What we have shown, intuitively, is that the size of the set $\mathbb{Q}$ is at most that of $\mathbb{N}$. Since $\mathbb{N}$ is a subset of $\mathbb{Q}$, the size of $\mathbb{N}$ cannot exceed that of $\mathbb{Q}$. We therefore should be able to conclude that the size of $\mathbb{Q}$ is precisely that of $\mathbb{N}$.

A formal argument is as follows. $\mathrm{im} \, (z \circ q)$ is easily seen to be an infinite set, whence the claim that there exists a bijection from $\mathbb{Q}$ to $\mathbb{N}$ follows from the lemma below:

Lemma 5.4. If $Z$ is an infinite subset of $\mathbb{N}$, then there exists a bijection $\varphi:\mathbb{N} \to Z$.

It therefore suffices to prove the lemma.

To this end, we first show that every subset $E$ of $\mathbb{N}$ has the unique minimal element. If $E$ is finite, we can verify this claim by comparing each and every element of $E$. If $E$ is infinite, then we can fix $N \in E$ and consider $E' = \{n \in E : n \leq N\}$. Since every element of $E \smallsetminus E'$ is larger than $N$, a minimal element of $E$, if it exists, must be in $E'$. Now, $E'$ is finite, and so we already know how to find its minimal element.

To show that the minimal element of $E$ is unique, we suppose that $m,m' \in E$ are two minimal elements of $E$. By minimality, we have $m \leq m'$ and $m' \leq m$, whence $m = m'$. Let us now construct the bijection $\varphi:\mathbb{N} \to Z$. We declare $\varphi(0)$ to be the minimal element of $Z$. Then, for each $n \geq 1$, we define $\varphi(n)$ to be the minimal element of $\{\varphi(0),\ldots,\varphi(n-1)\}$. If $\{\varphi(0),\ldots,\varphi(N-1)\} = Z$ for some $N \in \mathbb{N}$, then $Z$ is finite, contradicting the assumption that $Z$ is infinite.

The inductive process therefore continues ad infinitum, and $\varphi$ is well-defined on all of $\mathbb{N}$. By construction, $\varphi$ is injective. It remains to show that $\varphi$ is surjective. Given $N \in Z$, the set $\{n \in Z : k \leq N\}$ is finite of size, say, $M$. It then follows that $\varphi(n) = N$ for some $k \leq M$. This completes the proof of the lemma. $\square$

Exercise 5.5. Show that the cartesian product $\prod_{k=0}^n X_k$ of countably infinite sets $X_0,\ldots,X_n$ is countably infinite. To see this, we let $f_k:X_k \to \mathbb{N}$ be a bijection and define a function $g: \prod_{k=0}^n X_k \to \mathbb{N}$ by setting

where $p_0,\ldots,p_n$ are distinct prime numbers. Show that $g$ is an injection, and apply Lemma 5.4 to derive the desired result.

Example 5.6. The set $[0,1]$ of all real numbers between 0 and 1 including end points is uncountable. To see this, we suppose for a contradiction that $f:[0,1] \to \mathbb{N}$ is a bijective function. For each $n \in \mathbb{N}$, $f^{-1}(n)$ is a real number between 0 and 1, it admits a decimal representation of the form $0.x_1x_2x_3x_4 \cdot s$. Let us make a list:

Since $f$ is a bijection, the above list must contain every real number in $[0,1]$. We shall now exhibit a number in $[0,1]$ that does not belong to the above list: since $f$ was an arbitrary bijection, it follows that no bijection from $[0,1)$ to $\mathbb{N}$ can exist. To this end, we pick each $y_n \in \{0,1,2,3,\ldots,9\}$ so that $y_n \neq x_n^n$. The resulting decimal number

is a number in $[0,1]$.

Now, for each $n \in \mathbb{N}$, we see that $y$ cannot equal $f^{-1}(n)$, as the stipulation $y_n \neq x_n^n$ implies that the decimal representations of $y$ and $f^{-1}(n)$ are different. $y$ is therefore a number in $[0,1]$ that does not belong to the above list. This is the so-called diagonal argument.

Similar arguments show that the open interval $(0,1)$ and the half-open intervals $[0,1)$ and $(0,1]$ are uncountable. Given fixed real numbers $a$ and $b$ such that $% $, the function $g(x) = \frac{x-a}{b-a}$ is a bijection from the closed interval $[a,b]$ to $[0,1]$. If there exists a bijection $h:[a,b] \to \mathbb{N}$, then the composition $h \circ g^{-1}:[0,1] \to \mathbb{N}$ is bijective as well, contradicting the uncountability of $[0,1]$. We conclude that $[a,b]$ is uncountable.

Similar arguments show that the open interval $(a,b)$ and the half-open intervals $[a,b)$ and $(a,b]$ are uncountable.

Finally, we observe that a set $X$ that contains $[0,1]$ as a subset is uncountable. We suppose for a contradiction that we can find a bijection $b:X \to \mathbb{N}$, whence the restriction $b\vert _{[0,1]}:[0,1] \to \mathbb{N}$ is injective. What this implies, intuitively, is that the size of the set $[0,1]$ is at most $\mathbb{N}$. Since we have already shown that $[0,1]$ is uncountable, we should be able to derive a contradiction from this.

Formally, $\mathrm{im} b\vert _{[0,1]}$ is an infinite subset of $\mathbb{N}$, whence the desired contradiction follows from Lemma 5.4. It now follows at once that $\mathbb{R}$, the set $\mathbb{C}$ of complex numbers, and the set $\mathbb{H}$ of quaternions are uncountable. $\square$

The intuition that if the size of $X$ is at most that of $Y$, and if the size of $Y$ is at most that of $X$, then $X$ and $Y$ must be of the same size is quite attractive; we now seek to formalize this idea.

Definition 5.7. Let $X$ and $Y$ be sets. $X$ and $Y$ are said to be equinumerous if there exists a bijection $f:X \to Y$. We write $\vert X \vert = \vert Y \vert$ to denote that $X$ and $Y$ are equinumerous. In case $f$ is not a bijection but only an injection, we say that the cardinality of $X$ is at most the cardinality of $Y$. and write $\vert X \vert \leq \vert Y \vert$.

We remark that countably infinite is precisely being equinumerous with $\mathbb{N}$. We also observe the following:

Proposition 5.8 (Partition principle). $\vert X \vert \leq \vert Y \vert$ if and only if there exists a surjection $g:Y \to X$.

Proof. If there exsits an injection $f:X \to Y$, then the codomain restriction $f':X \to \mathrm{im} \, f$ of $f$ is a bijection. We now define a surjection $g:Y \to X$ as follows: fix $x_0 \in X$ and set

Conversely, if $g:Y \to X$ is a surjection, then the set $P=\{g^{-1}(\{x\}) : x \in X\}$ is a partition of $Y$.

To see this, we first note that every element of $Y$ must be mapped to an element of $X$, whence $\bigcup_{x \in X} g^{-1}(\{x\}) = X$. Moreover, $g$, as a function, can have only one value at each point, whence the preimage sets must be pairwise disjoint. Now, we define a function $f:X \to Y$ by defining $f(x)$ to be some element of the preimage $g^{-1}(\{x\})$. Since $P$ is a partition, $f$ must be injective. $\square$

Remark. In the proof, we have tacitly appealed to the axiom of choice (Axiom 4.2). Indeed, we used the choice function to choose an element from each set in the partition $\{g^{-1}(\{x\}) : x \in X\}$. The intuitive principle that we have alluded to above is formally established below:

Theorem 5.9 (Cantor–Bernstein). If $\vert X \vert \leq \vert Y \vert$ and $\vert Y \vert \leq \vert X \vert$, then $\vert X \vert = \vert Y \vert$.

Proof. We first show that if $A_1 \subseteq B \subseteq A$ and if $\vert A_1 \vert = \vert A \vert$, then $\vert A_1 \vert = \vert B \vert = \vert A \vert$. Let $\varphi:A \to A_1$ be a bijection. We define three collections of sets $\{A_n : n \in \mathbb{N}\}$, $\{B_n : n \in \mathbb{N}\}$, and $\{C_n : n \in \mathbb{N}\}$ as follows:

1. Let $A_0 = A$ and $B_0 = B$.
2. For each $n \geq 2$, we define $A_n = \varphi(A_{n-1})$.
3. For each $n \geq 1$, we define $B_n = \varphi(B_{n-1})$.
4. For each $n \in \mathbb{N}$, we define $C_n = A_n \smallsetminus B_n$.
5. Let $\displaystyle C = \bigcup_{n=0}^\infty C_n$.

By definition, $A_0 \supseteq A_1$. Since $A_2 = \varphi(A_1)$ must be a subset of $\mathrm{im} \, \varphi = A_1$, we see that $A_1 \supseteq A_2$. We now fix $N \geq 3$ and assume inductively that $A_{n-1} \supseteq A_n$ for all $% $. Since $A_{N-2} \supseteq A_{N-1}$, we see that

It follows that $A_{k-1} \supseteq A_k$ for all $k \in \mathbb{N}$. A similar argument shows that $B_{k-1} \supseteq B_k$ for all $k \in \mathbb{N}$. Since $A_0 \supseteq B_0$, we see that $A_1 = \varphi(A_0) \supseteq \varphi(B_0) = B_1$. Repeating this process, we conclude that $A_k \supseteq B_k$ for all $k \in \mathbb{N}$.

Here is a pictoral description of $A_k$, $B_k$, and $C$:

We now claim that $\varphi(C_k) = C_{k+1}$ for all $k \in \mathbb{N}$.

To see this, we fix $k \in \mathbb{N}$. Pick an arbitrary $x \in C_k$. Since $C_k \subseteq A_k$, we see that $\varphi(x) \in \varphi(C_k) \subseteq \varphi(A_k) = A_{k+1}$. Moreover, $x \notin B_k$, and so the injectivity of $\varphi$ implies that $\varphi(x) \notin \varphi(B_k) = B_{k+1}$. It follows that $x \in A_{k+1} \smallsetminus B_{k+1} = C_{k+1}$. Since $x$ was arbitrary, we conclude that $\varphi(C_k) \subseteq C_{k+1}$.

To show that $C_{k+1} = \varphi(C_k)$, we fix an arbitrary $y \in C_{k+1}$. Since $y \in C_{k+1} \subseteq A_{k+1}$, the relation $\varphi(A_k) = A_{k+1}$ implies that we can find an $x \in A_k$ such that $\varphi(x) = y$.

Now, if $x \in B_k$, then $\varphi(x) \in \varphi(B_k) = B_{k+1}$, which would, in turn, imply that $\varphi(x) \notin C_{k+1}$. This is evidently absurd, and so $x \notin B_k$. We conclude that $x \in C_k$. Since $y$ was arbitrary, the desired result follows.

The claim is now established.

An important consequence of the claim is that $\varphi(C) = \bigcup_{k=1}^\infty C_k$. We set $D = A \smallsetminus C$ and define

$f\vert_C$ is precisely the restriction $\varphi\vert_C$ of the injective function $\varphi$, and so it is injective. Similarly, $f\vert_D$ is the restriction of the identity function, and so it is injective. It follows that $f$ is injective. Moreover,

It follows that $f$ is a bijection from $A$ onto $B$, whence $\vert A \vert = \vert B \vert$, as was to be shown.

Let us now return to the general statement of the theorem. We assume that the two sets $X$ and $Y$ satisfy $\vert X \vert \leq \vert Y \vert$ and $\vert Y \vert \leq \vert X \vert$. This implies the existence of two injective functions $i:X \to Y$ and $j:Y \to X$. This, in particular, implies that the composition $j \circ i: X \to X$ is an injective function, and so $\vert X \vert = \vert \mathrm{im}\, (j \circ i) \vert$.

Now, we observe that $\mathrm{im} \, (j \circ i) = \mathrm{im} \, j\vert_{\mathrm{im} i} \subseteq \mathrm{im} \, j$. Since $\mathrm{im} \, j \subseteq X$, we conclude from what we have proved above that $\vert X \vert = \vert \mathrm{im} \, j \vert = \vert \mathrm{im} \, (j \circ i) \vert$. By injectivity of $j$, we see that $\vert \mathrm{im} \, j \vert = \vert Y \vert$, and thus $\vert X \vert = \vert Y \vert$, as was to be shown. $\square$

Many applications of the Cantor–Bernstein theorem require the axiom of choice. We study the cardinality of the $X$-fold cartesian product $X^X$ in Example 6.17.

## 6. Orders

Aside from the usual ordering relation $\leq$ on sets of numbers such as $\mathbb{N}$, $\mathbb{Z}$, $\mathbb{Q}$, and $\mathbb{R}$, the inclusion relation $\subseteq$ and the at-most-the-same-cardinality relation defined in Section 5 also compare “sizes” of two objects. Let us now abstract the key properties of these less-than-or-equal-to relations.

Definition 6.1. Let $X$ be a nonempty set. A partial order on $X$ is a relation $\preceq$ on $X$ that satisfies the following properties:

(1) reflexivity. $x \preceq x$ for all $x \in X$;

(2) antisymmetry.if $x \preceq y$ and $y \preceq x$, then $x = y$;

(3) transitivity. if $x \preceq y$ and $y \preceq z$, then $x \preceq z$.

A strict partial order on $X$ is a relation $\prec$ on $X$ that satisfies the following properties:

(1’) irreflexivity. $x \not\prec x$ for all $x \in X$;

(3) transitivity. if $x \prec y$ and $y \prec z$, then $x \prec z$.

A (strict) partial order $\preceq$ is a (strict) linear order, or a (strict) total order, if $\preceq$ satisfies the following additional property:

(4) trichotomy. if $x,y \in X$, then one of the following three must hold: $x \preceq y$, $x = y$, or $x \succeq y$.

A (strict) linear order $\preceq$ is a (strict) well-order if $\preceq$ satisfies the following additional property:

(5) well-ordering property. if $E$ is a nonempty subset of $X$, then there exists an element $m \in E$ such that $m \preceq x$ for all $x \in E$.

Example 6.2. It is easy to see that the usual ordering relations $\leq$ on $\mathbb{N}$, $\mathbb{Z}$, $\mathbb{Q}$, and $\mathbb{R}$, respectively, are linear orders. We have shown in the proof of Lemma 5.4 that $\mathbb{N}$ is well-ordered. Since the set $\{-n : n \in \mathbb{N}\}$ does not contain a minimal element, none of the other sets of numbers we have just listed is well-ordered. $\square$

Example 6.3. The is-a-subset-of relation $\subseteq$ on the powerset $\mathcal{P}(X)$ of any set $X$ is a partial order. (Please carry out a proof of this fact if you do not see it immediately!) If $X$ has two or more elements, however, then $\subseteq$ is not a linear order on $\mathcal{P}(X)$. To see this, we fix two distinct elements $x_0$ and $x_1$ of $X$; $\{x\}$ and $\{y\}$ are incomparable, in the sense that neither is a subset of the other. $\square$

Example 6.4. The at-most-the-same-cardinality relation on the powerset $\mathcal{P}(X)$ of any set $X$ is a partial order. Reflexivity is a consequence of the existence of the identity function $\mathrm{id} \, (x) = x$. Antisymmetry is a consequence of the Cantor–Bernstein theorem. Transitivity follows from the fact that the composition of two injective functions is injective. $\square$

The last example is worth a second look. It is not entirely obvious whether the relation should be a linear order. If the two sets $E,G \in \mathcal{P}(X)$ can be well, ordered, then, intuitively, we can match up the elements of $E$ and $G$ from the smallest to the largest until the elements of one of the sets runs out. This leads us to formulate the following result:

Theorem 6.5 (Well-ordering principle). For each set $X$, there exists a well-order $\preceq$ on $X$.

Note also that the process of matching up the elements of $E$ and $G$ might require infinitely many steps. Since the usual principle of mathematical induction is a finitary method, we need an extension of the principle that applies to infinite processes as well. For this, we ought to be able to count each step of these processes, which require an infinitary generalization of the natural numbers that admits its own version of the induction principle.

Definition 6.8. An ordinal number is a set $\alpha$ that satisfies the following properties:

(1) transitivity. every element of $\alpha$ is a subset of $\alpha$;

(2) strict well-ordering.  the is-an-element-of relation $\in$ is a strict well-order on $\alpha$.

We define the ordering relation $\prec$ by declaring $% $ if and only if $\alpha \in \beta$.

We see at once that $\varnothing$ is an ordinal number. To fit the natural numbers into the framework of ordinal numbers, we declare $0$ to be the empty set and carry out a set-theoretic construction of the natural numbers as follows:

• $1 = \{0\} = \{\varnothing\}$;
• $2 = \{0,1\} = \{\varnothing,\{\varnothing\}\}$;
• $3 = \{0,1,2\} = \{\varnothing,\{\varnothing\},\{\varnothing,\{\varnothing\}\}$;
• $\ldots$;
• $n = \{0,1,\ldots,n-1\}$.

With this definition, we see that $\in$ coincides with the usual order relation on $\mathbb{N}$. Moreover $n = \{k : k \prec n\}$ and $n+1 = n \cup \{n\}$ for each natural number $n$.

The set of natural numbers $\mathbb{N}$ is itself an ordinal number: transitivity is evident from the construction of the natural numbers, and strict well-ordering was already shown in the proof of Lemma 5.4. We write $\omega$ to denote the first infinite ordinal $\mathbb{N}$.

Note that $n \prec \omega$ for all $n \in \omega$. In fact, $\omega = \bigcup_{n \in \omega} n$. We therefore see that there are two kinds of ordinal numbers:

Definition 6.9. For each ordinal number $\alpha$, we define the successor of $\alpha$ to be the set $\alpha + 1 = \alpha \cup \{\alpha\}$. An ordinal number $\alpha$ is a successor ordinal if there exists an ordinal number $\beta$ such that $\alpha = \beta + 1$; if not, then $\alpha$ is said to be a limit ordinal.

The natural numbers other than $0$ are successor ordinals; $0$ and $\omega$ are limit ordinals. Since the union of two sets always exist, the successor of an ordinal always exists. Furthermore, the union

exists and is a limit ordinal, which we denote by $2\omega$. Similarly, we can construct the limit ordinal $k \omega$ for each $k \in \omega$. The analogous union

exists and is a limit ordinal, which we denote by $\omega^2$. We shall assume that this process can be continued indefinitely:

Assumption 6.10. We assume that many limit ordinals exist. Indeed, we assume that the following strictly well-ordered sequence of ordinals exists: \begin{align*} & 1 \prec 2 \prec \cdots \prec n \prec n+1 \prec \cdots \prec \omega \prec \omega+1 \prec \cdots \\ \prec \, & \omega + n \prec \omega + n + 1 \prec \cdots \prec 2 \omega \prec \cdots \prec n \omega \prec \cdots \\ \prec \, & \omega^2 \prec \cdots \prec \omega^3 \prec \cdots \prec \omega^\omega \prec \cdots \prec \omega^{\omega^\omega} \prec \cdots \\ \prec \, & \omega_1 = \omega^{ \omega^{ \omega^{ \cdot^{ \cdot^{ \cdot } } } } } \prec \cdots \prec \omega_2 = \omega_1^{ \omega_1^{ \omega_1^{ \cdot^{ \cdot^{ \cdot } } } } } \prec \cdots \prec \omega_n \prec \cdots \\ \prec \, & \omega_\omega \prec \cdots \prec \omega_{\omega_\omega} \prec \cdots \prec \omega_{\omega_\omega} \prec \cdots \end{align*}

Let us establish the principle of transfinite induction that was alluded to above.

Theorem 6.11 (Principle of transfinite induction). Let $\mathcal{C}$ be a collection of ordinal numbers. If $\mathcal{C}$ satisfies the following properties, then $\mathcal{C}$ contains all ordinal numbers:

(1) $0 \in \mathcal{C}$;

(2) if $\alpha \in \mathcal{C}$, then $\alpha + 1 \in \mathcal{C}$;

(3) if $\alpha$ is a nonzero limit ordinal and $\beta \in \mathcal{C}$ for all ordinals $\beta \prec \alpha$, then $\alpha \in \mathcal{C}$.

Remark. The collection that we have referred to in the statement above is actually a class, which, intuitively, is a collection of sets that may be too large to be a set. We will not attempt to formalize the notion of a class.

Proof. Suppose for a contradiction that $\mathcal{C}$ does not contain all ordinal numbers.

Since the ordinals are well-ordered, we can find the least ordinal $\alpha$ such that $\alpha \notin \mathcal{C}$. If $\alpha$ is a successor ordinal, then we can find $\beta \in \alpha$ such that $\beta + 1 = \alpha$.

Since $\beta \prec \alpha$, the ordinal $\beta$ must be in $\mathcal{C}$, whence by (2) $\alpha$ must be in $\mathcal{C}$ as well. If $\alpha$ is a limit ordinal, then (3) and the least-ordinal property of $\alpha$ implies that $\alpha$ must be in $\mathcal{C}$.

Both cases reach a contradiction, and so we conclude that $\mathcal{C}$ must contain all ordinal numbers. $\square$

We are now ready to prove the well-ordering principle.

Proof of the well-ordering principle. We use transfinite induction to label each element of the set $X$ by a unique ordinal, so that $X$ can be well-ordered by the well-ordering of the ordinal numbers. By the axiom of choice (Axiom 4.2), there exists a choice function $c$ on the set $\mathcal{P}(X) \smallsetminus \{\varnothing\}$ of nonempty subsets of $X$. We set $a_0 = c(X)$ and define $a_\alpha$ inductively for each ordinal $\alpha$ by setting

we terminate the process when $X \smallsetminus \{a_\beta : \beta \prec \alpha\}$ is empty.

We now let $\kappa$ be the least ordinal number such that $X = \{a_\beta : \beta \prec \kappa\}$. The set $\{a_\beta : \beta \prec \kappa\}$ consists precisely of the elments of $X$, labeled by ordinal numbers up to $\kappa$. We define an order on $X$ by declaring $a_{\alpha} \leq a_{\beta}$ if and only if $\alpha \prec \beta$ or $\alpha = \beta$. It follows at once that $\leq$ is a well-order on $X$. $\square$

Recall that a usual pplication of the finitary induction principle takes the form of demonstrating that the construction of a desired mathematical object can be done one step at a time. If the construction cannot be completed in finitely many steps, then the principle of transfinite induction assures that the process will be completed, provided that there are no obstructions on the way—that is, the method of construction behaves well at the limit ordinal stages.

We shall now capture this idea in a form that is particularly convenient to use. To this end, we need to introduce a few terms:

Definition 6.12. Let $X$ be a nonempty set, $\preceq$ be a partial order on $X$, and $E$ a subset of $X$. An upper bound of $E$ in $X$ is an element $u \in X$ such that $u \succeq x$ for all $x \in E$. A maximal element of $E$ is an element $m \in E$ such that no $x \in E$ satisfies the relation $x \succ m$.

Definition 6.13. Let $X$ be a nonempty set and let $\preceq$ be a partial order on $X$. A collection $E = \{x_\alpha : \alpha \in I\}$ of elements of $X$ is said to be a chain if $\preceq$ is a linear order on $E$, viz., every pair of elements of $E$ satisfies the trichotomy principle:

We are now ready to state and prove the ever-famous Zorn’s lemma:

Theorem 6.14 (Zorn’s lemma). Let $X$ be a nonempty set with a partial order $\preceq$. If every chain in $X$ has an upper bound in $X$, then $X$ has a maximal element.

Proof. We use transfinite induction to construct a chain in $X$ that contains a maximal element of $X$.

By the axiom of choice (Axiom 4.2), there exists a choice function $c$ on the set $\mathcal{P}(X) \smallsetminus \{\varnothing\}$ of nonempty subsets of $X$. We let $a_0 = c(X)$ and define $a_\alpha$ inductively for each ordinal $\alpha$ by declaring $a_\alpha$ to be an element of $X$ such that $a_\alpha \succ a_\beta$ for each ordinal $\beta \prec \alpha$; we terminate the process when there is no such element $a_\alpha$.

To check that this process continues until the point of termination, we must check the limit ordinal stages. But indeed, if $\alpha$ is a limit ordinal, then $% $ is a chain in $X$, whence by assumption we can find an upper bound $a_\alpha$. We can now find an ordinal $\kappa$ such that no $x \in X$ satisfies the relation $x \succ a_\kappa$. It follows that $a_\kappa$ is a maximal element of $X$. $\square$

Here is an archetypal Zorn-type argument in the context of linear algebra; proofs of this form are humorously referred to as Zornification arguments.

Example 6.15. We prove that every vector space has a basis. In fact, we establish a more general statement:

Let $V$ be a vector space over a field $F$. Every set of linearly independent vectors in $V$ is contained in a basis of $V$.

Fix a set $A$ of linearly independent vectors in $V$. If $A$ is a maximal linearly-independent set, viz., every superset $B$ of $A$ is linearly dependent in $V$, then $A$ is a basis of $V$. Indeed, if there exists a vector $v \in V$ such that no finite linear combination in $A$ equals $v$, then $A \cup \{v\}$ is a linearly independent superset of $A$, contradicting the maximality condition. We therefore assume that $A$ is non-maximal.

Let $(\mathcal{A},\preceq)$ be the collection of all linearly independent supersets of $A$ in $V$, ordered by the set-inclusion relation: $B \preceq C$ if and only if $B \subseteq C$. If $\{B_\alpha : \alpha \in I\}$ is a chain in $\mathcal{A}$, then $\bigcup_{\alpha \in I} B_\alpha$ is an upper bound of $\{B_\alpha : \alpha \in I\}$ in $\mathcal{A}$.

It now follows from Zorn’s lemma that there exists a maximal element $B$ of $\mathcal{A}$, which, by construction, is a maximal linearly-independent subset of $V$. $\square$

Transfinite induction arguments are often useful in comparing cardinalities of sets as well. We present two examples that makes use of the Cantor–Bernstein theorem (Theorem 5.9).

Example 6.16. Let $D$ and $E$ be infinite sets such that $D \cap E = \varnothing$. We show that $\vert D \cup E \vert$ equals $\vert D \vert$ if $\vert D \vert \geq \vert E \vert$, and $\vert E \vert$ if $\vert E \vert \geq \vert D \vert$.

We assume without loss of generality that $\vert E \vert \geq \vert D\vert$ and define a collection $\mathcal{A}$ of pairwise-disjoint collections $\{A_\alpha : \alpha \in I\}$ of subsets of $E$ such that $\vert A_\alpha \vert = \vert \mathbb{N} \vert$ for all $\alpha \in I$. We define a partial order $\preceq$ on $\mathcal{A}$ by declaring that $\{A_\alpha : \alpha \in I\} \preceq \{B_\beta : \beta \in J\}$ if and only if $I \subseteq J$ and $A_\alpha \subseteq B_\alpha$ for all $\alpha \in I$.

Similarly as in Example 6.15, each chain in $\mathcal{A}$ admits an upper bound, whence Zorn’s lemma furnishes a maximal collection $\{A_\alpha : \alpha \in I_*\}$ in $\mathcal{A}$.

We claim that $\bigcup_{\alpha \in I_*} A_\alpha = E$. If not, then $F = E \smallsetminus (\bigcup_{\alpha \in I_*} A_\alpha)$ is nonempty.

We note that $F$ must be finite. If $F$ is infinite, then $\vert F \vert \geq \vert \mathbb{N} \vert$, and so there exists a countably infinite subset $F'$ of $F$. Since $\{F\} \cup \{A_\alpha : \alpha \in I_*\}$ is a pairwise-disjoint collection that contains $\{A_\alpha : \alpha \in I_*\}$, the maximality condition is violated.

$F$ is finite, and so $\vert F \cup A_\alpha \vert = \vert A_\alpha \vert = \vert \mathbb{N} \vert$ for all $\alpha \in I_*$. Fixing $\alpha_0 \in I_*$ and defining

we see that $\{A_\alpha' : \alpha \in I_*\} \succeq \{A_\alpha : \alpha \in I_*\}$. The maximality condition is once again violated, and so we must conclude that $\bigcup_{\alpha \in I_*} A_\alpha = E$.

For each $\alpha \in I_*$, we find a bijection $b_\alpha:\mathbb{N} \to A_\alpha$. We can now construction a bijection $f:\mathbb{N} \times I_* \to E$ by setting $f(n,\alpha) = b_\alpha(n)$. That $f$ is a bijection follows from the fact that $\{A_\alpha : \alpha \in I_*\}$ is a partition of $E$.

We now let $\mathscr{E}$ and $\mathscr{O}$ be the sets of even and odd natural numbers, respectively, and let $e:\mathscr{E} \to \mathbb{N}$ and $o:\mathscr{O} \to \mathbb{N}$ be bijections. By setting $g(n,\alpha) = f(e(n),\alpha)$ and $h(m,\beta) =f(o(m),\beta)$, we obtain two bijections $g:\mathscr{E} \times I_* \to E$ and $h:\mathscr{O} \times I_* \to E$.

We now let $d:D \to E$ be an injection given by the relation $\vert D \vert \leq \vert E \vert$. We have constructed the following sequences of injections:

We define $h:D \cup E \to \mathbb{N} \times I_*$ by setting

$h$ is well-defined since $D \cap E = \varnothing$. The disjointness condition also implies that $h$ is injective, since both $h| _D$ and $h\vert_E$ are injective. It now follows that the composite function $f \circ h:D \cup E \to E$ is injective, and so $\vert D \cup E \vert \leq \vert E \vert$.

The reverse relation $\vert E \vert \leq \vert D \cup E \vert$ is obvious, whence we conclude from the Cantor–Bernstein theorem that $\vert D \cup E \vert = \vert E \vert$. $\square$

Example 6.17. Let $X$ be an infinite set. We generalize Example 5.3 and show that $\vert X \times X \vert = \vert X \vert$. Let us define a collection

This collection is nonempty. Since $X$ is an infinite set, we see that $\vert X \vert \geq \vert \mathbb{N} \vert$, whence there is a countably infinite subset $D_0$ of $X$. We have seen in Example 5.3 that $\vert D_0 \times D_0 \vert = \vert D_0 \vert$, and so $\mathcal{A}$ is nonempty.

Let us now define a partial order on $\mathcal{A}$ by declaring $(a,D) \preceq (a',D')$ if and only if $D \subseteq D'$ and $a = a'\vert_D$, viz., if $a'$ is an extension of $a$. if  If $\{(a_\alpha,D_\alpha) : \alpha \in I\}$ is a chain in $\mathcal{A}$, then the ordered pair $(a,D_\infty)$ defined by setting $D_\infty = \bigcup_{\alpha \in I} D_\alpha$ and

is an upper bound of the chain. Note that the extension property in the definition of $\preceq$ guarantees that $f$ is well-defined. We now apply Zorn’s lemma to construct a maximal ordered pair $(A,D)$ in $\mathcal{A}$.

It suffices to check that $D = X$. We suppose for a contradiction that $D \neq X$. If $\vert D \vert = \vert X \vert$, then we can find a bijection $b:X \to D$, whence we can define an extension $A':X \times X \to X$ of $A$ by setting $A'(x,y) = (b^{-1} \circ A)(b(x),b(y))$. As $A'$ is a bijection, the maximality condition on $(A,D)$ is violated.

We therefore assume that $% $.

Let $E = X \smallsetminus D$. We must have that $\vert E \vert = \vert X \vert$. Indeed, if $% $, then, by Example 6.16, the set $X = D \cup E$ is equinumerous to either $D$ or $E$, which is absurd.

Let us now fix a subset $E'$ of $E$ such that $\vert E' \vert = \vert D \vert$. Since $D$ and $E$ are disjoint, we see that $E' \times E'$, $E' \times D$, and $D \times E'$ are pairwise disjoint. It now follows from Example 6.16 that

is equinumerous to $E' \times E'$. Recalling that $\vert E' \vert = \vert D \vert$ and $\vert D \vert = \vert D \times D \vert$, we conclude that $\vert G \vert = \vert E' \vert$. Let $g:E' \to G$ be a corresponding bijection.

We now define a bijection $h:D \cup E' \to (D \times D) \cup G$ by setting

$h$ is well-defined, as $D \cap E' = \varnothing$. The disjointness condition also implies that $h$ is a bijection, as $h\vert_D:D \to D \times D$ and $h\vert_{E'}:E' \to G$ are bijections.

Since $(D \times D) \cup G = (D \cup E) \times (D \cup E)$, it follows that $(h, D \cup E) \succeq (A,D)$. contradicting the maximality of $(A,D)$. We therefore conclude that $D = X$, and so $\vert X \times X \vert = \vert X \vert$.

As an application of the equinumerosity statement, we now show that $\left\vert X^X\right \vert = \left\vert \{0,1\}^X\right \vert$. By fixing two distinct elements $x_0,x_1 \in X$, we can construct an injection $\phi:\{0,1\}^X \to X^X$ by setting

Our goal is to construct an injection in the opposite direction and apply the Cantor–Bernstein theorem. To this end, we shall show that

$\left\vert X^X\right\vert \leq \left\vert \mathcal{P}(X)^X \right\vert = \left\vert (\{0,1\}^X)^X \right\vert = \left\vert \{0,1\}^{X \times X} \right\vert = \left\vert \{0,1\}^X \right\vert$.

To this end, we first construct a bijection $\varphi:\mathcal{P}(X) \to \{0,1\}^X$ by defining, for each $E \subseteq X$, the value of $\varphi(E)$ to be the function $f_E:X \to \{0,1\}$ defined by the formula

$\varphi$ is a bijection: given $f_E \in X \in \{0,1\}^X$, we can recover the set $E$ by considering the preimage $f^{-1}_E(\{1\})$.

Fix $x_* \in E$; the function $\Psi:X^X \to \mathcal{P}(X)^X$ that is defined by the formula $\Psi(f)(x) = \{f(x)\}$ is an injection. Therefore, $\left\vert X^X\right\vert \leq \left\vert \mathcal{P}(X)^X \right\vert$.

The function $\psi:\mathcal{P}(X)^X \to (\{0,1\}^X)^X$ that sends $f:X \to \mathcal{P}(X)$ to the composite function $\varphi \circ f:X \to \{0,1\}^X$ is a bijection. Indeed, $f = \varphi^{-1} \circ \psi(f)$. Therefore, $\left\vert \mathcal{P}(X)^X \right\vert = \left\vert (\{0,1\}^X)^X \right\vert$.

Now, we construct a bijection $\Gamma:(\{0,1\}^X)^X \to \{0,1\}^{X \times X}$ as follows. An element $f$ of $(\{0,1\}^X)^X$ is a function that sends $x \in X$ to a function $f(x):X \to \{0,1\}$, so that $f(x)(y) \in \{0,1\}$ for all $x,y \in X$. It is then natural to set $\Gamma(f)(x,y) = f(x)(y)$ for each ordered pair $(x,y) \in X \times X$. We note that $\Gamma$ is a bijection: given $g \in \{0,1\}^{X \times X}$, we can recover a $\{0,1\}^X$-valued function $f$ on $X$ by setting $f(x)(y) = g(x,y)$. It follows that $\left\vert (\{0,1\}^X)^X \right\vert = \left\vert \{0,1\}^{X \times X} \right\vert$.

To prove the relation $\left\vert\{0,1\}^{X \times X}\right\vert = \left\vert \{0,1\}^X \right\vert$, we take the bijection $A:X \times X \to X$ that we have constructed above. The function $\Lambda:\{0,1\}^{X \times X} \to \{0,1\}^X$ that sends $f:X \times X \to \{0,1\}$ to the composite function $f \circ A^{-1}:X \to \{0,1\}$ is a bijection, as $f = (f \circ A^{-1}) \circ A$. We have thus produced the following sequence of injections:

The composite function $\Lambda \circ \Gamma \circ \psi \circ \Psi$ is an injection from $X^X$ to $\{0,1\}^X$, and so $\vert X^X \vert \leq \vert \{0,1\}^X \vert$. Since we have already shown that $\vert \{0,1\}^X \vert \leq \vert X^X \vert$, it follows from the Cantor–Bernstein theorem that $\vert \{0,1\}^X \vert = \vert X^X \vert$, as was to be shown. Since we have also shown that $\vert \{0,1\}^X \vert = \vert \mathcal{P}(X) \vert$, we now know that the relation $\vert \mathcal{P}(X) \vert = \vert X^X \vert$ holds as well. $\square$

Exercise 6.18. Modify the above proof to show that $\vert X^Y \vert = \vert\mathcal{P}(Y)\vert$, provided that $\vert Y \vert \geq \vert\mathbb{N}\vert$ and $\vert\{0,1\}\vert \leq \vert X \vert \leq \vert \mathcal{P}(Y) \vert$.

## 7. Additional Remarks and Further Results

7.1. The approach to set theory in this post is naïve in the sense that several assumptions are made in order to bypass the technical details and to streamline the exposition. All assumptions made in this post are provable facts in the standard framework of axiomatic set theory;  an excellent exposition of axiomatic set theory at the undegraduate level can be found in the textbook of Hrbacek and Jech.

7.2. We cheated our way through the theory of ordinals by simply assuming the existence of many, many ordinals. In particular, we have completely bypassed the recursion theorem, which validates certain recursive constructions that we have tacitly made use of. See Capter 6 of Hrbacek and Jech.

7.3. We also did not cover the theory of cardinal numbers at all. Since the equinumerosity relation is an equivalence relation, we can declare one set from each “equivalence class” to be the cardinal number representing the size of the sets in the equivalence class. Alternatively, we can define certain ordinal numbers to be cardinal numbers: see the wikipedia article on aleph numbers. An important topic in the theory of cardinal numbers is cardinal arithmetic, which we have sampled in Example 6.16 and Example 6.17. See Chapters 7 and 9 of Hrbacek and Jech.

7.4. The diagonal argument in Example 5.6 can be generalized to prove Cantor’s theorem: $% $ for every set $X$. This, in particular, shows that there are many, many cardinal numbers. The question of whether there is a set $Y$ such that $% $ leads us to the continuum hypothesis, which is a foundational problem in the theory of set-theoretic independence. My blog post on the Kirby-Paris Hydra game provides an introduction to the independence issues in set theory; see Section 6 of the post for further references.

7.5. The two main consequences of the axiom of choice in the post—the well-ordering principle and Zorn’s lemma—are provably equivalent to the axiom of choice. The proof that either of the two results implies the axiom of choice can be found in Chapter 8 of Hrbacek and Jech. On the other hand, it is not known whether the partition principle (Proposition 5.8) implies the axiom of choice: see the discussion on Math.SE.

7.6. We omitted an important property of the canonical surjection map $p:X \to X/\sim$ with respect to an equivalence relation $\sim$. The universal property of quotient maps states that if $f:X \to Y$ is a function such that $f(x) = f(x')$ whenever $x \sim x'$, then there exists a unique injective function $\bar{f}:X/\sim \to Y$ such that $f = \bar{f} \circ p$. Variants of this property appear in the context of groups, rings, vector spaces, fields, and other algebraic objects. My blog post on commutative diagrams contains the statement of the universal property of quotient maps in the context of group theory.

Thanks to Colin Beeh, John Ryan, and LDH for corrections!

Tags:

Categories:

Updated: