# Basic Sorting Algorithms

What does it mean to **sort** a list of objects? Most objects of numeric type can be ordered by the usual ordering of numbers, which certainly qualifies as sorting. Non-numeric objects can also be ordered in various ways. For example, strings can be ordered by the lexicographical order, which compares one letter at a time using the usual ordering of letters in the alphabet.

It is, then, reasonable to assert that we can talk about sorting on any class of objects that comes with the binary operations < , <=, >, >=, ==, and !=. (Formally, this is called a totally ordered set.) While there are some sorting algorithms that do not make use of the ordering on a class of objects, most standard sorting algorithms are, indeed, **comparison-based**.

# Quadratic-time Sorting Algorithms

We begin with an analysis of a few sorting algorithms, which are simple but inefficient.

## 1. Bubble Sort

The first algorithm we consider is **bubble sort**. The bubble sort algorithm sorts a list by swapping two adjacent terms as needed.

```
def bubble_sort(L):
n = len(L)
while True:
swapped = False
for i in range(1,n):
if L[i-1] > L[i]:
L[i-1], L[i] = L[i], L[i-1]
swapped = True
if swapped is False:
break
return L
```

### 1.1. Correctness Proof

To see that the above algorithm is correct, we fix a list L and assume that

are the elements of L, sorted. We shall show that, after k iterations of the `while`

loop, we have

for all . It then follows that L is sorted after at most n iterations of the `while`

loop, and one more iteration results in the termination of the algorithm.

We prove the assertion via mathematical induction.

we assume for now that k = 1. If

for some , then , whence and are swapped. By the same logic, a swap is made at each subsequence index, resulting in at the end of the first iteration of the `while`

loop.

We now assume inductively that, for a fixed , the first iterations of the `while`

loop yielded

for all . At the end of the th iteration of the `while`

loop, we let x be the smallest index such that

.

The inductive hypothesis implies that , and that is the maximum of the sublist

This means that, after the execution of the `for`

loop from to , we end up with

At this point, L[n-k] is the minimum of the sublist

and so no further swaps are made. It follows that

for all at the end of the th iteration of the `while`

loop.

The desired result now follows from mathematical induction.

### 1.2. Time Complexity Analysis

We have shown above that the algorithm terminates after at most iterations of the `while`

loop. During the th iteration of the `while`

loop, swaps are made only within the sublist

Therefore, at most steps of computations are executed during the th iteration of the `while`

loop, where is the number of steps of computation required to execute the swapping `if`

block. Let’s say , as swapping typically takes three steps of computation: save the value of in an auxiliary variable ; set the value of to be the value of ; set the value of to be the value of .

It follows that that at most

steps of computations are executed during the full execution of `bubble_sort()`

.

To see that is the optimal upper bound, we consider the list

the list of integers from to in reverse order. In this case, is true at every iteration, whence the upper bound calculated above is achieved.

To see that the algorithm does not run in time, we consider the list

the list of integers from to , ordered. In this case, no swap is made, and so the `while`

loop terminates after one iteration. Since the `for`

loops runs from to , we conclude that the algorithm terminates in time for this list.

### 1.3. Space Complexity Analysis

All that is needed is an auxiliary variable to facilitate the swapping, and so the space complexity is bounded above by .

### 1.4. In-Place Algorithm

We say that an algorithm is **in place** if an algorithm can be executed using minimal additional storage beyond saving the input data. The definition of *minimal* is context-dependent, but an algorithm with the space complexity of should always be considered in-place. This means, in particular, that bubble sort is in-place.

## 2. Selection Sort

We now consider **selection sort**, which allows for non-adjacent swaps.

```
def selection_sort(L):
n = len(L)
for i in range(n):
min_index = i
for j in range(i,n):
if L[j] < L[min_index]:
min_index = j
L[i], L[min_index] = L[min_index], L[i]
return L
```

### 2.1. Correctness Proof

To see that the above algorithm is correct, we fix a list L and assume that

are the elements of L, sorted. We shall show that, after k iterations of the outer `for`

loop, we have

for all . It then follows that L is sorted after the termination of the outer `for`

loop.

We first note that, given , the outer `for`

loop searches for the minimum value among

and places it at L[i], while leaving

unchanged. Therefore, the first iteration of the outer `for`

loop yields the identity

which remains true thereafter.

We now fix and assume inductively that the first iterations of the outer `for`

loop yielded the identity

for each . This, in particular, implies that

We then see that the *k* th iteration of the outer `for`

loop yields the identity

, while leaving unchanged. The desired result now follows from mathematical induction.

### 2.2. Time Complexity Analysis

During the th iteration of the outer `for`

loop, the conditional

is checked (n-1)-k times. Therefore, the algorithm goes through at least

steps of computation.

On the other hand, `min_index`

is altered at most times during the th iteration of the outer `for`

loop. Moreover, the swapping step

adds three steps of computation to each iteration of the outer `for`

loop. It follows that `selection_sort()`

goes through at most

steps of computation.

We conclude that the algorithm runs in time.

### 2.3. Space Complexity Analysis

We need two auxiliary variables: `min_index`

, and another to facilitatie swapping. It follows that the space complexity is bounded above by O(1).

## 3. Insertion Sort

Unlike bubble sort and selection sort, **insertion sort** makes use of insertion, which is an O(n) operation. Despite this drawback, insertion sort tends to be more efficient than bubble sort or selection sort in practice. Moreover, insertion sort also satisfies a number of nice properties: adaptive, stable, in-place, online. We shall discuss these properties in due course.

```
def insertion_sort(L):
n = len(L)
for i in range(1, n):
x = L[i]
j = i - 1
while (j >= 0) and (L[j] > x):
L[j+1] = L[j]
j = j - 1
L[j+1] = x
return L
```

### 3.1. Online Algorithm

We say that an algorithm is online if the algorithm does not require all information about the input before it executes. In other words, an online algorithm can process the input data *serially*, i.e., as it comes in.

Why is insertion sort an online algorithm? During the th iteration of the `for`

loop, we see that the loop reads and modifies only the sublist `L[:k+1]`

. Therefore, to execute `insertion_sort()`

, it suffice sto know one extra term of `L`

at each iteration of the `for`

loop.

### 3.2. Correctness Proof

To see that the above algorithm is correct, we fix a list L. We shall show that the string of inequalities

holds after iterations of the `for`

loop. It follows that L is sorted after the `for`

loop is terminated.

During the first iteration of the `for`

loop, and . If , then , and the `while`

loop swaps and . If , then , and the `while`

loop is bypassed. In either case, we end up with at the end of the first iteration.

We now fix and assume inductively that

holds after iterations of the `for`

loop. During the th iteration of the `for`

loop, the `while`

loop goes through the ordered sublist

in reverse order and shifts each term to the right until an index such that

is found. At this point, is inserted in between and , and the `for`

loop moves onto its next iteration. It now follows from the inductive hypothesis that

at the end of the th iteration of the `for`

loop. The desired result now follows from mathematical induction.

### 3.3. Stable Sorting Algorithm

We say that a sorting algorithm is **stable** if every pair of terms with the same value remains in the same order after the sorting process terminates. Insertion sort is stable, as each term is moved to the left without crossing any term with the same value.

Stability is a desirable property when a data set needs to be sorted in accordance with more than one ordering. For example, if we are to sort a list of names, in the alphabetical order, by last name AND first name, we can run a stable sort on last names and then run another stable sort on first names. Running an unstable sort twice might mix up the order of last names.

### 3.4. Time Complexity Analysis

During the *k* th iteration of the `for`

loop, the `while`

loop runs through at most iterations. It follows that `insertion_sort()`

goes through at most

steps of computation.

To see that is the optimal upper bound, we consider the list

the list of integers from to in reverse order. During each iteration of the `for`

loop, the `while`

loop must run through the maximum number of iterations to perform insertion at the right spot. It follows that the runtime is achieved in sorting L.

To see that the algorithm does not run in time, we consider the list

the list of integers from to , ordered. In this case, the `while`

conditional is never satisfied, and no insertion is performed. Therefore, each iteration of the `for`

loop contributes three steps of computation, and so the `insertion_sort()`

sorts in time.

### 3.5. Runtime Comparison: Bubble Sort, Selection Sort, and Insertion Sort

Even though all three sorting algorithms have the optimal upper bound of , they have different running times in practice.

```
import random
L1 = random.sample(range(100000),1000)
L2 = random.sample(range(100000),1000)
L3 = random.sample(range(100000),1000)
```

```
%%timeit
bubble_sort(L1[:])
bubble_sort(L2[:])
bubble_sort(L3[:])
```

```
1 loop, best of 3: 226 ms per loop
```

```
%%timeit
selection_sort(L1[:])
selection_sort(L2[:])
selection_sort(L3[:])
```

```
10 loops, best of 3: 61.5 ms per loop
```

```
%%timeit
insertion_sort(L1[:])
insertion_sort(L2[:])
insertion_sort(L3[:])
```

```
10 loops, best of 3: 65.5 ms per loop
```

As you can see above, both selection sort and insertion sort perform far better than bubble sort. This is to be expected, as our computations show bubble sort requires approximately 5 times as many steps of computation as selection sort or insertion sort. The definition of a *step of computation* we use here is vague, and our computations do not reflect the precise amount of time it takes to run the algorithms. Nevertheless, we are able to gather useful approximations from them.

### 3.6. Adaptive Sorting Algorithm

Between selection sort and insertion sort, which performs better in general?

Intuitively, the answer should be insertion sort: selection sort always runs in time, whereas insertion sort can run even in time, depending on the input.

The underlying reason for this disparity is that selection sort does not take into account the order already present in the input data, whereas insertion sort does. Evidently, selection sort does not care how ordered the input data is, as its running time is always .

To see that insertion sort is sensitive to the order prsent in the input data, we define an **inversion** in a list to be a pair indices such that but . We can think of the total number of inversions as a measure of disorder in . If is ordered, then there are no inversions in . If the terms of are arranged in reverse order, then there are inversions in , where is the length of the list .

Note that, during the th iteration of the `for`

loop, the `while`

conditional `(j >= 0) and (L[j] > x)`

is satisfied if and only if is an inversion. Therefore, the number of steps of computation `insertion_sort()`

performs is

where denotes the total number of inversions in the list .

We say that a sorting algorithm is **adaptive** if the algorithm can make use of the order already present in the input data to speed up the sorting process. The number of inversions is not the only measure of order in a list: see Estivill-Castro/Wood, “A survey of adaptive sorting algorithms” for details.

### 3.7. Space Complexity Analysis

We keep two auxiliary variables: `x`

and `j`

. It follows that the space complexity is bounded above by , and so insertion sort is an in-place algorithm.

# Sorting Algorithms

Note that sorting a list of terms entails choosing one out of the possible arrangements of the terms. By choosing two terms and of and comparing them, we can eliminate half of the possibilities. Indeed, if , then no arrangement of the terms with can be the sorted list; if , then no arrangement of the terms with can be the sorted list. Repeatedly applying this line of argument shows that we need to perform at least many comparisons to find the sorted list. Stirling’s approximation

now implies that we need to perform at least

many comparisons to find the sorted list.

Below, we shall study two sorting algorithms with the runtime of . We shall also study an algorithm that, despite its worst-case runtime, often runs faster than the two sorting algorithms.

## 4. Merge Sort

Unlike the sorting algorithms we have considered so far, merge sort approaches sorting from a recursive angle. If a list consists of two sorted sublists, then it is not too difficult to *merge* them into an ordered list. In light of this, we can recursively divide a list into sublists, order them, and then merge them back to obtain the ordered list. This algorithm is spelled out below:

```
import math
def merge_sort(L):
n = len(L)
if n == 1:
return L
else:
midpoint = math.floor(n/2)
left, right = L[:midpoint], L[midpoint:]
return merge(merge_sort(left), merge_sort(right))
def merge(left, right):
l, m = len(left), len(right)
i, j = 0, 0
merged = []
while i < l and j < m:
if left[i] < right[j]:
merged.append(left[i])
i += 1
else:
merged.append(right[j])
j += 1
if i < l:
merged += left[i:]
else:
merged += right[j:]
return merged
```

### 4.1. Correctness Proof

Note that the only time `merge_sort()`

returns a list without recursively calling itself only if the input list is of length 1. In other words, `merge_sort(L)`

splits the list into sublists of length 1, and then calls `merge()`

to merge them back. We shall show that, given two sorted lists `left`

and `right`

, the list

is sorted as well. Since all lists of length 1 are trivially sorted, this result implies the correctness of `merge_sort()`

.

Assume that `left`

and `right`

are sorted lists. The `while`

loop in `merge()`

checks the smallest terms of both `left`

and `right`

and moves the smaller one into `merged`

. This is continued until we run out of terms of either `left`

or `right`

. Evidently, `merged`

is a sorted list.

From there, whatever remains of `left`

or `right`

is appended to `merged`

. By construction, the remaining terms are larger than all the terms in `merged`

. Therefore, appending the remaining terms, which are sorted by assumption, to the sorted list `merged`

results in a sorted list. This completes the proof.

### 4.2. Time Complexity Analysis

Let denote the worst-case runtime of `merge_sort()`

on a list of length . Observe that

where is the number of steps of computation it takes to execute `merge()`

on two lists of length . The upper bound comes from the following facts:

`merge_sort()`

contains 7 lines of code.`merge_sort()`

on a list of length calls`merge_sort()`

on lists of length twice.`merge_sort()`

calls`merge()`

on lists of length once.

Now, `merge()`

merely iterates through `left`

and `right`

, so `merge()`

runs in time. Therefore, we have the inequality

Applying the inequality repeatedly, we obtain the following:

Since is the universal lower bound for comparison-based sorting algorithms, we conclude that the time complexity of `merge_sort()`

is .

### 4.3. Is Merge Sort Always Fast?

Since Merge sort always runs in time, it is not an adaptive sorting algorithm. The runtime of insertion sort can be as low as , and so we expect insertion sort to run faster than merge sort for lists that are largely sorted.

```
%%timeit
insertion_sort(list(range(100000)))
```

```
100 loops, best of 3: 12.6 ms per loop
```

```
%%timeit
merge_sort(list(range(100000)))
```

```
10 loops, best of 3: 187 ms per loop
```

Moreover, sorting algorithms can run faster than merge sort on small arrays.

```
%%timeit
insertion_sort(list(range(10,0,-1)))
```

```
100000 loops, best of 3: 4.79 µs per loop
```

```
%%timeit
selection_sort(list(range(10,0,-1)))
```

```
100000 loops, best of 3: 5.41 µs per loop
```

```
%%timeit
merge_sort(list(range(10,0,-1)))
```

```
100000 loops, best of 3: 9.98 µs per loop
```

This is because asymtotpic analysis is useful only for **large** .

What do we mean by this?

Given two positive constants and , we observe that

and so there exists a number such that

whenever . Therefore,

for all . This is what we mean when we say asymptotically dominates .

On the other hand, attains its minimum at and is an increasing function on . (Take this for granted if you don’t know differential calculus.) Therefore, so long as

we can find a positive integer such that

Therefore, if , there exists a positive integer such that

In other words, it is possible for an algorithm to run faster than an algorithm for a small .

### 4.4. Space Complexity Analysis

`merge()`

builds an auxiliary list `merged`

, which can be as long as the sum of the lengths of the two sublists `left`

and `right`

. Beyond this, there are only a few auxiliary variables, and so we conclude that merge sort requires memory space to run.

It is possible to construct an in-place variant of merge sort, but the currently existing algorithms are rather complicated. See Denham Coates-Evelyn, “In-Place Merging Algorithms” for details.

## 5. Heapsort

We now address the space complexity problem of merge sort by introducing an in-place -time sorting algorithm, commonly known as **heapsort**.

### 5.1. Graphs and Trees

In order to describe heapsort, we need to introduce a few mathematical concepts.

A **graph** consists of a set of **vertices** (also known as **nodes**) and a set of **edges** between pairs of vertices. We say that a graph is **connected** if, for each pair of vertices and , there exists a sequence of edges that connects and :

A **cycle** is a sequence of edges that starts and ends at the same vertex.

We say that a graph is a **tree** if it is connected and contains no edges. By picking a vertex of a tree to be the **root**, we can create a **rooted tree**.

A rooted tree can be oriented by thinking of the root as the *top* vertex of the graph, with branches going downwards. Specifically, we measure the distance from the root to a vertex by counting the number of edges it takes to get from the root to that vertex. This number is unique, as is connected and contains no cycles. We define the **depth** of the rooted tree to be the maximal distance from the root to a vertex of .

Given this orientation, we consider a pair of vertices connected by an edge. The vertex closer to the root is called a **parent**, and the vertex further away from the root is called a **child**. Two children of a same parent are referred to as **siblings**, and vertices with no children are called **leaves**.

A **binary tree** is a rooted tree such that every vertex has up to two children. A **complete binary tree** is a binary tree such that every parent has exactly two children, and that all of the leaves are of the same distance from the root. An **almost complete binary tree** is a generalization of a complete binary tree: it is a binary tree such that every parent except the parents of the leaves has exactly two children. Another way to think of a complete binary tree is that it is a complete binary tree with some of the leaves removed.

### 5.2. Heaps

A **min-heap** is an almost complete binary tree with a value at each node such that whenever is a parent of . Similarly, a **max-heap** is an almost complete binary tree with a value at each node such that whenever is a parent of .

Note that the root node contains the minimum value in a min-heap, and the maximum value in a max-heap. Therefore, if we can build a heap efficiently from a list, then we would have an efficient sorting algorithm.

Before we dig in, let us discuss an implementation of a heap as a list. Let the root node have index 0, let its children have indices 1 and 2, let their children have indices 3, 4, 5, 6, and so on. By this construction, the children of a node at index are located at index and index . Moreover, the parent of a node at index is located at index .

We shall now refer to each node on a binary tree by their indices. We call the child node with the smaller index the **left child node** and the child note with the larger index the **right child node**.

Let us first consider a small binary tree consisting of three nodes: . fails to be a max heap, as . In order to turn into a max-heap, we can swap the root with its right child node, resulting in .

A similar approach works for building a heap from a larger almost-complete binary tree. We let be a binary tree with a value attached at each node. Fix a node and assume that the subtrees formed by taking and as the root nodes, rspectively, are heaps. For the sake of simplicity, we assume that they are maxeaps.

If the subtree formed by taking as the root node is not a max-heap, then we push down the node until a max-heap is formed. By “push down”, we mean swapping a parent node with the smaller of the two child nodes. This process terminates in finitely many steps and results in a heap.

Since every binary tree consisting of three nodes can be rearranged into a heap, it follows from mathematical induction on the height of almost-complete binary trees that the above process turns any almost-complete binary tree into a heap.

We can apply the same process to devise a method of removing a node from a heap while maintaining the heap property: swap the node to be removed with a leaf, remove the leaf, and rearrange the rest into a heap.

### 5.3. Heapsort

Once we have a heap, it is not hard to form a sorted list consisting of the values of the nodes. We start by building a max-heap. The root node contains the maximum, so we swap the root node with the last element of the list.

The list without the last element is an almost-complete binary tree with nodes. The subtrees with the left child node of the root node and the right child node of the root node, respectively, are heaps, but the whole tree may not be a heap. We therefore push down the root node until we obtain a heap. The root node now contains the maximum, so we swap the root node witht the last element of the list. Repeating this process, we obtain a sorted list.

Let us now formalize the above discussion:

```
import math
def heap_sort(L):
build_max_heap(L)
n = len(L)
for i in range(n-1,0,-1):
L[0], L[i] = L[i], L[0]
push_down(L[:i], 0)
return L
def build_max_heap(L):
n = len(L)
for i in range(math.floor((n+1)/2),-1,-1):
push_down(L,i)
def push_down(L,i):
left, right = 2*i + 1, 2*i + 2
n = len(L)
if left < n and L[left] > L[i]:
max_index = left
else:
max_index = i
if right < n and L[right] > L[max_index]:
max_index= right
if max_index != i:
L[i], L[max_index] = L[max_index], L[i]
L = push_down(L, max_index)
```

### 5.4. Time Complexity Analysis

We begin by analyzing the time complexit of `push_down()`

. We let be the number of nodes on a subtree rooted at index . We write to denote the worst-case runtime of `push_down(L,i)`

. consists of a constant time to check the `if`

-`else`

conditions, and a possible recursion call to . If the subtree rooted at `max_index`

has all the leaves and the subtree rooted at the other child node of `i`

has no leaves, then, by the binary property, the number of nodes on the former tree is twice as much as the number of nodes on the latter subtree. It follows that

Applying the inequality repeatedly, we obtain the following:

It follows that `push_down`

runs in time.

Since `build_max_heap`

calls `push_down`

times, it runs in time. `heap_sort`

calls `build_max_heap`

once and `push_down`

times, and so it runs in time as well. By the universal lower bound on comparison-based sorting algorithms, we conclude that heapsort runs in time.

### 5.5. Space Complexity Analysis

The functions defined above operate on the same list without making use of an auxiliary list. It follows that the space complexity of heap sort is . Therefore, heapsort addresses merge sort’s lack of the in-place property.

### 5.6. Heapsort, Compared to Merge Sort

In practice, heapsort tends to be slower than merge sort.

```
import random
L1 = random.sample(range(100000),1000)
L2 = random.sample(range(100000),1000)
L3 = random.sample(range(100000),1000)
```

```
%%timeit
merge_sort(L1)
merge_sort(L2)
merge_sort(L3)
```

```
100 loops, best of 3: 5.83 ms per loop
```

```
%%timeit
heap_sort(L1)
heap_sort(L2)
heap_sort(L3)
```

```
100 loops, best of 3: 12.7 ms per loop
```

As is the case with bubble sort, selection sort, and insertion sort, algorithms with the same big-O bound can differ in running time, based on the constants the big-O bounds are hiding away.

## 6. Quicksort

Let us now discuss quicksort, an in-place sorting algorithm that, for the most part, runs faster than merge sort or insertion sort. The algorithm choose a **pivot** and divides the input list into sublists
The algorithm is then recursively applied to the two sublists, resulting in a sorted list.

The choice of pivot severely affects the efficiency of quicksort. Indeed, if the partition process results in a sublist just as big as the original list and an empty sublist, there is no reduction in the complexity of the problem. In what follows, we shall study three methods of choosing a pivot and discuss their effectiveness.

In all variants, the sorting operations are performed directly on the input list, thus the resulting sorting algorithms are in-place algorithms.

### 6.1. Quicksort with a Fixed Pivot

Quicksort with a fixed pivot is simple: each time, choose a pivot to be the element at a fixed location of the list. It is typical to choose either the first element or the last element of the list. Following CLRS, we shall choose the last element as the pivot.

```
def quick_sort_fixed_pivot(L):
n = len(L)
sort_fixed_pivot(L,0,n-1)
return L
def sort_fixed_pivot(L, p, r):
if p < r:
pivot_index = partition_fixed_pivot(L, p, r)
sort_fixed_pivot(L,p,pivot_index-1)
sort_fixed_pivot(L,pivot_index + 1, r)
def partition_fixed_pivot(L, p, r):
pivot = L[r]
i = p-1
for j in range(p,r):
if L[j] <= pivot:
i = i+1
L[i], L[j] = L[j], L[i]
i = i+1
L[i], L[r] = L[r], L[i]
return i
```

`sort_fixed_pivot()`

on a sublist `L[p:r+1]`

chooses `L[r]`

to be the pivot, via `partition_fixed_pivot()`

. `L[p:r+1]`

is divided into three sublists: the list of elements smaller than or equal to the pivot, the singleton list containing the pivot, and the list of elements larger than the pivot. `sort_fixed_pivot`

then recurses on to the first and the third sublists, sorting the sublists.

If `L`

is already sorted, then `partition_fixed_pivot(L, p, r)`

always returns `r`

. Therefore, `sort_fixed_pivot`

splits `L[p:r+1]`

into `L[p:r]`

, `[L[r]]`

, and `[]`

. Since `sort_fixed_pivot()`

can only to reduce the size of the list to be sorted by 1, it must go through iterations to sort an already-sorted list of size . Specifically, at the th iteration, `partition_fixed_pivot(L, 0, n-k)`

is executed, and `sort_fixed_pivot(L, 0, n-k-1) + [L[n-k]]`

is returned.

It therefore suffices to examine the time complexity of `partition_fixed_pivot(L, 0, n-k)`

. Since every instance of the `if`

staement is executed, the runtime of `partition_fixed_pivot(L, 0, n-k)`

is bounded below by . It follows that the runtime of `quick_sort_fixed_pivot`

on a sorted list of length `n`

is bounded below by

### 6.2. Quicksort with a Randomized Pivot

```
import random
def quick_sort_randomized_pivot(L):
n = len(L)
sort_randomized_pivot(L, 0 , n-1)
return L
def sort_randomized_pivot(L, p, r):
if p < r:
pivot_index = partition_randomized_pivot(L, p, r)
sort_randomized_pivot(L, p, pivot_index-1)
sort_randomized_pivot(L, pivot_index+1, r)
def partition_randomized_pivot(L, p, r):
pivot = random.randint(p, r)
L[pivot], L[r] = L[r], L[pivot]
return partition_fixed_pivot(L, p, r)
```

Observe that `quick_sort_randomized_pivot`

performs better than `merge_sort`

, whereas `quick_sort_fixed_pivot`

performs worse than `merge_sort`

.

```
import random
L1 = random.sample(range(100000),100)
L2 = random.sample(range(100000),100)
L3 = random.sample(range(100000),100)
```

```
%%timeit
merge_sort(L1)
merge_sort(L2)
merge_sort(L3)
```

```
1000 loops, best of 3: 451 µs per loop
```

```
%%timeit
quick_sort_fixed_pivot(L1)
quick_sort_fixed_pivot(L2)
quick_sort_fixed_pivot(L3)
```

```
1000 loops, best of 3: 1.44 ms per loop
```

```
%%timeit
quick_sort_randomized_pivot(L1)
quick_sort_randomized_pivot(L2)
quick_sort_randomized_pivot(L3)
```

```
1000 loops, best of 3: 439 µs per loop
```

Surely, there is a small but definite chance that the random selection of pivot results in always choosing the last element of the list, thus reducing `quick_sort_randomized_pivot`

to `quick_sort_fixed_pivot`

. Therefore, the worst-case time complexity should still be . How do we make sense of the improved performance, then?

Recall that the bottleneck in `quick_sort_fixed_pivot`

is `partition_fixed_pivot`

. In fact, the `for`

loop in `partition_fixed_pivot`

is the single biggest contributor to the time complexity of both `quick_sort_fixed_pivot`

and `quick_sort_randomized_pivot`

. This implies that the number of times the comparison statement `if L[j] <= pivot`

is executed is a good indicator of the time complexity of quicksort.

We fix a list L and assume that

are the elements of , sorted. Given a probability distribution over , we let denote the random variable outputing the total number of comparisons `if L[j] <= pivot`

performed in the course of an execution of `quick_sort_randomized_pivot(L)`

. For each , let denote the indicator random variable

so that

where denotes the expected value of a random variable. Since is linear, we see that

where denotes the probability of an event.

We let denote the set . Observe that equals the probability that or the is the first pivot chosen, out of all the elements of , in the course of an execution of `quick_sort_randomized_pivot(L)`

. Since each choice of pivot is independent of one another, we have that

It follows that

Since the harmonic numbers are asymptotically bounded by the logarithmic function, we conclude that

### 6.3. Quicksort with a Median-of-Medians Pivot

While `quick_sort_randomized_pivot`

performs quite well “on average” (and in practice!), it still has the worst-case time complexity of . With carefully chosen pivots, however, we can guarantee a performance in all cases. The method of pivot selection, however, is rather complicated, and it is often better to avoid the selection overhead and simply opt for randomized selection. Suffice it to say that the median-of-medians strategy is largely of theoretical importance. (Though some may disagree: see Kurosawa, “Quicksort with median of medians is considered practical”)

We have witnessed that a pivot selection strategy that divides a list into sublists of uneven sizes tends to perform badly. Therefore, an efficient search of the median element of a list would lead to an efficient choice of pivots.
The Blum–Floyd–Pratt–Rivest–Tarjan selection algorithm, commonly known as the **median-of-medians selection algorithm**, can be used to find a median in time. In fact, the median-of-medians selection algorithm can be used to find the th smallest element of the list in time.

6.3.1. Outline of Median-of-Medians Selection Algorithm. The outline of the algorithm, taken from CLRS is as follows; here, we assume that consists of distinct elements.

- Given an input list of elements, we divide the list into sublists of 5 elements, with at most one sublist possibly containing fewer than 5 elements.
- We find the median of each of the sublists by sorting them. Insertion sort is a good choice here, since it is fast for small lists.
- We take the list of the medians found in 1-2 and apply the selection algorithm recursively to find the median of medians . If there are an even number of medians, we take the lower one.
- We rearrange by putting the terms no larger than on the front, in the middle, and terms larger than in the back.
- Let be the index of the pivot in the rearranged list, so that is the th smallest element of . If , then we have found the median. If not, we apply the selection process recursively to find the median. Specifically, if , then we apply the selection algorithm on
`L[:i]`

to find the th smallest element. If , then we apply the selection algorithm on`L[i:]`

to find the th smallest element.

In short, the algorithm either selects the correct output or recurses down to a smaller sublist. Since the *k*th smallest element of a small list can be found in time, the algorithm always terminates with the correct output.

6.3.2. Overview of Time Complexity Analysis. To analyze the time complexity of the median-of-medians selection algorithm, we note that at least half of the medians found in Step 2 are no smaller than . This implies that at least half of the sublists contain at least 3 terms that are greater than , except for the one sublist that may not have 5 elements, and the sublist that contains . It follows that the number of elements of greater than is at least

Similarly, at least elements are less than , whence, in the worst-case scenario, Step 5 calls the selection algorithm recursively on at most elements.

6.3.3. Recurrence Relation. Let denote the worst-case running time of the selection algorithm on a list of size . Step 1 takes O(n) time. Step 2 executes insertion sort on many small sets, and so it takes time. Step 3 takes time. Step 4 takes time, as it is a simple variation of the partition algorithm from quicksort. Step 5 takes time, as discussed above. It follows that

Let us fix a constant such that

for all .

6.3.4. Constant Computation. Our goal is to find a constant such that for all large enough . How large should be? If were true for all for some fixed integer , we would have the estimate

In order to have a tight choice of , we expect to have the upper bound

which can hold if and only if

Rearranging the above inequality, we obtain

Now, if we assume that , then , and so the above inequality is equivalent to

It follows that and

implies

for all , and so

for all .

6.3.5. Proof of Linear Time Complexity. Let us now pick and , so that and . By choosing a larger if necessary, we can assume that

for all , and that

for all .

If

then and , and so

By our choice of , we conclude that

for all .

In general, if for all , then for all . Since

for all , we conclude that

for all . It follows that .

6.3.6. An Implementation of Quicksort with Median-of-Medians Pivot. Having established the linear-time complexity of median-of-medians selection algorithm, we proceed to construct a variant of quicksort with the worst-case time complexity of .

```
import math
def quick_sort_median_of_medians_pivot(L):
n = len(L)
sort_median_of_medians_pivot(L,0,n-1)
return L
def sort_median_of_medians_pivot(L, p, r):
if p < r:
pivot_index = partition_median_of_medians_pivot(L, p, r)
sort_median_of_medians_pivot(L,p,pivot_index-1)
sort_median_of_medians_pivot(L,pivot_index + 1, r)
def partition_median_of_medians_pivot(L, p, r):
mm = median_of_medians(L[p:r+1], math.floor((r+1-p)/2))
initial_pivot_index = L[p:r+1].index(mm) + p
L[r], L[initial_pivot_index] = L[initial_pivot_index], L[r]
pivot = L[r]
i = p-1
for j in range(p,r):
if L[j] <= pivot:
i = i+1
L[i], L[j] = L[j], L[i]
i = i+1
L[i], L[r] = L[r], L[i]
return i
# find the kth smallest element of L
def median_of_medians(L,k):
n = len(L)
# To reduce computational complexity, we compute the median directly
# if the length of the list is small enough.
if n < 10:
return sorted(L)[k-1]
# Divide L into sublists of length 5, sort them, and extract the medians.
medians = []
for i in range(math.floor(n/5)):
L[5*i: 5*i+5].sort()
medians.append(L[5*i+2])
if n % 5 != 0:
L[5*math.floor(n/5):].sort()
medians.append(L[5*math.floor(n/5) + math.floor((n % 5)/2)])
# Find recursively the median of medians
mm = median_of_medians(medians, math.floor(len(medians)/2))
# Partition L with mm as the pivot
mm_index = L.index(mm)
L[n-1], L[mm_index] = L[mm_index], L[n-1]
i = -1
for j in range(n-1):
if L[j] <= mm:
i = i+1
L[i], L[j] = L[j], L[i]
i = i+1
L[i], L[n-1] = L[n-1], L[i] # After this, L[i] = mm.
i = i+1 # Now mm is the ith largest element of L
# Determine whether mm is at the correct spot. If not, recurse.
if i == k:
return mm
elif i > k:
return median_of_medians(L[:i], k)
else: # i < k
return median_of_medians(L[i:], k-i)
```

In the above implementation, we have sacrificed a little bit of performance for the sake of clarity. `partition_median_of_medians_pivot`

spends time going through `L[p:r+1]`

, looking for the exact location of the median of medians. Moreover, `median_of_medians`

itself spends extra time going through `L`

, looking for the exact location of the median of medians. These overheads could have prevented by having `median_of_medians`

return the ordered pair `(med, med_index)`

of the median *and* the index in the original list. The above implementation is simpler, however, and the overhead does not change the asymptotic runtime of `partition_median_of_pivot`

.

6.3.7. Time Complexity of Quicksort with Median-of-Medians Pivot. Let us now show that the worst-case runtime complexity of `quick_sort_median_of_medians_pivot`

is . Let denote the worst-case runutime of `quick_sort_median_of_medians_pivot`

on a list of length . Since the pivot is always the median, we see that

It now follows that

as was to be shown.

Despite the improved worst-case time complexity, `quick_sort_median_of_medians_pivot`

often performs worse than `quick_sort_randomized_pivot`

. This is due to the large constant hidden away by the big-O notation, as we have seen in the derivation of the time complexity of `median_of_medians`

.

```
import random
L1 = random.sample(range(1000000),10000)
L2 = random.sample(range(1000000),10000)
L3 = random.sample(range(1000000),10000)
```

```
%%timeit
quick_sort_randomized_pivot(L1)
quick_sort_randomized_pivot(L2)
quick_sort_randomized_pivot(L3)
```

```
10 loops, best of 3: 65.8 ms per loop
```

```
%%timeit
quick_sort_median_of_medians_pivot(L1)
quick_sort_median_of_medians_pivot(L2)
quick_sort_median_of_medians_pivot(L3)
```

```
10 loops, best of 3: 174 ms per loop
```

# Non-Comparison Sorting Algorithms

In designing a sorting algorithm, it is possible to beat the information-theoretic lower bound of if we forgo comparions. In what follows, we discuss non-comparison sorting algorithms that can run in linear time or less under certain circumstances.

## 7. Counting Sort

Instead of comparing the elements of a list, **counting sort** determines, for each element , the number of elements less than and inserts directly into the appropriate position in the output list.

Let `L`

be an array of nonnegative integers. Let be the maximal element of `L`

, and create a list `Aux`

of length , initially populated by zeros. For each , we set

,

so that `A[j]`

, for each , contains the number of elements of equal to . Now, we set

for each , so that `Aux[j]`

contains the number of elements of `L`

no larger than .

It now suffices to go through `A`

and build a new, sorted array `S`

.

```
def counting_sort(L):
M = max(L)
n = len(L)
Aux = [0 for _ in range(M+1)]
for i in range(n):
Aux[L[i]] = Aux[L[i]] + 1
for j in range(1, M+1):
Aux[j] = Aux[j] + Aux[j-1]
Sorted = [0 for _ in range(n)]
for i in range(n-1,-1,-1):
Sorted[Aux[L[i]]-1] = L[i]
Aux[L[i]] = Aux[L[i]] - 1
return Sorted
```

In the worst-case scenario, the first `for`

loop takes time, the second `for`

loop takes time, and the third `for`

loop takes time. Therefore, `counting_sort`

has the worst-case time complexity of . If , then `counting_sort`

runs in time. Similarly, the space complexity of `counting_sort`

is , as it constructs a list of length and another list of length .

```
%%timeit
counting_sort(list(range(10000,0,-1)))
```

```
100 loops, best of 3: 3.4 ms per loop
```

```
%%timeit
quick_sort_randomized_pivot(list(range(10000,0,-1)))
```

```
10 loops, best of 3: 21.9 ms per loop
```

```
%%timeit
merge_sort(list(range(10000,0,-1)))
```

```
100 loops, best of 3: 17.6 ms per loop
```

As we can see, counting sort significantly outperforms merge sort and quicksort when the values of the input list are contained within a small interval (with respect to the length of the list).

We conclude this section by remarking that `counting_sort`

is stable. If and , then the third `for`

loop implies that `Aux[L[i]] - 1 < Aux[L[j]] - 1`

, and that

`Sorted[Aux[L[i]]-1] = L[i] = L[j] = Sorted[Aux[L[j]]-1]`

,

as was to be shown.

## 8. Radix Sort

**Radix sort** refers to a family of digit-based sorting algorithms, its origin predating the invention of computers. As per Sedgewick, we consider sequences of fixed-length characters, not just numbers.

### 8.1. Least-Significant-Digit-first Radix Sort

The simplest example of radix sort is the **LSD-first radix sort**, which goes through digits from right to left. It requires all items to be sorted have the same number of characters. For the implementation below, we assume that the input list `L`

contains strings of characters.

```
def radix_LSD_sort(L):
str_len = len(L[0])
for index in range(str_len-1,-1,-1):
sorted_indices = modified_counting_sort(L, index)
L = [L[i] for i in sorted_indices]
return L
def modified_counting_sort(L, index):
n = len(L)
# ord() returns the usual numerical representation of a character.
chars = [ord(w[index]) for w in L] # just the characters.
indices = [ind for ind in range(n)] # just the indices
M = max(chars)
Aux = [0 for _ in range(M+1)]
for i in range(n):
Aux[chars[i]] = Aux[chars[i]] + 1
for j in range(1, M+1):
Aux[j] = Aux[j] + Aux[j-1]
Sorted = [0 for _ in range(n)]
for i in range(n-1,-1,-1):
Sorted[Aux[chars[i]]-1] = indices[i]
Aux[chars[i]] = Aux[chars[i]] - 1
return Sorted
```

Since counting sort runs in time, LSD-first radix sort runs in time, where denotes the length of each string, and denotes the maximum value of the ’s computed at all the indices. If and are relatively small compared to , then LSD-first radix sort runs in linear time. Observe below that LSD-first radix sort runs comparably fast, or even faster, to comparison sorting algorithms.

```
numbers1 = random.sample(range(100000,1000000),60000)
numbers2 = random.sample(range(100000,1000000),60000)
numbers3 = random.sample(range(100000,1000000),60000)
strings1 = [str(x) for x in numbers1]
strings2 = [str(x) for x in numbers2]
strings3 = [str(x) for x in numbers3]
```

```
%%timeit
quick_sort_randomized_pivot(numbers1)
quick_sort_randomized_pivot(numbers2)
quick_sort_randomized_pivot(numbers3)
```

```
1 loop, best of 3: 473 ms per loop
```

```
%%timeit
merge_sort(numbers1)
merge_sort(numbers2)
merge_sort(numbers3)
```

```
1 loop, best of 3: 360 ms per loop
```

```
%%timeit
radix_LSD_sort(strings1)
radix_LSD_sort(strings2)
radix_LSD_sort(strings3)
```

```
1 loop, best of 3: 459 ms per loop
```

We could also build counting sort directly into LSD-first radix sort, which results in a slight performance gain. Slight, but enough to beat merge sort and quick sort in many cases!

```
def radix_LSD_sort_builtin_counting_sort(L):
str_len = len(L[0])
n = len(L)
for index in range(str_len-1,-1,-1):
chars = [ord(w[index]) for w in L]
M = max(chars)
Aux = [0 for _ in range(M+1)]
for i in range(n):
Aux[chars[i]] = Aux[chars[i]] + 1
for j in range(1, M+1):
Aux[j] = Aux[j] + Aux[j-1]
Sorted = [None for _ in range(n)]
for i in range(n-1,-1,-1):
Sorted[Aux[chars[i]]-1] = L[i]
Aux[chars[i]] = Aux[chars[i]] - 1
L = Sorted
return L
```

```
numbers1 = random.sample(range(100000,1000000),50000)
numbers2 = random.sample(range(100000,1000000),50000)
numbers3 = random.sample(range(100000,1000000),50000)
strings1 = [str(x) for x in numbers1]
strings2 = [str(x) for x in numbers2]
strings3 = [str(x) for x in numbers3]
```

```
%%timeit
radix_LSD_sort_builtin_counting_sort(strings1)
radix_LSD_sort_builtin_counting_sort(strings2)
radix_LSD_sort_builtin_counting_sort(strings3)
```

```
1 loop, best of 3: 278 ms per loop
```

```
%%timeit
radix_LSD_sort(strings1)
radix_LSD_sort(strings2)
radix_LSD_sort(strings3)
```

```
1 loop, best of 3: 388 ms per loop
```

```
%%timeit
merge_sort(numbers1)
merge_sort(numbers2)
merge_sort(numbers3)
```

```
1 loop, best of 3: 439 ms per loop
```

```
%%timeit
quick_sort_randomized_pivot(numbers1)
quick_sort_randomized_pivot(numbers2)
quick_sort_randomized_pivot(numbers3)
```

```
1 loop, best of 3: 403 ms per loop
```

### 8.2. Most-Significant-Digit-first Radix Sort

Unlike the LSD-first radix sort, the **MSD-first radix sort** goes through digits from left to right. The algorithm divides the input list into sublists according to the first character of each item and applies itself recursively to each sublist. This has the advantage that strings can be of different lengths. For the implementation below, we assume that the input list `L`

contains strings of characters.

```
def radix_MSD_sort(L):
n = len(L)
msd(L, 0, n, 0)
return L
def msd(L, low, high, index):
# Make sure range(low, high) is nontrivial
if high <= low+1:
None
# Collect numerical representations of characters at the given index.
# +1 is there so we can add a 0, instead of a negative number, in case
# there is no character at the given index.
chars = []
blank_count = 0
for i in range(low, high):
try:
chars.append(ord(L[i][index])+1)
except IndexError:
blank_count = blank_count + 1
chars.append(0)
# Make sure there are characters to be sorted
if sum(chars) == 0:
return None
M = max(chars)
# Begin counting sort on characters at the given index
Aux = [0 for _ in range(M+1)]
for i in range(high-low):
Aux[chars[i]] = Aux[chars[i]] + 1
for j in range(1, M+1):
Aux[j] = Aux[j] + Aux[j-1]
Sorted = [None for _ in range(high-low)]
for i in range(high-low-1,-1,-1):
Sorted[Aux[chars[i]]-1] = L[i + low]
Aux[chars[i]] = Aux[chars[i]] - 1
L[low: high] = Sorted
# End counting sort
new_low = low + blank_count
for i in range(new_low+1, high):
# Recurse onto sublists with the same character at the given index.
if L[i][index] != L[i-1][index]:
msd(L, new_low, i, index+1)
new_low = i
if high > new_low+1:
msd(L, new_low, high, index+1)
```

Similar to the LSD-first Radix sort, the MSD-first radix sort consists of loops and loops that can repeat at most times, and so the worst-case runtime is bounded above by . Unlike the LSD-first Radix sort, however, the MSD-first radix sort does not necessarily compare every single digit. For example, to establish the order of `['a', 'baaaaaaaaaaa']`

, there is no need to compare anything beyond the leftmost digit. In such cases, the MSD-first radix sort can run significantly faster than the LSD-first radix sort.

```
some_chars = [
'!', '@', '#', '$$', '%', '^', '&', '*', '(', ')',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z']
strings = [x + ('a' * 500) for x in some_chars]
```

```
%%timeit
radix_MSD_sort(strings)
```

```
1000 loops, best of 3: 365 µs per loop
```

```
%%timeit
radix_LSD_sort(strings)
```

```
100 loops, best of 3: 10.1 ms per loop
```