# Basic Graph Algorithms (Draft)

## 1. Basic Definitions

A graph $G(V,E)$ consists of a set of vertices $V = \{v_i\}_i$ and a set of edges $E = \{e_k\}_k$. An edge connects two vertices; the two vertices connected by an edge are said to be adjacent. The size of the graph is the cardinality of the vertex set $V$. We shall assume that every graph in this post is finite, i.e., $% $.

A weighted graph is a graph equipped with an additional weight function $\mathfrak{w}:E \to \mathbb{R}$ that assigns a number to each edge. We sometimes say unweighted to refer to a graph without a weight function, for emphasis. When $\mathfrak{w}$ is nonnegative on all edges, we also call $\mathfrak{w}$ the length function.

Mathematically, we could model edges by sets of vertices or ordered pairs of vertices. The former ignores the ordering, resulting in an unordered graph. In the latter model, $(v_i,v_j)$ and $(v_j,v_i)$ are distinct edges, whence the graph is directed. We shall call an unordered graph a graph, and an ordered graph a digraph.

If we wish to allow multiple edges with the same endpoints (with the same direction), we must take $E$ to be a bag instead of a set. We shall not concern ourselves with such foundational details and abuse notations at will.

The incidence lists representation of a graph stores a graph as a hash table, with each key representing a vertex and its associated value a list of all vertices adjacent to the vertex in question. Diagraphs can be stored as a hash table in a similar way: keys are starting points, and values are lists of end points. Here is an example of a graph, implemented in Python:

G = {
'v1' : ['v2', 'v4'],
'v2' : ['v1', 'v3'],
'v3' : ['v2', 'v5'],
'v4' : ['v1', 'v5'],
'v5' : ['v3', 'v4']
}


The degree $d(v)$ of a vertex $v$ on a graph is the number of edges connected to the vertex in question. On a digraph, we must differentiate between in-degrees $d_I(v)$ and out-degrees $d_O(v)$. Observe that

on a graph $G(V,E)$, whence the number of odd-degree vertices must be even. Similarly,

whenever $G(V,E)$ is a digraph.

## 2. Shortest Paths

A path between two vertices $v$ and $w$ on a graph is a sequence of edges

A path on a digraph is defined analogously. A path that starts and ends at the same vertex is a circuit; a circuit of length one is a loop.

The connected component of a vertex $v$ is the set of all vertices connected to $v$ via paths. A graph is said to be connected if every pair of vertices are connected by at least one path, viz., the connected component of an arbitrary vertex is the entire graph.

How do we find a path between two vertices? Better yet, how do we find the shortest path between two vertices? It depends on the kind of graph we are interested in.

Suppose $G(V,E)$ is an undirected, unweighted, connected graph. The breadth-first search (BFS) algorithm (Moore, 1959) finds the distance from a fixed vertex $v$ to all the other vertices. The distance is understood to mean the length of the shortest sequence of edges between the two vertices

Starting at the source vertex, the BFS algorithm examines all adjacent vertices, assigns them the distance of 1, and puts them in the queue. Until the queue empties out, the algorithm examines the adjacent vertices of each vertex $v$ in the queue and assigns the distance $\operatorname{distance}(v) + 1$ to those vertices that have not already been visited, which are then put in the queue. The algorithm terminates when there is no more vertex left in the queue.

 1: def bfs_distance(G, source):
2:     visited = [source]
3:     distance = {source: 0}  # distances are given as nonnegative integers.
4:     queue = [source]
5:     while queue:  # while queue is nonempty
6:         u = queue.pop(0)
7:         for v in G[u]:
8:             if v not in visited:
9:                 visited.append(v)
10:                 distance[v] = distance[u] + 1
11:                 queue.append(v)
12:     return distance


We observe that each line of the above algorithm takes constant time to execute. Since BFS goes through every vertex, every vertex is popped from the queue at least once, which adds $O(\vert V \vert)$ time complexity, via repetition of line 6. Line 7 goes through every edge of a vertex. Since every vertex appears as u at least once, it follows that lines 8 - 11 are repeated $O(\vert E \vert)$ times. Line 8 guarantees that no vertex enters the queue more than once, and so we conclude that the time complexity of bfs is $O(\vert V \vert + \vert E \vert)$.

As for the space complexity, we observe that visited must eventually store all vertices. Therefore, the space complexity is bounded below by $\Omega( \vert V \vert)$. On the other hand, each of visited, distnace, queue stores no more than $O(\vert V \vert)$ items each, and so the space complexity must be bounded above by $O(\vert V \vert)$. It follows that the space complexity of BFS is $\Theta(\vert V \vert)$.

Since BFS computes distances by tracing out the shortest path one edge at a time, a minor modification yields an algorithm that returns the shortest path from a fixed vertex $v$ to all the other vertices.

 1: def bfs_path(G, source):
2:     visited = [source]
3:     path = {source: [source]}  # paths are given as lists.
4:     queue = [source]
5:     while queue:
6:         u = queue.pop(0)
7:         for v in G[u]:
8:             if v not in visited:
9:                 visited.append(v)
10:                 path[v] = path[u] + [v]
11:                 queue.append(v)
12:     return path


bfs_path is slower than bfs_distance, as constructing the list

path[v] = path[u] + [v]

takes $O(\vert \texttt{path[u]} \vert)$ time. $\vert \texttt{path[u]} \vert$ can be as large as $\vert E \vert$, and so the time complexity of bfs_path is bounded by $O(\vert V \vert + \vert E \vert^2)$. This increase in time complexity can be remedied if we use a linked list, instead of an array, to save the paths. In this case, path[v] = path[u] + [v] only takes constant time, whence the time complexity remains the same.

The space complexity increases to $O(\vert V \vert \vert E \vert)$, as each list stored in path can be as large as $O(\vert E \vert)$. Using a different data structure does not remedy this issue, as storing a path requires storing multiple vertices.

### 2.2. Dijkstra’s Algorithm

We now assume that $G(V,E)$ is a weighted digraph with a nonnegative weight function on the edges. If the weight function is strictly positive, BFS does generalize with minor modification to digraphs with a positive weight function on the edges. Indeed, any edge of weight $w$ larger than 1 can be replaced with $w$ edges with $w-1$ intermediate vertices. Nevertheless, this method increases the size of the input graph exponentially, which we would generally like to avoid. Moreover, it is not clear how to adapt BFS for graphs with zero-weight edges.

A reasonable alternative is Dijkstra’s algorithm (Dijkstra, 1959),

Tags:

Categories:

Updated: