Basic Graph Algorithms (Draft)

1. Basic Definitions

A graph consists of a set of vertices and a set of edges . An edge connects two vertices; the two vertices connected by an edge are said to be adjacent. The size of the graph is the cardinality of the vertex set . We shall assume that every graph in this post is finite, i.e., .

A weighted graph is a graph equipped with an additional weight function that assigns a number to each edge. We sometimes say unweighted to refer to a graph without a weight function, for emphasis. When is nonnegative on all edges, we also call the length function.

Mathematically, we could model edges by sets of vertices or ordered pairs of vertices. The former ignores the ordering, resulting in an unordered graph. In the latter model, and are distinct edges, whence the graph is directed. We shall call an unordered graph a graph, and an ordered graph a digraph.

If we wish to allow multiple edges with the same endpoints (with the same direction), we must take to be a bag instead of a set. We shall not concern ourselves with such foundational details and abuse notations at will.

The incidence lists representation of a graph stores a graph as a hash table, with each key representing a vertex and its associated value a list of all vertices adjacent to the vertex in question. Diagraphs can be stored as a hash table in a similar way: keys are starting points, and values are lists of end points. Here is an example of a graph, implemented in Python:

G = {
     'v1' : ['v2', 'v4'],
     'v2' : ['v1', 'v3'],
     'v3' : ['v2', 'v5'],
     'v4' : ['v1', 'v5'],
     'v5' : ['v3', 'v4']
    }

The degree of a vertex on a graph is the number of edges connected to the vertex in question. On a digraph, we must differentiate between in-degrees and out-degrees . Observe that

on a graph , whence the number of odd-degree vertices must be even. Similarly,

whenever is a digraph.

2. Shortest Paths

A path between two vertices and on a graph is a sequence of edges

A path on a digraph is defined analogously. A path that starts and ends at the same vertex is a circuit; a circuit of length one is a loop.

The connected component of a vertex is the set of all vertices connected to via paths. A graph is said to be connected if every pair of vertices are connected by at least one path, viz., the connected component of an arbitrary vertex is the entire graph.

How do we find a path between two vertices? Better yet, how do we find the shortest path between two vertices? It depends on the kind of graph we are interested in.

Suppose is an undirected, unweighted, connected graph. The breadth-first search (BFS) algorithm (Moore, 1959) finds the distance from a fixed vertex to all the other vertices. The distance is understood to mean the length of the shortest sequence of edges between the two vertices

Starting at the source vertex, the BFS algorithm examines all adjacent vertices, assigns them the distance of 1, and puts them in the queue. Until the queue empties out, the algorithm examines the adjacent vertices of each vertex in the queue and assigns the distance to those vertices that have not already been visited, which are then put in the queue. The algorithm terminates when there is no more vertex left in the queue.

 1: def bfs_distance(G, source):
 2:     visited = [source]
 3:     distance = {source: 0}  # distances are given as nonnegative integers.
 4:     queue = [source]
 5:     while queue:  # while queue is nonempty
 6:         u = queue.pop(0)
 7:         for v in G[u]:
 8:             if v not in visited:
 9:                 visited.append(v)
10:                 distance[v] = distance[u] + 1
11:                 queue.append(v)
12:     return distance

(source code)

We observe that each line of the above algorithm takes constant time to execute. Since BFS goes through every vertex, every vertex is popped from the queue at least once, which adds time complexity, via repetition of line 6. Line 7 goes through every edge of a vertex. Since every vertex appears as u at least once, it follows that lines 8 - 11 are repeated times. Line 8 guarantees that no vertex enters the queue more than once, and so we conclude that the time complexity of bfs is .

As for the space complexity, we observe that visited must eventually store all vertices. Therefore, the space complexity is bounded below by . On the other hand, each of visited, distnace, queue stores no more than items each, and so the space complexity must be bounded above by . It follows that the space complexity of BFS is .

Since BFS computes distances by tracing out the shortest path one edge at a time, a minor modification yields an algorithm that returns the shortest path from a fixed vertex to all the other vertices.

 1: def bfs_path(G, source):
 2:     visited = [source]
 3:     path = {source: [source]}  # paths are given as lists.
 4:     queue = [source]
 5:     while queue:
 6:         u = queue.pop(0)
 7:         for v in G[u]:
 8:             if v not in visited:
 9:                 visited.append(v)
10:                 path[v] = path[u] + [v]
11:                 queue.append(v)
12:     return path

(source code)

bfs_path is slower than bfs_distance, as constructing the list

path[v] = path[u] + [v]

takes time. can be as large as , and so the time complexity of bfs_path is bounded by . This increase in time complexity can be remedied if we use a linked list, instead of an array, to save the paths. In this case, path[v] = path[u] + [v] only takes constant time, whence the time complexity remains the same.

The space complexity increases to , as each list stored in path can be as large as . Using a different data structure does not remedy this issue, as storing a path requires storing multiple vertices.

2.2. Dijkstra’s Algorithm

We now assume that is a weighted digraph with a nonnegative weight function on the edges. If the weight function is strictly positive, BFS does generalize with minor modification to digraphs with a positive weight function on the edges. Indeed, any edge of weight larger than 1 can be replaced with edges with intermediate vertices. Nevertheless, this method increases the size of the input graph exponentially, which we would generally like to avoid. Moreover, it is not clear how to adapt BFS for graphs with zero-weight edges.

A reasonable alternative is Dijkstra’s algorithm (Dijkstra, 1959),