Graphs

A graph consists of a set of vertices and a set of edges. Vertices are the nodes that make up the graph. Edges, or arcs, connect the vertices.

There are two types of graphs:

Directed Graph
Undirected Graph

A directed graph, or digraph, is a graph in which the edges connect vertices in a specific direction.

A directed graph with six vertices {1, 2, 3, 4, 5, 6} and ten directed edges {1 to 2, 1 to 4, 1 to 5, 2 to 3, 2 to 4, 3 to itself, 4 to 2, 4 to 3, 6 to 2, 6 to 3} can be pictured as:

A tree is a special case of a digraph. It is characterized by the fact that:

one of its vertices, the root, has no incoming edge
every other vertex can be reached from the root by a unique path, that is, by following one and only one sequence of consecutive edges

A digraph differs from a tree in that:

it need not have a root node
there may be several (or no) paths from one vertex to another
it is processed differently. Take insertion for example. To add a node to a tree, a link field must be added to a parent node. For a directed graph, a node may be inserted, but there need not be an arc to or from it; or an edge can be inserted between two existing nodes.

Some definitions that are associated with graphs:

Two vertices are said to be adjacent if there is an edge connecting them.

A path in a graph is a sequence of vertices and edges.

The length of a path is the number of edges on a path, which is always equal to the number of vertices - 1.

A loop is a path made up of an edge that goes from a vertex back to itself.

A simple path is a path where all of the vertices are distinct, with the exception of the first and last.

A cycle is a path of at least length 1 where the first and last nodes are the same. The edges must be distinct for undirected graphs.

A digraph is acyclic if it has no cycles.

A digraph is said to be strongly connected is there is a path from every vertex to every other vertex.

A complete graph is a graph in which there is an edge between every pair of vertices.

Representation

There are several ways of representing a graph. One of the most common is to use an adjacency matrix. To construct the matrix:

number the vertices of the digraph 1, 2, ..., n
construct a matrix that is n x n
For each entry in row i and column j, insert a 1 if there is an edge from vertex i to vertex j; otherwise insert a 0

A directed graph and adjacency matrix:

An undirected graph and adjacency matrix

The space requirement for an adjacency matrix is V², where V is the number of vertices.

One drawback to this type of representation is that it is often sparse, that is, it has a lot of zero entries, and thus considerable space is wasted.

To alleviate this problem, a graph can also be represented as an adjacency list. For this type of representation, an array is used to hold the data for each vertex in the graph. For each vertex, there is also a pointer to a linked list of all vertices that are adjacent to the vertex.

A directed graph and an adjacency list:

The space requirement for an adjacency list is E+V, where E is the number of edges and V is the number of vertices.

Topological Sort

A topological sort is an ordering of vertices in a directed acyclic graph, such that if there is a path from v_i to v_j, the v_i appears before v_j in the ordering. This is not possible if the graph has a cycle.

Algorithm:

Find a vertex with no incoming edges
Process it and then remove it from the graph
Repeat

v1 v2 v5 v4 v3 v7 v6

v1 v2 v5 v4 v7 v3 v6

are both possible topological orderings.

The indegree of vertex v is the number of edges coming into v.

Assuming that the indegree for each vertex is stored and that the graph is stored as an adjacency list, the following algorithm can be derived:

for( counter from 0 through the number of vertices )
  v = find a new vertex with indegree of 0
  
  if v is not a vertex
    error
  endif
  
  assign a topological number to v -- use the counter
  
  for each vertex w that is adjacent to v
    decrement w's indegree by 1
  endfor
endfor

The finding of the vertex with indegree of 0 is accomplished by scanning the vertices looking for a vertex with indegree of 0 and no topological number. If no such vertex exists, then the graph contains a cycle.

Finding the vertex with indegree of 0 has a running time of V. That call is made V times, therefore the algorithm has a running time of V².

The algorithm can be improved by simply storing all of the vertices with indegree of 0 on a queue. The new algorithm resembles:

for each vertex v
  if v has indegree of 0
    push v onto a queue
  endif
endfor

Initialize a counter to 0

while the queue is not empty
  pop the queue and assign to v
  increment the counter
  assign the counter as the topological number for v
  
  for each vertex w that is adjacent to v
    decrement w's indegree by 1
    if w's indegree is 0
      push w onto the queue
    endif
  endfor
endwhile

The algorithm now has a running time E+V since the for loop is executed at most once per edge, the queue operations are executed at most once per vertex, and the initialization is at most the size of the graph.

Shortest Path Algorithms

There are many ways to find the shortest path from one vertex to another.

the first assumes that the graph is weighted, which means that each edge has a cost to traverse it. The cost of a path is the sum of the edge costs, this is known as the weighted path length.

Single-source shortest path problem:

Given a weighted graph and a vertex s, find the shortest path weighted path from s to every other vertex in the graph.

4 possible algorithms to find the shortest path from one vertex to all other vertices:

Unweighted shortest path
Dijkstra's algorithm
Graphs with negative edge costs
Acyclic graphs

Unweighted Shortest Path Algorithm

Choose a starting vertex and assign the value 0 to it.

Look for all vertices that are a distance 1 away from the start and mark them as such.

Repeat the process until all of the nodes are marked.

This type of processing is known as a breadth-first search, since it is processing vertices in layers. This is equivalent to a level-order traversal for trees.

To implement this, keep track of three pieces of information:

The distance from start vertex to other vertices
The vertex that preceded it in the path
Whether or not the vertex has been processed

start vertex distance is 0

for dist of 0 through number of vertices
  for each vertex v
    if v has not been processed and v's distance is equal to dist
      indicate that v has been processed
      
      for each vertex w that is adjacent to v
        if w's distance is not known 
          set w's distance to dist plus 1
          set w's preceding vertex to v
        endif
      endfor
    endif
  endfor
endfor

The algorithm has a running time of V² because of the nested for loops, but it can be improved by using a queue. The vertices will be pushed on the queue in the order of increasing distance from the start.

push the start vertex onto queue
set start vertex distance is 0

while the queue is not empty
  v = pop the queue
  indicate that v has been processed
      
  for each vertex w that is adjacent to v
    if w's distance is not known 
      set w's distance to dist plus 1
      set w's preceding vertex to v
      push w onto the queue
    endif
  endfor
endwhile

The algorithm now has a running time of E + V.

Using the graph from above results in the following:

After popping v3:

After popping v1:

After popping v6:

After popping v2:

After popping v4:

After popping v5:

After popping v7:

Dijkstra's Algorithm

With this algorithm, the graph is weighted and it uses the same three principles as the previous algorithm:

the distance is now the cost
the previous vertex that is stored is the one that causes a change in the distance
whether or not the vertex has been processed

It proceeds in stages. At each stage, a vertex is selected that has the smallest cost among all unprocessed vertices and mark the vertex as processed (ie. that the shortest path has been found).

The start vertex is v1. It is assigned a cost of 0 and marked to indicate it has been processed:

v2 and v4 are adjacent so they are adjusted accordingly:

v4 is processed next since it has the smaller cost. It is marked as known and vertices v3, v5, v6, and v7 are adjusted accordingly:

v2 is processed next. It is marked as known and vertices v4 and v5 are adjusted accordingly. v4 is left alone since it has already been processed. v5 is left alone because it already has a minimal cost:

v5 is processed next since it has the smallest cost. It is marked as known and vertex v7 is the adjacent vertex. It's left alone because the cost is already minimal:

v3 is processed next since it has the smallest cost. It is marked as known and vertex v6 is the adjacent vertex. It's adjusted with the smaller cost:

v7 is next. v6 is again the adjacent and it's updated with the smaller cost:

v6 is next. There are no adjacent vertices so the processing is done:

This algorithm is known as a greedy algorithm, because it solves the problem in stages by doing what appears to be best at each stage.

Graphs with Negative Edge Costs

With this algorithm, the graph is still weighted but it is now possible to have an edge with a negative weight. The addition of the negative costs causes Dijkstra's algorithm to fail because once a vertex is processed, it is never processed again. But with a negative cost, it's possible that a path back to a vertex that costs less can be found.

To fix the problem: the weighted and unweighted algorithms will be combined. This drastically increases the running time!

To do this, the vertices will no longer be marked as processed.

Start by pushing the start vertex onto a queue
Pop a vertex v
Find each vertex w that is adjacent to v such that w's cost is greater than v's cost + edge cost. If that is the case, update w's cost and push it onto the queue if it's not already there.
repeat from step 2 until the queue is empty

Acyclic Graphs

Dijkstra's algorithm can be improved if the graph is known to be acyclic. The improvement is made by changing the order that the vertices are processed. It will now be done in a topological order.

This works because once a vertex is selected its cost cannot be lowered since it will no longer have any incoming edges from unprocessed vertices.