CSC152 2006S, Class 52: Finding the Shortest Path in a Graph

Admin:
* Questions on the exam will happen at the end of class (or after class).
* As we have a visitor, I will use the EBoard to record my notes on the class.
* I will do my best to get you preliminary grades by next Wednesday.
* EC for attending any one of Ben's four baseball games

Overview:
* Some Definitions (Path, The Problem).
* A Simplified Version: Shortest Paths in Unweighted Graphs.
* An Example.
* The Solution.
* Extracting the Path.
* Weighted Graphs: Dijkstra's Algorithm.
* Asymptotic Analysis.

/What do you know?/

* Graphs have vertices and edges.  Vertices are 'stopping points'
  and edges provide connections.  [Good definition from LH]
* Edges may be directed

/An Example/
            +----------------+
            v                |
    v0 --> v1 ----> v4 ----+ |
    |       ^        ^     v |
    |       |        |    v3-+
    |       |        |     ^
    +----> v2 ----> v5 ----+

/Definitions/

* Definition: A path is a sequence of edges that take us from one vertex to another.
* The cost of the path is
  * The number of edges, in an unweighted graph
  * The sum of the weights of edges, in a weighted graph
* The problem: Shortest path
  * The cost of the path is less than any other path
  * We find *a* shortest path, since there may be multiple such paths

/Strategy/

We actually solve a harder problem: Find a shortest path from a vertex to *every* other vertex

Build a table of three columns:

Vertex | Distance/Cost from v0 | Predecessor
--------------------------------------------

We can fill in the distance to v0, since we start at v0.

   v0  |       0               |    NONE

For all neighbors, we can compute the distance to those values

   v1  |       1               |      0     
   v2  |       1               |      0     

[Suggestion from class: vertex 3 is 3 because we can count the edges.
Problem: Computers can't just count the edges.  We need to instead think about how the computer will do the work without knowledge.]

In this case, although v3 comes next, we can't fill it in immediately.
Instead, we fill in the neighbors of v1.

   v4  |       2               |      1

We then fill in the neighbors of v2

   v5  |       2               |      2

Finally, we fill in a neighbor of v4

   v3  |       3               |      4

Putting it together:

Vertex | Distance/Cost from v0 | Predecessor
--------------------------------------------
   v0  |       0               |    NONE
   v1  |       1               |      0     
   v2  |       1               |      0     
   v3  |       3               |      4
   v4  |       2               |      1
   v5  |       2               |      2

[Student comment: Since there's an edge from 3 to 1, might
we need to recheck that value?  Answer: Let's consider the
pseudocode.]

/Pseuocode/

Initialize
  for all vertices, v, 
    dist[v] = infinity
    pred[v] = undefined
Main Loop
  for all possible costs
    for all v where dist[v] = cost
      for all w adjacent to v
        if dist[w] = infinity then
          dist[w] = dist[v] + 1
          pred[w] = v

/Java Code (inserted by Sam)/

  // for all vertices, v, 
  for (int v = 0; v < n; v++) {
    // dist[v] = infinity
    dist[v] = Integer.MAX_VALUE;
    // pred[v] = undefined
    pred[v] = NO_SUCH_VERTEX
  }
  // for all possible costs
  for (int cost = 0; cost < n) {
    // for all v where dist[v] = cost
    for (int v = 0; v < n; v++) {
      if (dist[v] == cost) {
        // for all neighbors
        for (int w) in graph.getSuccessors(v) {
          if (dist[w] = Integer.MAX_VALUE)
        }
      }
    }
  }

/How do we read off a path from the table?/

* For example, what is the shortest path to 3?
* Go backwards:
  * To get to 3, you need to have come from 4
  * To get to 4, you need to have come from 1
  * To get to 1, you need to have come from 0
  * Therefore, the path is 0 -> 1 -> 4 -> 3

/A Revised Example/

            +----------------+
        4   v                |
    v0 --> v1 ----> v4 ----+ |6
    |       ^   5    ^  1  v |
    |1     2|        |    v3-+
    |       |   4    |  3  ^
    +----> v2 ----> v5 ----+
      
Assumptions
* No negative weight edges
* Once again, the strategy is to 

Goal: Given a directed, weighted, graph where edges have
non-negative edge weights, find the shortest path from
vertex 0 to vertex i for all i.  

["Shortest" means smallest sum of edges.]


Vertex | Distance/Cost from v0 | Predecessor
--------------------------------------------
   v0  |       0               |    NONE
What's the shortest path to v2?  It seems to be 1.

   v2  |       1               |      0     

Is the shortest path to v1 the edge from v0?  NO!

A problem: When we choose a vertex, we don't want to
process it too early because, if we update it, we will
then need to update its neighbors (and all of their
neighbors and so on and so forth).  Hence, we mark
a vertex when we are CONFIDENT that we know the shortest
path.  Add a column "known"

Vertex | Known | Distance/Cost from v0 | Predecessor
--------------------------------------------
   v0  |       |     0                 |    NONE
   v1  |       |     0                 |    NONE
   v2  |       |     0                 |    NONE
   v3  |       |     0                 |    NONE
   v4  |       |     0                 |    NONE
   v5  |       |     0                 |    NONE

How do we know when the shortest distance is known?
* If there's only one predecessor (one not thought of; a good idea, but not workable for all nodes)
* Try every path to that node (workable, but *EXPENSIVE*)

Hmmm .... what can we do?  Let's try working the algorithm

   v0  |   X   |     0                 |    NONE

We now have tentative costs for two vertices, the neighbors of 0

   v1  |       |     4                 |    0
   v2  |       |     2                 |    0

For vertices with tentative costs, we have a known cost and
a number of other possible, non-explored paths.  Can we be
sure of any of them?

Consider the vertex with a least tentative cost.  Any other
path to that vertex will have to go through one of the other
tentative vertices.  Since the costs to get to those vertices
are all larger, using them won't help.

[Hmmm ... should I ask the students to finish the example on
their own?]

Sketch algorithm: We repeatedly select a vertex to "process".
When we process a vertex, we mark it and update the costs to
all of the neighbors

Initilization
    for all v, 
      known[v] = false;
      dist[v] = infinity
      pred[v] = undef
    dist[0] = 0
Update
  for i = 0 to n-1
    v = "find vertex with lowest tentative cost"
    known[v] = true
    for all neighbors, w, of v
      if dist[w] > dist[v] + weight(v,w)
        dist[w] = dist[v] + weight(v,w)
        pred[w] = v

[Sam decides not to record walk-through so that he can put
something similar on the final exam.  Hint hint.]

/Asymptotic Analysis/

Initialization is O(|V|), |V| is "number of items in the set V"

How long does it take to find the vertex with lowest tentative
cost?  O(|V|).  
We do so O(|V|) times.
Total cost: O(|V|^2)

How long does it take to look at adjacent neighbors?  For a
particular vertex, it is potentially as bad as O(|V|).  That
means that identifying adjacent neighbors would be O(|V|^2).

However, we follow each edge exactly once.  Hence O(|E|).