CSC152 2006S, Class 53: Pause for Breath

Admin:
* Exam due.  Ohmig-d, Tony has a stapler.
* The week to come: Tomorrow: Exploration of Scheme knowledge; An overview of CS.  Wednesday: Discussion of Exam 3 and grades.  Friday: Course Evaluation and Wrapup.
* You will have homework for Friday: A course-specific evaluation form.
* Although the final is scheduled for Friday of finals week, you may take it earlier by obtaining and filling out one of those cool "Change in Finals Time" forms.
* My upper-level course is reading Alan Perlis's "Epigrams on Programming", available on the Web at http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html.  Some thought you might enjoy reading it.
* Math/CS table today.

Overview:
* Shortest Path, Revisited: Some Optimizations.
* Sorting, concluded: Evaluating Quicksort.
* Expressing Graph Algorithms in Code: Shortest Path.

Shortest path algorithm: Dijsktra

* Problem?
Given a graph, find the shortest path between two vertex,
start and finish.
* What does it mean to be shortest: 
  * The fewest edges in an unweighted graph.
  * The smallest sum of edge weights in a weighted graph.
  * Called "shortest" because a weight may represent distance

* Algorithm?
  * Solve a more general problem: Find the shortest path from start to every other vertex
  * Keep a table, indexed by vertex number, with
    * Cost Known: A Boolean
    * Length of shortest path found so far
    * Predecessor
  * Strategy: Repeatedly process vertices
    * When we process a vertex
      * Mark it (set Cost Known to true)
      * Pick every successor of the vertex
      * Update the estimated cost of reaching that successor, if the cost of going through this vertex is smaller
  * When processing vertices, which do we pick to process?
    * The smallest known cost so far.
    * Why is that safe?  
      * Because you were told so.  Believe your teachers.  We never lie.
      * Well, you wouldn't want to pick one with an unknown cost, because that makes no sense.  Take my word for it.
      * Aw crud, just believe me, it makes sense to me.  Get inside my head and you'll understand.  (Next week on The Magic School Bus - Everyone shrinks down and goes inside Emily's head to understand.)
      * Every other path will need to go through the other vertices, and it already costs more to go through those vertices, or through one of the unknown vertices, and paths to those vertices will either go through this node or one of the known vertices.


Running time analysis
* Let n be the number of nodes; m be the number of edges
* n times, we find the smallest unprocessed vertex
  * If we check all of them, it costs O(n) to find that vertex
  * If we keep the vertices in a heap by cost, it's only O(log_2(n)) to find that smallest
    * Slight problem: Heaps aren't designed to allow you to update the "key" assigned

But there's more ... What about updating the table?
* Different analysis technique:
* Traditionally
  * N times, we update all neighbors
  * There are at most N neighbors
  * O(n^2)
* This time:
  * Don't care how, but every edge processed exactly once, so instead of O(n^2), we say O(m)
* The "look at the big picture" analysis technique is a powerful one.  Like all powerful things, it should be used with caution.