EBoard 26: More on MSTs
Warning! You are probably being recorded (and transcribed).
Approximate overview
- Administrative stuff
- Review (mostly)
- Prim’s algorithm, revisited
- Running time: Prim’s
- Kruskal’s algorithm, revisited
- Running time: Kruskal’s
- Detour: Helper algorithms/Data Structures for Kruskal’s
- Running time: Kruskal’s (revisited)
- Preparing for Proofs of Correctness: Cuts in a graph
- Proving Prim’s correct
- Proving Kruskal’s correct
- A review of the heap data structure (if time)
- Another MST algorithm (in the unlikely case we still have time)
Administrative stuff
- The other Professor Rebelsky asked me to share a meme about Tornado
Watches vs Tornado Warnings
- There was a question as to whether you have to implement heaps when you
implement Prim’s and whether you have to implement Kruskal’s supporting
data structure when you implement Kruskal’s.
- The answer is No.
- However, you should use a separate data structure so that you
can swap in a better implementation.
- We’ll talk more about Project 4, in which you implement Prim’s and
Kruskal’s, on Monday.
- I realize that there are a lot of resources for you to look at after
class (e.g., eboard, Autry’s notes, texbook), which might lead you
to skip note-taking. Remember that experiments suggest that taking
notes helps you learn!
- Hypothesis: Having to summarize activates brain cells
- Hypothesis: It’s harder to say “Oh, I understand that” when you
write it down / summarize.
- You can also take notes while reading through resources.
- Please read your email about the SEPC and consider applying to join
the SEPC. We have a lot of openings on the SEPC.
- The SEPC serves as the liaison between the faculty and students.
- The SEPC represents students in faculty reviews
- The SEPC represents students in faculty hiring
- The SEPC is responsible for the social life of the department
- The SEPC gets to deal with SGA (no, not the guy from Oklahoma)
Upcoming events
- Monday, 2026-04-06, 7:00 p.m.ish, 3819, Mentor Session
- Tuesday, 2026-04-07, Noon, CS Table (TBD)
- Wednesday Extra, 2026-04-08, 4:15–5:45 p.m.: CS Poster Session
- Thursday Extra, 2026-04-09, 4:15–5:45 p.m.: CS Poster Session
- Thursday, 2026-04-09, 7:00 p.m., Mentor Session
Upcoming deadlines
- Monday, 2026-04-06: Read about MSTs (CLRS 22)
- Monday, 2026-04-06: Project 3 due (does not match Autry’s syllabus)
- Friday, 2026-04-17: Problem set 4 due
- Friday, 2026-04-17: Project 4 due
Policy/administrative/assignment questions
Who votes for SEPC
Traditionally, only students.
Prim’s Algorithm
Unmark all vertices
Choose a vertex and mark it
Repeat (|V| - 1) times
Find the shortest edge from a marked vertex to an unmarked vertex
Add that edge to the MST
Mark the unmarked vertex
With a priority queue
Unmark all vertices
Create a new priority queue, pq
Choose a vertex and mark it
Add all of its edges to the priority queue
Repeat until we've marked very vertex
e = pg.get() // Gets the shortest edge from a marked vertex
if e connects a marked vertex, m, to an unmarked vertex, u
Add e to the MST
Mark u
For each edge from u to an unmarked vertex
Add that edge to pq
Running time of Prim’s
- Marking a vertex is: O(1), which we can achieve with a flag on the vertex.
Or by using a hash table.
- Adding an edge to the MST is: We’ll use a list of edges, O(1)
- Finding/removing the smallest edge is: Depends on your implementation
of the priority queue.
- If we use a sorted list, O(1)
- If we use a heap, O(logn)
- Adding a new edge is:
- If we use a sorted list, O(n), where n is the length of the list
- If we use a heap, O(logn)
SO the body of our loop is O(logn) if we use a heap.
How many times might we repeat the loop?
- We might have to check every edge once.
| So our running time is $$O( |
E |
\times log |
E |
)$$ |
- We may have to put every edge into the priority queue and grab every
edge from the priority queue.
- Although the first version has a loop over the number of vertices,
the time within the loop is harder to predict, so we don’t use
that version.
(CLRS may do a slightly tighter bound.)
Kruskal’s Algorithm
Sort the edges from smallest to largest
Repeat until we've created the MST
Grab the next smallest edge (from u to w)
If adding the edge to the MST does not create a cycle
Add it to the MST
Analyzing Kruskal’s Algorithm
What are the key operations?
-
| Sorting all the edges: $$O( |
E |
log |
E |
)$$ |
- Grab the next smallest edge : \(O(1)\)
- Determine if adding an edge creates a cycle: It depends on how we
we check for cycles. \(O(cyclecheck)\)
How many times might we run the loop?
\[O(|E|)\]
So our running time is …
\[O(|E|log|E| + |E| \times cyclecheck)\]
How would you check for a cycle?
- Use the marking idea from Prim’s. Nope.
- Keep track of the number of edges from each vertex. Nope.
- Use a “Union Find”
- Topologically sort the graph; if you can’t, there’s a cycle.
Topological sort can be expensive.
Detour: Data structures/algorithms for Kruskal’s
Rethink the “are we creating a cycle” to
- Keep track of all the connected components in the graph (each of
those is “set”)
- When you explore an edge, if both vertices are in the same set,
adding the edge creates a loop. If both vertices are not in the
same set, you can safely add the edge.
- We want a data structure that supports
- Create a set
- Check if two values are in the same set (Find)
- Merge two sets (Union)
- One technique: Hash table. Hash each vertex to its own value. When
we merge to sets, we pick a new set identifier and re-hash all of
the vertices in the two sets. \(O(|V|)\) to do the union,
expected \(O(1)\) to do the find.
-
| This would give us an overall running time of $$O( |
E |
log |
E |
+ |
E |
|
V |
)$$ |
We can do even better.
- Key idea: Upside-down trees, in which the path from any vertex name
in the tree to the root gives you the identifier for the set its
in.
- When you union two trees, you make the parent of the smaller tree
the root of the larger tree
- With a bit of work, you can show that the depth of the tree can’t be
larger than \(log|V|\).
-
| This would give us an overall running time of $$O( |
E |
log |
E |
+ |
E |
log |
V |
)$$ |
| = $$O( |
E |
log |
E |
)$$ |
|
|
|
|
-
| Proof that an improved union/find algorithm is $$O(log^star |
V |
)$$ |
- log-star is “the number of times you have to take the logarithm
of the value to get to 1. Grows much slower than log.
- Proof that the amortized time for the improved union/find is
“the inverse Ackermann function” (grows really really slowly)
Analyzing Kruskal’s algorithm, revisited
With this new data structure, our running time is \(O(|E|log|E|)\); that
is, the sorting dominates!
Preparing for Proofs: Cuts in a graph with a partial MST
It turns out that the proofs of the correctness for Prim’s depend upon
some additional infrastructure.
We will define a cut in a graph with a partially completed MST.
- Let set \(S \subset V\) be a set of vertices
- In our algorithms, it will contain all the edges in the partially
completed MST.
- The set \(T = V - S\) contains all vertices not in S.
- This partitioning is called a cut of the graph.
- We’ll do a few diagrams on the board. Hopefully, I’ll remember to
turn off screen sharing during that time.
- We will be considering the edges that cross the cut.
- Idea: In building an MST algorithm, we want to create repeated cuts
and choose the smallest edge that crosses the cut. (Assuming that
no edges in the MST cross the cut.)
Theorem: If \(e\) is the smallest edge that connects a vertex in
\(S\) to a vertex in \(T\) (“crosses the cut””), then \(e\) is in
some MST of \(G\).
Theorem (restated): Given a weighted, undirected graph, \(G = (V,E,w)\),
a cut \(S \subset V\) and \(T = V - S\), and an edge \(e\) that is the
lowest weight edge that crosses the cut, \(e\) is in some MST of \(G\).
What technique might you use to prove this?
- Contradiction
- Contrapositive
- Indicution
- Construction
Proof:
A proof by contradiction.
Suppose \(e\) is not in any MST of \(G\).
Let \(M\) be an MST of \(G\). \(M\) must contain an edge, \(m\), that
crosses the cut between \(S\) and \(T\).
Now, consider set edge set \(M \union { e }\). This set contains a loop.
Removing \(m\) from that set will create a new MST.
What is the weight of \(M \union { e } - { m }\). Since \(e\) is the
lowest weight edge that crosses the cut,
\(W(M \union { e } - { m }) \le W(M)\).
But that’s an MST that contains \(e\). That violates our assumption.
Our assumption must be wrong.
Therefore, \(e\) is in some MST of \(G\).
QED.
Note: Another way to think of why \(M \union { e } - { m }\) is a
spanning tree. We can think of \(M\) as having three parts: The
edges in \(S\), which we’ll call \(M_S\) the edges in \(T\), which
we’ll call $M_T$$, and …
Proof of Prim’s
Corollary: Prim’s Algorithm is correct
A proof by construction
Proof of Kruskal’s
Corollary: Kruskal’s Algorithm is correct
A proof by construction
Our other MST algorithm
While more than \(|V|-1\) edges remain, remove the highest-weight
edge that does not disconnect the graph.
Why don’t we use this approach?
- Perhaps it’s not efficient.
- Perhaps it’s not correct.
- Perhaps it’s harder to prove correct.
Let’s think about efficiency
Looking at up to \(O|E|\) edges. Checking if a graph is disconnected does
not have an efficient solution.
Heaps (not covered)