*Original version by John Stone on October 24, 1995. Some modifications by
Samuel A. Rebelsky in September 1997.*

- Introduction
- Psuedocode
- Example: Bin packing
- Bin packing of "continuous" objects (greedy method works)
- Bin packing of "discrete" object (greedy method fails)
- Example: Minimum spanning tree

Many of the problems that computer programs are designed to solve have involve optimization or finding a "best" solution. That is, some operation can be performed in a number of different ways, all meeting some basic constraints, and the problem is to find the way that is optimal, in the sense that it maximizes the value of some desirable quantity or minimizes the value of an undesirable one.

A *greedy algorithm* is a method for finding an optimal solution to
some problem involving a large, homogeneous data structure (such as an
array, a tree, or a graph) by starting from an optimal solution to some
component or small part of the data structure and extending it, by
considering additional components of the data structure one by one, to an
optimal global solution. A greedy algorithm assumes that a local optimimum
is part of the global optimimum.

The principal advantage of greedy algorithms is that they are usually straightforward, easy to understand and easy to code. Their principal disadvantage is that for many problems there is no greedy algorithm. More precisely, in many cases there is no guarantee that making locally optimal improvements in a locally optimal solution yields the optimal global solution.

In an imperative language, like C, one might write:

solution = initialize_solution(data); while (!completed(solution, data)) { component = select(data, solution); solution = extend_optimally(component, solution); } return solution;

In an object-oriented language, like Java, one might express it more like:

public Solution greedilySolve(Data data, Evaluator eval) { // The current solution (or partial solution) Solution solution = new Solution(data); // The parts of the input data that are currently unused Data unused; // The next best piece of datum to extend Datum component; // Keep going until our solution is finished while (solution.incomplete()) { // Get the parts of the data not yet used in the current // solution. unused = solution.unused(); // Find the "best" one to add to the current solution component = eval.best(unused,solution); // Extend the solution solution.extend(component); } // while return solution; } // greedilySolve

Let's consider two closely related problems, only one of which can be effectively solved by a greedy algorithm. We imagine a thief who has broken into a warehouse, surrounded by merchandise of various sorts and equipped with a flexible swag-bag, or knapsack, in which to carry loot. We idealize the problem by assuming that knapsack has a fixed volume but can be stretched into any form that encloses that volume. We assume that the unit price, the unit volume, and the number of units of each kind of merchandise that the warehouse contains are all known. The thief, naturally, wants to carry off goods of the greatest possible value; how should the theif pack a bag in order to achieve this goal?

If the warehouse contains merchandise that is continuous rather than discrete -- stuff, such as cloth or gold dust, rather than things, such as clock radios and steel ingots -- there is a greedy algorithm that can easily be proven to give the correct answer: Divide the unit price of each kind of stuff by its unit volume to obtain its ``value density.'' Load up the knapsack with the stuff that has the highest value density until either the knapsack is completely full or there is no more of that stuff. Repeat with stuffs of successively lower value densities until either the knapsack is full or the warehouse is empty.

If the warehouse winds up empty, it's obvious that the thief has optimized her take: There's nothing left to steal! And if, instead, the knapsack is full, it's again obvious that the solution is optimal -- replacing anything in the knapsack with anything outside of it would leave one with an equal volume of stuff of equal or lower value density, and hence with the same or less total value in the knapsack.

The discrete form of the knapsack problem is much harder, because the volume of the knapsack may not be an integer multiple of the unit volume of the item with the highest value density. The greedy policy of taking as many as possible of those items may leave an empty space in the bag, too small to accommodate any of the items left in the warehouse, but large enough that the overall value density is well below that of other items. Is it worthwhile to steal items of lesser value density that happen to fit better into the knapsack? No simple answer fits all cases.

Another classical greedy algorithm finds a minimum-weight span for a
weighted graph. A *graph* is a collection of labelled, unstructured
objects called *vertices* (you may want to think of them as points)
some pairs of which are connected by *edges*. (In diagrams edges
are typically represented as segments of lines or curves, but the exact
shape of the edge is irrelevant when you're considering the graph -- all
that is important is which vertices are connected to one another.) In a
*weighted* graph, each edge has a positive real number associated
with it; when a graph is used to model some real-world network, such as
cities connected by highways or pipelines running between pumping stations,
an edge's weight typically represents the cost of constructing, traversing,
or consuming whatever it is that the edge represents.

A *span* for a given graph is a subset *E'*of the edges of the
graph such that you can move from any vertex of the graph to any other by
following edges that are in *E'*. Not every graph has a span (for
example, a graph with four vertices and only two edges cannot have a span),
but lots of graphs have many spans. The optimization problem is to
identify a span in which the sum of the weights of the edges in the span
is as small as possible.

A greedy algorithm solves this problem as follows: Initialize *E'* to
be the empty set of edges. Choose any vertex of the graph, and add to
*E'* whichever of the edges connecting the chosen vertex to any other
vertex has the least weight. Now think of the two vertices that are
connected by that edge as the ones that have already been ``spanned,'' and
find the least-weight edge that connects a spanned vertex to an unspanned
one. Add it to *E'*, and mark the previously unspanned vertex at its
far end as spanned. Repeat until all the vertices are spanned (or until
there are no edges connecting the spanned and unspanned vertices, in which
case no span exists).

It's not obvious that this method always leads to a span that has the least
possible total weight, but in fact one can prove that it does, as follows:
Suppose that the minimum-weight span *E** that has a lesser total
weight than the *E'* constructed by this algorithm. *E'* must
include at least one edge that is not in *E** (otherwise *E'*
would be a subset of *E**, and the total weight of *E** would
have to be greater than that of *E'*). Walk through the algorithm
until it is about to add to *E'* some edge *e'* that is not in
*E**. At that point, there must be some other edge *e**
connecting the spanned and the unspanned vertices, one that is in *E**
(otherwise *E** would not be a span, because you would not be able to
move from any of the spanned vertices to any of the unspanned ones or vice
versa).

Consider the result of replacing *e** with *e'* in *E'*.
The result of this replacement is still a span, since any path from one
vertex to another in the graph that uses *e** can instead move from
the spanned end of *e** to the spanned end of *e'*, and from
there to the unspanned end of *e'*, and from there to the unspanned
end of *e**. But the weight of *e'* is less than or equal to
that of *e**. It cannot be less, since *E** is by hypothesis the
minimum-weight span. So *e'* and *e** have the same weight, and
it makes no difference which one you include in the span.

Repeating this argument for every edge that is in *E'* but not in
*E**, one finds that every such difference involves the substitution
of one edge for another of equal weight. But then the total weight of
*E** is the same as that of *E'*, contrary to the hypothesis that
is was less. So *E'* is itself a minimum-weight span.

History of modifications:

- 24 September 97 [rebelsky]: Added OOP psuedocode. Removed pronouns from discussion of thief. Added section headers. Reorganized some discussions.

**Disclaimer** Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

Source text written by Samuel A. Rebelsky.

Source text last modified Wed Oct 1 08:14:13 1997.

This page generated on Wed Oct 1 08:14:32 1997 by SamR's Site Suite.

Contact our webmaster at rebelsky@math.grin.edu