Algorithms and OOD (CSC 207 2014F) : Readings
Primary: [Front Door] [Schedule] - [Academic Honesty] [Disabilities] [Email] - [Learning Outcomes] [FAQ] [Teaching & Learning] [Grading] [Rubric] - [Calendar]
Current: [Assignment] [EBoard] [Lab] [Outline] [Reading]
Sections: [Assignments] [EBoards] [Examples] [Handouts] [Labs] [Outlines] [Readings]
Reference: [Student-Curated Resources] [Java 8 API] [Java 8 Tutorials] [Code Conventions]
Related Courses: [CSC 152 2006S (Rebelsky)] [CSC 207 2014S (Rebelsky)] [CSC 207 2014F (Walker)] [CSC 207 2011S (Weinman)]
Misc: [Submit Questions] - [SamR] [Glimmer Labs] [CS@Grinnell] [Grinnell] - [Issue Tracker (Course)] [Issue Tracker (Textbook)]
Summary: We consider techniques for assessing algorithms, focusing primarily on ways to describe the relationship between expected running time and input size.
Prerequisites: Functions, Loops, Conditionals, Recursion, Math.
As computer scientists develop algorithms, they may find that they come up with different algorithms that solve the same problem. How do you choose which algorithm is best? Once the obvious issues are resolved (e.g., are all the algorithms correct?), the most appropriate criterion to choose is which algorithm makes the best use of system resources (e.g., is fastest or uses the last memory).
In this reading, we consider notations for expressing the efficiency of an algorithm and techniques for using those notations. We primarily focus on a technique called "Big O" analysis, which is a form of asymptotic analysis.
In addition to expressing the efficiency of algorithms for comparative purposes, these approaches can also spur us to come up with more efficient algorithms, even after we've already designed some algorithms.
Let's start by looking at different solutions we might come up with for
computing xn for double
value x and non-negative integer n.
You've likely seen a variety of solutions.
Here's one typical solution, using a for loop.
public static double pow01(double x, int n)
{
int result = 1;
for (int i = 0; i < n; i++)
{
result = result * x;
} // for
return result;
} // pow01 (double, int)
Here's a similar solution, using recursion.
public static double pow02(double x, int n)
{
if (n == 0)
{
return 1;
} // if n is 0
else
{
return x * pow02(x, n-1);
} // if n is nonzero
} // pow02(double, int)
Of course, many of us are taught that when we use recursion, we should also think about using tail recursion, which smart compilers implement with fewer system resources. And, as long as we're using the husk and kernel that are often required for tail recursion, we might as well add a little error checking, too.
public static double pow03(double x, int n)
{
if (n < 0)
{
throw new RuntimeException("Exponent must be non-negative");
} // inappropriate exponent
return pow03kernel(1, x, n);
} // pow03(double, int)
private static double pow03kernel(double tmp, double x, int n)
{
if (n == 0)
{
return tmp;
} // if (n == 0)
else
{
return pow03kernel(tmp*x, x, n-1);
} // if (n > 0)
} // pow03kernel(double, double, int)
Are there other things we might think about as we write this code?
Well, we often use recursion for divide and conquer strategies. So
we might see if we can come up with such a strategy. It doesn't really
make sense to divide x by 2, so what happens if we divide
n by 2 (at least if n is even)? In particular,
how does computing x(n/2)
help us compute xn? Well,
the latter is the square of the former. So, we can now write yet another
version.
public static double pow04(double x, int n)
{
// x^0 = 1
if (n == 0)
{
return 1;
} // if (n == 0)
// Even exponent: x^(2k) = (x^k)^2
else if ((n % 2) == 0)
{
double tmp = pow04(x, n/2);
return tmp*tmp;
} // even exponent
// Odd exponent: x^n = x * x^(n-1)
else
{
return x * pow04(x-1);
} // odd exponent
} // pow04(double, int)
We could also write a similar iterative algorithm, but we'll leave that as a topic for the future.
Are there other approaches? Well, before there were computers, people used tables to do this kind of computation. How did you keep the number of tables down, given that there are a huge number of possible exponents? You take advantage of other rules you know. For example, we know that log(xn) = nlogx and that if a = b, then ea = e/b. Hence, we can write
public static double pow05(double x, int n)
{
return Math.exp(n * Math.log(x));
} // pow05(double, int)
If we didn't have a computer, we could still compute this relatively accurately by looking up the log (or marking it on a slide rule), doing the multiplication, and then looking up the exponent in a table (or using the slide rule backwards).
Which of these is best? It depends on which criteria you want to use to evaluate what it means to be “better”. And there are certainly many criteria to use. Obviously, the first criterion we should use is correctness, as there is little point in using an incorrect algorithm. But there are also others.
pow03 is the only one that explicitly throws an error
for a negative exponent; pow01 and pow05
will give you some value - 1, in the first case, the correct answer,
in the second; pow02 and pow04 will likely
recurse forever (or close to forever), although that issue can be
resolved in either
pow01 is probably best, but some people might find the
other versions equally clear.
In most instances, the most important criterion for programmers is the speed of the algorithm. (There have, however, been some recent papers that we should increasingly pay attention to tolerance to system errors; e.g., what sorting algorithm does best when the comparator fails one out of five times?) So, we need a way to measure running time.
So, how do we measure the running time of an algorithm? One experimental approach is to implement the algorithm run the same algorithm on a variety of sizes of inputs and find a function that best models the experimental data. A problem with that approach is that the same algorithm may take a very different amount of time depending on characteristics of the input other than the size. For example, bubble sort is really fast on an already-sorted list, but really slow on a randomized list. This approach also means that we have to spend time implementing the algorithm. To handle the first concern, we often do a worst-case analysis. We identify inputs that are likely to make the algorithm run slowly, and use those for our measures. (It is also possible to do average-case analysis or expected-case analysis.)
Interestingly, even if we implement the algorithm and give it worst case inputs, the information we get in one set of runs may not provide us with reliable information for other sets of runs. Why? Because different architectures may implement the same operation differently, because different architectures may treat sequences of operations differently, and because other factors often affect the “wall clock” running time, such as other programs running simultaneously. Hence, we often provide an abstract count of the number of operations. Doing such a count also means that we can work directly with pseudocode, rather than having to implement our program.
If we don't know how long an operation takes on a particular machine, and the same operation may take different times on different machines, we might as well treat all “constant time” operations as taking the same amount of time. That is, we will treat multiplying, comparing, and adding two integers as taking the same amount of time, since each is likely to be a single computer operation. However, we will not assume that finding the length of a string in C is also only one step, since we know that you have to traverse a string to find its length, which means that length is not a constant time operation.
Our discussion of measuring running time has led us down a path toward an approach that most computer scientists use to assess the efficiency of an algorithm, a technique that is typically called asymptotic analysis. In the asymptotic analysis of the running time of algorithms, one identifies appropriate models of the curves that relate input size to running time. In the asymptotic analysis of the memory use of algorithms, one identifies similar curves for running time. The curve is “asymptotic” in that it models the actual curve well for large enough inputs and for the majority of inputs.
There are many models of asymptotic behavior. The one we use most frequently is Big-O analysis, and it usually measures upper bounds on the running time of an algorithm.
In thinking about the shape of the curve, I find it most helpful to think about what happens when you double the size of the input.
pow05 is an O(1) algorithm.
pow04 is an O(logn) algorithm -
if we double the size of n, we have to divide
it by 2 (one step) to reduce the problem to the previous problem,
and the square the result of that problem (one step) to produce the
new result. So doubling the size required two additional steps.
pow01 above.
As you might expect, we prefer constant-time algorithms to logarithmic, logarithmic algorithms to linear, linear to quadratic, and quadratic to exponential. And, as you might expect, there are also a wide variety of other functions.
We've given an informal definition of Big-O notation. There is also a more formal definition in terms of the relationships between functions.
f(n) is in O(g(n)) if and only iff there exists numbers n0 and d such that for all but finitely many n > n0, |f(n)| <= |d*g(n)|.
Note that O(g(n)) is a set. It represents the set of all functions which are bounded above by a curve with the shape of g(n).
What do all of these parts mean? The n0 is a way to say “for all sufficiently large values of n”. For example, g(n) and f(n) might behave differently for small values of n; we care only about the behavior after both settle down.
The d is there to let us focus on the shape of the curve, rather than on the exact curve. As you may recall from above, we've chosen to avoid thinking about the precise differences between constant-time operations. So, we'd like to treat an algorithm that requires f(n) = 2*n multiplications as being equivalent to an algorithm that requires g(n) = 10*n additions.
We rarely identify the n0 and d precisely. Rather, they exist to help us to prove different characteristics of the notation. (You will probably prove these in an upper-level algorithms course. For now, we'll just assume they are true unless you want formal proofs.)
Now that we've provided a formal grounding for Big-O analysis, how do we actually do it in practice? Do we have to do formal proofs? No. We tend to informally “count” the steps in an algorithm and add them up.
Those are the main techniques for analyzing iterative algorithms. We may find a few more for a few exceptional cases, but these should get you by.
While we can use many of the strategies above when analyzing recursive algorithms, we do encounter one significant problem: Part of the analysis requires that when we call an algorithm, we determine the running time of that algorithm. If that call is recursive, the strategy doesn't work.
So, what do we do? We model everything other than the recursive call and then write a “recurrence” relation which describes the running time of an algorithm in terms of itself. We also tend to use conditionals as different parts of the relation, rather than using the worst case scenario above.
For example, suppose that our algorithm does a constant amount of work (which we will call a) in the base case (an input of size 1) and in the recursive case it decreases the size of the parameter by 1 and does a constant amount of other work (which we will call b). We would model that as
Similarly, if the algorithm does a constant amount of work in the base case and in the recursive case it halves the size of the input by doing a linear number of steps, we might model that as
You can learn formal techniques for solving such sets of equations in mid-level courses in mathematics. In this course, we will instead do some informal analysis, looking for patterns.
There are two approaches that are often successful for identifying patterns in the relations. You can start with the base case and work your way up, figuring out the value of f(n) for every increasing values of n. You can also start with the recursive case and work your way down, looking for patterns there. Let's try that for each of our pairs of equations.
Let's try the first one bottom up.
At this point, you've probably identified a pattern. We see that f(n) = (n-1)*b + a.
What happens if we try to work our way top down?
At this point, you may have also identified a pattern. We see that f(n) = k*b + f(n-k). We're not quite done, since this is still recursive. However, if we allow k to be n-1, we get
Fortunately, we've come up with the same conclusion.
Now that we've completed our asymptotic analysis, we might then try an experimental analysis. That is, we might run the algorithm on a variety of inputs and graph the results to see if the curves match. We can measure the algorithm using wall-clock time (less good), or we can augment our program to count the number of steps (better, but more work).
If the experimental and abstract results match, we can be fairly confident in our analysis. If they don't, we'll need to reflect as to why there is a difference and try again. It may be that we've analyzed the algorithm incorrectly. It may be that we've implemented the algorithm incorrectly. It may be that our data are outliers of some sort. It may be that our data are “average cases” and we will need to figure out what the bad cases will be. But in any case, experimental analysis helps us think more carefully about our more abstract analysis.