Experiments in Java

Session A3: Analyzing Algorithms

You may have noted that it is possible to develop a number of algorithms to solve the same problem. For example, in a previous laboratory session we developed a number of different methods to compute Fibonacci numbers and to do exponentiation. How do we choose between a number of competing algorithms for the same problem? One way is to compare the efficiency of the algorithms. How do we quantify the efficiency of an algorithm or program? We might consider their use of memory, their running time, or their use of other resource. In this laboratory session, you will consider concrete ways of determining the running time of functions.

Timing algorithms

How do you determine the running time of an algorithm? You might try to measure the time it takes using a clock. Unfortunately, many algorithms run so quickly that they will finish before the second hand ticks. Humans are also notoriously inaccurate at noticing when programs start and stop. Fortunately, Java includes methods for determining the current time to a high precision.

In particular, the method System.currentTimeMillis() returns the current time in milliseconds since 00:00:00 January 1, 1970. Hence, to time an algorithm, you can simply determine the time immediately before the algorithm begins and the time immediately after the algorithm completes. By comparing the two times, you can determine how long the algorithm took. For example,

    long start_time; // The time the algorithm started
    long end_time;   // The time the algorithm ended
    start_time = System.currentTimeMillis();
    end_time = System.currentTimeMillis();
    out.println("The algorithm took " + 
                (end_time-start_time) + 
                " milliseconds.");

(Note that long is a form of int that permits a larger range of integers.)

In Experiment A3.1 you will use wall-clock timing to consider the amount of time it takes to compute Fibonacci numbers. In Experiment A3.2 you will use use wall-clock timing to investigate the exponentiation algorithms.

Problems with wall-clock timing

A number of disadvantages to using wall-clock timing exist, some of which you will explore in Experiment A3.1. First, timing is highly dependent on the machine being used. The same program can run much faster or much slower depending on the machine it is running on. In fact, the same program may run at different speeds on the same computer. In particular, a program is likely to run much more slowly on a computer that is doing a number of other tasks than on a computer that is just running that program.

It is often the case that programmers who time their programs end up timing things that they don't intend. For example, if you are timing an algorithm and doing input and output, you should time only the algorithm and not the input and output. This is particularly important because input and output can take significantly longer than many algorithms. In addition, things are often going on behind the scenes that you have no control over. For example, Java often spends longer on the first invocation of a method than on subsequent invocations.

Does this mean that you shouldn't use wall-clock timing? Certainly not. It is a valuable tool in your toolbox. If you are careful about your timing (e.g., you do your best to time only what you want to time, and you run all of your timing tests on the same computer with the same load), then it can give you useful results. As you develop larger programs, you may find it useful to use wall-clock timing to identify parts of your program that need to be improved.

Counting steps

Instead of measuring wall-clock time, we can augment our methods to count the number of steps they do. Obviously, counting every line executed in a program is difficult. Therefore, you will typically count one step for each function call and one step for each repetition of a loop. How do you do that counting? You may recall from the lab on recursion that we should

For example, the following class provides a Fibonacci method whose steps are counted.

 * A measured Fibonacci function.
 * @author Samuel A. Rebelsky
 * @version 1.0 of September 1998
public class MeasuredFibonacci {

  // +--------+---------------------------------------------------
  // | Fields |
  // +--------+

  /** The number of steps used to compute the nth Fibonacci number. */
  protected int steps;

  // +---------+--------------------------------------------------
  // | Methods |
  // +---------+

   * Reset the count of steps to 0.
  public void resetCounter() {
    steps = 0;
  } // resetCounter()

   * Determine the number of steps used since the last call to
   * resetCounter().
  public int getSteps() {
    return steps;
  } // getSteps()

   * Compute the nth Fibonacci number.
  public int fib(int n) {
    // Base case: n is 1 or 2
    if ((n == 1) || (n == 2)) {
      return 1;
    } // base case
    // Recursive case
    else {
      return fib(n-1) + fib(n-2);
    } // recursive case
  } // fib(int)
} // class MeasuredFibonacci

In Experiment A3.3 you will use count the steps to compute Fibonacci numbers. In Experiment A3.4 you will count the steps used in the two exponentiation algorithms.

Problems with counting

Are there also problems with measuring the execution time of algorithms by counting steps? Certainly. First of all, this mechanism requires you to augment your code with lines that increment the counter. This is often much more difficult than simply using a timer. It also requires you to have access to the source code of the methods you wish to measure.

This mechanism can also lead to problems if we use the same counter for different methods in that we may end up counting steps that we do not intend to count or if we fail to use the same counter. For an example of the latter, consider a method that has one step, a call to another method. If we don't count the steps used in the sub-method, our analysis will be quite inaccurate.

In addition, this type of measurement is easily affected by slight variations in the input. For example, consider what happens if we call the recursive exponentiation function with exponents of 16 and 15. Recall that we developed an efficient recursive exponentiation function in which the exponent in the recursive call is either one less than the exponent (if the exponent is odd) or half of the exponent (if the exponent is even). The number 16 is even, so we divide by two, leaving 8 (one steps). The number 8 is even, so we divide by two, leaving 3 (two steps). 4 is even, so we divide by two, leaving 2 (three steps). The number 2 is even, so we divide by two, leaving 1 (four steps). The number 1 is odd, so we subtract 1 from 1, leaving 0 (five steps). 0 is the base case (six steps). The number 15 is odd, so we subtract 1, leaving 14 (one step). The number 14 is even, so we divide by two, leaving 7 (two steps). The number 7 is odd, so we subtract 1, leaving 6 (three steps). The number 6 is even, so we divide by two, leaving 3 (four steps). The number 3 is odd so we subtract 1, leaving 2 (five steps). The number 2 is even, so we divide by two, leaving 1 (six steps). The number 1 is odd, so we subtract 1 from 1, leaving 0 (seven steps). 0 is the base case (eight steps). We did more steps for a smaller exponent!

Once again, this is an imperfect method. Nonetheless, it is a useful one, provided you understand its drawbacks.

Is there a better method for analyzing algorithms? Yes, you can measure them abstractly. However, we will not consider such analysis in depth in this session.

Binary search, revisited

We will conclude our study of algorithm analysis by returning to two key searching methods: sequential search and binary search. For our purposes, a search algorithm will determine the position of an integer in an array or inform us if the integer is not in the array. As you may recall, we've included such methods as part of the IntSeq class.

You may recall that our sequential search algorithm looked at the elements of the array, one-by-one, until it found the element or exhausted the elements in the array. In the former case, it returned the index of the matching item. In the latter case, it returned -1 to indicate an error. In the worst case, sequential search looks at every element of the array.

Can we do better? Certainly, if we know particular facts about the array. For example, if we know the smallest element in the array and we discover that the value we're looking for is smaller than that element, then we can report that the value is not there in one step. However, this is clearly an uncommon case.

We can also do better if the array is sorted. Then we can use the divide-and-conquer technique of binary search. In binary search, we look at the middle element. If the value we're looking for is less than the middle element, we throw away the upper half of the array. If the value we're looking for is greater than the middle element, we ``throw away'' the lower half of the array. You should be able to figure out what to do if the value we're looking for is equal to the middle element.

In Experiment A3.5 you will consider the number of steps the various searching algorithms take.

Finding groups of smallest entries

Suppose you were instead asked to find the two smallest entries in a sequence. One question you might ask would be ``How should I return two values?'' So that we need not concern ourselves with that question, let us instead try to move the two smallest entries to the first two positions of the array.

One approach would be to look through the sequence to find the smallest entry and move it to the front of the sequence, then look through all but the first element of the modified sequence for the next smallest element. Using the indexOfSmallest method described above, we might phrase this as

   * Put the two smallest elements of the sequence at the beginning
   * of the sequence.  The sequence must have at least two elements.
  public void twoSmallest() {
    // Swap the initial element with the smallest
    swap(0, indexOfSmallest(0, this.length()-1));
    // Swap the next element with the smallest remaining
    swap(1, indexOfSmallest(1, this.length()-1));
  } // twoSmallest()

As you might guess, swap(i,j) swaps the elements at positions i and j in the sequence.

Now, how might we put the five smallest elements in a sequence of 50 elements at the front of that sequence? One approach would be to comb through the sequence to find the smallest entry and move it to the front of the sequence. Next, you could comb through the 49 entries following this newly positioned entry to find the next smallest entry and move it to the position following the smallest entry. By repeating this process three more times, each time finding the smallest entry remaining in the sequence and placing it just behind the entry found in the previous pass, you will have placed the five smallest entries at the beginning of the sequence in increasing order of size.

When turning this narrative into code, it is appropriate to use a loop (since the five pieces are quite similar). For example,

   * Put the five smallest elements of the array at the beginning of
   * the array (naive method).  The sequence should have at least
   * five elements.
  public void fiveSmallest() {
    int i;
    // For each index i from 0 to 4,
    for (i = 0; i < 5; ++i) {    
      // Swap the smallest element in [i .. last] with the ith element.
      swap(i, indexOfSmallest(i, this.length()-1));
    } // for
  } // fiveSmallest()

What we have accomplished is a partial sorting of the sequence by selecting the smallest entries.

Our task now is to analyze the efficiency of this approach. This we do in terms of the number of times two entries in the sequence are compared. To find and position the smallest entry in the sequence requires 49 comparisons, to process the next smallest entry requires 48, and so on. Thus, the total number of comparisons to find the five smallest entries is

49 + 48 + 47 + 46 + 45 = 235

In general, applying this selection method to find the K smallest entries in a sequence of N entries requires

(1/2)(2*N*K - K2 - K)

comparisons. Can you derive this formula? Thus, to find the 10 smallest entries in a sequence of 10,000 entries requires 99,945 comparisons. We might also say that this is an O(nk) algorithm.

Can we do better? Recall that when we found the smallest element in a sequence, we began with a guess of the smallest and then refined that guess by looking at the remaining elements. We can do the same thing to find the five smallest elements. Initially, we'll assume that the first five elements are the five smallest elements. Sort the first five entries in the sequence by any method. Then consider the sixth entry. Compare it to the fifth entry in the sequence, which is now the largest of the first five entries. If the sixth entry is larger, pass over it because it is not one of the five smallest entries. If, however, the sixth entry is smaller than the fifth, compare it with the fourth, third, and so on, inserting it among the first five entries so that the first five entries in the list remain the smallest entries found so far in increasing order. Repeat this process for the entries in positions 7, 8, ..., 50.

Note that this process creates a partially sorted list by inserting the smallest entries into the beginning of the list. To see why the insertion strategy is superior to the selection strategy, let us compare the two approaches when searching for the 10 smallest entries within a list of 1000 entries. We suppose that our insertion strategy has reached the halfway point. The 10 smallest items in the first 500 have been found and we are about to consider the entry in position 501. If the original list was randomly scrambled, it is unlikely that this entry will be less than the tenth entry, and only one comparison is required to discover this. The same is true for all the entries in positions 501 through 1000. However, if one of these entries does belong among the top 10, then this will be discovered with one comparison, and at most nine more comparisons will be required to position it properly. This is much more efficient than our selection strategy, in which each of the last 500 entries is involved in 10 comparisons.

In Experiment A3.6 you will revisit the problem of finding the smallest element in a sequence. In Experiment A3.7, you will investigate these two algorithms for computing collections of smaller elements.

Copyright (c) 1998 Samuel A. Rebelsky. All rights reserved.

Source text last modified Tue Oct 26 13:07:51 1999.

This page generated on Tue Oct 26 15:38:39 1999 by Siteweaver.

Contact our webmaster at rebelsky@math.grin.edu