Algorithms and OOD (CSC 207 2014F) : Readings
Primary: [Front Door] [Schedule] - [Academic Honesty] [Disabilities] [Email] - [Learning Outcomes] [FAQ] [Teaching & Learning] [Grading] [Rubric] - [Calendar]
Current: [Assignment] [EBoard] [Lab] [Outline] [Reading]
Sections: [Assignments] [EBoards] [Examples] [Handouts] [Labs] [Outlines] [Readings]
Reference: [Student-Curated Resources] [Java 8 API] [Java 8 Tutorials] [Code Conventions]
Related Courses: [CSC 152 2006S (Rebelsky)] [CSC 207 2014S (Rebelsky)] [CSC 207 2014F (Walker)] [CSC 207 2011S (Weinman)]
Misc: [Submit Questions] - [SamR] [Glimmer Labs] [CS@Grinnell] [Grinnell] - [Issue Tracker (Course)] [Issue Tracker (Textbook)]
Summary: We consider the standard problem of searching an array for a value or values. That is, we consider algorithms that, given a target value, a collection of values, and a notion of equality, find the position of the target value. We explore two common searching algorithms, linear search and binary search. We also consider ways in which object-oriented design (and its instantiation in the Java programming language) may lead us to think about searching in slightly different ways.
One of the most common tasks computer programs do is search - look through a collection of information for something matching particular criteria. Sometimes the search is huge and illformed, as in Google searches. Sometimes the search is small and precise.
There are a host of ways to organize information to make searching faster and easier. We'll consider some of those mechanisms this semester. For now, we'll start with one of the simplest phrasings of the search problem: given a value to search for and an array of values to search in, find the index of the value in that array.
The simplest searching algorithm is often called linear search, and involves looking at each element in turn until we find one that matches.
For each index, i, in values
If values[i] is equal to val
return i;
If the value was never found, signal that problem
How long does linear search take? Well, it depends. If the thing we're searching for appears near the beginning of the array, it takes only a few steps. If it appears at the end of the array, we'll need to look at almost all of the elements of the array. And, if it doesn't appear at all, we need to look at every element.
Since, in the worst case, we look at every element, our loop will run O(n) times. Each repetition of the loop requires a constant number of steps (or so we hope - comparison should be fast), so the algorithm takes O(n) time.
Is there anything special about linear search in Java? A few things. Let's start with the “obvious” implementation.
/** * Search values for an object equal to value. * * @param value * A value to search for * @param values * An array of values to search * @return index * The index of a value equal to value * @pre * Elements of values are all comparable to value using equals. * @post * value.equals(values[index]) * @throws Exception * If the value is incomparable to an element of values. * @throws NotFound * If no equal value appears. */ public static int search(Object value, Object values[]) throws NotFound, Exception { for (int i = 0; i < values.length; i++) { if (value.equals(values[i])) return i; } // for throw new NotFound(value + " does not appear"); } // search
Experienced functional and object-oriented programmers might note
that an important element seems to be implicit in this procedure.
That is, we use the equals
method of the
value to compare values. However, it turns out that when we're
searching an array, we may want to search using different criteria.
For example, if we're searching for a person, we might want to
match by last name, id number, or even some physical characteristics.
Hence, rather than just relying on the equals
,
we really should be passing in an object that knows how to compare
things for equality. (A functional programmer would just pass
in a function, but we need objects in Java.) We might call such
a thing an EqualityComparator
and define it with an
interface something like the following.
public interface EqualityComparator { /** * Determine if x and y are equal using some comparison metric. */ public boolean equals(Object x, Object y) throws IncomparableException; } // interface EqualityComparator
Once we've done so, we would rewrite our linear search algorithm as follows.
/** * Search values for an object equal to value. * * @param value * A value to search for * @param values * An array of values to search * @param EqualityComparator * The equality metric * @return index * The index of a value equal to value * @pre * Elements of values are all comparable to value using equals. * @post * value.equals(values[index]) * @throws IncomparableException * If the value is incomparable to an element of values. * @throws NotFound * If no equal value appears. */ public static int search(Object value, Object values[], EqualityComparator eq) throws NotFound, IncomparableException { for (int i = 0; i < values.length; i++) { if (eq.equals(value,values[i])) return i; } // for throw new NotFound(value + " does not appear"); } // search
If we want to compare people by last name, we might write something like the following.
public class ComparePersonByLastName implements EqualityComparator { public boolean equals(Object x, Object y) { return (x instanceof Person) && (y instanceof Person) && ((Person) x).getLastName.equals((Person) y).getLastname(); } // equals(Object, Object) } // ComparePersonByLastName
Then, to search by last name, we might do something like
Utils.search(new Person("Smith", ""), people, new ComparePersonByLastName());
Of course, it's a bit silly to have to make a new comparison object each time we search. Later on, we'll learn the “Singleton” design pattern to help with this issue. As an alternative, we'll also ll learn how to write such one-shot classes “on the fly”, without bothering to name them. (Yes, we have anonymous classes in object-oriented programming, just as we have anonymous functions in functional programming.)
You may be wondering why we're building an object to search for and a comparator to search with. Others have asked the same thing, Another alternative is to do away with the value altogether and simply pass in a predicate that returns true (representing “yes, this item is acceptable”) or false (representing “no, this item is not acceptable”).
public interface Predicate { /** * Test if val meets this predicate. */ public boolean test(Object val); } // interface Predicate
Now, our linear search might look something like the following. (We've ellided most of the documentation, since it's not much different.)
/** * Search values for a value for which pred holds. */ public static int search(Predicate pred, Object values[]) throws NotFound, Exception { for (int i = 0; i < values.length; i++) { if (pred.test(values[i])) return i; } // for throw new NotFound(value + " does not appear"); } // search
And we can build a predicate to search for a last name of Smith using something like the following.
public class IsSmith implements Predicate { public boolean test(Object o) { return (o instanceof Person) && ((Person) o).getLastName().equals("Smith"); } // test(Object) } // class IsSmith
Isn't writing general code fun?
But wait! Run-time type tests using
instanceof
make many Java programmers nervours. After all,
why use a language with compile-time type checking if you still need
to do run-time type checking? Wouldn't it be better to say that
our comparator only works with Person
objects? In early
versions of Java, it was not possible to specify the particular types
you worked with - you had to do run-time type checking. But newer
versions of Java introduced a concept known as
“generics” which allow you to say
that you work with a particular type. In essence, you give a
parameter to your interface that specifies a type, and you can then
use that type in your other specifications. Here's an example. We'll
consider the general strategy in more depth in the near future.
Let's start with the interface.
public interface Predicate<T> { /** * Test if val meets this predicate. */ public boolean test(T val); } // interface Predicate<T>
You'll note that there are two changes. We've added the
<T>
after the interface name. The thing in
the angle brackets is a type variable. When we
implement the interface, we can specify what type we use. You'll
note that we use that type variable in the specification of the
test
method. In effect, when you put
in a class name for T
, the Java compiler
substitutes it into the header. So, the implicit interface
Test<Person>
declares the method
public boolean test(Person val);
We can now rewrite our predicate as follows
public class IsSmith implements Predicate<Person> { public boolean test(Person val) { return val.getLastName().equals("Smith"); } // test(Person) } // class IsSmith
Isn't that much nicer?
Of course, we must also rewrite our search method to accept generics.
Note that we put the type parameter immediately after the
static
keyword.
/** * Search values for a value for which pred holds. */ public static <T> int search(Predicate<T> pred, T values[]) throws NotFound, Exception { for (int i = 0; i < values.length; i++) { if (pred.test(values[i])) return i; } // for throw new NotFound(value + " does not appear"); } // search
Some folks worry that linear search is biased toward elements that appear at the start of the array and hence prefer to search arrays “randomly”. Here's a simple version of a random search algorithm.
Repeat
i = a random number in the range 0 .. n-1
If values[i] is equal to val
Return i;
Until we've examined all of the values in the array
If the value was not found, signal an error
Of course, this implementation has two significant problems. First, it can be hard to tell if we've examined all of the values in the array. (I'm sure that you can figure out some strategies.) Second, we will often look at some values more than once, which is inefficient. What's the solution? Instead of randomly generate an index, we start with the list of all possible indices and randomly permute it. We can then just look at the elements in that order.
The binary search algorithm is a standard divide-and-conquer algorithm. It requires that the input be sorted. At the highest level, binary search does the following
While elements remain
Look at the middle element
If it matches our target, return the index of the middle element
If it is smaller than our target, throw away the bottom half
If it is larger than our target, throw away the top half
If no elements remain, report failure
But how do we “throw away” elements in an array? Traditionally, we keep track of the lower and upper bounds of the portion of the array of interest. So, initially, the lower bound is 0 and the upper bound is length - 1.
While lb <= ub
mid = midpoint(lb, ub);
If values[mid] equals value
return mid;
Else if values[mid] < value
lb = mid + 1;
Else
ub = mid - 1;
Report Failure
We can also express this algorithm recursively.
binarySearch(value, values, lb, ub)
If (lb > ub)
Report Failure
mid = midpoint(lb, ub);
If values[mid] equals value
return mid;
Else if values[mid] < value
return binarySearch(value, values, mid+1, ub);
Else
return binarySearch(value, values, lb, mid-1);
Why prefer one over the other? I find the recursive version easier to analyze. However, unless your compiler optimizes tail calls, the iterative version is likely to be slightly faster.
As noted above, the recursive version is probably a little easier to analyze.
What's the running time for input of size n? Well, we do a constant number of tests (no more than three, I believe) and we also have to compute the midpoint of two values. And then we probably have to recurse on about half of the input (maybe exactly half, maybe slightly less.) That gives the recurrence relation
How can we solve this relation? Let's try a few values.
Do you see a pattern yet? It looks like time(2^{k}) = k*c + b. (You can prove this via induction, if you'd like.) Since k grows as the log of 2^{k}, it looks like binary search has running in O(log_{2}n).
But wait! We've only analyzed powers of 2. Does that matter? Probably not. The upper bound on the running time is clearly increasingly monotonically. So, the number of steps, for any n is bounded by the number of steps for a power of two greater than n. (You can finish this proof, too.)
In implementing binary search, one important question is how we compare
individual elements. In linear search, we had to compare only for
equality, which allowed us to use the built-in equals
method (or a few alternates). In binary search, we need to know if
the value we're looking at should appear before or after a non-equal
value. (We also still need to know if it's equal.) Fortunately,
Java provides an interface for just that purpose,
java.util.Comparator
. You should read the documentation
for that class. However, here's a quick snippet of what it might look
like.
public interface Comparator<T> { /** * Compare o1 and and o2 for order. Return a negative number of o1 * comes before o2, zero if the two are equal, and a postive number * if o1 comes after o2. */ public int compare(T o1, To2); } // Comparator<T>
We'll leave it as an exercise for you to write binary search.
Portions of this reading are based on materials prepared for Grinnell College's CSC 151 by Janet Davis, Samuel A. Rebelsky, John David Stone, Henry Walker, and Jerod Weinman.