EBoard 18: Analyzing algorithms
Warning This class is being recorded. At least I think it is.
Approximate overview
- Administrivia
- Questions
- Analyzing algorithms
- Formalizing the notion
- Some practice
Administrivia
- Rant from Sam
- Don’t forget, we do have individual tutors for CSC-207, too.
Upcoming Token activities
Academic
- CS Extras, Thursday, 4:15 p.m, 3821, Declaring a CS major
- Hopefully, our Chair will produce a summary.
- If not, we may also spend a few minutes at the start of the next
class summarizing what was learned.
- Pom and Micah can also give you some info on this stuff.
Cultural
Peer
Wellness
Misc
- Thursday is homecoming parade in Grinnell at 5:30 in Central Park.
Other good things (no tokens)
Upcoming work
- Thursday: MP4
- Friday: MP4 post-assessment
- Friday: CLRS 2.3, CLRS 3. Skimming is okay.
Questions
Analyzing algorithms
Our goal as programmers / computer scientists is to build algorithms
that (help us) solve problems. We also build data structures to help
with that.
We generally want to know things about our algorithms, so that we can,
for example, decide which algorithm is best for our particular problem.
We normally think about N things.
- How fast is it? (Based on the size of the input.)
- How much memory does it use? (Based on the size of the input.)
- Is it correct? (Or which inputs does it work for?)
- It passes all of our tests.
- We write (in)formal proofs that our algorithms are correct.
- As you will learn, checking that an arbitrary proof is correct is
not computable.
We will be analyzing “speed” and “memory use”.
- Two techniques
- Run a lot of examples and try to fit a curve to them.
- Model the running time of an algorithm and focus on that.
- We’re going to do the latter.
- When modeling, we can be very detailed.
- We will avoid too many details.
- As inputs get large, the shape of the curve matters more than other
details.
- If algorithm 1 takes 1/1000 * n^2 steps
- And algorithm 2 takes n steps
- Where n is a number that represents the size of the input
- The 1/1000 * n^2 wins on small inputs, but eventually looses. Badly.
We use a few notations to talk about shape
- Theta(f(n)) - functions that take (approximately) some constant times
f(n) for sufficiently large input.
- O(f(n)) - functions that are bounded above by some constant times f(n)
for sufficiently large input.
We generally consider only a few classes of functions.
- O(1) - Those bounded above by a constant (constant-time functions).
- O(logn) - Log n: Those that increase their running time by a constant each
time you double the size of the input.
- O(n) - Linear: Those who running time depends linearly on the size of the input
- O(nlogn) - NlogN
- O(n^2) - Quadratic
- O(n^3) …
- O(2^n) - Exponential algorithms
- O(n!)
When designing algorithms, we usually care only about these asymptotic
bounds.
When choosing what algorithm to use for a program we’re building, we
usually fall back onto what someone has given us in a library.
Or … we think carefully about the input and choose one best suited
to the input.
Some constant-time algorithms/functions?
- Almost every basic operation we treat as constant time.
- Including
car, cdr, array references.
Some linear-time algorithms/functions
- Get the nth element of a linked list
- Shift elements in an array (n is the size of the array)
Figuring out the running time of an iterative algorithm
- Normal process:
- Look at each loop (inside out).
- Count the number of times the loop runs.
- Count the number of steps in the loop.
- Multiply the two.
for (i = 0; i < n; i++) {
pen.println(lst.valueAt(i));
}
- The loop runs n times
lst.valueAt takes i time (where lst is a linked list)
pen.println takes 1 time
- So, we multiply _ times (_ + __)
Two approaches:
- Assume that
lst.valueAt is always O(n).
- O(n * (n + 1)) = O(n^2 + n) = O(n^2)
- Note that we can throw away lower-order terms in big-O notation.
- Use a slightly different approach, don’t multiply n, but instead
add up all the steps.
- 1 + 2 + 3 + 4 + … n/2 + …. n-3 + n-2 + n-1 + n
- That sum is n*(n+1)/2 which is Theta(n^2) (also O(n^2)).
- Yay! Both give us the same answer.
Binary search, conceptually
- Input: Sorted array of values, a value we’re looking for
- Output: Either the index of the value, if it’s in the array
or -1 if it’s not in the array.
while (there are elements left to process) {
look at the middle element (using a comparator)
if the middle element is equal to the sought value,
return its index
if the middle element is larger than the sought value,
throw away everything larger (including the middle element)
otherwise the middle element is smaller than the sourght value,
throw away everything smaller (including the middle element)
}
- “Look at the middle element” - constant time
- Find the index - constant time
- Look at that location - constant time
- Return its index - constant time
- Comparing the middle element to the sought value - Constant time
- Throw away half the array
- Depends on we do it.
- If we copy to a new array, linear
- If we keep track of the lower-bound and upper-bound of the
portion of interest, it’s constant time
How many times does this loop run? (in terms of n)
Consider the example in which n is 1000; you can approximate.
- 1000 -> 500 -> 250 -> 125 -> 63 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1
- 11 steps.
Consider the example in which n is 2000; you can approximate.
- 12 steps (just one more step and we’re down to 1000)
Consider the example in which n is 4000; you can approximate.
- 13 steps (just one more step).
Algorithms in which you add a constant number of steps when you double
the size of the input are … logn
A function f(n) is in O(g(n)) iff there exist values n0 and c, such that
for all n > n0, f(n) <= c*g(n).
- The
<= is “f(n) is bounded above by”
- The
c is “ignoring constant multipliers”
- The
for all n > n0 represents for sufficiently large inputs
A function f(n) is in Theta(g(n)) iff there exist values n0, b, and c, such that
for all n > n0, bg(n) <= f(n) <= cg(n).
Some exercises
What is result in the following?
result = 0;
for (i = 1; i < n; i = i*2) {
result + i; // Models a case in which the stuff inside the loop takes O(i)
}
print result;
Hypotheses about result
- O(1) - nope, I see an increase
- O(logn) - nope, I don’t see a constant increase when doubling.
- O(nlogn) - ???
Analysis
- Suppose n is 10
- Suppose n is 20
- Suppose n is 40
- 1 + 2 + 4 + 8 + 16 + 32 = 63
- Suppose n is 80
- 1 + 2 + 4 + 8 + 16 + 32 + 64 = 127
- Suppose n is 100
- 1 + 2 + 4 + 8 + 16 + 32 + 64 = 127
Is this O(1), O(logn), O(n), O(n log n), …?
It’s O(n), linear
1 + 2 + 4 + 8 + 16 + … + 2^k = 2^(k+1)-1 = 2*2^k
- k=0: 1 (2)
- k=1: 3 (4)
- k=2: 7 (8)
- k=3: 15 (16)
- k=4: 31 (32)
- k=5: 63
- k=6: 127
- k=7: 255
- k=8: 511
- k=9: 1032
- k=10: 2063
- k=11: 4127
result = 0;
for (i = 1; i < n; i = i+1) {
result += i; // Models a case in which the stuff inside the loop takes O(i)
}
print result;
1 + 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + …. + 1/2^k = something a little less than 2.