CSC153 2004S, Class 21: Algorithm Analysis (2)
Admin:
* HW4 due. Make sure your name is at the top!
* Exam 1 to be returned tomorrow (I hope).
* Exam 2 to be distributed next week (probably Wednesday).
* Read "Searching"
* Today's outline may have extra stuff (particularly stuff we did Monday).
* Distributed: Mr. Stone's book. (Optional reading.)
* Cool panel tomorrow with FREE LUNCH!
Overview:
* Review: Big-O notation and its benefits.
* An example: Comparing two running times.
* Analyzing recursive times
* Recurrence relations
* Samples
/Big-O Notation/
* To Mathematicians: A way to compare functions
* To Computer Scientists: A way to approximate running times
* Approximate definition:
g(n) is in O(f(n)) iff
there exist n0, c>0 s.t.
for all n > n0 |c*g(n)| <= |f(n)|
* "n0" : For big enough input
* "c" : We don't care about constant multipliers, particularly since they differ from machine to machine
* "absolute value": Mathematicians like to talk about functions that may return negative numbers
/Why Big-O Notation May Be Useful/
* Experimental results:
* Algorithm A:
* Input size 100: 1 second
* Input size 200: 4 seconds
* Input size 300: 9 seconds
* Algorithm B:
* Input size 100: 100 seconds
* Input size 200: 200 seconds
* Input size 300: 300 seconds
* Which is faster for really large inputs? Suppose that careful analysis of the code reveals that
* Algorithm A's running time is approximately (n/100)^2
* Algorithm B's running time is approximately n
* (There are other results): A could also be linear with experimental error
* As Erik notes, for a particularly big input, B will be faster. Claim: 10,000 is the turning point
* (10,000/100)^2 = 100^2 = 10,0000
* For bigger inputs, we see gains
* (20,000/100)^2 = 200^2 = 40,000 vs. 20,000 for B
* Conclusion: The constant multiplier really doesn't matter. The shape of the curve matters.
* Therefore, we can analyze primarily in terms of the shape of the curve.
n is in O((n/100)^2)
(n/100)^2 is not in O(n)
For iterative algorithms: Analysis is comparatively easy
* Count the number of individual steps
* Count the number of repetitions
* Multiply the two
* Throw away constant multipliers
Sometimes this feels "wrong" (overly pessimistic)
procedure selection-sort(array-of-ints numbers)
for i = 0 to numbers.length
swap(numbers[i], indexOfSmallest(numbers, i)
procedure indexOfSmallest(array-of-ints numbers, int start)
guess = start;
for j = start + 1 to numbers.length
if numbers[j] > numbers[guess]
guess = j;
return guess;
Claim: n repetitions of indexOfSmallest, each of which costs n
O(n^2)
But ... the second time you call indexOfSmallest n-1 elements
the third time n-2
the fourth time n-3
after awhile 5, 4, 3, 2, 1
Alternate analysis
n + n-1 + n-2 + n-3 + ... 3 + 2 + 1 = n(n+1)/2
The story of Gauss
K: n + n-1 + n-2 + n-3 + ... 3 + 2 + 1
K: 1 + 2 + 3 + 4 + n-2 + n-2 + n
--------------------------------------------
2*K = n+1 + n+1 + n+1 .... + n+1 (n such sums)
2*K = n*(n+1)
K = n*(n+1)/2
n*(n+1)/2 is in O(n^2)
/But what about recursive procedures?/
* Sometimes: Count the number of recursive calls; Count the cost per recursive call; Multiply! Throw away constants.
* Fibonacci
(define (fib n)
(if (< n 2) 1
(+ (fib (- n 1)) (fib (- n 2)))))
* How many recursive calls do you do? (It's surprisingly hard to count)
(Slightly less than 2^n)
Why is the Fibonacci sequence useful?
* Limit of nth/n-1st approximates the golden ratio 1+sqrt(3)/2
Observation: You can approximate the nth Fibonacci number by computing (1+sqrt(3)/2)^n
/Alternate Strategy: Define Functions Recursively and Analyze/
time(n) = c + time(n-1)
* Goal: Eliminate the recursion
time(n) = c + c + time(n-2)
time(n) = c + c + c + time(n-3)
time(n) = k*c + time(n-k)
Assume the base case takes 0 steps
When n = k, time(n) = n*c in O(n)
/Sample Recurrence Relations and Their Solutions/
* time(n) = n + time(n-1)
time(n) = n + n-1 + time(n-2)
time(n) = n + n-1 + n-2 + time(n-3)
time(n) = Gauss = n*(n-1)/2
time(n) = c + 2*time(n/2)
time(n) = c + 2c + 2*time(n/4)
time(n) = c + 2c + 4c + 2*time(n/8)
time(n) = c + 2c + 4c + 8c + 2*time(n/16)
time(n) = c(1 + 2 + ... + 2^(k-1)) + 2*time(n/2^k)
When k = log_2n
Assume time(1) = 0
time(n) = c*(1 + 2 + 4 + ... 2^k) = c*(2^(k+1)-1) is in O(n)
time(n) = c + time(n/2)
time(n) = c + c + time(n/4)
time(n) = c + c + c + time(n/8)
time(n) = k*c + time(n/2^k)
when k = log_n
time(n) = c*log_2(n) + time(1) in O(log_2(n))
time(n) is in O(log_2(n))
* time(n) = c n + time(n-1)
* time(n) = c * time(n-1)
* time(n) = c * time(n/2)