EBoard 18: Analyzing algorithms

Warning This class is being recorded. At least I think it is.

Approximate overview

Administrivia
Questions
Analyzing algorithms
Formalizing the notion
Some practice

Administrivia

Rant from Sam
Don’t forget, we do have individual tutors for CSC-207, too.

Upcoming Token activities

Academic

CS Extras, Thursday, 4:15 p.m, 3821, Declaring a CS major
- Hopefully, our Chair will produce a summary.
- If not, we may also spend a few minutes at the start of the next class summarizing what was learned.
- Pom and Micah can also give you some info on this stuff.

Cultural

Peer

Soccer Saturday at 1:30

Wellness

Misc

Thursday is homecoming parade in Grinnell at 5:30 in Central Park.

Other good things (no tokens)

Upcoming work

Thursday: MP4
Friday: MP4 post-assessment
Friday: CLRS 2.3, CLRS 3. Skimming is okay.

Questions

Analyzing algorithms

Our goal as programmers / computer scientists is to build algorithms that (help us) solve problems. We also build data structures to help with that.

We generally want to know things about our algorithms, so that we can, for example, decide which algorithm is best for our particular problem.

We normally think about N things.

How fast is it? (Based on the size of the input.)
How much memory does it use? (Based on the size of the input.)
Is it correct? (Or which inputs does it work for?)
- It passes all of our tests.
- We write (in)formal proofs that our algorithms are correct.
- As you will learn, checking that an arbitrary proof is correct is not computable.

We will be analyzing “speed” and “memory use”.

Two techniques
- Run a lot of examples and try to fit a curve to them.
- Model the running time of an algorithm and focus on that.
We’re going to do the latter.
When modeling, we can be very detailed.
We will avoid too many details.
As inputs get large, the shape of the curve matters more than other details.
If algorithm 1 takes 1/1000 * n^2 steps
And algorithm 2 takes n steps
Where n is a number that represents the size of the input
The 1/1000 * n^2 wins on small inputs, but eventually looses. Badly.

We use a few notations to talk about shape

Theta(f(n)) - functions that take (approximately) some constant times f(n) for sufficiently large input.
O(f(n)) - functions that are bounded above by some constant times f(n) for sufficiently large input.

We generally consider only a few classes of functions.

O(1) - Those bounded above by a constant (constant-time functions).
O(logn) - Log n: Those that increase their running time by a constant each time you double the size of the input.
O(n) - Linear: Those who running time depends linearly on the size of the input
O(nlogn) - NlogN
O(n^2) - Quadratic
O(n^3) …
O(2^n) - Exponential algorithms
O(n!)

When designing algorithms, we usually care only about these asymptotic bounds.

When choosing what algorithm to use for a program we’re building, we usually fall back onto what someone has given us in a library.

Or … we think carefully about the input and choose one best suited to the input.

Some constant-time algorithms/functions?

Almost every basic operation we treat as constant time.
Including car, cdr, array references.

Some linear-time algorithms/functions

Get the nth element of a linked list
Shift elements in an array (n is the size of the array)

Figuring out the running time of an iterative algorithm

Normal process:
- Look at each loop (inside out).
- Count the number of times the loop runs.
- Count the number of steps in the loop.
- Multiply the two.

for (i = 0; i < n; i++) {
  pen.println(lst.valueAt(i));
}

The loop runs n times
lst.valueAt takes i time (where lst is a linked list)
pen.println takes 1 time
So, we multiply _ times (_ + __)

Two approaches:

Assume that lst.valueAt is always O(n).
- O(n * (n + 1)) = O(n^2 + n) = O(n^2)
- Note that we can throw away lower-order terms in big-O notation.
Use a slightly different approach, don’t multiply n, but instead add up all the steps.
- 1 + 2 + 3 + 4 + … n/2 + …. n-3 + n-2 + n-1 + n
- That sum is n*(n+1)/2 which is Theta(n^2) (also O(n^2)).
Yay! Both give us the same answer.

Binary search, conceptually

Input: Sorted array of values, a value we’re looking for
Output: Either the index of the value, if it’s in the array or -1 if it’s not in the array.

while (there are elements left to process) {
  look at the middle element (using a comparator)
  if the middle element is equal to the sought value, 
    return its index
  if the middle element is larger than the sought value, 
    throw away everything larger (including the middle element)
  otherwise the middle element is smaller than the sourght value,
    throw away everything smaller (including the middle element)
}

“Look at the middle element” - constant time
- Find the index - constant time
- Look at that location - constant time
Return its index - constant time
Comparing the middle element to the sought value - Constant time
Throw away half the array
- Depends on we do it.
- If we copy to a new array, linear
- If we keep track of the lower-bound and upper-bound of the portion of interest, it’s constant time

How many times does this loop run? (in terms of n)

Consider the example in which n is 1000; you can approximate.

1000 -> 500 -> 250 -> 125 -> 63 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1
11 steps.

Consider the example in which n is 2000; you can approximate.

12 steps (just one more step and we’re down to 1000)

Consider the example in which n is 4000; you can approximate.

13 steps (just one more step).

Algorithms in which you add a constant number of steps when you double the size of the input are … logn

Formalizing the notion

A function f(n) is in O(g(n)) iff there exist values n0 and c, such that for all n > n0, f(n) <= c*g(n).

The <= is “f(n) is bounded above by”
The c is “ignoring constant multipliers”
The for all n > n0 represents for sufficiently large inputs

A function f(n) is in Theta(g(n)) iff there exist values n0, b, and c, such that for all n > n0, bg(n) <= f(n) <= cg(n).

Some exercises

What is result in the following?

result = 0;
for (i = 1; i < n; i = i*2) {
  result + i;   // Models a case in which the stuff inside the loop takes O(i)
}
print result;

Hypotheses about result

O(1) - nope, I see an increase
O(logn) - nope, I don’t see a constant increase when doubling.
O(nlogn) - ???

Analysis

Suppose n is 10
- 1 + 2 + 4 + 8 = 15
Suppose n is 20
- 1 + 2 + 4 + 8 + 16 = 31
Suppose n is 40
- 1 + 2 + 4 + 8 + 16 + 32 = 63
Suppose n is 80
- 1 + 2 + 4 + 8 + 16 + 32 + 64 = 127
Suppose n is 100
- 1 + 2 + 4 + 8 + 16 + 32 + 64 = 127

Is this O(1), O(logn), O(n), O(n log n), …?

It’s O(n), linear

1 + 2 + 4 + 8 + 16 + … + 2^k = 2^(k+1)-1 = 2*2^k

k=0: 1 (2)
k=1: 3 (4)
k=2: 7 (8)
k=3: 15 (16)
k=4: 31 (32)
k=5: 63
k=6: 127
k=7: 255
k=8: 511
k=9: 1032
k=10: 2063
k=11: 4127

result = 0;
for (i = 1; i < n; i = i+1) {
  result += i; // Models a case in which the stuff inside the loop takes O(i)
}
print result;

1 + 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + …. + 1/2^k = something a little less than 2.

Unless specified otherwise elsewhere on this page, this work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

This website was built using Jekyll, Twitter Bootstrap, and the Bootswatch Cosmo Theme.