CSC 207.02 2019S, Class 17: Analyzing recursive algorithms

Overview

Preliminaries
- Notes and news
- Upcoming work
- Extra credit
- Questions
Quiz
Big-O, revisited
Iterative analysis, revisited
Recurrence relations
Approaches to recurrence relations

Preliminaries

News / Etc.

Welcome to any prospective students we have. Thank you for bringing warmer weather with you.
I’m back! I hope that you had a good time without me. I apologize for the inconsistency in communication.
I brought you conference swag. (One of each item per person.)
Blake says “Be proud that you are able to think technically and talk about it,.”

Upcoming work

Assignment 5 due Tuesday night.
Exam 1 to be distributed in concrete form tonight. Sorry for the delay in getting it out.
- Prologue due Thursday night
- Exam due the following Thursday.
Reading for Wednesday: Anonymous functions (to be posted tonight)
Lab writeup: [None]

Extra credit

Extra credit (Academic/Artistic)

March 8-10 (7:30 7:30 2:00), Twelfth Night. Box office opens today at noon.

Extra credit (Peer)

Grinnell Singers March 10 at 2pm.

Extra credit (Wellness)

Extra credit (Wellness, Regular)

30 Minutes of Mindfulness at SHACS every Monday 4:15-4:45
Any organized exercise. (See previous eboards for a list.)
60 minutes of some solitary self-care activities that are unrelated to academics or work. Your email reflection must explain how the activity contributed to your wellness.
60 minutes of some shared self-care activity with friends. Your email reflection must explain how the activity contributed to your wellness.

Extra credit (Misc)

Other good things

Environmental talk tonight at 7:30 in Noyce 2021. Sounds really cool. (Appropriate for this weather.)

Questions

What’s the problem with the linear average algorithm?

Potential overflow!

When will we get assignments back?

Soon, I hope, except for the evil assignment (which you all got 25 on).

Sam will push on the graders. (Or maybe Sam will push on Sam.)

Quiz

Joy and fun, maybe.

Big-O analysis, revisited

What is Big O and why do we use it?

A way to analyze programs in terms of how long they take (or how much memory they use)
A way to classify functions (e.g., linear, exponential) E.g. A function in O(n) is “linear”.
We write a function that models how long our algorithm takes (how much memory it uses)
We might want to compare our model to actual experiments.
Big-Oh notation is used to provide upper bounds on functions
Using a formally defined mechanism
That lets us describe the overall “shape” of the bound of a function.

Formal definition

f(n) is in O(g(n)) iff exist c > 0, n0 > 0, s.t. for all n > n0, f(n) <= c*g(n).
<= indicates the upper bound
n > n0, “for sufficiently large n”. Compare 10000000*n vs n^2/100. For small n, you should use the quadratic one, but when n is big enough the n^2/100 dominates.
c > c0, “we don’t care about constant multipliers; we care primarily about the overall shape”
- Most of our analyses do not carefully distinguish between the various costs of “constant time” operations (e.g., addition, multiply, functional call are all “1 unit”)

The formal definition of Big O lets us prove a variety of important properties of the notation.

If f(n) is in O(g(n)) and g(n) is in O(h(n)), f(n) is in O(h(n))
- 5*n^2 is in O(n^2); it is also in O(n^44)
- We try to pick the tightest bound possible.
- How do we prove this?
  - Set theory.
  - Given the c and n0 for the first rule, c and n0 for the second rule (d and n1), come up with a c and n0 (C and N0) for the third rule. E.g., C = c*d and N0 = max(n0,n1)
  - f(n) < cg(n), g(n) < dh(n), so cg(n) < cd*h(n)
  - Then we have transitivity of <.
If f(n) is in O(g(n)) then f(n)+g(n) is in O(g(n))
- You can throw away lower-order terms.
- E.g., 5n+n^2 is in O(n^2)
c*f(n) is in O(f(n))

Iterative analysis, revisited

Normal techniques for bounding algorithms.

Take a structure, have a rule for bounding that structure.

Bound on a sequence of steps is the sum of the bounds of the individual steps.

a[0] = 1 // 1 step
a[0] = largest(a) // n steps

Bound on a for loop E.g.,

Count the number of times the loop executes
Count the cost of the body of the loop
Multiply the two

selection_sort(int a) {
  for (int i = 0; i < n; i++) {
    swap(a[i], index_of_smallest(a, i))
  }
}
// Find the location of the smallest element in the array,
// looking starting at start.
int index_of_smallest(int a, int start) {
  ...
}

This loop executes n times
In the body, we compare (hidden) [1], increment (hidden) [1], swap [1], and compute the index of the smallest [n]
The running time of this algorihtm is n(3+n) = 3n + n^2 in O(n^2)

What is the cost of a conditional?

if (test) {
  consequent;
} else {
  alternate;
}

if (test) {
  x = x+1;
} else {
  for (int i = 0; i < number_of_atoms_in_the_universe; i++) {
  }
}

Cost is cost of test + max of cost of consequent and alternative

You can do a lot of analysis like this, but sometimes it’s helpful to unroll your loops.

for (int i = n; i > 1; i = i/2) {
  a[i] = smallest(a, i);        // smallest looks at positions 0 ... i
} // for

Number of repetitions: log_2(n) (also written as log(n) or logn)
Cost per repetition: n
Product: O(nlogn)

Unroll the loop

First iteration: n
Second iteration: n/2
Third iteration: n/4
Fourth iteration: n/8
Kth iteration: n/2^(k-1)

What is n + n/2 + n/4 + n/8 + n/2^k (or what’s a bound on it?)

k=0: n
k=1: n + n/2 = 3n/2
k=2: n + n/2 + n/4 = 7n/4
k=3: n + n/2 + n/4 + n/8 = 15n/8
k=4: n + n/2 + n/4 + n/8 + n/16 = 31n/16
k=5: n + n/2 + n/4 + n/8 + n/16 + n/32 = 63n/32
General: 2n - 1/2^k, approaches 2n

Suggestion

Our algorithm is in O(n)

Notes

We just concluded it’s in O(n)
We previously concluded it’s in O(nlogn)
The O(n) is a better (closer) bound.

Recurrence relations

As computer scientists, we often write recursive algorithms.

merge_sort(A) {
  if (A.length <= 1) {
    // Do nothing, it's sorted
  }
  else {
    split array into two new subarrays A1 and A2
    A1 = merge_sort(A1)
    A2 = merge_sort(A2)
    combine them back together
  }
}

To analyze this algorithm, we’ll invent a function, T(n), that represents the running time

T(n) = 1 (for test) + n (to split) + T(n/2) + T(n/2) + n (to merge)
T(n) = 1 + 2n + 2(T(n/2))
T(1) = 1
T(0) = 1

How do we figure out what functions this rule indicates? How do we find the fixed form of this recursive formulation?

T(n) = 1 + 2n + 2(T(n/2))

Approaches to recurrence relations

Let’s try a slightly simpler one. We’ll try repeated expansion

T(x) = x + 2*T(x/2)
T(n/2) = n/2 + 2*T(n/2/2)
T(n/4) = n/4 + 2*T(n/4/2)

Following the steps

T(n) = n + 2*T(n/2)
T(n) = n + 2(n/2 + 2*T(n/4))
T(n) = n + n + 4*T(n/4)
T(n) = 2n + 4*T(n/4)
T(n) = 2n + 4(n/4 + 2T(n/8))
T(n) = 2n + n + 8T(n/8) = 3n + 8T(n/8)
T(n) = kn + 2^k*T(n/2^k)
When k is logn, 2^k is n. T(N) = nlogn + n*T(1) is in O(nlogn)

Unless specified otherwise elsewhere on the page, this work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

This website was built using Jekyll, Twitter Bootstrap, and the Bootswatch Cosmo Theme.