EBoard 19: Analyzing recursive algorithms

Warning This class is being recorded and transcribed. At least I think it is.

Approximate overview

  • Administrivia
  • About MP5
  • Our GitHub testing repo
  • Summary of yesterday’s majors session
  • Questions
  • Advantages of big-O notation
  • Some more practice analyzing iterative algorithms
  • Analyzing recursive algorithms
  • More practice

Administrivia

  • Happy Friday

Upcoming Token activities

Academic

  • Mentor Session, Sunday, 4pm.

Cultural

Peer

  • Soccer Saturday

Wellness

  • CS picnic, today, 4pm, Natatorium

Misc

  • Second-year spotlight event, today, 4:00-6:30 p.m. in the Kernel or Husk Atrium.
    • Stop by and visit OCS and tell them Sam sent you.
  • Fireman’s breakfast, Sunday 6am-noon at the firehouse. Free will donation.
    • I will reimburse you up to $10.

Other good things (no tokens)

  • Volleyball vs. Lawrence Saturday at 1pm (Senior Day).
  • Play this weekend. Everybody. Friday and Saturday at 7:30 p.m. in Flanagan. Sunday at 2:00 also in Flanagan. Get tickets at the box office starting at noon on Thursday. (Maybe before.) The box office is in Bucksbaum, near the courtyard.

Upcoming work

  • Tonight: MP4 post-assessment
  • Sunday night: MP5 pre-assessment
  • Monday: Readings on lambda in the Java Tutorial and Priority Queues
  • Thursday: MP5

Friday PSA

  • You are awesome.
  • I’d like you to be well.
  • Think about the legality of what you do.
  • Choose what is appropriate for you.
  • Consent is essential.

About MP5

  • We are writing the back end for an AAC which maps presses on buttons to spoken text.
  • You get to see what other people’s assignments look like.

Our testing repo

Having a common repo ended up being a good example of what goes wrong when you have a common repo (and novice GitHub users). We’re going to talk through some of them.

Sam: Turn off the recording.

Why do random files get added?

Because someone typed something like git add . or git add *, rather than the better git add FILENAME.

The other commands sometimes add extra cruft to the repo.

Important note: The .gitignore file tells git what not to add when you do something careless like git add . (which way too many guides tell you to do).

Sam: Turn on the recording.

Some other notes:

  • Once you’re working with real code and other people, NEVER commit code that doesn’t run.
  • Be careful about putting cruft in the repo.
  • Sam needs to teach you about branching.

Summary of yesterday’s majors session

  • What classes are being offered in the future? See the diagram.
  • How do I get an advisor? We’ll have an advisor session in the spring. You will attend, hear about our advising philosophy (Sam’s is “Make fun of your advisees”), fill out a form giving preferences, and cross your fingers.
  • Your advisor is usually one of the CS faculty (or Sarah Dahlby Albright) (or Henry Walker) (or Liz Rodrigues). Sam will ask Charlie (Our department chair, not a tuna with good taste).
  • If I’ve declared another major will I get closed out of classes this spring? Potentially. I haven’t seen the priority list lately for things like 211 and 324 lately, but it’s something like “CS seniors who need this for graduation”; “CS seniors who need some course for graduation”; “CS third-years who still need the course”; “Undeclared second-years”.
    • Concentrations should not affect this.
  • Do I have to make a new four-year plan if I declare a second major? Every major declaration form needs a four-year plan. If you saved it and it works, you don’t need to make a new one. If you didn’t save it, you must make a new one or hope your advisor has a copy.

Questions

Advantages of big-O notation

Big-O and Big-Theta are useful notations because they simplify our descriptions in many ways.

For example, O(n^2 + n) = O(n^2). You can throw away lower-order terms.

In fact, O(n^2 + 1000n*lgn) = O(n^2).

In addition, we can throw away constant multipliers. O(100n^2) = O(n^2).

HOWEVER, we have to be careful about when we throw them away; it’s usually after we’ve developed a formula.

O(f(n)) is a SET of functions (all the functions that are bounded above by some multiple of f(n) when n is sufficiently large). So it is appropriate to say g(n) IS IN O(f(n)) not g(n) is O(f(n)).

Some more practice

Our goal in assessing/analyzing iterative algorithms is “count the steps”.

  • Sometimes we count the number of steps within a loop and multiply by the number of times the loop runs.
  • Sometimes we count the number of steps in each iteration (which varies) and add them all up.
result = 0;
for (i = 1; i < n; i = i*2) {
  result = result + i;   // Models a case in which the stuff inside the loop takes O(i)
}
print result;

For this one, since we spend different “steps” (have a different i) each time through, we use the second technique. 1 + 2 + 4 + 8 + …

result = 0;
for (i = 1; i < n; i = i*2) {
  result = result + 1;   // Models a case in which the stuff inside the loop takes O(i)
}
print result;

This time, I do the same amount of work every time (or add the same value to i each time), so I can multiply the number of times the loop runs by the work each time through.

Hypothesis one: The loop runs n/2 times, so we should end up with a number like n/2 for result.

Watch Sam fail at writing C code.

We’ll do f(100)

  • i = 1, result becomes 1
  • i = 2, result becomes 2
  • i = 4, result becomes 3
  • i = 8, result becomes 4
  • i = 16, result becomes 5
  • i = 32, result becomes 6
  • i = 64, result becomes 7
  • i = 128; stop

This function is logarithmic.

If you see something doubling or halving each time through, it’s likely logarithmic (or nlogn).

result = 0;
for (i = 1; i < n; i = i*2) {
  for (j = 0; j < n; j++) {
    result = result + 1;   // Models a case in which the stuff inside the loop takes O(i)
  }
}
print result;

Suppose n is 100. What should we see?

Hypothesis: n log n.

Analysis:

  • How long does the j loop take: O(n) (or it just adds n)
  • The outer loop runs O(log2n) time.
  • We multiply the number of times the outer loop runs by the cost of the stuff inside the loop. O(n*log2n)

Analyzing recursive algorithms

Merge sort (in arrays)! (One of Sam’s favorite recursive algorithm examples.)

  • If the array is small enough (0 or 1 elements), it’s already sorted.
  • Otherwise
    • Sort each half with mergesort (trusting the magic recursion fairy)
    • Combine the elements back together into a sorted array

Let’s put on our mathematician’s hat and write a function T(n) that says how long merge sort takes on an array of size n.

  • T(0) = 0
  • T(1) = 1
  • T(n) = T(n/2) + T(n/2) + n = 2T(n/2) + n

Can we write T(n) in a closed form (not recursively)?

How do we handle the problem? Three approaches.

Approach 1: Bottom up

  • T(1) = 1
  • T(2) = 2T(1) + 2 = 2 + 2 = 4 (22) 2*2^1
  • T(4) = 2T(2) + 4 = 24 + 4 = 12 (34) 32^2
  • T(8) = 2T(4) + 8 = 212 + 8 = 32 (48) 42^3
  • T(16) = 2T(8) + 16 = 232 + 16 = 80 (516) 52^4
  • T(32) = 2T(16) + 32 = 280 + 32 = 192 (632) 62^5

Observation: Each time we double the size we add 1 to first multiplier and double the multiplicand. The multiplicand in the last case is 2^5

  • T(2^k) = (k+1)*2^k [formula used below]
  • If n = 2^k, k is log2n, so T(n) = (log2n+1)*n = nlog2n + n is in O(nlog2n)

That last step again.

  • n = 2^k
  • We’ll take log2 of both sides
  • log2(n) = log2(2^k)
  • log2(2^k) = k (by the definition of log)
    • log2(m) is the number of times you have to multiply 2 by itself to get m.
  • T(n) = T(2^k) [because n = 2^k]
  • = (k+1)*2^k [by the formula above]
  • = (k+1)*n [because 2^k = n]
  • = (log2n + 1) * n [because k = log2n]
  • = nlog2n + n [by the distributive property]
  • in O(nlog2n) [by the definition of O]

We can also try this “top down”.

  • T(n) = 2*T(n/2) + n
    • Side note: T(n/2) = 2*T(n/4) + n/2 [plugging in n/2 into the formula]
  • T(n) = 2(2T(n/4) + n/2) + n [substituting T(n/2) into the prior eqn.]
  • T(n) = 4*T(n/4) + n + n [Distribution]
  • T(n) = 4*T(n/4) + 2n [Combining the n’s]
    • T(n/4) = 2*T(n/8) + n/4 [plugging n/4 into the formula]
  • T(n) = 4(2T(n/8) + n/4) + 2n [Substituting T(n/4)
  • T(n) = 8*T(n/8) + n + 2n [Distribution]
  • T(n) = 8*T(n/8) + 3n [Combining the ns]

If we think about this in terms of powers of 2

  • T(n) = (2^3)*T(n/(2^3) + 3n

Generalize

  • T(n) = 2^k*T(n/(2^k)) + kn

Let n = 2^k (or k = log2n)

  • T(n) = nT(n/n) + log2nn
  • T(n) = nT(1) + (log2n)n
  • T(n) = n1 + (log2n)n
  • T(n) in O((log2n)*n)

Practice