Adventures in binary search, part the third: Implementing binary search

Part of an ongoing series of essays tentatively entitled Don’t embarrass me, Don’t embarrass yourself: Notes on thinking in C and Unix.

A simulated group interview, continued.

Welcome back! I hope you enjoyed your second visit to our awesome snack bar. And I’m glad to see that we haven’t lost any more of you. Good job!

Provided you can pass the personality screening, you will all be joining Microgoogazonbook soon. But before we send you off to our teams of trained psychologists, I’d like to hear a bit more about how you came up with your solutions. Don’t worry, there are no wrong answers!

We’ll start with candidate number 11.

I’ve screwed up binary search so many times that I decided to just memorize implementations in most major programming languages.

Yeah, we’ve all screwed up binary search at least once. Memorizing the correct solution is an interesting approach. I’d hope that you’d try to learn from your failures so that you could avoid similar mistakes when writing other algorithms. But we’ll work on that.

How about candidate number 13. What? We don’t have a candidate number 13. Are you sure that they’re not just available only to certain people, like the thirteenth floor of most hotels?

Okay, on to candidate number 3. What did you do?

I carry a flash drive with a library of useful code I’ve written. I just copied from that flash drive.

I’ll make a note to erase the computer you were working on. But it’s great practice to build a library of routines you use. I know of some Microgoogazonbookers who regularly dominate programming competitions because they have a library in which they have implemented every algorithm in CLRS [1].

We’ll continue with candidate number 6.

I am not a number. I am a free man.

Did we really need references to obscure 1960’s TV shows? Candidate 19, let’s hear what you did.

My first version didn’t work. So I made some changes. It still didn’t work. So I made some more changes. In the end, I kept hacking at it until I was sure it worked.

That’s not quite an approach that we like, but I’m glad to hear that your sense of it worked was thorough enough that it passed all of our tests.

Oh, of course I wrote a test suite. I wrote that first. My teachers drilled TDD [2] into my head early.

Well, that does make things sound a bit better. I’m interested to hear the wide variety of approaches you are using. Let’s try candidate number 7.

I know that when I’m writing algorithms that involve arrays and have subtle edge cases, I should write loop invariants. So I wrote a loop invariant.

What a wonderful answer! I expect you’ve read Gries cover to cover.

Well, no. But I think we did enough that I’m confident using loop invariants to write better code.

I’m glad you know how to use them. Let’s hear from some other folks about what they know about loop invariants, particularly the basic principles of writing and using loop invariants.

The loop invariant is something that, if it’s true at the start of the body of the loop, is also true at the end of the loop.

That’s a good start. But true meets that criterion. So does false. What else do we want?

We want to be able to ensure that the loop invariant holds when we first enter the loop.

Okay, you’ve eliminated my snarky false invariant. But we can still use true as the invariant.

The loop invariant tells us something useful about the state of the system. Usually, when the loop is over, we can combine the loop invariant with the inverse of the loop test to ensure that we’ve met the goal of the loop.

Good. So, in order to successfully write loop invariants, we also need to think about the loop test. You’ve been doing well. Let’s give candidate #19 a chance to redeem themselves.

We need to make sure that the loop terminates. Hence, we have some measure of the size of the problem and ensure that the size decreases at every repetition.

So you do know something more sensible than hack at my code until it works. That’s good.

Now, I have enough experience with invariants to know that a pictorial invariant might help at this time. We’ll keep track of the lower bound and upper bound of the range of interest.

+-------+-------------+-------+
| < val | unprocessed | > val |
+-------+------+------+-------+
|       |      |      |       |
0       lb     mid    ub      size

As you’ve likely noted from this diagram, here at Microgoogazonbook, we tend to write our indices at the start of an array cell. If we were phrasing it textually, we might say For all i, 0 <= i < lb, a[i] < val. You can figure out the rest.

Let’s go back to candidate #7. Does this look like your invariant?

Yes, my invariant is If val appears in the array, it appears between indices lb and ub.

Well, that’s a bit vague. Is that range inclusive or exclusive?

Inclusive of lb, exclusive of ub, just as in your picture.

Good. How do we make sure that the invariant holds when we start? Let’s see … candidate #6?

I am not a number. I am a free man.

Didn’t we do that joke already? No one understands it. Why repeat it.

Okay, if we set lb to 0, and ub to size, we know that there are no numbers in the array at indices less than lb, which means that every number at an index less than lb is smaller than val. Similarly, there are no numbers in the array at indices of ub or greater, so all of those values are larger than val.

That sounded right. Let me draw a picture.

+-----------------------------+
|        unprocessed          |
+-----------------------------+
|                             |
0                             size
lb                            ub

Yup, looks good to me. There’s nothing outside of the bounds, so we can be pretty sure that everything to the left of lb is smaller than val and everything to the right of ub is bigger than val.

We’ve written might seem like a useful invariant. We’ve ensured that it holds at the beginning. What’s next? Let’s figure out the loop test. When should we stop?

Anyone? Oh, good, lots of hands. You, young octopus, what is your answer?

We can stop if there’s nothing left in the unprocessed section, because we know the value is not there.

Good. Anything else?

We can stop if a[mid] equals val.

Good. Anything else?

Um, no?

Good. How will we measure progress? Let’s focus on the first invariant, the invariant that lb < ub. Candidate 27?

We can measure the size of the problem by ub - lb. If we ensure that number decreases, we can be sure the loop terminates.

Great. I know that this is review for most of you, but I thought it would be helpful to think through the problem.

One last question, which is somewhat tangential: How do you compute mid? Let’s try someone new.

Um, can’t we just use (lb + ub)/2? Am I missing something?

No! What happens if lb is bigger than INT_MAX? The sum will be bigger than INT_MAX and your program will crash. You probably aren’t ready to work here at Microgoogazonbook. But I did say that anyone who passed the binary search test would be hired. Your job will be to shuttle bikes around our campus so that there are always some for people to use.

That sounds exciting!

I’m glad to hear that. But I’d still like a better way of computing the midpoint.

What? It’s time for the psychological profiles? We haven’t finished exploring how the candidates wrote binary search. Oh well. I’m sure that we’ll learn much more about their design habits over the next few weeks. Good luck everyone!

Readers: You should have enough information to correctly implement binary search. Make sure you think carefully about how you maintain the invariant. Make sure you think carefully about how you ensure that ub - lb decreases. Note that when ub is lb+1, mid will be lb.

[1] Cormen, Leiserson, Rivest, and Stein. Algorithms. 3rd Edition.

[2] TDD is test-driven development.

Citations

The loop invariant in this essay is inspired by one by Jon Bentley in Programming Pearls, originally published in Communications of the ACM.

Jon Bentley. 1983. Programming pearls: Writing correct programs. Commun. ACM 26, 12 (December 1983), 1040-1045. DOI=http://dx.doi.org/10.1145/358476.358484

The idea of students self-identifying as octopi is due to MK.

Version 1.0 of 2017-02-05.

The opinions stated herein are those of Samuel A. Rebelsky and do not necessarily reflect those of Grinnell College, Grinnell's Computer Science Department, the Rebelsky family, CMD-IT, SIGCAS, SIGCSE, any other organizations I am or have been affiliated with, or even most other sentient beings.

Check accessibility with WAVE.

SamR's Assorted Musings and Rants: Adventures in binary search, part the third: Implementing binary search by Samuel A. Rebelsky is licensed under a Creative Commons Attribution 4.0 International License.

This Web site was built using Markdown, some custom scripts, Twitter Bootstrap, and the Bootswatch Readable Theme.