Adventures in binary search, part the third: Implementing binary search
Part of an ongoing series of essays tentatively entitled Don’t embarrass me, Don’t embarrass yourself: Notes on thinking in C and Unix.
A simulated group interview, continued.
Welcome back! I hope you enjoyed your second visit to our awesome snack bar. And I’m glad to see that we haven’t lost any more of you. Good job!
Provided you can pass the personality screening, you will all be joining Microgoogazonbook soon. But before we send you off to our teams of trained psychologists, I’d like to hear a bit more about how you came up with your solutions. Don’t worry, there are no wrong answers!
We’ll start with candidate number 11.
I’ve screwed up binary search so many times that I decided to just memorize implementations in most major programming languages.
Yeah, we’ve all screwed up binary search at least once. Memorizing the correct solution is an interesting approach. I’d hope that you’d try to learn from your failures so that you could avoid similar mistakes when writing other algorithms. But we’ll work on that.
How about candidate number 13. What? We don’t have a candidate number 13. Are you sure that they’re not just available only to certain people, like the thirteenth floor of most hotels?
Okay, on to candidate number 3. What did you do?
I carry a flash drive with a library of useful code I’ve written. I just copied from that flash drive.
I’ll make a note to erase the computer you were working on. But it’s great practice to build a library of routines you use. I know of some Microgoogazonbookers who regularly dominate programming competitions because they have a library in which they have implemented every algorithm in CLRS [1].
We’ll continue with candidate number 6.
I am not a number. I am a free man.
Did we really need references to obscure 1960’s TV shows? Candidate 19, let’s hear what you did.
My first version didn’t work. So I made some changes. It still didn’t work. So I made some more changes. In the end, I kept hacking at it until I was sure it worked.
That’s not quite an approach that we like, but I’m glad to hear that your
sense of it worked
was thorough enough that it passed all of our tests.
Oh, of course I wrote a test suite. I wrote that first. My teachers drilled TDD [2] into my head early.
Well, that does make things sound a bit better. I’m interested to hear the wide variety of approaches you are using. Let’s try candidate number 7.
I know that when I’m writing algorithms that involve arrays and have subtle edge cases, I should write loop invariants. So I wrote a loop invariant.
What a wonderful answer! I expect you’ve read Gries cover to cover.
Well, no. But I think we did enough that I’m confident using loop invariants to write better code.
I’m glad you know how to use them. Let’s hear from some other folks about what they know about loop invariants, particularly the basic principles of writing and using loop invariants.
The loop invariant is something that, if it’s true at the start of the body of the loop, is also true at the end of the loop.
That’s a good start. But true
meets that criterion. So does false
.
What else do we want?
We want to be able to ensure that the loop invariant holds when we first enter the loop.
Okay, you’ve eliminated my snarky false
invariant. But we can still
use true
as the invariant.
The loop invariant tells us something useful about the state of the system. Usually, when the loop is over, we can combine the loop invariant with the inverse of the loop test to ensure that we’ve met the goal of the loop.
Good. So, in order to successfully write loop invariants, we also need to think about the loop test. You’ve been doing well. Let’s give candidate #19 a chance to redeem themselves.
We need to make sure that the loop terminates. Hence, we have some measure of the size of the problem and ensure that the size decreases at every repetition.
So you do know something more sensible than hack at my code until
it works
. That’s good.
Now, I have enough experience with invariants to know that a pictorial invariant might help at this time. We’ll keep track of the lower bound and upper bound of the range of interest.
+-------+-------------+-------+
| < val | unprocessed | > val |
+-------+------+------+-------+
| | | | |
0 lb mid ub size
As you’ve likely noted from this diagram, here at Microgoogazonbook,
we tend to write our indices at the start of an array cell. If we
were phrasing it textually, we might say For all
You can figure out the rest.i
, 0 <= i
< lb
,
a[i] < val
.
Let’s go back to candidate #7. Does this look like your invariant?
Yes, my invariant is
Ifval
appears in the array, it appears between indiceslb
andub
.
Well, that’s a bit vague. Is that range inclusive or exclusive?
Inclusive of
lb
, exclusive ofub
, just as in your picture.
Good. How do we make sure that the invariant holds when we start? Let’s see … candidate #6?
I am not a number. I am a free man.
Didn’t we do that joke already? No one understands it. Why repeat it.
Okay, if we set
lb
to 0, andub
tosize
, we know that there are no numbers in the array at indices less thanlb
, which means that every number at an index less thanlb
is smaller thanval
. Similarly, there are no numbers in the array at indices ofub
or greater, so all of those values are larger thanval
.
That sounded right. Let me draw a picture.
+-----------------------------+
| unprocessed |
+-----------------------------+
| |
0 size
lb ub
Yup, looks good to me. There’s nothing outside of the bounds, so we
can be pretty sure that everything to the left of lb
is smaller than
val
and everything to the right of ub
is bigger than val
.
We’ve written might seem like a useful invariant. We’ve ensured that it holds at the beginning. What’s next? Let’s figure out the loop test. When should we stop?
Anyone? Oh, good, lots of hands. You, young octopus, what is your answer?
We can stop if there’s nothing left in the unprocessed section, because we know the value is not there.
Good. Anything else?
We can stop if
a[mid]
equalsval
.
Good. Anything else?
Um, no?
Good. How will we measure progress? Let’s focus on the first invariant,
the invariant that lb < ub
. Candidate 27?
We can measure the size of the problem by
ub - lb
. If we ensure that number decreases, we can be sure the loop terminates.
Great. I know that this is review for most of you, but I thought it would be helpful to think through the problem.
One last question, which is somewhat tangential: How do you compute mid
?
Let’s try someone new.
Um, can’t we just use
(lb + ub)/2
? Am I missing something?
No! What happens if lb
is bigger than INT_MAX
? The sum will
be bigger than INT_MAX
and your program will crash. You probably
aren’t ready to work here at Microgoogazonbook. But I did say that
anyone who passed the binary search test would be hired. Your job will
be to shuttle bikes around our campus so that there are always some
for people to use.
That sounds exciting!
I’m glad to hear that. But I’d still like a better way of computing the midpoint.
What? It’s time for the psychological profiles? We haven’t finished exploring how the candidates wrote binary search. Oh well. I’m sure that we’ll learn much more about their design habits over the next few weeks. Good luck everyone!
Readers: You should have enough information to correctly implement
binary search. Make sure you think carefully about how you maintain
the invariant. Make sure you think carefully about how you ensure that
ub - lb
decreases. Note that when ub
is lb+1
, mid
will be lb
.
[1] Cormen, Leiserson, Rivest, and Stein. Algorithms. 3rd Edition.
[2] TDD is test-driven development.
Citations
The loop invariant in this essay is inspired by one by Jon Bentley in Programming Pearls, originally published in Communications of the ACM.
Jon Bentley. 1983. Programming pearls: Writing correct programs. Commun. ACM 26, 12 (December 1983), 1040-1045. DOI=http://dx.doi.org/10.1145/358476.358484
The idea of students self-identifying as octopi is due to MK.
Version 1.0 of 2017-02-05.