CSC323 2010S Software Design

Beautiful Code 4: Finding Things

Bray, Tim (2007). Finding Things. Chapter 4 (pp. 41-58) in Oram & Wilson (eds), Beautiful Code. Sebastapol, CA: O'Reilly.

What part of building a large hash table is computationally intensive? Loading 245 MB of data doesn't seem complicated enough to take 55 minutes.

Ruby's method of sorting is described in chapter 4 of beautiful code as:

The sort works its way back and forth through the array being sorted, passing pairs of elements to the block, which has to return a negative number, 0, or a positive number depending on whether the first element is less than, equal to, or greater than the second.

This does not seem to be any kind of sorting I have heard of before. Perhaps in an effort to avoid bothering to explain difficult concepts to foolish youths my teachers have told me quicksort is appropriate for most situations. What kind of sorting is this and why does Ruby use it?

The author is describing the comparator, rather than the sorting method.

Bray states that Perl is one of the most optimized languages regarding speed for handling dictionaries and hash tables. If this is the case, why did he choose to code this in Ruby instead of using Perl?

Because Perl is ugly enough that it's difficult to write beautiful code in Perl. (I use Perl for many things; but Perl code is rarely beautiful.)

Can you explain the Ruby synatx on p48, line 10? I don't understand how that does a sort.


What does it mean that regular expression can be translated into finite automata?

What are finite automata? Why can Regexp be translated into them? Why is it that algorithms that use automaton only have to look once at each character in the text they are trying to match?

We can turn a regular expression into a really simple (and fast) model of computation that looks at each character once. Take 341 for more details.

What are some of the advantages or disadvantages of Ruby vs. Python?

Python is more widely used. Python uses the interesting indentation syntax (some people like it, some don't). Ruby tends to overload operators. Objects and functional programming seem more central to Ruby. We'll talk about other issues when we read the chapter on Ruby.

I understand that regular expressions have made searching much, much easier. However, I'm not quite clear on how we compute the amount of time it will take to run something - can we go over that a little?


I'm slightly confused on how the hash is loaded on pg.51 from lines 8 to 12. What does s[2].intern imply and what is the if-else conditional saying since the << operator doesn't seem to act like I'm used to (probably just my lack of ruby knowledge)?

Line 10 (page 48):

keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }

Lines 9-12 (page 51):

if @hash[s[0]]
   @hash[s[0]] << [ s[1], article ]
   @hash[s[0]] = [ s[1], article ]

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Tue Feb 9 09:52:17 2010.
The source to the document was last modified on Tue Feb 9 09:52:14 2010.
This document may be found at

You may wish to validate this document's HTML ; Valid CSS! ; Creative Commons License

Samuel A. Rebelsky,

Copyright © 2010 Samuel A. Rebelsky. This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.