# Homework 7: Introduction to Approximate String Matching

Assigned: Monday, February 13, 2006
Due: 8:00 a.m., Wednesday, February 15, 2006

Summary: In this assignment, you will write a program that asks the user for a string and then finds the best match for that string in a file of strings.

Purpose: To ground you in a problem that we will return to many times this semester. To increase your facility with repetition in Java.

Contents:

## Background

One of the more interesting problems in computing is that of approximate string matching: Given a string to match and a sequence of candidate matches, find the best match. Spell checkers regularly use such an algorithm to suggest spelling corrections. Biologists also use such an algorithm to match genetic sequences.

We will consider this problem in a number of ways this semester. That is, we will explore different ways to represent the candidate matches and we will will explore different strategies for selecting among those matches.

In this first exploration, you have the freedom to choose how to compare strings.

## The Assignment

Write a file that contains a list of a dozen or so names. (More is better, but a dozen or so should suffice.) You can call the file whatever you'd like, but `names.txt` is a good idea.

Write a Java program that prompts the user for a name and finds the best match in the file for that name. A good match might have only one letter different, or one letter deleted, or one letter added. A less good match might have a combination of those.

You may choose your own metric for best, provided it decides that an exact match (if one exists) is the best.

## Hints

You are likely to need at least two loops: One to loop through the strings in the file, and one to loop through the string and the candidate string comparing characters.

You will probably want to write a subroutine, `qualityOfMatch(String a, String b)` that assigns a number to the quality of the match between `a` and `b`. One of the loops will be in that subroutine.

When you are satisfied with your work, you should email me the method you have written.

## Extra Credit

Those of you who want to get extra credit on this assignment might consider writing particularly nice comparison metrics or provide an ordered list of candidates (from best to worst).

## Questions

Can we hard-code the name of the file that contains the strings to match, or should we prompt the user for that name?
It is much easier to hard-code the name of the file, and you are certainly welcome to do so. You may, however, choose whichever method for getting the file name that you prefer.
Should the strings in the file be just first names (e.g., `Samuel`) or first and last names (e.g., `Samuel Rebelsky`). If the latter, what form should they take?
You may choose the form of the names.

## History

Monday, 13 February 2006 [Samuel A. Rebelsky]

• Created.

Tuesday, 14 February 2006 [Samuel A. Rebelsky]

This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Homework/hw.07.html`.