CSC152 2006S, Class 55: About Exam 3 Admin: * Exam 3 returned. * Attendance on Friday is required. Overview: * Grading. * Final. * Problem 2: Text Analysis. * Problem 3: Median. * Problem 1: Removing from Binary Search Trees. * Problem 4: Genetic Matching. Problem 2: Text Analysis * Problem: Given: * A dictionary, dict, that maps words (strings) to counters (number of occurences) * A vector, unique, that lists all the different words that appear * An int, total, that gives the total number of words Goal * Find the twenty most frequent words Strategy: * Find the most frequently occuring word public String mostFrequent(Hashtable dict, Vector unique) { // Pick the first element as your guess int indexOfGuess = 0; String guess = unique.get(0); int frequencyOfGuess = dict.get(guess).get(); // Scan through the remaining elements, updating your // guess if they are better for (int i = 1; i < unique.size(); i++) { String nextGuess = unique.get(i); int frequencyOfNext = dict.get(nextGuess).get(); if (frequencyOfNext > frequencyOfGuess) { guess = nextGuess; indexOfGuess = i; frequencyOfGuess = frequencyOfNext; // unique.remove(indexOfGuess); // NO } } // Remove the most frequent one unique.remove(indexOfGuess); // YES return guess; } // mostFrequent for (int i = 0; i < 20; i++) { String tmp = mostFrequent(unique, dict); frequent[i] = new WordFrequency(tmp, dict.get(tmp).get() / total); } Problem: Next time, we'll get the same value Solution: Remove it from the vector --- Median - ithSmallest (something with i smaller values) public static T ithSmallest(Vector vec, int i, Comparator c) { Random r = new Random(); Vector smaller = new Vector(); Vector larger = new Vector(); // Randomly pick a pivot T pivot = vec.get(r.nextInt(vec.size())): // Build a vector of smaller values // Build a vector of larger value for (int j = 0; j < vec.size(); j++) { if (c.compare(vec.get(j), pivot) > 0) { larger.add(vec.get(j)); } else if (c.compare(v.get(j), pivot) < 0) { smaller.add(vec.get(j)); } } // Recurse, as appropriate // If the smaller vector has exactly i values, done if (smaller.size() == i) { return pivot; } // If the smaller vector has > i values, recurse on smaller else if (smaller.size() > i) { return ithSmallest(smaller, i, c); } // Otherwise, recurse on larger else { return ithSmallest(larger, i - smaller.size() - 1, c); } } // ithSmallest(Vector,Comparator) How do we tell if it's O(n)? * Gather data For a variety of sizes Find the best, worst, average running time Plot or Compute best/size, worst/size, average/size Welcome to the (word not to be used in 152) of binary search trees * Problem: Remove the key/value pair for a particular key * Strategy: * Find the node with that key BSTNode removeMe = findNode(top, key); * Find the leftmost child in the right subtree BSTNode leftMost = findLeftMost(top.right); * Copy the key and value from that child removeMe.key = leftMost.key; removeMe.value = leftMost.value; * Delete leftMost // deleteLeftMost(top.right); if (top.right == leftMost) { top.right = leftMost.right; } else { BSTNode parentOfLeftMost = findParentOfLeftMost(top.right); parentOfLeftMost.left = leftMost.right; } BSTNode findNode(BSTNode top, K key) { ... } BSTNode findParentOfLeftMost(BSTNode top) { while (top.left.left != null) { top = top.left; } } BSTNode findLeftMost(BSTNode top) { while (top.left != null) { top = top.left; } } BSTNode findLeftMost(BSTNode top) { if (top.left == null) { return top; } else { return findLeftMost(top.left); } }