CSCI 340 Assignment 5


Purpose

This assignment includes a review of C++ template classes, the STL list class, the concept of function objects, and the hash table data structure.

Assignment

Implement a hash table as a C++ template class, using the division method for the hash function and separate chaining to resolve collisions. Then write a program that uses your hash table class to implement a spell-checking dictionary.

Hash Table Implementation

You will need to write the header files for two classes: The StringHashKey class is a function object that converts a C++ string into an integer key, and the HashTable class implements a hash table using the division method and separate chaining. The data for the hash table will be stored in a STL vector of STL lists.

class StringHashKey

The StringHashKey class is used to create a function object that converts a string key to an integer key. This class has no data members and a single public method, the overloaded operator (). The method will take a reference to a constant string as its single parameter and return an integer. The method should convert the string parameter into an integer. There are a number of possible ways to accomplish this; the logic for one technique is outlined in the pseudocode below:

set an integer sum to 0
for each character in the string
   multiply the sum times 8, then add the value of the current string character
   if the result is greater than 2**27, calculate the remainder mod 2**27

class HashTable

The HashTable class implements a hash table. This will be a template class that takes two parameters. The first parameter is the data type of the items to be stored in the hash table. The second parameter is a function object to convert an item of the type being stored into an integer key that can be hashed. The definition for the HashTable class should be placed in the HashTable.h header file. Because this is a template class, the code for the methods of the class should also be placed in the header file.

A HashTable contains four data members. The first is an integer that contains the number of pointers to linked lists in the hash table; these pointers represent the buckets of the hash table. The second is an integer that contains the number of data items currently stored in the hash table. The third is a function object of the type of the second template parameter which will convert a data item to an integer hash key. The fourth data member is the vector of linked-list buckets, which should be declared as follows:

   vector <list<T> > buckets;

where T is the first template class parameter. Note the space between the two > symbols - it is required. Failure to include it will cause a syntax error.

HashTable constructor

The constructor for the class should take a single parameter, an integer specifying the number of buckets. The constructor should set the number of buckets to this parameter, set the size to 0, and create the vector of buckets. You may need to use the member initialization syntax for the latter.

Even though this class involves dynamic memory, you do not need to write a copy constructor, destructor, or overloaded assignment operator for it because all of the dynamically-allocated storage for the hash table is managed by the embedded STL container classes, which already have the appropriate methods implemented for them.

HashTable clear() method

This method returns nothing and takes no parameters. It clears the hash table, releasing all of the dynamic memory associated with it. The method should reset the size to 0 and call the clear() method for the vector of buckets.

HashTable size() method

This method takes no parameters. It returns the number of data items currently stored in the hash table.

HashTable bucketsUsed() method

This method takes no parameters. It returns the number of buckets currently in use in the hash table, i.e., the number of non-empty lists.

HashTable maxLength() method

This method takes no parameters. It returns the maximum length of any list currently in the hash table.

HashTable empty() method

This method takes no parameters. It returns true if the hash table contains no data items; otherwise it returns false.

HashTable insert() method

This method returns a Boolean value and takes a single parameter, a reference to a constant item of the first template parameter type (the item to insert). The method should "call" the hash key function object, passing it the item to insert. The integer returned by the function object should be divided by the number of buckets; the remainder is the hash value, which can be used as an index to a particular bucket in the vector of buckets. You should perform a linear search of the linked list associated with that bucket. If the item to be inserted is already present in the linked list, return false. Otherwise, insert the item into the list for that bucket, increment the size of the hash table, and return true.

HashTable find() method

This method returns a Boolean value and takes a single parameter, a reference to a constant item of the first template parameter type (the item to find). The method does not alter the hash table data. The method should "call" the hash key function object, passing it the item to insert. The integer returned by the function object should be divided by the number of buckets; the remainder is the hash value, which can be used as an index to a particular bucket in the vector of buckets. You should perform a linear search of the linked list associated with that bucket. If the item to be found is present in that bucket, return true; otherwise return false.

Program

In the second part of this assignment, you will write a main program that uses your hash table class. The main program will simulate the logic of the Linux utility ispell.

The program will take three command line parameters: the number of buckets in the hash table, the name of the dictionary file, and a file of words to spell check.

There are three possible outputs for each word:

We will provide you with a file of about 25,000 words to use as a dictionary. You should test with a much smaller dictionary first. Read the dictionary file and add the words to a hash table of the specified number of buckets. Convert each word to lower case first.

After the dictionary is loaded, print the following information about it, one per line, properly aligned:

Then read the file specified in the second argument. Convert each word to lower case, then check to see whether it is in the dictionary. If so, it should print "ok". If not, it should look for all possible near misses. If the program finds any near misses in the dictionary, it should print them in alphabetical order, four across. If not, it should print "not found". (Make sure you print the original word first!)

To search for near misses, you need to generate every possible near miss and look for it in the dictionary. Specifically:

At the end of your program, print the time needed to spell check the file.

Programming Notes

Standards

Extensions

You can improve the performance of your program by using binary search instead of linear search. Another option would be to use maps to store the contents of the buckets. The STL makes it possible to try these changes with very little programming.

Submission