CS 206 Assignment 4

CMSC 206 (Data Structures)

Assignment#4

Due: October 24, 2012 11:59:59pm

The goal of this assignment is to implement and use the LinkedList ADT in a real application.

You are STRONGLY encouraged to work with one partner on this assignment. If you do work with a partner, you should submit only ONE assignment for both people.

Introduction

A spell checker is a common application used in conjunction with document preparation. Since many documents focus on one subject, many words in a document are often repeated. Hence, it makes sense that checking the spelling of repeated words should be made as fast as possible. In this project you will implement a rudimentary spell checker that provides for faster access to frequently used words. The spell checker will be implemented as a List of Lists.

We will implement two variations of the linked list ADT by deriving two new kinds of linked lists using inheritance. This will require you to implement your own linked list ADT and create the derived list implementations. One new list is called a most-recently used (MRU) list. Every time an element of an MRU List is "found" it is moved to the front of the list. In this way, frequently accessed data is always close to the front of the list, so subsequent "finds" for that data are quicker. New words are inserted at the front of the MRU List. For our spell-checker this means that repeated words in a document are checked more quickly.

The other new Linked List is the "Sorted Linked List" (SLL). A SLL keeps its data sorted at all times by inserting new elements in the appropriate place in the list.

Assignment 4 will be invoked with multiple command line arguments. The first argument is the name of the text file which will be checked for spelling errors. The second argument is the name of the dictionary file. The third command line argument is the number of words to print for each letter on the command line. The remaining argument(s) are single lower-case characters [a-z] for which recently used word lists are printed.

I have already provided a substantial amount of code to get you started as well as skeletons for all the classes you'll need to implement. You can find this code at /home/eeaton/public/cs206/Assignment04.zip on the Bryn Mawr CS linux systems. Just copy it to your own folder and unzip it:

    cp
            /home/eeaton/public/cs206/Assignment04.zip ~

                unzip ~/Assignment04.zip

If you are working remotely on your own laptop, try:

    scp USERNAME@powerpuff.brynmawr.edu:/home/eeaton/public/cs206/Assignment04.zip
                  .

                      unzip Assignment04.zip

Spelling Rules

These spelling rules are already implemented for you in the removePunctuation() functions in Assignment4.java. However, to ensure a thorough understanding of how the program works and to help with debugging, you should read them over several times.

A word in the text file is defined to be a sequence of characters surrounded by whitespace. Any sequence of characters containing something other than the letters A-Z and a-z should be considered misspelled, but see below concerning punctuation characters. For example, in the string "This.example shows many_things." there are 3 words: "This.example", "shows", and "many_things.".
All words should be considered to be case-sensitive. E.g. if you find the word monday in the text file and that word is not found in the dictionary, then you have a misspelled word since Monday would be in the dictionary.
Punctuation is removed from words using the following rules:

All punctuation characters tailing a word should be removed. Punctuation characters are defined as periods, exclamation points, question marks, semi-colons, and commas.
Hyphens are the only punctuation that can be removed from the middle of a word (i.e. word-word).
Quotation marks and parenthesis must be removed from the beginning and end of words.
For example, in the sentence "Hi, you're looking nice today?", you would remove the quotes, comma, and the question mark and search the dictionary for the words Hi, you're, looking, nice and today.

Abstract Data Types

To complete the assignment, you will need to implement the first three abstract data types (ADTs). I strongly suggest you complete them in the order they are presented.

Four other ADTs are provided for you in their entirety: Dictionary, WordList, MisspelledWordList and CountedWord. You may use the classes as-is; there is no need to implement anything for them.

The Linked List ADT

You must complete the provided Linked List skeleton, including implementing LinkedList, ListNode, and ListIterator classes. This can be a single or doubly linked list, and should use the List interface.

The MRU List ADT

The Most Recently Used (MRU) List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- elements which are successfully found are moved to the front of the list. The list is unchanged in the event of an unsuccessful find operation. Since an MRU list is inherently NOT sorted, new elements can be inserted anywhere in the MRU list.

The Sorted List ADT

The Sorted List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- the elements are always in ascending sorted order.

The Dictionary ADT [Provided]

The dictionary contains Strings which represent the complete set of correctly spelled words. To facilitate fast look-up, these words are stored based on their first letter in an MRU list for each letter of the alphabet. The Dictionary is essentially a SortedList of 26 MRU word lists. Note that it does not matter whether the first letter of each word is upper or lowercase.

The Dictionary ADT defines the following operations

void insert( String word ) -- adds the specified word to the dictionary.
boolean contains( String word ) -- searches the dictionary for the specified word.
String printMRU( char letter, int n ) -- returns a String of the n most recently used words in the dictionary that begin with the specified letter, formatted appropriately for output. If less than n words start with the specified letter, then all words beginning with that letter are printed. The words must be printed in a neat columnar format, 4 words per row. Each column must be separated by a minimum of 2 whitespaces. The width of the columns will depend on the length of the longest word to be printed. Note that this operation is case-INsensitive. The parameter passed to the function may be upper- or lower-case. In either event, words that begin with the same upper- or lower-case letter must be printed.

For example, printMRU('a', 10) and printMRU('A', 10) BOTH print words which begin with 'a' OR 'A'.
A default constructor which creates an empty dictionary.

Word List ADT [Provided]

The WordList associates some information with the MRU list for each letter of the alphabet. The WordList ADT encapsulates this information and becomes an adapter for the MRU list. The WordList ADT defines the following operations:

appropriate constructor(s)
appropriate comparison and equals methods
an appropriate toString() function
void insert( String word ) -- inserts the specified word into the WordList
boolean contains( String word ) -- finds the specified word in the WordList

Misspelled Word List ADT [Provided]

The spell checker must maintain a list of misspelled words along with the number of times each misspelled word appears. To do so, we've provided the MisspelledList ADT which defines the following operations:

appropriate constructor(s)
void insert( String word ) -- inserts the misspelled word into the MisspelledList. If the word is already in the list, its count is incremented.
String toString( ) -- prints the Misspelled list in alphabetical order.

CountedWord ADT [Provided]

Since the printed list of misspelled words also includes the number of times each misspelled word appears, good object-oriented design dictates that these data items be stored together in a simple object which we are calling a CountedWord.

The CountedWord ADT defines the following operations

appropriates constructor(s) that takes in the word as a String and initializes its count at 1.
void increment( ) -- increments the word's counter
appropriate comparison operators
String toString( ) that outputs the word and and count in the format "<word> : <count>" (without the quotes).

Requirements

ALL LISTS used in this assignment must be linked lists. This means NO JAVA ARRAYLISTS, LINKED LIST, ARRAYS are permitted.
The MRU list and Sorted Linked List (SLL) must be implemented using inheritance. They are both derived from the linked list class.
All possible exceptions discussed pertaining to a linked list must be caught and dealt with appropriately. Your program should continue processing after an exception if at all possible.
You may include any private data and private functions to any class.
Be prepared for your code to work on large data sets.
You must bulletproof your code as much as possible, e.g. making sure that the command line parameters are all valid, that the files named in it all exist, and that there is at least one report command. In case of invalid report parameters, your program should print them and a message indicating that the command is in error.
Only the Dictionary and MisspelledList classes are instantiated in main() or in functions called from main().
All classes should be placed in the package assignment4.
You may modify any of the provided code however you like, providing that it meets the project requirements.

The main() function and spell checking application is already provided for you in Assignment4.java. Do not rename the Assignment4 class. For your understanding and to help debugging, you need to understand the steps that the Assignment4.main() follows for the application:

The program reads in the following command line arguments in order (see the Sample Output section for an example):

The name of the file containing the text.
The name of the file containing the dictionary.
An integer n, specifying the number of words to print from each MRU list. (optional, but must be paired with item 4)
One or more letters for the MRU lists to print. (optional, but must be paired with item 3)

Reads the file of words and puts them into the dictionary. This file will include one word per line. There is no need to remove punctuation from this file, as it will have already been removed.
Reads the text file and checks each word for proper spelling according to the spelling rules above. To do this, it removes punctuation as described above and then check to see if the word is in the dictionary. If it is not, then it is misspelled. Misspelled words will then be inserted into the misspelled word list.
Prints the list of misspelled words and the number of times each misspelled word appeared. This list is printed in alphabetical order.
The words must be printed in a neat columnar format, 4 words per row. The format is shown in the sample output. For formatting purposes only, we assume that no misspelled word will appear more than 99 times.
For each letter found on the command line, prints the first n (taken from the command line) from that letter's MRU list in the dictionary.