CMSC 206 (Data Structures)

Assignment#4

Due:  October 24, 2012 11:59:59pm

The goal of this assignment is to implement and use the LinkedList ADT in a real application. 

You are STRONGLY encouraged to work with one partner on this assignment.  If you do work with a partner, you should submit only ONE assignment for both people.


Introduction

A spell checker is a common application used in conjunction with document preparation. Since many documents focus on one subject, many words in a document are often repeated. Hence, it makes sense that checking the spelling of repeated words should be made as fast as possible. In this project you will implement a rudimentary spell checker that provides for faster access to frequently used words. The spell checker will be implemented as a List of Lists.

We will implement two variations of the linked list ADT by deriving two new kinds of linked lists using inheritance. This will require you to implement your own linked list ADT and create the derived list implementations. One new list is called a most-recently used (MRU) list. Every time an element of an MRU List is "found" it is moved to the front of the list. In this way, frequently accessed data is always close to the front of the list, so subsequent "finds" for that data are quicker. New words are inserted at the front of the MRU List. For our spell-checker this means that repeated words in a document are checked more quickly.

The other new Linked List is the "Sorted Linked List" (SLL). A SLL keeps its data sorted at all times by inserting new elements in the appropriate place in the list.

Assignment 4 will be invoked with multiple command line arguments. The first argument is the name of the text file which will be checked for spelling errors. The second argument is the name of the dictionary file. The third command line argument is the number of words to print for each letter on the command line. The remaining argument(s) are single lower-case characters [a-z] for which recently used word lists are printed.

I have already provided a substantial amount of code to get you started as well as skeletons for all the classes you'll need to implement.  You can find this code at /home/eeaton/public/cs206/Assignment04.zip on the Bryn Mawr CS linux systems.  Just copy it to your own folder and unzip it:
    cp /home/eeaton/public/cs206/Assignment04.zip ~
    unzip ~/Assignment04.zip

If you are working remotely on your own laptop, try:
    scp USERNAME@powerpuff.brynmawr.edu:/home/eeaton/public/cs206/Assignment04.zip .
    unzip Assignment04.zip


Spelling Rules
These spelling rules are already implemented for you in the removePunctuation() functions in Assignment4.java.  However, to ensure a thorough understanding of how the program works and to help with debugging, you should read them over several times.
  1. A word in the text file is defined to be a sequence of characters surrounded by whitespace. Any sequence of characters containing something other than the letters A-Z and a-z should be considered misspelled, but see below concerning punctuation characters. For example, in the string "This.example shows many_things." there are 3 words: "This.example", "shows", and "many_things.".
  2. All words should be considered to be case-sensitive.  E.g. if you find the word monday in the text file and that word is not found in the dictionary, then you have a misspelled word since Monday would be in the dictionary.
  3. Punctuation is removed from words using the following rules:
    • All punctuation characters tailing a word should be removed. Punctuation characters are defined as periods, exclamation points, question marks, semi-colons, and commas.
    • Hyphens are the only punctuation that can be removed from the middle of a word (i.e. word-word).
    • Quotation marks and parenthesis must be removed from the beginning and end of words.
    • For example, in the sentence "Hi, you're looking nice today?", you would remove the quotes, comma, and the question mark and search the dictionary for the words Hi, you're, looking, nice and today.


Abstract Data Types

To complete the assignment, you will need to implement the first three abstract data types (ADTs).  I strongly suggest you complete them in the order they are presented.

Four other ADTs are provided for you in their entirety:  Dictionary, WordList, MisspelledWordList and CountedWord.  You may use the classes as-is; there is no need to implement anything for them.

The Linked List ADT

You must complete the provided Linked List skeleton, including implementing LinkedList, ListNode, and ListIterator classes.  This can be a single or doubly linked list, and should use the List interface.

The MRU List ADT

The Most Recently Used (MRU) List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- elements which are successfully found are moved to the front of the list. The list is unchanged in the event of an unsuccessful find operation. Since an MRU list is inherently NOT sorted, new elements can be inserted anywhere in the MRU list.

The Sorted List ADT

The Sorted List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- the elements are always in ascending sorted order.

The Dictionary ADT [Provided]

The dictionary contains Strings which represent the complete set of correctly spelled words. To facilitate fast look-up, these words are stored based on their first letter in an MRU list for each letter of the alphabet.  The Dictionary is essentially a SortedList of 26 MRU word lists.  Note that it does not matter whether the first letter of each word is upper or lowercase.

The Dictionary ADT defines the following operations

  1. void insert( String word ) -- adds the specified word to the dictionary.
  2. boolean contains( String word ) -- searches the dictionary for the specified word.
  3. String printMRU( char letter, int n ) -- returns a String of the n most recently used words in the dictionary that begin with the specified letter, formatted appropriately for output. If less than n words start with the specified letter, then all words beginning with that letter are printed.  The words must be printed in a neat columnar format, 4 words per row. Each column must be separated by a minimum of 2 whitespaces. The width of the columns will depend on the length of the longest word to be printed.   Note that this operation is case-INsensitive. The parameter passed to the function may be upper- or lower-case. In either event, words that begin with the same upper- or lower-case letter must be printed.

    For example, printMRU('a', 10) and printMRU('A', 10) BOTH print words which begin with 'a' OR 'A'.

  4. A default constructor which creates an empty dictionary.

Word List ADT [Provided]

The WordList associates some information with the MRU list for each letter of the alphabet. The WordList ADT encapsulates this information and becomes an adapter for the MRU list. The WordList ADT defines the following operations:
  1. appropriate constructor(s)
  2. appropriate comparison and equals methods
  3. an appropriate toString() function
  4. void insert( String word ) -- inserts the specified word into the WordList
  5. boolean contains( String word ) -- finds the specified word in the WordList

Misspelled Word List ADT   [Provided]

The spell checker must maintain a list of misspelled words along with the number of times each misspelled word appears. To do so, we've provided the MisspelledList ADT which defines the following operations:
  1. appropriate constructor(s)
  2. void insert( String word ) -- inserts the misspelled word into the MisspelledList. If the word is already in the list, its count is incremented.
  3. String toString( ) -- prints the Misspelled list in alphabetical order.

CountedWord ADT  [Provided]

Since the printed list of misspelled words also includes the number of times each misspelled word appears, good object-oriented design dictates that these data items be stored together in a simple object which we are calling a CountedWord.

The CountedWord ADT defines the following operations

  1. appropriates constructor(s) that takes in the word as a String and initializes its count at 1.
  2. void increment( ) -- increments the word's counter
  3. appropriate comparison operators
  4. String toString( ) that outputs the word and and count in the format "<word> : <count>" (without the quotes).


Requirements
  1. ALL LISTS used in this assignment must be linked lists. This means NO JAVA ARRAYLISTS, LINKED LIST, ARRAYS are permitted.
  2. The MRU list and Sorted Linked List (SLL) must be implemented using inheritance. They are both derived from the linked list class.
  3. All possible exceptions discussed pertaining to a linked list must be caught and dealt with appropriately. Your program should continue processing after an exception if at all possible.
  4. You may include any private data and private functions to any class.
  5. Be prepared for your code to work on large data sets.
  6. You must bulletproof your code as much as possible, e.g. making sure that the command line parameters are all valid, that the files named in it all exist, and that there is at least one report command. In case of invalid report parameters, your program should print them and a message indicating that the command is in error.
  7. Only the Dictionary and MisspelledList classes are instantiated in main() or in functions called from main().
  8. All classes should be placed in the package assignment4.
  9. You may modify any of the provided code however you like, providing that it meets the project requirements.
The main() function and spell checking application is already provided for you in Assignment4.java.  Do not rename the Assignment4 class.  For your understanding and to help debugging, you need to understand the steps that the Assignment4.main() follows for the application:
  1. The program reads in the following command line arguments in order (see the Sample Output section for an example):
    1. The name of the file containing the text.
    2. The name of the file containing the dictionary.
    3. An integer n, specifying the number of words to print from each MRU list. (optional, but must be paired with item 4)
    4. One or more letters for the MRU lists to print. (optional, but must be paired with item 3)
  2. Reads the file of words and puts them into the dictionary.  This file will include one word per line.  There is no need to remove punctuation from this file, as it will have already been removed.
  3. Reads the text file and checks each word for proper spelling according to the spelling rules above. To do this, it removes punctuation as described above and then check to see if the word is in the dictionary.  If it is not, then it is misspelled.  Misspelled words will then be inserted into the misspelled word list.
  4. Prints the list of misspelled words and the number of times each misspelled word appeared. This list is printed in alphabetical order. 
    The words must be printed in a neat columnar format, 4 words per row. The format is shown in the sample output. For formatting purposes only, we assume that no misspelled word will appear more than 99 times.
  5. For each letter found on the command line, prints the first n (taken from the command line) from that letter's MRU list in the dictionary.

Hints

  1. You may base your linked list implementation on any material from the book or class notes.  However, you may not use the linked list class included in java.util.
  2. Lists are generic in type. This separates the list implementation from the data it contains.  So your implementation must use Java generics (i.e. class LinkedList<T> where T is the generic type provided at declaration and initialization.)
  3. This is not an easy project. Start early and develop it incrementally, at each point ensuring that you have working code.
  4. Use main() functions within an ADT's class definition to test that it works.


Sample Output

A demo of the spell checking program is provided in CS206Assignment4Demo.jar and can be run by java -jar
CS206Assignment4Demo.jar.

The output below is provided for formatting purposes only. Assignment4.main() already outputs the data in an appropriate format.  Your output may vary slightly, but must conform to the output format found elsewhere in this description. The output must be column aligned for each letter. The length of the longest word starting with each letter is used to determine the column width.

>  java assignment4.Assignment4 textfile.txt dictionary.dat 15 f k a

Spell checking textfile.text using dictionary.dat

There were 10 misspelled words
------------------------------
        aren't : 1       brokin :10      brothar : 1       cuddel : 3      
        freind : 2        mooth : 4       times2 : 1        verry : 1

The first 15 words beginning with 'f'
-------------------------------------
        fireman         first         front        flight    
          fudge           fun        famous       finance    
        frankly        future          fast        fickle    

The first 15 words that begin with 'k'
--------------------------------------
          kettle          kitten       ketchup           Karl
         Kringle          kludge      kangaroo         karate
       kidnapper         kidneys        knight        knocker

The first 15 words that begin with 'a'
--------------------------------------
        aspen    asteroid      asthma       ankle    athlete
        atlas      attack        atom       attic     August
    Australia      author   autograph  automobile      avoid




Submitting the Assignment

Copy all your .java files and compiled class files for the project into a folder called Lastname(s)-Assignment4 in dropbox before the deadlines.  You need submit only one copy per team.  You should also submit a README file that lists the names of both team members and any special instructions or features of your assignment.

Make certain that your assignment will run when the command java assignment4.Assignment4 is called from the terminal.

Compress your code and submit it via Dropbox, as detailed in the assignment submission instructions.


The person who does NOT submit the project should include a simple text file in their dropbox folder called README that states the name of the person they teamed with for this assignment.




Acknowledgement:  This assignment was adapted from a project provided by Bergeron, Peng and Edelman.  All code was written by Eric Eaton.