CMSC 206 (Data Structures)
Assignment#4
Due: October 24, 2012
11:59:59pm
The goal of this assignment is
to implement and use the LinkedList ADT in a real
application.
You are STRONGLY encouraged to
work with one partner on this assignment. If you do work
with a partner, you should submit only ONE assignment for both
people.
Introduction
A spell checker is a common application used in conjunction
with document preparation. Since many documents focus on one
subject, many words in a document are often repeated. Hence,
it makes sense that checking the spelling of repeated words
should be made as fast as possible. In this project you will
implement a rudimentary spell checker that provides for faster
access to frequently used words. The spell checker will be
implemented as a List of Lists.
We will implement two variations of the linked list ADT by
deriving two new kinds of linked lists using inheritance. This
will require you to implement your own linked list ADT and
create the derived list implementations. One new list is
called a most-recently used (MRU) list. Every time an element
of an MRU List is "found" it is moved to the front of the
list. In this way, frequently accessed data is always close to
the front of the list, so subsequent "finds" for that data are
quicker. New words are inserted at the front of the MRU List.
For our spell-checker this means that repeated words in a
document are checked more quickly.
The other new Linked List is the "Sorted Linked List" (SLL). A
SLL keeps its data sorted at all times by inserting new
elements in the appropriate place in the list.
Assignment 4 will be invoked with multiple command line
arguments. The first argument is the name of the text file
which will be checked for spelling errors. The second argument
is the name of the dictionary file. The third command line
argument is the number of words to print for each letter on
the command line. The remaining argument(s) are single
lower-case characters [a-z] for which recently used word lists
are printed.
I have already provided a substantial amount of code to get
you started as well as skeletons for all the classes you'll
need to implement. You can find this code at /home/eeaton/public/cs206/Assignment04.zip
on the Bryn Mawr CS linux systems. Just copy it to your
own folder and unzip it:
cp
/home/eeaton/public/cs206/Assignment04.zip ~
unzip ~/Assignment04.zip
If you are working remotely on your own laptop, try:
scp USERNAME@powerpuff.brynmawr.edu:/home/eeaton/public/cs206/Assignment04.zip
.
unzip Assignment04.zip
Spelling
Rules
These spelling rules are already implemented for you in the
removePunctuation() functions in Assignment4.java.
However, to ensure a thorough understanding of how the program
works and to help with debugging, you should read them over
several times.
- A word in the text file is defined to be a sequence of
characters surrounded by whitespace. Any sequence of
characters containing something other than the letters A-Z
and a-z should be considered misspelled, but see below
concerning punctuation characters. For example, in the
string "This.example shows many_things." there are 3 words:
"This.example", "shows", and "many_things.".
- All words should be considered to be case-sensitive.
E.g. if you find the word monday in the text
file and that word is not found in the dictionary, then you
have a misspelled word since Monday would be in the
dictionary.
- Punctuation is removed from words using the following
rules:
- All punctuation characters tailing a word should be
removed. Punctuation characters are defined as periods,
exclamation points, question marks, semi-colons, and
commas.
- Hyphens are the only punctuation that can be removed
from the middle of a word (i.e. word-word).
- Quotation marks and parenthesis must be removed from the
beginning and end of words.
- For example, in the sentence "Hi,
you're looking nice today?", you would remove
the quotes, comma, and the question mark and search the
dictionary for the words Hi, you're, looking, nice and today.
Abstract
Data Types
To complete the assignment, you will need to implement the
first three abstract data types (ADTs). I strongly
suggest you complete them in the order they are presented.
Four other ADTs are provided for you in their entirety:
Dictionary, WordList, MisspelledWordList and
CountedWord. You may use the classes as-is; there is no
need to implement anything for them.
The Linked List ADT
You must complete the provided Linked List skeleton, including
implementing LinkedList, ListNode, and ListIterator
classes. This can be a single or doubly linked list, and
should use the List interface.
The MRU List ADT
The Most Recently Used (MRU) List is derived from LinkedList.
It provides no other functionality. It differs from a
LinkedList in just one way -- elements which are successfully
found are moved to the front of the list. The list is
unchanged in the event of an unsuccessful find operation.
Since an MRU list is inherently NOT sorted, new elements can
be inserted anywhere in the MRU list.
The Sorted List ADT
The Sorted List is derived from LinkedList. It provides no
other functionality. It differs from a LinkedList in just one
way -- the elements are always in ascending sorted order.
The Dictionary ADT [Provided]
The dictionary contains Strings which represent the complete
set of correctly spelled words. To facilitate fast look-up,
these words are stored based on their first letter in an MRU
list for each letter of the alphabet. The Dictionary is
essentially a SortedList of 26 MRU word lists. Note that
it does not matter whether the first letter of each word is
upper or lowercase.
The Dictionary ADT defines the following operations
- void insert( String word ) -- adds the
specified word to the dictionary.
- boolean contains( String word ) -- searches
the dictionary for the specified word.
- String printMRU( char letter, int n ) -- returns a
String of the n
most recently used words in the dictionary that begin with
the specified letter, formatted appropriately for output.
If less than n
words start with the specified letter, then all words
beginning with that letter are printed. The words
must be printed in a neat columnar format, 4 words per
row. Each column must be separated by a minimum of 2
whitespaces. The width of the columns will depend on the
length of the longest word to be printed. Note
that this operation is case-INsensitive. The parameter
passed to the function may be upper- or lower-case. In
either event, words that begin with the same upper- or
lower-case letter must be printed.
For example, printMRU('a',
10) and printMRU('A',
10) BOTH
print words which begin with 'a' OR 'A'.
- A default constructor which creates an empty dictionary.
Word List ADT [Provided]
The WordList associates some information with the MRU list for
each letter of the alphabet. The WordList ADT encapsulates
this information and becomes an adapter for the MRU list. The
WordList ADT defines the following operations:
- appropriate constructor(s)
- appropriate comparison and equals methods
- an appropriate toString() function
- void insert( String word ) -- inserts
the specified word into the WordList
- boolean contains( String word ) -- finds the
specified word in the WordList
Misspelled Word List ADT [Provided]
The spell checker must maintain a list of misspelled words
along with the number of times each misspelled word appears.
To do so, we've provided the MisspelledList ADT which defines
the following operations:
- appropriate constructor(s)
- void insert( String word ) -- inserts
the misspelled word into the MisspelledList. If the word
is already in the list, its count is incremented.
- String toString( ) -- prints the
Misspelled list in alphabetical order.
CountedWord ADT [Provided]
Since the printed list of misspelled words also includes the
number of times each misspelled word appears, good
object-oriented design dictates that these data items be
stored together in a simple object which we are calling a
CountedWord.
The CountedWord ADT defines the following operations
- appropriates constructor(s) that takes in the word as a
String and initializes its count at 1.
- void increment( ) -- increments
the word's counter
- appropriate comparison operators
- String toString( ) that outputs
the word and and count in the format "<word> :
<count>" (without the quotes).
Requirements
- ALL LISTS used
in
this assignment must be linked lists. This
means
NO JAVA ARRAYLISTS, LINKED LIST, ARRAYS are permitted.
- The MRU list and Sorted Linked List (SLL) must be
implemented using inheritance. They are both derived from
the linked list class.
- All possible exceptions discussed pertaining to a linked
list must be caught and dealt with appropriately. Your
program should continue processing after an exception if
at all possible.
- You may include
any private data and private functions to any class.
- Be prepared for your code to work on large data sets.
- You must bulletproof your code as much as possible, e.g.
making sure that the command line parameters are all
valid, that the files named in it all exist, and that
there is at least one report command. In case of invalid
report parameters, your program should print them and a
message indicating that the command is in error.
- Only the
Dictionary and MisspelledList classes are instantiated
in main() or in functions called from main().
- All classes
should be placed in the package assignment4.
- You may
modify any of the provided code however you like,
providing that it meets the project requirements.
The main() function and spell checking application is already
provided for you in Assignment4.java. Do not rename the
Assignment4 class. For your understanding and to help
debugging, you need to understand the steps that the
Assignment4.main() follows for the application:
- The program reads in the following command line
arguments in order (see the Sample Output section for an
example):
- The name of the file containing
the text.
- The name of the file containing
the dictionary.
- An integer n, specifying the
number of words to print from each MRU list.
(optional, but must be paired with item 4)
- One or more letters for the MRU
lists to print. (optional, but must be paired with
item 3)
- Reads the file of words and puts them into the
dictionary. This
file will include one word per line. There
is no need to remove punctuation from this file, as it
will have already been removed.
- Reads the text file and checks each word for proper
spelling according to the spelling rules above. To do this,
it removes punctuation as described above and then check
to see if the word is in the dictionary. If it is
not, then it is misspelled. Misspelled words will
then be inserted into the misspelled word list.
- Prints the list of misspelled words and the number of
times each misspelled word appeared. This list is printed
in alphabetical order.
The words must be printed in a neat columnar format, 4
words per row. The format is shown in the sample output.
For formatting purposes only, we assume that no misspelled
word will appear more than 99 times.
- For each letter found on the command line, prints the
first n (taken
from the command line) from that letter's MRU list in the
dictionary.