CMSC 206 (Data Structures)
Assignment#2
Due: March 1, 2012
11:59:59pm
The goal of this assignment is
to
implement and use the LinkedList ADT in a real
application.
You are STRONGLY encouraged to
work with one
partner on this assignment. If you do work with a
partner, you
should submit only ONE assignment for both people.
Introduction
A spell checker is a common application used in conjunction
with
document preparation. Since many documents focus on one
subject, many
words in a document are often repeated. Hence, it makes sense
that
checking the spelling of repeated words should be made as fast
as
possible. In this project you will implement a rudimentary
spell
checker that provides for faster access to frequently used
words. The
spell checker will be implemented as a List of Lists.
We will implement two variations of the linked list ADT by
deriving two new kinds of linked lists using inheritance. This
will
require you to implement your own linked list ADT and create
the
derived list implementations. One new list is called a
most-recently
used (MRU) list. Every time an element of
an MRU List is "found" it is moved to the front of the list.
In this
way, frequently accessed data is always close to the front of
the list,
so subsequent "finds" for that data are quicker. New words are
inserted
at the front of the MRU List. For our spell-checker this means
that
repeated words in a document are checked more quickly.
The other new Linked List is the "Sorted Linked List" (SLL). A
SLL
keeps its data sorted at all times by inserting new elements
in the
appropriate place in the list.
Assignment 2 will be invoked with multiple command line
arguments. The
first argument is the name of the text file which will be
checked for
spelling errors. The second argument is the name of the
dictionary
file. The third command line argument is the number of words
to print
for each letter on the command line. The remaining argument(s)
are
single lower-case characters [a-z] for which recently used
word lists are printed.
I have already provided a substantial amount of code to get
you started
as well as skeletons for all the classes you'll need to
implement. You can find this code in your dropbox assignment2
submission folder. Just replace the version I placed
there with
your own.
Spelling
Rules
These spelling rules are already implemented for you in the
removePunctuation() functions in Assignment2.java.
However, to
ensure a thorough understanding of how the program works and to
help
with debugging, you should read them over several times.
- A word in the text file is defined to be a sequence of
characters
surrounded by whitespace. Any sequence of characters
containing
something other than the letters A-Z and a-z should be
considered
misspelled, but see below concerning punctuation characters.
For
example, in the string "This.example shows many_things."
there are 3
words:
"This.example", "shows", and "many_things.".
- All words should be considered to be case-sensitive.
E.g.
if you find the word monday in the text
file and that
word is not found in the dictionary, then you have a
misspelled word
since Monday would be in the
dictionary.
- Punctuation is removed from words using the following
rules:
- All
punctuation characters tailing a word should be removed.
Punctuation characters are defined as periods, exclamation
points,
question marks, semi-colons, and commas.
- Hyphens
are the only punctuation that can be removed from the
middle of a word
(i.e. word-word).
- Quotation marks and parenthesis must be removed from
the beginning and end of words.
- For example, in the sentence "Hi,
you're
looking nice today?", you would remove the
quotes,
comma, and the question mark and search the dictionary for
the words Hi, you're, looking, nice and today.
Abstract
Data
Types
To complete the assignment, you will need to implement the
first three abstract data types (ADTs). I strongly
suggest you complete
them in the order they are presented.
Four other ADTs are provided for you in their entirety:
Dictionary, WordList, MisspelledWordList and
CountedWord. You may use the classes
as-is; there is no need to implement anything for them.
The Linked List ADT
You must complete the provided Linked List skeleton, including
implementing LinkedList, ListNode, and ListIterator
classes. This
can be a single or doubly linked list, and should use the List
interface.
The MRU List ADT
The Most Recently Used (MRU) List is derived from LinkedList.
It
provides no other functionality. It differs from a LinkedList
in just
one way -- elements which are successfully found are moved to
the front
of the list. The list is unchanged in the event of an
unsuccessful find
operation. Since an MRU list is inherently NOT sorted, new
elements can
be inserted anywhere in the MRU list.
The Sorted List ADT
The Sorted List is derived from LinkedList. It provides no
other
functionality. It differs from a LinkedList in just one way --
the
elements are always in ascending sorted order.
The Dictionary ADT [Provided]
The dictionary contains Strings which represent the complete
set of
correctly spelled words. To facilitate fast look-up, these
words are
stored based on their first letter in an MRU list for each
letter of
the alphabet. The Dictionary is essentially a SortedList
of 26
MRU word lists. Note that it does not matter whether the
first
letter of each word is upper or lowercase.
The Dictionary ADT defines the following operations
- void insert( String word ) -- adds the
specified word
to the dictionary.
- boolean contains( String word ) -- searches
the dictionary
for the specified word.
- String printMRU( char letter, int n ) -- returns a
String of the n
most recently used words in the
dictionary that begin with the specified letter, formatted
appropriately for output. If less than n words start with
the specified
letter, then all words beginning with that letter are
printed. The
words must be printed
in a neat columnar format, 4 words per row.
Each column must be separated by a minimum of 2
whitespaces. The width
of the columns will depend on the length of the longest
word to be
printed. Note
that this
operation is case-INsensitive. The parameter
passed to the function may be upper- or lower-case. In
either event,
words that begin with the same upper- or lower-case letter
must be
printed.
For example, printMRU('a',
10) and printMRU('A',
10) BOTH
print words which
begin with 'a' OR 'A'.
- A default constructor which creates an empty dictionary.
Word List ADT [Provided]
The WordList associates some
information with the MRU list for each letter of the alphabet.
The
WordList ADT encapsulates this information and becomes an
adapter for
the MRU list. The WordList ADT defines the following
operations:
- appropriate constructor(s)
- appropriate comparison and equals methods
- an appropriate toString() function
- void insert( String word ) -- inserts
the specified
word into the WordList
- boolean contains( String word ) -- finds the
specified word
in the WordList
Misspelled Word List ADT [Provided]
The spell checker must maintain a list of misspelled words
along with
the
number of times each misspelled word appears. To do so, we've
provided
the MisspelledList ADT which defines the following operations:
- appropriate constructor(s)
- void insert( String word ) -- inserts
the misspelled
word into the MisspelledList. If the word is already in
the list, its
count is incremented.
- String toString( ) -- prints the
Misspelled
list in alphabetical order.
CountedWord ADT [Provided]
Since the printed list of misspelled words also includes the
number of
times each misspelled word appears, good object-oriented
design
dictates that these data items be stored together in a simple
object
which we are calling a CountedWord.
The CountedWord ADT defines the following operations
- appropriates constructor(s) that takes in the word as a
String
and initializes its count at 1.
- void increment( ) -- increments
the word's
counter
- appropriate comparison operators
- String toString( ) that outputs
the word and
and count in the format "<word> : <count>"
(without the
quotes).
Requirements
- ALL LISTS used
in
this assignment must
be linked lists. This
means
NO
JAVA ARRAYLISTS, LINKED LIST, ARRAYS are permitted.
- The MRU list and Sorted Linked List (SLL) must be
implemented using inheritance. They are both derived from
the linked
list class.
- All possible exceptions discussed pertaining to a
linked list must be caught and dealt with appropriately.
Your program
should continue processing after an exception if at all
possible.
- You
may include any private data and private
functions to any class.
- Be prepared for your code to work on large data
sets.
- You must bulletproof your code as much as
possible, e.g. making sure that the command line
parameters are all
valid, that the files named in it all exist, and that
there is at least
one report command. In case of invalid report parameters,
your program
should print them and a message indicating that the
command is in error.
- Only
the Dictionary and MisspelledList classes
are instantiated in main() or in functions called from
main().
- All
classes should be placed in the package
assignment2.
- You
may modify any of the provided code however you
like, providing that it
meets the project requirements.
The main() function and spell checking application is already
provided
for you in Assignment2.java. Do not rename the
Assignment2
class. For your understanding and to help debugging, you
need to
understand the steps that the Assignment2.main() follows for
the
application:
- The program reads in the following command line
arguments in
order (see the Sample Output section for an example):
- The
name of the file containing the text.
- The
name of the file containing the dictionary.
- An
integer n, specifying the number of words to print
from each MRU list.
(optional, but must be paired with item 4)
- One
or more letters for the MRU lists to print.
(optional, but must be
paired with item 3)
- Reads the file of words and puts them into the
dictionary. This
file will
include one word per line. There is no need
to remove
punctuation from this file, as it will have already been
removed.
- Reads the text file and checks each word for
proper spelling according to the spelling rules above.
To do this, it removes punctuation as described above and
then
check to see if the word is in the dictionary. If it
is not, then
it
is misspelled. Misspelled words will then
be inserted into the misspelled word list.
- Prints the list of misspelled words and the number of
times
each misspelled word appeared. This list is printed in
alphabetical
order.
The words must be printed in a neat columnar format, 4
words per row.
The format is shown in the sample output. For formatting
purposes only,
we assume that no misspelled word will appear more than 99
times.
- For each letter found on the command line, prints the
first n (taken
from the command line)
from that letter's MRU list in the dictionary.