CMSC 206 (Introduction to Computing)
Assignment#2
Due: October 05, 2011
11:59:59pm
The goal of this assignment is to
implement and use the LinkedList ADT in a real application.
You are encouraged to work with one
partner on this assignment. If you do work with a partner, you
should submit only ONE assignment for both people.
Introduction
A spell checker is a common application used in conjunction with
document preparation. Since many documents focus on one subject, many
words in a document are often repeated. Hence, it makes sense that
checking the spelling of repeated words should be made as fast as
possible. In this project you will implement a rudimentary spell
checker that provides for faster access to frequently used words. The
spell checker will be implemented as a List of Lists.
We will implement two variations of the linked list ADT by
deriving two new kinds of linked lists using inheritance. This will
require you to implement your own linked list ADT and create the
derived list implementations. One new list is called a most-recently
used (MRU) list. Every time an element of
an MRU List is "found" it is moved to the front of the list. In this
way, frequently accessed data is always close to the front of the list,
so subsequent "finds" for that data are quicker. New words are inserted
at the front of the MRU List. For our spell-checker this means that
repeated words in a document are checked more quickly.
The other new Linked List is the "Sorted Linked List" (SLL). A SLL
keeps its data sorted at all times by inserting new elements in the
appropriate place in the list.
Assignment 2 will be invoked with multiple command line arguments. The
first argument is the name of the text file which will be checked for
spelling errors. The second argument is the name of the dictionary
file. The third command line argument is the number of words to print
for each letter on the command line. The remaining argument(s) are
single lower-case characters [a-z] for which recently used
word lists are printed.
I have already provided a substantial amount of code to get you started
as well as skeletons for all the classes you'll need to
implement. You can find this code in your dropbox assignment2
submission folder. Just replace the version I placed there with
your own.
Spelling
Rules
These spelling rules are already implemented for you in the
removePunctuation() functions in Assignment2.java. However, to
ensure a thorough understanding of how the program works and to help
with debugging, you should read them over several times.
- A word in the text file is defined to be a sequence of characters
surrounded by whitespace. Any sequence of characters containing
something other than the letters A-Z and a-z should be considered
misspelled, but see below concerning punctuation characters. For
example, in the string "This.example shows many_things." there are 3
words:
"This.example", "shows", and "many_things.".
- All words should be considered to be case-sensitive. E.g.
if you find the word monday in the text file and that
word is not found in the dictionary, then you have a misspelled word
since Monday would be in the dictionary.
- Punctuation is removed from words using the following rules:
- All
punctuation characters tailing a word should be removed.
Punctuation characters are defined as periods, exclamation points,
question marks, semi-colons, and commas.
- Hyphens
are the only punctuation that can be removed from the middle of a word
(i.e. word-word).
- Quotation marks and parenthesis must be removed from
the beginning and end of words.
- For example, in the sentence "Hi,
you're
looking nice today?", you would remove the quotes,
comma, and the question mark and search the dictionary for the words Hi, you're, looking, nice and today.
Abstract
Data
Types
To complete the assignment, you will need to implement the first
five abstract data types (ADTs). I strongly suggest you complete
them in the order they are presented. Due to the interdependency
between the Dictionary and the WordList, you may wish to implement
those two ADTs in parallel.
Two other ADTs are provided for you in their entirety:
MisspelledWordList and CountedWord. You may use the classes
as-is; there is no need to implement anything for them.
The Linked List ADT
You must complete the provided Linked List skeleton, including
implementing LinkedList, ListNode, and ListIterator classes. This
can be a single or doubly linked list, and should use the List
interface.
The MRU List ADT
The Most Recently Used (MRU) List is derived from LinkedList. It
provides no other functionality. It differs from a LinkedList in just
one way -- elements which are successfully found are moved to the front
of the list. The list is unchanged in the event of an unsuccessful find
operation. Since an MRU list is inherently NOT sorted, new elements can
be inserted anywhere in the MRU list.
The Sorted List ADT
The Sorted List is derived from LinkedList. It provides no other
functionality. It differs from a LinkedList in just one way -- the
elements are always in ascending sorted order.
The Dictionary ADT
The dictionary contains Strings which represent the complete set of
correctly spelled words. To facilitate fast look-up, these words are
stored based on their first letter in an MRU list for each letter of
the alphabet. The Dictionary is essentially a SortedList of 26
MRU word lists. Note that it does not matter whether the first
letter of each word is upper or lowercase.
The Dictionary ADT defines the following operations
- void insert( String word ) -- adds the specified word
to the dictionary.
- boolean contains( String word ) -- searches the dictionary
for the specified word.
- String printMRU( char letter, int n ) -- returns a String of the n most recently used words in the
dictionary that begin with the specified letter, formatted
appropriately for output. If less than n words start with the specified
letter, then all words beginning with that letter are printed. The words must be printed
in a neat columnar format, 4 words per row.
Each column must be separated by a minimum of 2 whitespaces. The width
of the columns will depend on the length of the longest word to be
printed.
You will find that some of the provided code will help you implement
this method -- look around for it! Note that this
operation is case-INsensitive. The parameter
passed to the function may be upper- or lower-case. In either event,
words that begin with the same upper- or lower-case letter must be
printed.
For example, printMRU('a',
10) and printMRU('A', 10) BOTH print words which
begin with 'a' OR 'A'.
- A default constructor which creates an empty dictionary.
Word List ADT
You will find it necessary (or at least easier) to associate some
information with the MRU list for each letter of the alphabet. The
WordList ADT encapsulates this information and becomes an adapter for
the MRU list. The WordList ADT defines the following operations:
- appropriate constructor(s)
- appropriate comparison and equals methods
- an appropriate toString() function
- void insert( String word ) -- inserts the specified
word into the WordList
- boolean contains( String word ) -- finds the specified word
in the WordList
Misspelled Word List ADT [Provided]
The spell checker must maintain a list of misspelled words along with
the
number of times each misspelled word appears. To do so, we've provided
the MisspelledList ADT which defines the following operations:
- appropriate constructor(s)
- void insert( String word ) -- inserts the misspelled
word into the MisspelledList. If the word is already in the list, its
count is incremented.
- String toString( ) -- prints the Misspelled
list in alphabetical order.
CountedWord ADT [Provided]
Since the printed list of misspelled words also includes the number of
times each misspelled word appears, good object-oriented design
dictates that these data items be stored together in a simple object
which we are calling a CountedWord.
The CountedWord ADT defines the following operations
- appropriates constructor(s) that takes in the word as a String
and initializes its count at 1.
- void increment( ) -- increments the word's
counter
- appropriate comparison operators
- String toString( ) that outputs the word and
and count in the format "<word> : <count>" (without the
quotes).
Requirements
- ALL LISTS used
in this assignment must
be linked lists. This
means
NO JAVA ARRAYLISTS, LINKED LIST, ARRAYS are permitted.
- The MRU list and Sorted Linked List (SLL) must be
implemented using inheritance. They are both derived from the linked
list class.
- All possible exceptions discussed pertaining to a
linked list must be caught and dealt with appropriately. Your program
should continue processing after an exception if at all possible.
- You
may include any private data and private
functions to any class.
- Be prepared for your code to work on large data
sets.
- You must bulletproof your code as much as
possible, e.g. making sure that the command line parameters are all
valid, that the files named in it all exist, and that there is at least
one report command. In case of invalid report parameters, your program
should print them and a message indicating that the command is in error.
- Only
the Dictionary and MisspelledList classes
are instantiated in main() or in functions called from main().
- All
classes should be placed in the package assignment2.
- You
may modify any of the provided code however you like, providing that it
meets the project requirements.
The main() function and spell checking application is already provided
for you in Assignment2.java. Do not rename the Assignment2
class. For your understanding and to help debugging, you need to
understand the steps that the Assignment2.main() follows for the
application:
- The program reads in the following command line arguments in
order (see the Sample Output section for an example):
- The
name of the file containing the text.
- The
name of the file containing the dictionary.
- An
integer n, specifying the number of words to print from each MRU list.
(optional, but must be paired with item 4)
- One
or more letters for the MRU lists to print. (optional, but must be
paired with item 3)
- Reads the file of words and puts them into the
dictionary. This file will
include one word per line. There is no need to remove
punctuation from this file, as it will have already been removed.
- Reads the text file and checks each word for
proper spelling according to the spelling rules above.
To do this, it removes punctuation as described above and then
check to see if the word is in the dictionary. If it is not, then
it
is misspelled. Misspelled words will then
be inserted into the misspelled word list.
- Prints the list of misspelled words and the number of times
each misspelled word appeared. This list is printed in alphabetical
order.
The words must be printed in a neat columnar format, 4 words per row.
The format is shown in the sample output. For formatting purposes only,
we assume that no misspelled word will appear more than 99 times.
- For each letter found on the command line, prints the first n (taken from the command line)
from that letter's MRU list in the dictionary.