assignment sir sirs signer sign assign assignments signer assignsSo, the dataset consists of words, 1 word per line. You may assume that every word is in lower case and that every line has a word.
The first task is to get and use the code Stemmer.java from /home/gtowell/cs206/a6/. Figure out how to use this code to get the "stemmed" stemmed form of each word. (The stemmed form is the word with any suffixes removed.) You might discover that the documentation of this code, while long is woefully insufficient to make the code easily used. (So all of you who are handing in poorly documented assignments ....) Note that this stemmer is very aggressive and occasionally wrong so it will remove some things that you would not call suffixes. Do not worry about this, you task is only to use this code, not to correct it.
Once you are able to stem words, build a data structure that holds, for each stemmed word: the stem, all of the unstemmed variants, and the number of times some unstemmed variant of the word appears in the dataset.
After reading in all of the words, you should print out an alphabetized list of the stems, the number of times that occurred and the morphological variants that appeared. For example, for the above mini dataset you might print out:
assign 4 assign assigns assignment sign 3 sign signer sir 2 sir sirsFor the morphological variants, the order is not important.
As always, there are some constraints on your solution:
Part 2: 60 points April 22 The completed program. The program need not follow that plan handed in previously, but if it does not, you should explain why not.
I strongly recommend that you think hard about this assignment before writ