Homework 7
Due | cs113: Nov 17, prior to 11:59:59 PM |
cs109: Nov 18, prior to 11:59:59 PM |
Overview
More computational linguistics! We know that 'e' is the most common letter in the english language, but is it also the most repeated letter within individual words? In this assignment, you are going to take a step towards answering the question of repeats. Specifically, the question you will answer is:
For a word, how many times does it happen that a letter in position N also appear in position M such that M>N.
For example, for the word "letter", the answer is 2 because the letter at position 1 (an 'e') also appears in position 4, and the letter at position 2 (a 't') also appears at position 3 (position is zero indexed). For the word "mississippi", the answer is 7. The 'i' in position i also appear at position 4, the 's' at position 2 also appears at position 3, the 's' at 3 also appears at 5, the 'i' at 5 also appears at 8, etc. Note that for each position N, increment the count by, at most, 1. So in "mississippi" when counting repeats the fact that there are 3 'i's after position 1 is not important; it is on that that there are more than 0 'i's after position 1 that is important.
Code Understandability and Readability
You may have read the following statement previously. It still applies
In addition to writing code that correctly implements the specification, you are also asked to write code that is easy to read and understand.
In particular, part of your score on this assignment will be determined by:
Variable naming: Variables should have meaningful names that indicate what they represent, using full English words or common abbreviations, e.g. "wins" or "votes" instead of "w" or "vot". Likewise, all variable names should begin with a lower case letter and class names should begin with a capital letter.
Appearance: The code should be formatted so that indentation and spacing make it easy to understand which parts of the code are within the bodies of if-statements and loops. Additionally, there should be spacing between variables and operators to make it easy to read each individual line of code. (Hint, in VSC, you can reformat you entire program -- getting the indenting to look nice -- by doing the following
- Bring up the "command pallette". (either View menu / Command Pallette, or F1 key, or maybe Option-x)
- Type "reindent lines" or as much of that as you need to get "reindent lines" to be the top item in the list of commands
- Hit return
).
A full description of Java style conventions is available here.
Data Files
Into your working directory, copy the file alice.txt (the full text of "Alice's Adventures in Wonderland"). This file is available on Unix at
/home/gtowell/Public/CS113/HW6/alice.txt
Use scp or cp as appropriate for you (scp if you work on your own computer).
What the program should do
Nov 15 Update
The program as originally assigned is still valid. This update just makes an slightly easier option available. The full version is worth 10 points of extra credit.
Full Credit Program
Your program should do the following
- Check the command line inputs (as described below)
- Read all of the words, word by word, in the file named on the command line (see the code below)
- Count the number of words read
- With each word, determine if the first letter of the word appears anywhere else in the word.
- If the first letter is repeated
- Print the word
- increment a counter of words with repetitions
- After all the words have been read, print the total number of words and the number of words in which the first letter is repeated somewhere else in the word.
Hint, you can do this without an extra loop using charAt(0) to get the first character of the word and one of the other string functions discussed this week. (By extra loop, I mean that you will need the while loop to read all the words, but no other loops.)
Here are the first few words that my version of this program prints
sister
nothing
peeped
sister
thought
getting
eyes
nothing
_very_
remarkable
Extra Credit Program
Your program should do the following
- Check the command line inputs (as described below)
- Read all of the words, word by word, in the file named on the command line (see the code below)
- Count the number of words read
- With each word, count the number of repeated letters, with repetition defined as above
- if the word has a repetition count greater than zero
- Print the word and its repetition count
- increment a counter of words with repetitions
- After all the words have been read, print the total number of words and the number of words that have repetitions.
For instance, for the Alice in Wonderland file, the first things printed are:
Adventures 1
Wonderland 2
Carroll 2
Rabbit-Hole 1
beginning 4
sitting 2
sister 1
nothing 1
peeped 3
Your program will have a lot of loops. There will certainly be a 'while' loop for reading words from the file (see "Reading a File" below). Withing that 'while' loop you will need at least one and possibly 2 'for' loops (one inside the other). The number of 'for' loops depends on how you actually implement the repetition check.
Reading a File
Unlike some previous assignments where we read one character at a time, in this assignment you will be reading one word at a time. Hence, you can (and should) use a Java class for reading files that does more than just read one letter at a time. Specifically, you should use the Scanner class. Below is code to read a file word by word using Scanner.
try {
Scanner s = new Scanner(new File("Alice.txt"));
while (s.hasNext()) {
String wd = s.next();
System.out.println(s);
}
} catch (Exception e) {
System.err.println("Error " + e);
}
Repeating some text from la recent 113 lab ... Here, we open a Scanner, telling it to read from a file by giving it a file object "new File("alice.txt")". We then use the next method on the scanner instance to read the words the file alice.txt.
Java often requires you to do things to handle problems. The way it does this is with try..catch as you have seen with the FileReader on previous assignments. In this case, with the line "Scanner s = ..." an problem could occur if the file "data.txt" did not exist. On the next lines, a problem could occur if something weird happened to the file.
Command line input
The name of the file to be read should NOT be hard coded into your program Rather it should be given on the command line.
Validating the command line inputs
Before starting presentation of words, your program should doing the following validation on the command line inputs:
- There ia exactly 1 input, the name of a file.
- Optional and worth 3 points of extra credit: the file exists. (Hint look for a method in the File class.)
If the criteria you implemented are not met, your program should print a message saying what is wrong and then quit.
Submitting
Create a readme
Use VSC to create another file in your HW7 directory. This file should be named "Readme". The contents of this file should follow this sample.
You should have at least 3 files in your HW7 directory: XXX.java, alice.txt and Readme. (XXX should be a a meaningful name) (You might also have .class files.)
Submit
If you did this work on your own computer
(All of the directions below assume a CS113, if you have a CS109 directory, please substitute as needed.)
You will first need to copy the files from you own computer to a lab computer. Either way, you will need to create
HW7 directory within you CS113 directory on the Unix machines. Recall, the can be done with the following commands:
cd
cd CS113
mkdir HW7
Once you have made the HW7 directory in Unix, open a terminal on you own computer and in that terminal use "cd" to navigate to the directory containing your work for this assignment. Assuming you use the same directory structure on your own computer and in the lab, this process can be accomplished with the following commands
cd
cd CS113
cd HW7
Then use the scp command to copy each of the files you want to submit from your computer to the lab. For example:
scp Readme UNIX_NAME@goldengate.cs.brynmawr.edu:CS113/HW7/Readme
As always, when you read "UNIX_NAME" put in your UNIX user name. Also, with each scp command you will need to enter your UNIX password.
Actually submit
Open a terminal in UNIX (again, you can use SSH to do so from your laptop) and execute the following Unix commands (assuming you put HW7 directory into a CS113 directory in your home directory).
cd
cd CS113
/home/gtowell/bin/submit -c 113 -d HW7 -p 7
In response to the submit command you should see a series of messages ending with:
Submitting archive...
Submission complete! Submission timestamp is 2023-08-08-15-30-28-EDT.