CS 206 - Introduction to Data Structures
Homework 9
Linked Lists and Baby Names
Due Wednesday Dec 2 prior to 11:59PM
Remember: Read, and follow, program design principles (http://cs.brynmawr.edu/cs206/design.pdf) and code formatting standards (http://cs.brynmawr.edu/cs206/style.html) carefully.
Overview
In this assignment you will read, store and merge one or more cvs files containing the 1000
most common baby names for boys and girls by year. Your program should store the baby
names in linked lists (one for boys and one for girls) that are sorted either alphabetically or reverse alphabetically. As described below, the desision about sort order will be made at run time basedon a command line input.
The program then needs to be able to look up a name and report the following statistics:
- The sex of the name whose stats are reported
- number - the total number of babies given that name (for that gender) for all years
- percentage - the percentage of babies given that name (for that gender) for all years
- The years in which the name appeared in the list.
- Alphabetical rank (for that gender) for all names seen in all years. The rank is dependent on sort direction
Explained in more detail below, your program should run based on input from the command line. For example assuming that the class Main contains the main method to be run:
UNIX> java Main ASC -f Mary -f Nancy -m Devon -f Devon -m Mark -f Marlen -f Zoe /home/gtowell/Public206/a4/names2000.csv /home/gtowell/Public206/a4/names2001.csv
Mary
Girls Alpha Rank: 765
[2000, 2001]
Percentage: 0.41863382305636054
Nancy
Girls Alpha Rank: 819
Years: [2000, 2001]
Percentage: 0.09811785098728892
Devon
Boys Alpha Rank: 284
Years: [2000, 2001]
Percentage: 0.16482288847264318
Devon
Girls Alpha Rank: 317
Years: [2000, 2001]
Percentage: 0.023353031820525103
Mark
Boys Alpha Rank: 707
Years: [2000, 2001]
Percentage: 0.2887222362428601
Marlen
Girls Alpha Rank: 760
[2000]
Percentage: 0.007444876309701236
Zoe
Girls Alpha Rank: 1055
[2000, 2001]
Percentage: 0.2960040679927911
UNIX> java Main DESC -f Mary -f Nancy -m Devon -f Devon -m Mark -f Marlen -f Zoe /home/gtowell/Public206/a4/names2000.csv /home/gtowell/Public206/a4/names2001.csv
everything is as above except for the Alpha Ranks; For Mary this would be 293, etc
The above formatting is meant as an example; not a requirement. Formating of the output is up to you. That said, the output must contain the required information.
Note that the girls name Marlen only appears in the list for the year 2000.
Input File Format
The input files contain lines in the following format:
rank,male-name,male-number,female-name,female-number
where the comma-separated fields have the following meanings:
rank | the ranking of the names in this file
|
---|
male-name | a male name of this rank
|
---|
male-number | number of males with this name
|
---|
female-name | a female name of this rank
|
---|
female-number | number of females with this name
|
---|
This is the format of database files obtained from the U.S. Social Security Administration. Here is an example showing data from the year 2002:
1,Jacob,30568,Emily,24463
2,Michael,28246,Madison,21773
3,Joshua,25986,Hannah,18819
4,Matthew,25151,Emma,16538
5,Ethan,22108,Alexis,15636
6,Andrew,22017,Ashley,15342
7,Joseph,21891,Abigail,15297
8,Christopher,21681,Sarah,14758
9,Nicholas,21389,Samantha,14662
10,Daniel,21315,Olivia,14630
...
996,Ean,157,Johana,221
997,Jovanni,157,Juana,221
998,Alton,156,Juanita,221
999,Gerard,156,Katerina,221
1000,Keandre,156,Amiya,220
From the above, in 2002, the most popular baby names were Jacob with 30,568 male babies, and Emily with 24,463 female babies. Similarly, going down the list, there were 220 newborn females named Amiya, making it the 1000th most popular female baby name.
The entire data set contains a file for each year from 1990 to 2017, named
names1990.csv, ..., names2017.csv. These files are in /home/gtowell/Public/206/a4. Grading will use file names from the command line.
Specific Tasks
Build two linked lists to store the baby names, one for the male names
and one for the female names.
The linked lists must be kept in alphabetically sorted order by name, case insensitive. The sort order should be determined by the first parameter on the command line, either ASC or DESC. The linked list must be in sorted order at all times.
The two linked lists you are building should be from scratch;
you are not allowed to use Java's built-in LinkedList class. You are allowed to use
any code discussed in class. Your linked lists may be either singly or doubly linked.
Design a class that stores all the relevant stats for a particular name.
Computing the overall percentages requires additional data not stored in the linked lists.
Consider what you need and decide where and how to store the information carefully.
When adding a name to a linked list, you must be able to handle that the name is
already in the list. In such cases, rather than inserting a new item into the linked
list, you should add information (the count and the year) to the existing item.
Suggested steps:
- Rather than accepting command line arguments, use an array of strings defined
within the main function. For instance, your code could look like:
public static void main(String[] args) {
String[] myArgs = {"-f", "Dianna", "/home/gtowell/Public206/data/a4/names1990.csv"};
if (args.length == 0)
args = myArgs;
...
}
This has been discussed previously. Doing something like this will make development far quicker. To test with actual command line input you need only provide input on the command line.
- Read one file into two lists of unique names in sorted order. If you are having trouble debugging the sorted order, I suggest creating a smaller input set. One way to do this is to simply stop reading the file after
2 names. When correct with 2 names, do 3, then 4, ... This may seem painful but it is a time honored approach. It is usually
easier to identify and correct bugs using this procedure than any other.
- Expand your class that holds names to provide storage for usage counts and years
- Compute all the necessary totals to enable yearly percentage reporting.
- Enable single name lookup
- Enable single name lookup on a multiple files.
- Enable multiple name lookup on multiple files.
- Enable ASC/DESC to set the sort order
- Use of command line arguments that conform to the requirements (described next). You can keep the code suggested in step 1, the point here is to actually use command line args.
Look-up via Command-line Arguments
Your program should take command-line arguments as follows:.
- [ASC | DESC] That is, one or the other of ASC and DESC.
- A flag, either -m or -f, which indicates a male name or a female name to look up, respectively.
- A name, for instance Dianna. Capitalization of the name should not matter. "Dianna" should give the same results as "dianna" and "diaNNa", etc.
- Either the -m/-f flag or the name of a file. If -m or -f then go back to 2. If not, then this is a file name so read and store the data.
- Additional file names
For example:
java DESC Main -f Dianna /home/gtowell/Public/206/a4/names1990.csv /home/gtowell/Public/206/a4/names2000.csv
will print out the alpha rank, year and percentage (as explained above) of the female name Dianna used in 1990 and 2000.
Other possible command line input include (but are not limited to):
java ASC Main -f Mary -f Amie -m DaviD /home/gtowell/Public206/data/a4/names1991.csv
java DESC Main -f Devon -m Devon /home/gtowell/Public206/data/a4/names1993.csv /home/gtowell/Public206/data/a4/names1994.csv /home/gtowell/Public206/data/a4/names1995.csv /home/gtowell/Public206/data/a4/names1991.csv
Filenames are always last; nothing follows a filename other than another filename. That is once you see a file name you will not see a person name or a -f/-m.
Electronic Submissions
Your program will be graded based on how it runs on the
department’s Linux server, not how it runs on your computer.
- README: This file should follow the format of this sample README (https://cs.brynmawr.edu/cs206/HW/README.txt)
- Source files: Every .java file used in the final version of your project
- Unique Data files used: This should be blank as the only data files you should use are as above
DO NOT INCLUDE: Data files that are read from the class site.
The following steps for submission assume that you created a folder named Assignment9 in the directory /home/YOU/cs206/
and that all of your code, along with the README file, is inn this directory.
- Put the README file into the project directory (/home/YOU/206/Assignment4)
- Go to the directory /home/YOU/cs206
- Enter submit -c 206 -p 9 -d Assignment9
For more on using the submit script click here