CS 206 - Introduction to Data Structures

Homework 4

Linked Lists and Baby Names

Due Thursday, Feb 20 prior to 11:59PM

Remember: Read, and follow, program design principles (http://cs.brynmawr.edu/cs206/design.pdf) and code formatting standards (http://cs.brynmawr.edu/cs206/style.html) carefully.

Overview

In this assignment you will read, store and merge one or more cvs files containing the 1000 most common baby names for boys and girls by year. Your program should store the baby names in linked lists (one for boys and one for girls) that are sorted reverse alphabetically.

The program then needs to be able to look up a name and report the following statistics:
  1. The sex of the name whose stats are reported
  2. number - the total number of babies given that name (for that gender) for all years
  3. percentage - the percentage of babies given that name (for that gender) for all years
  4. The years in which the name appeared in the list.
  5. Alphabetical rank (for that gender) for all names seen in all years
Explained in more detail below, your program should run based on input from the command line. For example assuming that the class Main contains the main method to be run:
UNIX> java Main -f Mary -f Nancy -m Devon -f Devon -m Mark -f Marlen -f Zoe /home/gtowell/Public206/a4/names2000.csv /home/gtowell/Public206/a4/names2001.csv

	Mary
	Girls Alpha Rank: 293 of 1057
        [2000, 2001]
        Percentage: 0.41863382305636054
	
	Nancy
	Girls Alpha Rank: 239 of 1057
	Years: [2000, 2001]
	Percentage: 0.09811785098728892
	
	Devon
	Boys Alpha Rank: 759 of 1042
	Years: [2000, 2001]
	Percentage: 0.16482288847264318
	
	Devon
	Girls Alpha Rank: 741 of 1057
	Years: [2000, 2001]
	Percentage: 0.023353031820525103
	
	Mark 
	Boys Alpha Rank: 336 of 1042
	Years: [2000, 2001]
	Percentage: 0.2887222362428601

	Marlen
	Girls Alpha Rank: 298 of 1057
	[2000]
	Percentage: 0.007444876309701236
	
	Zoe
        Girls Alpha Rank: 3 of 1057
        [2000, 2001]
        Percentage: 0.2960040679927911

The above formatting is meant as an example; not a requirement. Formating of the output is up to you. That said, the output must contain the required information. Note that the girls name Marlen only appears in the list for the year 2000.

Input File Format

We'll be taking input from files containing lines in the following format:
rank,male-name,male-number,female-name,female-number
where the comma-separated fields have the following meanings:
rankthe ranking of the names in this file
male-namea male name of this rank
male-number number of males with this name
female-name a female name of this rank
female-numbernumber of females with this name

This is the format of database files obtained from the U.S. Social Security Administration. Here is an example showing data from the year 2002:

1,Jacob,30568,Emily,24463
2,Michael,28246,Madison,21773
3,Joshua,25986,Hannah,18819
4,Matthew,25151,Emma,16538
5,Ethan,22108,Alexis,15636
6,Andrew,22017,Ashley,15342
7,Joseph,21891,Abigail,15297
8,Christopher,21681,Sarah,14758
9,Nicholas,21389,Samantha,14662
10,Daniel,21315,Olivia,14630
...
996,Ean,157,Johana,221
997,Jovanni,157,Juana,221
998,Alton,156,Juanita,221
999,Gerard,156,Katerina,221
1000,Keandre,156,Amiya,220
From the above, in 2002, the most popular baby names were Jacob with 30,568 male babies, and Emily with 24,463 female babies. Similarly, going down the list, there were 220 newborn females named Amiya, making it the 1000th most popular female baby name.

The entire data set contains a file for each year from 1990 to 2017, named names1990.csv, ..., names2017.csv. These files are in /home/gtowell/Public206/data/a4. You should use the files directly from this location. Do not make a local copy. Grading will not use local copies. (As always, for development you may do what is convenient, but this rule applies to the version you hand in.)

Specific Tasks

Build two linked lists to store the baby names, one for the male names and one for the female names. The linked lists must be kept in alphabetically sorted order by name, case insensitive. The two linked lists you are building should be from scratch; you are not allowed to use Java's built-in LinkedList class. You are allowed to use any code discussed in class.

Design a class that stores all the relevant stats for a particular name.

Computing the overall percentages requires additional data not stored in the linked lists. Consider what you need and decide where and how to store the information carefully.

When adding a name to a linked list, you must be able to handle that the name is already in the list. In such cases, rather than inserting a new item into the linked list, you should add information (the count and the year) to the existing item.

Suggested steps:

  1. Rather than accepting command line arguments, use an array of strings defined within the main function. For instance, your code could look like:
    public static void main(String[] args) {
        String[] myArgs = {"-f", "Dianna", "/home/gtowell/Public206/data/a4/names1990.csv"};
    	if (args.length == 0)
    		args = myArgs;
        ...
    }
    
    Doing something like this will make development far quicker. To test with actual command line input you need only provide input on the command line.
  2. Read one file into two lists of unique names in sorted order. If you are having trouble debugging the sorted order, I suggest creating a smaller input set. One way to do this is to simply stop reading the file after 2 names. When correct with 2 names, do 3, then 4, ... This may seem painfil but it is a lot easier to identify and correct bugs using this procedure than any other.
  3. Expand your class that holds names to provide storage for number and years
  4. Compute all the necessary totals to enable yearly percentage reporting and storing them reasonably.
  5. Enable single name lookup
  6. Enable single name lookup on a multiple files.
  7. Enable multiple name lookup on multiple files.
  8. Use of command line arguments that conform to the requirements (described next). You can leave the code suggested in step 1, the point here is to actually use command line args.
  9. Enable checking of command line arguments to handle bad user input. Form instance, unknown files, missing arguments, etc.

Look-up via Command-line Arguments

Your program should take command-line arguments as follows:.
  1. A flag, either -m or -f, which indicates a male name or a female name to look up, respectively.
  2. A name, for instance Dianna. Capitalization of the name should not matter. "Dianna" should give the same results as "dianna" and "diaNNa", etc.
  3. Either the -m/-f flag or the name of a file. If -m or -f then go back to 2. If not, then this is a file name so read and store the data.
  4. Additional file names
For example:
java Main -f Dianna /home/gtowell/Public206/data/a4/names1990.csv /home/gtowell/Public206/data/a4/names2000.csv
will print out the ranks (alphabetic and numeric), number and percentages (as explained above) of the female name Dianna used in 1990 and 2000.

Other possible command line input include (but are not limited to):

java Main -f Mary -f Amie -m DaviD /home/gtowell/Public206/data/a4/names1991.csv
java Main -f Devon -m Devon /home/gtowell/Public206/data/a4/names1993.csv /home/gtowell/Public206/data/a4/names1994.csv /home/gtowell/Public206/data/a4/names1995.csv /home/gtowell/Public206/data/a4/names1991.csv
	
Filenames are always last; nothing follows a filename other than another filename. That is once you see a file name you will not see a person name or a -f/-m. Make sure you error-check your arguments thoroughly, i.e. illegal/badly-formated/missing options. Your program should behave rationally no matter how unreasonable the input.

If you are using Visual Studio Code for development, and your folder is named Assignment4, then you should be able to run your program from the command line as in Lab 4. Specifically, first open a terminal by selecting "Applications / System Tools / MATE Terminal" from the menus in the upper left. Then

	cd /home/YOU/206/Assignment4/
	javac Main.java
After this you should be able to use commands like those above.

Electronic Submissions

Your program will be graded based on how it runs on the department’s Linux server, not how it runs on your computer. DO NOT INCLUDE:
Data files that are read from the class site.

The following steps for submission assume that you created a folder named Assignment4 in the directory /home/YOU/cs206/ and that all of your code, along with the README file, is inn this directory.

  1. Put the README file into the project directory (/home/YOU/206/Assignment4)
  2. Go to the directory /home/YOU/cs206
  3. Enter submit -c 206 -p 4 -d Assignment4
For more on using the submit script click here