CS 206 - Introduction to Data Structures

Linked lists of baby names

In this assignment you will read, store and merge one or more cvs files containing the 1000 most common baby names for boys and girls by year. Your program should store the baby names in two independent linked lists (one for boys and one for girls) that are sorted alphabetically.

The program needs to be able to look up a name and report the following statistics:
  1. The sex of the name whose stats are reported
  2. number - the total number of babies given that name (for that gender) for all years
  3. percentage - the percentage of babies given that name (for that gender) for all years
  4. The years in which the name appeared in the list.
  5. Alphabetical rank (for that gender) for all names seen in all years.
Explained in more detail below, your program should run based on input from the command line and interactive user queries. For example assuming that the class Main contains the main method to be run:
UNIX> java Main  /home/gtowell/Public/206/a9/names2000.csv /home/gtowell/Public/206/a9/names2001.csv
Name  (q to quit): Mary

Used as a girls name
Total Babies       :11921
Years in top 1000  :[2000, 2001]
Percentage         : 0.004186338230563605
Alphabetic Position: 293

Name  (q to quit): Devon

Used as a Boys Name
Total Babies       :5695
Years in top 1000  :[2000, 2001]
Percentage         : 0.0016482288847264317
Alphabetic Position: 759
Used as a girls name
Total Babies       :665
Years in top 1000  :[2000, 2001]
Percentage         : 2.3353031820525102E-4
Alphabetic Position: 741

Name  (q to quit): Mark

Used as a Boys Name
Total Babies       :9976
Years in top 1000  :[2000, 2001]
Percentage         : 0.002887222362428601
Alphabetic Position: 336

Name (q to quit): Marlen

Used as a girls name
Total Babies       :212
Years in top 1000  :[2000]
Percentage         : 7.444876309701235E-5
Alphabetic Position: 298

Name: q

The above formatting is meant as an example; not a requirement. Formating of the output is up to you. That said, the output must contain the required information. Note, in the example above the names are sorted in reverse alphabetic order.

Input File Format

The input files contain lines in the following format:
where the comma-separated fields have the following meanings:
  1. rank the ranking of the names in this file (you can ignore this)
  2. male-name a male name of this rank
  3. male-number number of males with this name
  4. female-name a female name of this rank
  5. female-number number of females with this name
This is the format of database files obtained from the U.S. Social Security Administration. Here is an example showing data from the year 2002:
From the above, in 2002, the most popular baby names were Jacob with 30,568 male babies, and Emily with 24,463 female babies. Similarly, going down the list, there were 220 newborn females named Amiya, making it the 1000th most popular female baby name.

The entire data set contains a file for each year from 1990 to 2017, named names1990.csv, ..., names2017.csv. These files are in /home/gtowell/Public/206/a9.



Computing the overall percentages requires some additional data not stored in the linked lists. Consider what you need and decide where and how to store the information carefully.

When a name is read from a data file, you must be able to handle that the name is already in the list. In such cases, rather than inserting a new item into the linked list, you should add information (the count and the year) to the existing item.

When implementing sorting -- via insertion sort -- first make the class holding the baby name information implement the Comparable interface. Then use the compareTo method when determining where to insert. Note that your compareTo method might be only one line long as all it (probably) needs to do is invoke the compareTo method of the underlying name.

Suggested Steps

  1. For your development purposes, rather than accepting command line arguments, use an array of strings defined within the main function. For instance, your code could look like:
    public static void main(String[] args) {
        String[] myArgs = {"/home/gtowell/Public/206/a9/names1990.csv"};
    	if (args.length == 0)
    		args = myArgs;
    This has been discussed previously. Doing something like this will make development far quicker. To test with actual command line input you need only provide input on the command line.
  2. Initially just work with one input file
  3. Write a toString method that prints out the the contents of a the linked lists. You will use this toString method a lot in the next step.
  4. Read one file into two lists of unique names in sorted order. If you are having trouble debugging the sorted order, I suggest creating a smaller input set. One way to do this is to simply stop reading the file after 2 names. When correct with 2 names, do 3, then 4, ... This may seem painful but it is a time honored approach. It is usually easier to identify and correct bugs using this procedure than any other.
  5. Expand your class that holds names to provide storage for usage counts and years
  6. Figure out a way to extract years from the file name. The file name is the only place that this information appears.
  7. Work on the user interaction
  8. Create a system for looking up names
  9. Expand your system to work with multiple input files
  10. Document your code and make the user interactions clear (better than my sample above).

Electronic Submissions

Your program will be graded based on how it runs on the department’s Linux server, not how it runs on your computer. The submission should include the following items:

The following steps for submission assume that you created a project named AssignmentN in the directory /home/YOU/cs206/

  1. For this assignment N=9
  2. Put the README file into the project directory
  3. Go to the directory /home/YOU/cs206
  4. Enter /home/gtowell/bin/submit -c 206 -p N -d AssignmentN

For more on using the submit script click here