CS 151 - Introduction to Data Structures

Linked lists of baby names

In this assignment you will read, store and merge one or more csv files containing 2000 of the most common baby names by year. Your program must store the baby names in a linked list that is sorted alphabetically.

Once the files are read, the program needs to be able to look up a name and report the following statistics:
  1. name whose stats are reported
  2. number - the total number of babies given that name for all years (this is "freq" in the listing below)
  3. The number of times in which the user supplied name appears in the lists over all years. For many common names this will be exactly the number years of data scanned. Uncommon names may appear only once or twice over all of the years. Some names may appear twice in a given year so the number of times the name appears can be greater than the number of years. (Shown as "uses" below). In the list for a given year, this may be 0, 1 or 2.
  4. Alphabetical rank of the user-supplied from among the names seen in all years read. For instance, if the complete list of names, in alphabetic order is (A D G K L) then the alphabetic position of G is 3. (Shown as Pos below.)
Your program should run based on input from the command line and interactive user queries. For example assuming that the class Main contains the main method to be run:
UNIX> java Main 2000 2001 2002

Name:   (q to quit) aaron
Aaron
    Uses:3
    Freq:28089
    Pos:1

Name:   (q to quit) SYDNEY
sydney
    Uses:3
    Freq:28913
    Pos:1911

Name:   (q to quit) SiDNEY
sidney
    Uses:6
    Freq:4135
    Pos:1868


Name:   (q to quit) SYDny
sydny not found


Name:   (q to quit) Marlen
marlen
    Uses:1
    Freq:212
    Pos:1468

The above formatting is meant as an example; not a requirement. Formatting of the output is up to you. That said, the output must contain the required information.

Input Files

Input files are available at
	https://cs.brynmawr.edu/cs151/Data/Hw9/NNYEAR.csv
They are also available via scp at
	/home/gtowell/Public/151/Data/Hw9/NNYEAR.csv
where YEAR is a number in the range 1990-2020. For instance, a file is NN2000.csv.

These files contain lines in the following format:
name,number
where the comma-separated fields have the following meanings:
  1. name: a name;
  2. number: a number of times name was used in that year -- note that the name may appear more than once in a given file, so this number is not necessarily the number of times the name was used in the year.
For instance, this is the start of one of the data files.
Jacob,34471
Emily,25953
Michael,32035
Hannah,23080
Matthew,28572
Madison,19967
Joshua,27538
Ashley,17997
Christopher,24931
Sarah,17697
Nicholas,24652
Alexis,17629
Andrew,23639
Samantha,17266

Requirements

Thoughts

When a name is read from a data file, you must be able to handle that the name is already in the list. In such cases, rather than inserting a new item into the linked list, you should add information (the frequency of usage and the appearance count) to the existing item.

Suggested Steps

  1. For your development purposes, rather than accepting command line arguments, use an array of strings defined within the main function. For instance, your code could look like:
    public static void main(String[] args) {
        String[] myArgs = {"/home/gtowell/Public/151/Hw9/NN1990.csv"};
    	if (args.length == 0)
    		args = myArgs;
        ...
    }
    
    This has been discussed previously. Doing something like this will make development far quicker (if only because it will allow the use of the run button in VSC). To test with actual command line input you need only provide input on the command line.
  2. Initially just work with one input file
  3. Write a toString method that prints out the the contents of the linked list. You will use this toString method a lot in the next step.
  4. Set up your system so you can specify the number of lines to use (this number may be hard coded as you will use it only during development). Then while working on getting your linked list in sorted order use only the first 4 names. When your sorted linked list is correct for 4 names, use 6 names, then 8, ... This may seem painful but it is a time honored approach because it works. The idea is that by working with a really small data set you can easily spot (and repair) problems (presumably using the interactive debugger).
  5. Once you have sorting working, expand your class that holds names to provide storage for usage counts and appearance frequency. Also allow for updates of these quantities
  6. Work on the user interaction
  7. Create a system for looking up names
  8. Expand your system to work with multiple input files
  9. Document your code and make the user interactions clear (hopefully better than my sample above).

What to Hand in

Electronic Submissions

Your program will be graded based on how it runs on the department’s Linux server, not how it runs on your computer.

The following steps for submission assume that you created a project named AssignmentN in the directory /home/YOU/cs151/

  1. For this assignment N=9
  2. Put the README file into the project directory
  3. Go to the directory /home/YOU/cs151
  4. Enter /home/gtowell/bin/submit -c 151 -p N -d AssignmentN