CS 151 - Introduction to Data Structures
Linked lists of baby names
In this assignment you will read, store and merge one or more csv files containing 2000 of the most common baby names by year. Your program must store the baby names in a linked list that is sorted alphabetically.
Once the files are read, the program needs to be able to look up a name and report the following statistics:
- name whose stats are reported
- number - the total number of babies given that name for all years (this is "freq" in the listing below)
- The number of times in which the user supplied name appears in the lists over all years. For many common names this will be exactly the number years of data scanned. Uncommon names may appear only once or twice over all of the years. Some names may appear twice in a given year so the number of times the name appears can be greater than the number of years. (Shown as "uses" below). In the list for a given year, this may be 0, 1 or 2.
- Alphabetical rank of the user-supplied from among the names seen in all years read. For instance, if the complete list of names, in alphabetic order is (A D G K L) then the alphabetic position of G is 3. (Shown as Pos below.)
Your program should run based on input from the command line and interactive user queries. For example assuming that the class Main contains the main method to be run:
UNIX> java Main 2000 2001 2002
Name: (q to quit) aaron
Aaron
Uses:3
Freq:28089
Pos:1
Name: (q to quit) SYDNEY
sydney
Uses:3
Freq:28913
Pos:1911
Name: (q to quit) SiDNEY
sidney
Uses:6
Freq:4135
Pos:1868
Name: (q to quit) SYDny
sydny not found
Name: (q to quit) Marlen
marlen
Uses:1
Freq:212
Pos:1468
The above formatting is meant as an example; not a requirement. Formatting of the output is up to you. That said, the output must contain the required information.
Input Files
Input files are available at
https://cs.brynmawr.edu/cs151/Data/Hw9/NNYEAR.csv
They are also available via scp at
/home/gtowell/Public/151/Data/Hw9/NNYEAR.csv
where YEAR is a number in the range 1990-2020. For instance, a file is NN2000.csv.
These files contain lines in the following format:
name,number
where the comma-separated fields have the following meanings:
- name: a name;
- number: a number of times name was used in that year -- note that the name may appear more than once in a given file, so this number is not necessarily the number of times the name was used in the year.
For instance, this is the start of one of the data files.
Jacob,34471
Emily,25953
Michael,32035
Hannah,23080
Matthew,28572
Madison,19967
Joshua,27538
Ashley,17997
Christopher,24931
Sarah,17697
Nicholas,24652
Alexis,17629
Andrew,23639
Samantha,17266
Requirements
- All data about individual names must be kept in a linked list.
- The linked list must be kept in alphabetically sorted order by name, case insensitive. The linked list must be in sorted order at all times.
- The linked lists you use should be from scratch; do not use Java's built-in LinkedList class. You may use any code discussed in class or lab or in the textbook. Your linked lists may be either singly or doubly linked. (you may use, in fact are encouraged to use, the linked list you worked on in lab 9.
- The year number(s) to be read must be given (giveable) on the command line as illustrated above.
- The program must use a class of your design that stores all the relevant stats for a particular name.
- Your name holding class must implement the Comparable interface.
- Your linked list class must use the fact that the data items implement the Comparable interface to keep the linked items in sorted order.
- The files to be searched may only be read once.
- The program should never crash. It is OK for the program to choose to die on unexpected input or other issues. However, that should be a decision you, as the programmer made, rather than something that just happens. Use good exception handling. (For instance, suppose on the command line you give the name of a file that does not exist.)
- The interactive component of the program should have a text interface akin to that above. You need not reproduce my format precisely. However, you must have all of the data shown.
- All code should be well-commented and follow the standard style guidelines
- Your linked list implementation must be completely generic. It should, without any code revision, be able to (for example) build a sorted list of Rabbits. (Note that the class(es) that use the linked list will certainly need a lot of code specific to handling the baby name problem. However, this should not affect your linked list code.)
- User interactions should be case insensitive
- You must have a graceful way for the user to quit the program.
- You must show the performance of your program in a "script" file that shows the output of your program on data for the years 1995-2008 (inclusive). The script file should show user interaction for the following names: Lily, Grace, DeVon, Brendan, devin. Be sure to use capitalization as it is here.
Thoughts
When a name is read from a data file, you must be able to handle that the name is already in the list. In such cases, rather than inserting a new item into the linked list, you should add information (the frequency of usage and the appearance count) to the existing item.
Suggested Steps
- For your development purposes, rather than accepting command line arguments, use an array of strings defined within the main function. For instance, your code could look like:
public static void main(String[] args) {
String[] myArgs = {"/home/gtowell/Public/151/Hw9/NN1990.csv"};
if (args.length == 0)
args = myArgs;
...
}
This has been discussed previously. Doing something like this will make development far quicker (if only because it will allow the use of the run button in VSC). To test with actual command line input you need only provide input on the command line.
- Initially just work with one input file
- Write a toString method that prints out the the contents of the linked list. You will use this toString method a lot in the next step.
- Set up your system so you can specify the number of lines to use (this number may be hard coded as you will use it only during development). Then while working on getting your linked list in sorted order use only the first 4 names. When your sorted linked list is correct for 4 names, use 6 names, then 8, ... This may seem painful but it is a time honored approach because it works. The idea is that by working with a really small data set you can easily spot (and repair) problems (presumably using the interactive debugger).
- Once you have sorting working, expand your class that holds names to provide storage for usage counts and appearance frequency. Also allow for updates of these quantities
- Work on the user interaction
- Create a system for looking up names
- Expand your system to work with multiple input files
- Document your code and make the user interactions clear (hopefully better than my sample above).
What to Hand in
- All code, well, documented
- The script file containing the interaction as described in the requirements section
- Readme file with the standard contents
- DO NOT INCLUDE: baby name data files
Electronic Submissions
Your program will be graded based on how it runs on the
department’s Linux server, not how it runs on your computer.
The following steps for submission assume that you created a project named AssignmentN in the directory /home/YOU/cs151/
- For this assignment N=9
- Put the README file into the project directory
- Go to the directory /home/YOU/cs151
- Enter /home/gtowell/bin/submit -c 151 -p N -d AssignmentN