Homework 4: Putting data into databases

Part 1

Take the data you collected for Homework 3 and put it into both PostgreSQL and Mongo. (You may change your data from Homework 3 if you decide that you are already hate your data.) When putting the data into these databases you should consider how your data will be used (see, for instance, homework 5). Further consider how that data can be best represented to take advantage of the natural capabilities of each of these two databases. In homework 3, I asked you to give a relational design for you dataset. You can stick with that design, or change it as you see fit. (If you change, be sure that at least one table, in postgreSQL, expresses a many-to-many relationship.

In PostgreSQL, set up your tables using primary keys, foreign keys, etc with as many constraints as makes sense for your data. (Be gratuitous with constraints.) This will likely involve the interaction of many tables. Do not create a new database; rather put your tables into your default database.

In Mongo, create a new database with your name. Depending on your design and your data, you may have only a single collection (where in PostgreSQL you are required to have multiple tables). You should almost certainly not duplicate your PostGreSQL table structure in Mongo.

In both PostgreSQL and Mongo, create one or more indices on non-primary key attributes to support efficient querying. (Given the probable size of your databases, indices are not really important. Do it anyway.)

With that done write a brief report about the structure of your databases. The report should include:

  1. A schema diagram showing the actual tables you created in PostGreSQL and how those tables interact.
  2. Something similar to the schema diagram, but showing the JSON structure of your Mongo collection(s)
  3. 1-3 pages of text detailing the choices you made in bringing the data into PostGreSQL and Mongo and an explanation of why you made those choices. This report will likely refer extensively to the PostgreSQL and Mongo diagrams. The diagrams should be included in the report rather than being separate files. The diagrams do not count against (or towards) the textual page count of this report.

Part 2

Write a set of at least 3 sample queries, and apply them to both Mongo and PostGreSQL. The queries should retrieve identical, or very close to identical, data from the two databases. At least one of your queries should make use of at least one of the indices you created. Put your queries into file types appropriate to the two databases (ie .sql for PostGreSQL and .js for Mongo.) Separately write a brief description of what the query does and, for any query that uses an index, how and why it uses that index. (You may put these descriptions as an appendix at the end of the document you write for part 1. If you include this text in the part 1 report it would not affect the part 1 textual page count.)

What to hand in

The short answer is everything you did -- except the data -- plus a text file, named README, listing all of the files you are including with your submission and what those files contain. Here is a list of the files I might expect:

How to submit

Use the process for submission from HW1 except put your document(s) in a directory named, for example, hw4. Then go to the directory that contains your hw4 directory, and execute
    /home/gtowell/bin/submit -c 383 -d hw4 -p 4 
There may be other versions of submit on the servers. Be sure to use mine. The arguments to submit have the following meanings: You may submit as many times as you like. I will onl look at your final submission unless you tell me otherwise.