Warning: this assignment is out of date. It may still need to be updated for this year's class. Check with your instructor before you start working on this assignment.

Homework 6: DAN & RNNs

Deadline:

credit: Jordan Boyd-Graber

In this homework you’ll get more experience with PyTorch. You will be implementing two classifiers, one that encodes a sentence representation by averaging word vectors and one that encodes a sentence representation using an RNN. You will be downloading tweets directly from Twitter as well.

This assignment is based on a Shared Task at the WNUT 2020 Workshop

Partners: For this homework, you are allowed to work with a partner.

Question 0.1: According to the shared task paper linked above, in your own words, how do they define a tweet that is “informative” about COVID-19 and how do they define a tweet that is “uninformative” about COVID-19? Answer this question in the README.

What you have to do

1. Download Tweets

In hw06/data/, you should see three csv files. These files contain Twitter IDs and their corresponding labels. We also provided some starter code in that directory for you to download the data. You need to update the code by 1) using your own credentials and 2) making sure to catch the specific Exceptions that the API throws. We provide descriptions of the Exceptions in the code.

I’d recommend looking at the tweepy documentaiton for what those exceptions are.

Since this will take a while to run, you should run this on the CS lab machines (not directly on goldengate though!).

To keep the code running when you sign off, I’d recommend using nohup. Personally I like to use tmux or screen but they are not installed on the lab machines currently.

2. Implement a DAN model in `models.py`.

2. Implement a classifier in `models.py` that uses an RNN encoder.

2. Implement a classifier in `models.py` that uses a BiLSTM encoder.

Feedback (3 points)

In your README, answer the following questions:

How long did you spend on the assignment?
What did you learn by doing this assignment?
Briefly describe a Computational Text Analysis/Text as Data research question where using LDA would be useful (keep this answer to a maximum of 3 sentences).

Feel free to add additional/optional feedback, e.g. what did you like or dislike about the assignment?

Submitting

Submit the following files to the assignment called HW06 on Gradescope:

Make sure to name the python file exactly what we specify here. Otherwise, our autograders might not work and we might have to take points off.