credit: Jordan Boyd-Graber
In this homework you’ll get more experience with PyTorch. You will be implementing two classifiers, one that encodes a sentence representation by averaging word vectors and one that encodes a sentence representation using an RNN. You will be downloading tweets directly from Twitter as well.
This assignment is based on a Shared Task at the WNUT 2020 Workshop
Partners: For this homework, you are allowed to work with a partner.
Question 0.1: According to the shared task paper linked above, in your own words, how do they define a tweet that is “informative” about COVID-19 and how do they define a tweet that is “uninformative” about COVID-19? Answer this question in the README.
What you have to do
In hw06/data/
, you should see three csv files. These files contain Twitter IDs and
their corresponding labels. We also provided some starter code in that directory
for you to download the data. You need to update the code by 1) using your own credentials and
2) making sure to catch the specific Exceptions that the API throws. We provide descriptions of
the Exceptions in the code.
I’d recommend looking at the tweepy
documentaiton for what those exceptions are.
Since this will take a while to run, you should run this on the CS lab machines (not directly on
goldengate
though!).
To keep the code running when you sign off, I’d recommend using nohup
. Personally I like to use tmux
or screen
but they are not installed on the lab machines currently.
models.py
.models.py
that uses an RNN encoder.models.py
that uses a BiLSTM encoder.In your README, answer the following questions:
Feel free to add additional/optional feedback, e.g. what did you like or dislike about the assignment?
Submit the following files to the assignment called HW06
on Gradescope:
Make sure to name the python file exactly what we specify here. Otherwise, our autograders might not work and we might have to take points off.