credit: Jordan Boyd-Graber
In this homework you’ll get more experience with pytorch. You will be completing the notebook from Monday (03/13 lab) and implementing a stochastic gradient ascent for logistic regression and you’ll apply it to the task of determining whether documents are talking about hockey or baseball. Sound familiar? It should be!
Indeed, it will be doing exactly the same thing on exactly the same data as the previous homework. The only difference is that while you had to do logistic regression yourself, this time you’ll be able to use Pytorch directly.
What you have to do
Complete the notebook from lab. Fill in the missing code and answer the questions in the notebook.
In the last cells, we look at what words are most similar to father
, king
, africa
, and baltimore
according to the trained CBOW model.
In a new cell, look at the most similar word to any 5 of your choosing. Then, briefly in a new text/markdown cell, describe what you found.
Some of the options in Reading 05 looked at biases in word embeddings. For example, the popular paper called Man is to Computer Programmer as Woman is to
Homemaker? Debiasing Word Embeddings showed that the vector from subtracting woman
from man
was very close to the vector created by subtracting homemaker
from computer programmer
.
Using the man
to woman
analogy, come up with three additional words that might then
show gender-based biases in the word embeddings you trained.
Make sure to create a new code cell where you test the new word. Then, in a new text/markdown cell briefly describe the biases you discovered (if you discovered any).
You can install pytorch on one of the CS lab machines by running:
conda install pytorch=1.13 torchvision -c pytorch
This will install the cpu version of pytorch 1.13
If you run pip install -r requirements.txt
, it will install
the necessary packages.
In your README, answer the following questions:
Feel free to add additional/optional feedback, e.g. what did you like or dislike about the assignment?
Submit the following files to the assignment called HW05
on Gradescope:
lr_pytorch.py
README
(this can be a pdf if you include figures, which are better than text). The README should include a link to your notebook as well.Make sure to name the python file exactly what we specify here. Otherwise, our autograders might not work and we might have to take points off.