Skip to main content

Homework 0: Python Warm Up

Deadline: 2023-01-23 23:59:00EDT

credit: Jordan Boyd-Graber

The goal of this assignment is for you to get practice with programming in Python. You will be writing code to determine whether a poem is a limerick or not. To do this, we will be using the CMU pronunciation dictionary (which is covered in the second chapter of the NLTK book. NLTK is a common python library used for text processing.). You will also be using the NLTK library to tokenize text.

Note You will need to install nltk using pip and the CMU pronunciation dictionary. When you run limerick.py, you will see instructions on how to install the CMU pronunciation dictionary in python.

A limerick is defined as a poem with the form AABBA, where the A lines rhyme with each other, the B lines rhyme with each other (and not the A lines). (English professors may disagree with this definition, but that’s what we’re using here to keep it simple. There are also constraints on how many syllables can be in a line.)

Pre-course survey (2 points)

Please fill out this short survey. Your answers will help your instructor get to know you and learn what you hope to get out of this course. This will help me modify the course to best fit your interests.

Programming (40 points)

Look at the file limerick.py. Your job is to fill in the missing functions in that file so that it does what it’s supposed to do.

What does it mean for two words to rhyme? They should share the same sounds in their pronunciation except for their initial consonant sound(s). (This is a very strict definition of rhyming. This makes the assignment easier.) If one word is longer than the other, then the sounds of shorter word (except for its initial consonant cluster) should be a suffix of the sounds of the longer. To further clarify, when we say “one word is longer than the other”, we are using number of phonemes as the metric, not number of syllables or number of characters.

How do I know if my code is working? Run the provided unit tests provided in tests.py. Initially, many of them will fail. If your program is working correctly, all of them will pass. However, the converse is not true: passing all the tests is not sufficient to demonstrate that your code is working. Using tests.py is strongly encouraged, this is similar to how your code will be graded on gradescope.

How do I separate words from a string of text? Use the word_tokenize function.

What if a word isn’t in the pronouncing dictionary? Assume it doesn’t rhyme with anything and only has one syllable.

How ``hardened’’ should the code be? It should handle ASCII text with punctuation and whitespace in upper or lower case.

What if a word has multiple pronunciations? If a word like “fire” has multiple pronunciations, then you should say that it rhymes with another word if any of the pronunciations rhymes.

What if a word starts with a vowel? Then it has no initial consonant, and then the entire word should be a suffix of the other word.

What about end rhymes? End rhymes are indeed a rhyme, but they make for less interesting limericks. So “conspire” and “fire” do rhyme.

What about stress? The stresses of rhymes should be consistent.

What if a word has no vowels? Then it doesn’t rhyme with anything.

Extra Credit

Extra Credit (create new functions for these features; don’t put them in the required functions that will be run by the autograder):

Feedback (3 points)

In a README file, called README.txt, answer the following questions:

  1. How long did you spend on the assignment?
  2. What did you learn by doing this assignment?
  3. Briefly describe a Computational Text Analysis/Text as Data research question where it would be useful to be able to identify if a poem is a limerick or not (keep this answer to a maximum of 3 sentences).

Feel free to add additional/optional feedback, e.g. what did you like or dislike about the assignment?

Submitting

Submit the following files to the assignment called HW00 on Gradescope:

  1. limerick.py
  2. README.txt

Make sure to name these files exactly what we specify here. Otherwise, our autograders might not work and we might have to take points off.