credit: Jordan Boyd-Graber
The goal of this assignment is for you to get practice with programming in Python. You will be writing code to determine whether a poem is a limerick or not. To do this, we will be using the CMU pronunciation dictionary (which is covered in the second chapter of the NLTK book. NLTK is a common python library used for text processing.). You will also be using the NLTK library to tokenize text.
Note You will need to install nltk using
pip
and theCMU pronunciation dictionary
. When you runlimerick.py
, you will see instructions on how to install theCMU pronunciation dictionary
in python.
A limerick is defined as a poem with the form AABBA, where the A lines rhyme with each other, the B lines rhyme with each other (and not the A lines). (English professors may disagree with this definition, but that’s what we’re using here to keep it simple. There are also constraints on how many syllables can be in a line.)
Please fill out this short survey. Your answers will help your instructor get to know you and learn what you hope to get out of this course. This will help me modify the course to best fit your interests.
Look at the file limerick.py. Your job is to fill in the missing functions in that file so that it does what it’s supposed to do.
num_syllables
: look at how many syllables a word has. Implement this first. This will help you understand what the CMU dictionary data looks like. You don’t need to use it in implementing subsequent functions, but you can.rhyme
: detect whether two words rhyme or notlimerick
: given a candidate limerick, return whether it meets the constraint or not.
More requirements / information appear in the source file.What does it mean for two words to rhyme? They should share the same sounds in their pronunciation except for their initial consonant sound(s). (This is a very strict definition of rhyming. This makes the assignment easier.) If one word is longer than the other, then the sounds of shorter word (except for its initial consonant cluster) should be a suffix of the sounds of the longer. To further clarify, when we say “one word is longer than the other”, we are using number of phonemes as the metric, not number of syllables or number of characters.
How do I know if my code is working? Run the provided unit tests provided in tests.py. Initially, many of them will fail. If your program is working correctly, all of them will pass. However, the converse is not true: passing all the tests is not sufficient to demonstrate that your code is working. Using tests.py is strongly encouraged, this is similar to how your code will be graded on gradescope.
How do I separate words from a string of text? Use the word_tokenize
function.
What if a word isn’t in the pronouncing dictionary? Assume it doesn’t rhyme with anything and only has one syllable.
How ``hardened’’ should the code be? It should handle ASCII text with punctuation and whitespace in upper or lower case.
What if a word has multiple pronunciations? If a word like “fire” has multiple pronunciations, then you should say that it rhymes with another word if any of the pronunciations rhymes.
What if a word starts with a vowel? Then it has no initial consonant, and then the entire word should be a suffix of the other word.
What about end rhymes? End rhymes are indeed a rhyme, but they make for less interesting limericks. So “conspire” and “fire” do rhyme.
What about stress? The stresses of rhymes should be consistent.
What if a word has no vowels? Then it doesn’t rhyme with anything.
Extra Credit (create new functions for these features; don’t put them in the required functions that will be run by the autograder):
apostrophe_tokenize
that handles apostrophes in words correctly so
that “can’t” would rhyme with “pant” if the is_limerick
function used apostrophe\_tokenize
instead of word_tokenize
.guess_syllables
.limerick.txt
).
Add extra credit code as functions to the LimerickDetector
class, but don’t interfere with required functionality.In a README file, called README.txt
, answer the following questions:
Feel free to add additional/optional feedback, e.g. what did you like or dislike about the assignment?
Submit the following files to the assignment called HW00
on Gradescope:
limerick.py
README.txt
Make sure to name these files exactly what we specify here. Otherwise, our autograders might not work and we might have to take points off.