Bryn Mawr College
CS 325: Computational Linguistics - Fall 2024
Assignment#4
Due before class on Wednesday, November 6
Description:
First, if you have not yet done it, do the Stochastic Tagging Lab.
Part 1. Using the tagged Brown Corpus in NLTK do the following:
- What are the 12 most frequent tags?
- For each of the tags in the tag set, find out what the 10 most frequent words are.
Do this ONLY with Universal tag set. Use the "news" section of the corpus only.
Part 2. Using the tagging methods in NLTK presented in class:
- Build a good tagging system for tagging any tokenized text.
You may use any of the N-Gram based taggers and the cascading option. Provide accuracy for your tagger on at least two different texts: pick your own pre-tagged texts from the corpus different from the ones used in earlier class examples. Do the accuracy by (1) Using the entire text as training and test, and (2) Using 90% of the text as training and 10% as test.
- For the provided test texts (see below), show the output of the taggers (after training them on a full existing NLTK tagged corpus). Highlight some of the words that were mistagged. Compare the output of the tagger on test texts against NLTK's HMM tagger (trained on a selected corpus) and on NLTK's tagger (nltk.pos_tag()).
- Test Texts: Test your trained tagging system on the two texts: Animals.txt and CS.txt.
Notes
- You can use the nltk tokenizers, if needed. For stochastic taggers it is a good idea to tag one sentence at a time (i.e. sentence boundaries are treated as new contexts.). You may want to use a combination of hand/program-based tokenization, if necessary (especially for small test texts provided). Run your program on the texts provided.
- Work incrementally to accomplish the task.
- Try and document your thought process at each step.
- Once done, summarize the process by which you arrived at the final solution in the Report section of your Colab Notebook.
- The Summary section should also contain the outcome of your analyses of outputs as specified above. Finally, conclude the section with your own reflections on the exercise, the process, and how you arrived at the solution(s).
WHAT TO HAND IN
Once completed, send/share the link to your Notebook with the instructor via e-mail. To do this, click on the "Share" icon/button (see top right of window), in the pop-up window, change the access to "Anyone with link", copy the link and paste into the e-mail.
Back to CS325 home page.