Bryn Mawr College
CS 325: Computational Linguistics
Lab Assignment#2
Due in class on Thursday, September 29. 2011

Description: Write a program to access the contents of a given text and then analyze it as follows:

Notes

  1. You should run the program on two texts of your chosing. For instance, you can get electronic texts from Project Gutenberg's website.
  2. In order to test your program, create a small text file of your own. Run the programs on the larger texts only after making sure that your program is complete and correct.
  3. You will need to eliminate the 'added' text downloded from Project Gutenberg.
  4. What is a word? Think about this before you do anything. Arrive at a decision and write it down. Then encode it in your program. Same for what is a sentence...
  5. Work incrementally to accomplish the task.
  6. Remember that in this domain, the problems generally tend to be ill-defined and solutions also tend to be imperfect.
  7. This exercise is designed to help you face with the above reality and yet explore and come up with your own solution(s) to solving the problem. In this particular instance though, you can get help from your text or other sources.
  8. Try and document your thought process at each step.
  9. Once done, write down the process by which you arrived at the final solution.
  10. Hand in a report containing the outcome of your analyses of the two texts, your well commented program(s), and a sample output. Also, write a final section on your own reflections on the exercise, the process, and how you arrived at the solution(s). Is your solution general enough? For example, would it be able to extract the same information from another similar source? What changes/modifications would you require for another source?

Back to CS325 home page.