Bryn Mawr College
CS 325: Computational Linguistics
Instructor: Deepak Kumar, 246B Park Hall, 526-7485
E-Mail: dkumar at cs brynmawr dot edu
Lecture Hours: Tuesdays & Thursdays, 2:15a to 3:45a
- Computer Science Lab Room 231 (Science Building)
Texts & Software
Speech and Language Processing: An Introduction to Natural Language
Processing, Computational Linguistics, and Speech Recognition, Second Edition, by Daniel
Jurafsky and James Martin, Prentice Hall Publishers, 2008.
August 30 : First lecture
September 29: Exam 1
November 8: Exam 2
December 8: Last lecture/Exam 3
- Homework (Due on Tuesday, September 6): Pick one of the language processing applications discussed in class and write a short, critical, reflective essay (a "blog" piece) on it. Doesn't have to be technical, but more reflective of the impact of such technology on the society, etc. 1-2 pages max.
- Assignment#2 (Due in clas on Thursday, September 22): Click here for details.
- Assignement#3 is posted (Due on Thursday, September 29, 2011): Click here for details.
- Assignment#4 is posted (Due on Tuesday, October 25): Click here for details.
- Assignment#5 is posted (Due on Thursday, November 3): Click here for details.
- Assignment#6 is posted (Due on Tuesday, November 29): Click here for details.
- Week 1 (August 30, September 1)
August 30: Course Introduction. Overview of topics: Words, syntax,
semantics, discourse, etc.
September 1: Language structure, Formal models, applications. Linguistic knowledge: words, syntax, semantics, pragmatics, discourse. Formal models: regular languages, context free grammars, probabilistic models, logic. Applications: Language translation (Google Translate), Sentiment analysis (Truthy), Question answering (Watson), Word lens (iPhone/iPad app), Speech to text systems (Dragon Dictate). The classic pipeline model for linguistic processing. The ambiguity roadblock.
Read: Chapter 1 from Jurafsky & Martin.
Homework (Due on Tuesday, September 6): Pick one of the language processing applications discussed in class and write a short, critical, reflective essay (a "blog" piece) on it. Doesn't have to be technical, but more reflective of the impact of such technology on the society, etc. 1-2 pages max. Use your 'background and preparation' to set the tone/judgements/etc.
- Week 2 (September 6, 8)
September 6: Regular Expressions: for searching and specifying languages.
Basic elements of regular expressions: expressions, anchors, counters,
operator precedence, substitution, memory, examples.
Read: Chapter 2 from Jurafsky & Martin.
Do: Search in Google for "Microsoft Word regular expression search" and look for a link to a page at office.microsoft.com site on using regular expressions in Word. Follow the tutorial and learn how to use regular expressions in Word and note the little differences in the the use and specification of patterns in Word vs how we did them in class.
September 8: Putting regular expressions to work. The Python re library. Introduction to NLTK. How to acquire text, text corpora, and web pages.
Assignment#2 (Due in clas on Thursday, September 22): Click here for details.
- Week 3 (September 13, 15)
September 13: No class today as Deepak is out of town (U. Kansas). But we will use a blended learning exercise: Read and do the Python/NLTK tutorials: Part 1, Part 2.
September 15: No class today as Deepak is out of town (Denver, CO). But we will use a blended learning exercise: Read and do the Python/NLTK tutorials: Part 1, Part 2.
Work on your Assignment#2 this week. Deepak will have limited e-mail connectivity but feel free to write for quick clarifications and questions. Please put the string "CS325" in your e-mail's subject header.
- Week 4 (September 20, 22)
September 20: Finite state automata, deterministic and non-deterministic FSAs. The equivalence of deterministic and non-deterministic FSAs. Formal Languages. The equivalence between regular expressions, regular languages, and finite state automata.
Read: Chapter 2 from Jurafsky & Martin.
September 22: Morphology: Rules for inflectional and derviations morphology. Agreement: number, tense, gender, etc. Morphological parsing.
Read: Chapter 3 from Jurafsky & Martin.
Assignement#3 is posted (Due on Thursday, September 29, 2011): Click here for details.
- Week 5 (September 27, 29)
September 27: Morphological Parsing. Orthographic rules. Finite State Transducers.
Solutions to Assignment#2 are posted. Click here.
September 29: Exam 1 is today.
- Week 6 (October 4, 6)
October 4: Word and sentence segmentation. MaxMatch algorithm for wordsegmentation (!!). Hashtags and other applications of word segmentation. Doing emipirical tests of the goodness of CL algorithms.
October 6: Measuring the effectiveness of maxmatch on hashtag segmentation. MinimumEdit Distance and spelling correction. Non-deterministic word segmentation.
Programs: minEditDistance, wordsegmentation
- Week 7 (October 11, 13)
No classes, Fall Break!!
- Week 9 (October 18, 20)
October 18: Parts of speech, POS tagging. Tagsets: Penn TreeBank, C5, Brown. Approaches to POS tagging: rule-based, stochastic.
Watch: Grammar Rock Videos to review basic parts of speech.
Read: Chapter 5 from Jurafsky & Martin.
October 20: Part of Speech (POS) Tagging. Accessing and working with NLTK corpora.
Read/Do: Python for Linguists, Part3
Assignment#4 is posted (Due on Tuesday, October 25): Click here for details.
- Week 10 (October 25, 27)
October 25: Part-of-speech tagging, contd. Default, Regular Expression-based tagging, Ngram taggers. Cascading taggers. Evaluating accruacy of taggers.
Solution hints for Assignment#4: web browser, your iPad.
October 27: HMM POS tagging.
Read: Chapter 5 from Jurafsky & Martin.
Read/Do: Python for Linguists, Part4 (Tagging)
Assignment#5 is posted (Due on Thursday, November 3): Click here for details.
- Week 11 (November 1, 3)
November 1: Syntax: Grammars for sentence level constructs. Constituents, common sentence level constructs: declarative, imperative, Yes-No Questions, Wh-Questions. Context Free Grammars.
Read: Chapter 12 from Jurafsky & Martin
November 3: Context Free Grammars. Issues in using CFGs: Modifiers in Noun groups, Verbs, number and person agreement, verb subcategorization, auxilliaries, etc. Grammar Equivalence, Chomsky Normal Form.
Solution hints for Assignment#5: Click here.
- Week 12 (November 8, 10)
November 8: Exam 2 is today.
November 10: Parsing. Top Down and Bottom-up parsing. Parsing algorithms/models: RTN's and ATNs, recursive descent, shift-reduce, CYK, and Earley.
Read: Chapter 13 from Jurafsky & Martin
- Week 13 (November 15, 17)
November 15: Parsing Algorithms: Top-down, bottom-up. Chart parsing, Earley Algorithm.
Read/Do: Python for Linguists, Part5 (Parsing)
Assignment#6 is posted (Due on Tuesday, November 29): Click here for details.
November 17: Language & Complexity. Chomsky Hierarchy. Pumpimg Lemma. Semantics. Meaning representations. First-order Predicate Calculus.
Read: Chapters 16 & 17.
- Week 14 (November 22, 24)
November 22: FOPC, contd. Garden Path sentences.
November 24: Happy Thanksgiving!!
Week 15 (November 29, December 1)
November 29: Semantics. Syntax driven semantic analysis. Lambda reductions. Augmented CFGs. Examples. Quantifier scoping.
Read: Chapter 18.
December 1: No class today. Deepak is out of town (at Purdue University).
Watch: IBM Watson takes on Champions of jeopardy! Day 1 (Part1, Part2), Day2 (Part1, Part2), Day 3 (Part1, Part2)
Week 15 (December 6, 8)
December 6: Course Wrap up. Building Watson: A Discussion. Watson vs. Apple Siri.
Building Watson: An Overview of the DeepQA Project, AI Magazine, Fall 2010.
December 8: Exam 3 is today.
All graded work will receive a grade, 4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7,
1.3, 1.0, or 0.0. At the end of the semester, final grades will be calculated
as a weighted average of all grades according to the following weights:
Exam 1: 15%
Exam 2: 15%
Exam 3: 15%
Labs & Written Work: 55%
Text's Home Page (Jurafsky & Martin)
The Association for Computational Linguistics (ACL)
Computer Q&A demo
An online version
NLTK Home page
NLTK LITE Tutorials
NLTK LITE API Documentation
Created by firstname.lastname@example.org on
August 11, 2011.