Python for Linguists
A Gentle Introduction to the Python Language
By Deepak Kumar

Part 5: Parsing

 

Top Down Parsing: Recursive Descent

In NLTK, you can learn the process of recursive descent parsing by first experimenting with the built-in graphical application:

>>> import nltk
 >>> nltk.app.rdparser()

A window like the one shown below will pop up (it might be behind your IDLE window):

You can see the CFG that is defined on the left. And on the right, at the bottom, it is showing the input sentence that is to be parsed. Next, using the "Step" or "Autostep" buttons, you can watch the entire recursive descent parse.

First, try the above parser with a few different sentences, including ones that may fail on the given grammar. To enter your own sentence, select the "Edit Sentence" option from the "Edit" menu. Do this a few times.

Next, you can try and define a new grammar (or modify the existing one). To, select the "Edit Grammar" option from the "Edit" menu.

You can also use the same parser in your Python programs as follows:

First, let us define a grammar of your own as shown below:

# A simple CFG
>>> grammar1 = nltk.parse_cfg(
   """
   S -> NP VP
   NP -> Det N | Det N PP | "I"
   VP -> V | V NP | V NP PP
   PP -> P NP
   V -> "saw" | "ate"
   Det -> "a" | "the"
   N -> "man" | "dog" |"telescope" | "park"
   P -> "in" | "under" | "with"
   """)

Next, define or instantiate the parser with this grammar as follows:

>>> parser = nltk.RecursiveDescentParser(grammar1)

Let us now create a test sentence and tokenize it:

>>> sent = "the dog saw a man in the park".split()
>>> sent
['the', 'dog', 'saw', 'a', 'man', 'in', 'the', 'park']

To get a parse from the parser that uses grammar1, do the following:

>>> parser.parse(sent)
Tree('S', 
[Tree('NP', [Tree('Det', ['the']), Tree('N', ['dog'])]),
Tree('VP', [Tree('V', ['saw']),
Tree('NP', [Tree('Det', ['a']), Tree('N', ['man']),
Tree('PP', [Tree('P', ['in']),
Tree('NP', [Tree('Det', ['the']), Tree('N', ['park'])])])])])])

In order to better see the structure returned, you can use the print command:

>>> print parser.parse(sent)
(S
 (NP (Det the) (N dog))
 (VP
 (V saw)
 (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park))))))

Draw the above tree and confirm the parse.

The sentence we are using actually has multiple parses. The recursive descent parser can provide with any number of possible parses:

>>> for p in parser.nbest_parse(sent):
print p
(S (NP (Det the) (N dog)) (VP (V saw) (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park)))))) (S (NP (Det the) (N dog)) (VP (V saw) (NP (Det a) (N man)) (PP (P in) (NP (Det the) (N park)))))

Above we can see that there are two possible parses: The first one associates the preposition phrase ("in the park") with the object noun phrase (i.e. "a man in the park"); the second parse associates the prepositional phrase with the verb (that "saw" event took place in the park).

Bottom-up Parsing

In NLTK, you can learn the process of shift-reduce parsing by first experimenting with the built-in graphical application:

>>> import nltk
>>> nltk.app.srparser()

A window like the one shown below will pop up (it might be behind your IDLE window):

You can see the CFG that is defined on the left. And on the right, at the top, it is showing the input sentence that is to be parsed. Next, using the "Step" button, you can watch the entire parse.

This parser will most likely get stuck due to the choices it makes. You can backtrack and "help" it by chosing the "shift" or "Reduce" options to get a complete parse. This will definitely help you learn the parse process. SR parsers require conflict resolution strategies that are not implemented in this version.

First, try the above parser with a few different sentences, including ones that may fail on the given grammar. To enter your own sentence, select the "Edit Sentence" option from the "Edit" menu. Do this a few times.

Next, you can try and define a new grammar (or modify the existing one). To, select the "Edit Grammar" option from the "Edit" menu.

You can also use the same parser in your Python programs as follows:

parser = nltk.ShiftReduceParser(grammar1)

As in the case of recursive descent parser, the funcstions "parse" and "nbest_parse" are both available.

Chart Parsing: Bottom-up & Top-down

NLTK's chart parser can use the chart to do bottom-up as well as top-down processing. Once again, as above, first we will use the demo version of the bottom-up chart parser (tthe "2" in the first argument indicates that it is for bottom-up; you would specify "1" for a top-down chart parsing (see below).

>>> nltk.parse.chart.demo(2, should_print_times=False, trace=1, sent='I saw John with a dog', numparses=2)
 * Sentence:
 I saw John with a dog
 ['I', 'saw', 'John', 'with', 'a', 'dog']
* Strategy: Bottom-up
   |. I . saw . John . with . a . dog .|
   |[------] . . . . .| [0:1] 'I'
   |. [------] . . . .| [1:2] 'saw'
   |. . [------] . . .| [2:3] 'John'
   |. . . [------] . .| [3:4] 'with'
   |. . . . [------] .| [4:5] 'a'
   |. . . . . [------]| [5:6] 'dog'
   |> . . . . . .| [0:0] NP -> * 'I'
   |[------] . . . . .| [0:1] NP -> 'I' *
   |> . . . . . .| [0:0] S -> * NP VP
   |> . . . . . .| [0:0] NP -> * NP PP
   |[------> . . . . .| [0:1] S -> NP * VP
   |[------> . . . . .| [0:1] NP -> NP * PP
   |. > . . . . .| [1:1] Verb -> * 'saw'
   |. [------] . . . .| [1:2] Verb -> 'saw' *
   |. > . . . . .| [1:1] VP -> * Verb NP
   |. > . . . . .| [1:1] VP -> * Verb
   |. [------> . . . .| [1:2] VP -> Verb * NP
   |. [------] . . . .| [1:2] VP -> Verb *
   |. > . . . . .| [1:1] VP -> * VP PP
   |[-------------] . . . .| [0:2] S -> NP VP *
   |. [------> . . . .| [1:2] VP -> VP * PP
   |. . > . . . .| [2:2] NP -> * 'John'
   |. . [------] . . .| [2:3] NP -> 'John' *
   |. . > . . . .| [2:2] S -> * NP VP
   |. . > . . . .| [2:2] NP -> * NP PP
   |. [-------------] . . .| [1:3] VP -> Verb NP *
   |. . [------> . . .| [2:3] S -> NP * VP
   |. . [------> . . .| [2:3] NP -> NP * PP
   |[--------------------] . . .| [0:3] S -> NP VP *
   |. [-------------> . . .| [1:3] VP -> VP * PP
   |. . . > . . .| [3:3] PP -> * 'with' NP
   |. . . > . . .| [3:3] Prep -> * 'with'
   |. . . [------> . .| [3:4] PP -> 'with' * NP
   |. . . [------] . .| [3:4] Prep -> 'with' *
   |. . . . > . .| [4:4] Det -> * 'a'
   |. . . . [------] .| [4:5] Det -> 'a' *
   |. . . . > . .| [4:4] NP -> * Det Noun
   |. . . . [------> .| [4:5] NP -> Det * Noun
   |. . . . . > .| [5:5] Noun -> * 'dog'
   |. . . . . [------]| [5:6] Noun -> 'dog' *
   |. . . . [-------------]| [4:6] NP -> Det Noun *
   |. . . . > . .| [4:4] S -> * NP VP
   |. . . . > . .| [4:4] NP -> * NP PP
   |. . . [--------------------]| [3:6] PP -> 'with' NP *
   |. . . . [------------->| [4:6] S -> NP * VP
   |. . . . [------------->| [4:6] NP -> NP * PP
   |. . [---------------------------]| [2:6] NP -> NP PP *
   |. [----------------------------------]| [1:6] VP -> VP PP *
   |[=========================================]| [0:6] S -> NP VP *
   |. [---------------------------------->| [1:6] VP -> VP * PP
   |. [----------------------------------]| [1:6] VP -> Verb NP *
   |. . [--------------------------->| [2:6] S -> NP * VP
   |. . [--------------------------->| [2:6] NP -> NP * PP
   |[=========================================]| [0:6] S -> NP VP *
   |. [---------------------------------->| [1:6] VP -> VP * PP
 Nr edges in chart: 53
(S
   (NP I)
   (VP (VP (Verb saw) (NP John)) (PP with (NP (Det a) (Noun dog)))))
(S
   (NP I)
   (VP (Verb saw) (NP (NP John) (PP with (NP (Det a) (Noun dog))))))

The trace is useful to see if and allows you to peek inside the parse process. Since there isn't yet a chart visualizer, this is the best it can do. Try the above command for doing top-down parsing:

>>> nltk.parse.chart.demo(1, should_print_times=False, sent='I saw John with a dog', numparses=2)

Next, to instantiate your own bottom-up or top-down parser with a given grammer, you do the following:

First, as above, create the parser with the given grammer (we will use grammar1 from above):

parser = nltk.parse.BottomUpChartParser(grammar1)

Next, parse the sentence:

>>> print parser.parse(sent)
(S
 (NP (Det the) (N dog))
 (VP
   (V saw)
   (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park))))))
 
>>> for p in parser.nbest_parse(sent):
       print p
(S
   (NP (Det the) (N dog))
   (VP
     (V saw)
     (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park))))))
(S
   (NP (Det the) (N dog))
   (VP
     (V saw)
     (NP (Det a) (N man))
     (PP (P in) (NP (Det the) (N park)))))
 

As you can see from above, you get the same parses.

Chart Parsing: Top-Down

To create a top-down chart parser, you do the following:

>>> parser = nltk.parse.TopDownChartParser(grammar1)
>>> print parser.parse(sent)
(S
 (NP (Det the) (N dog))
 (VP
   (V saw)
   (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park))))))

>>> for p in parser.nbest_parse(sent):
      print p
(S
   (NP (Det the) (N dog))
   (VP
     (V saw)
     (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park))))))
(S
   (NP (Det the) (N dog))
   (VP
     (V saw)
     (NP (Det a) (N man))
     (PP (P in) (NP (Det the) (N park)))))

Earley Chart Parsing

Similarly, in case you want to use the Earley algorithm:

>>> parser = nltk.parse.EarleyChartParser(grammar1)
>>> print parser.parse(sent)
(S
 (NP (Det the) (N dog))
 (VP
 (V saw)
 (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park))))))

>>> for p in parser.nbest_parse(sent): print p
 (S
   (NP (Det the) (N dog))
   (VP
     (V saw)
     (NP (Det a) (N man) (PP (P in) (NP (Det the) (N park))))))
 (S
   (NP (Det the) (N dog))
   (VP
     (V saw)
     (NP (Det a) (N man))
     (PP (P in) (NP (Det the) (N park)))))

Python for Linguists, Part 6... coming soon.