Python for Linguists
A Gentle Introduction to the Python Language
By Deepak Kumar

Part 1: Python Basics

This short tutorial provides a gentle introduction to the Python language. We will focus primarily on features of the language particularly useful for processing in the domain of computational linguistics. In addition to the standard Python language, we will also make use of the Python-based toolkit, NLTK which has several useful libraries for doing computational linguistics.

Using Python & NLTK

You will need to have a current version of Python installed on some computer (current distribution of NLTK works on Python 2.7). It is already installed on the Computer Science servers. You can also get your copy of Python for the computer you own or have access to. Installation instructions for the complete package are available at the NLTK site. These include getting a copy of Python and its installation.

Once installed, you can run Python either form the command line (on Linux) or by using the integrated development environment, idle (recommended) which is available in all installations. To start python from the command line on Linux, just enter (in our examples, whatever you type will be shown in red and the system's responses will appear in blue):

python

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>

The >>> is a Python prompt where you can enter any Python expression.

On linux, to start idle, enter

idle

Or in Windows or MACOS fire up the idle application by double-clicking on it. You will get a Python Shell window whose contents will be as shown below:

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>

The >>> is a Python prompt where you can enter any Python expression.

Basic expressions and data types

The simplest expression you can enter is a value. For example:

>>> 42
42

When you enter a number, it evaluates to itself and its value (42) is returned. Python can handle several different types of values. The number 42 is called an integer value. You can also have fractional values which are represented as floating point values:

>>> 2.5
2.5

For our purposes, another useful type of value is a string. Any sequence of letters enclosed in single-quotes (') or double quotes(") is a string. For example:

>>> 'Hello'
'Hello'

>>> 'Computational Linguistics'
'Computational Linguistics'

Values can be operated upon by various operators. For example, you can add, subtract, multiply, and divide numbers:

>>> 2 + 2
4
>>>
34 - 11
23
>>>
3.14159 * 100000
314159.0
>>>
7//3
2
>>>
7/3
2.3333333333333335

The results are more or less as expected except perhaps in the case above where you divide 7//3 which returned the answer 2. // is the integer division operator in Python. Thus 7//2 gives 2.

Some useful operations you can perform on strings are discussed next. You can concatenate two strings to get a single string using the '+' operator:

>>> 'Computational' + 'Linguistics'
'ComputationalLinguistics'
>>>
'Computational' + ' Linguistics'
'Computational Linguistics'

Notice that a blank (or a space, ' ') is also a character. Thus you have to specifically included it if needed. Another useful operation on a string is the len function:

>>> len('Hello')
5
>>>
len('Computational Linguistics')
25

The len function returns the length of the string (the number of characters that make up the string).

Variables

You can store values by naming them as variables. For example:

>>> pi = 3.14159
>>> pi * 2
6.2831799999999998
>>>
title = 'Computational Linguistics'
>>> course = 'CS325'
>>> course + ' ' + title
'CS325 Computational Linguistics'
>>>
pi * 2
6.2831799999999998
>>>

Variables are useful to store values you may need over the course of processing. Once a variable has been assigned a value, it can be used in any expression (as shown above). It is a good idea to use variable names that signify the nature of the the value they hold. For instance, it would be really confusing to do the following:

>>> president = 'Computational Linguistics''

That is, using the variable name president to store the title of a course. Python leaves these choices up to you and hence you have to make judicious use of the variable names you use so as not to confuse yourself (or anyone else who may be reading your python expressions).

You can also access individual elements of a string by indexing. For example:

>>> title = 'Computational Linguistics'
>>> course = 'CS325'
>>> title[0]
'C'
>>>
course[2]
'3'

Notice that index numbering begins with the number 0 (the left-most character in a string). Thus course[2] refers to the third character (i.e. 3). Also, look carefully at the value returned by the indexing operator: it is also a string. Thus, course[2] is a string '3' and not the number 3. Using indexing, you can also take substrings:

>>> title[0:5]
'Compu'

>>> title[5:13]
'tational'

>>> title[:13]
'Computational'

>>> title[14:]
'Linguistics'

Study the above examples carefully and try some of your own to make sure you understand how indexing works.

Lists

Values in Python can be combined into aggregates called lists. Lists are a sequential arrangement of values. They are written enclosed in square brackets, [...], with each value separated by a comma (,). For example:

>>> [42, 314, 75]
[42, 314, 75]

>>> FallCourses = ['CS109', 'CS206', 'CS231', 'CS240', 'CS223', 'CS325']

>>> FallCourses
['CS109', 'CS206', 'CS231', 'CS240', 'CS223', 'CS325']

You can also use the len operator to find out the length of a string. Also indexing, as illustrated above, can be applied to strings. Examples are shown below:

>>> FallCourses[0]
'CS109'

>>> FallCourses[0:3]
['CS109', 'CS206', 'CS231']

>>> FallCourses[:6]
['CS109', 'CS206', 'CS231', 'CS240', 'CS223', 'CS325']

>>> FallCourses[2:]
['CS231', 'CS240', 'CS223', 'CS325']

When specifying an index range (see above), if you leave out the starting index (as in FallCourses[:6]) it is assumed to be 0. Likewise, leaving out the ending index (as in FallCourses[2:]) starts from the start index and goes until the end of the list.

Like string, lists can also be concatenated using the + operator. Thus,

>>> NewCourse = ['ESEM']
>>> FallCourses = NewCourse + FallCourses
>>> FallCourses
['ESEM', 'CS110', 'CS206', 'CS231', 'CS223', 'CS246', 'CS325']

Notice that the second command also illustrates the use of variables to store updated values. As shown above, the new value of FallCourses is the result of concatenating the list stored in NewCourses with the list that was stored in FallCourses.

You can also concatenate individual elements of a list, if they are strings:

>>> FallCourses[5]+FallCourses[6]
'CS223CS325'

Also, you can individually access elements in a string by double-indexing. I.e.

>>> FallCourses[6][3]
'2'
>>>
FallCourses[6][4]
'5'

You can also enquire about the number of items in a list:

>>> len(FallCourses)
7

Several other operations on lists include: sort and reverse:

>>> FallCourses.sort()
>>> FallCourses
['CS109', 'CS206', 'CS231', 'CS240', 'CS223', 'CS325', 'ESEM']

>>> FallCourses.reverse()
>>>
FallCourses
['ESEM', 'CS325', 'CS223', 'CS240', 'CS231', 'CS206', 'CS109']

Notice the different syntax for applying sort and reverse. Also notice that the variable's value is changed in the process. ANother useful operation on a list is a kind of search:

>>> FallCourses
['ESEM', 'CS325', 'CS223', 'CS240', 'CS231', 'CS206', 'CS109']
>>>
FallCourses.index('CS325')
1

>>>
FallCourses.index('CS372')

Traceback (most recent call last):
File "<pyshell#66>", line 1, in -toplevel-
FallCourses.index('CS372')
ValueError: list.index(x): x not in list

The second example above shows that the call to index fails if asked for an item not present in the list.

The * operator is used to 'multiply' strings:

>>> 'boutros'*2
'boutrosboutros'

split and join are also used often in the course of language processing applications:

>>> sentence = ['Mary', 'had', 'a', 'little', 'lamb.']
>>> sentence
['Mary', 'had', 'a', 'little', 'lamb.']

>>> s = ' '.join(sentence)
>>> s
'Mary had a little lamb.'

>>> s2 = s.split(' ')
>>> s2
['Mary', 'had', 'a', 'little', 'lamb.']

Here are some more examples:

>>> '***'.join(sentence)
'Mary***had***a***little***lamb.'

>>> s
'Mary had a little lamb.'
>>> s.split('a')
['M', 'ry h', 'd ', ' little l', 'mb.']

split and join are complements of each other. For example:

>>> s
'Mary had a little lamb.'

>>> t = s.split('a')
>>> t
['M', 'ry h', 'd ', ' little l', 'mb.']

>>> 'a'.join(t)
'Mary had a little lamb.'

Make sure you understand how these operations work. Later, we will see how split is used to process corpora containing several lines of text. Python has a nice built-in help facility that you can use to get more information on operations available. For example, to get more information on lists or strings, just enter

help(list)
help(str)

To get help on individual functions, you can try:

>>> help(str.split)
>>> help(list.append)

List Comprehensions

Lists are a versatile data structure that come in handy in many different situations. Because lists are so ubiquitous and useful Python also provides some features that make the construction of lists very easy. For example, if you have a list that contains a sentence:

>>> sentence = ['Mary', 'had', 'a', 'little', 'lamb']

You can compute a list that contains the lengths of each individual string (word) in it:

>>> [len(x) for x in sentence]
[4, 3, 1, 6, 4]

We will return to list comprehensions after discussing some control structures.

Dictionaries

Dictionaries provide an efficient way to lookup items based on some key value. In a dictionary, an expression x:y represents a key/value pair where x is the key and y is the value. A dictionary is surrounded by curly braces ({}) and key/value pairs are separated by commas. The value associated with a given key can be found by indexing the dictionary with that key using the [] operator. A new key/value pair can be added to a dictionary by assigning the value to be added to the dictionary indexed with the key to be added. One final consideration is that Python makes no guarantees about the order in which entries will be stored in the dictionary.

A few examples of these operations are shown below.

>>> wordlist = {}
>>> wordlist['Mary'] = 'name'
>>> wordlist['had'] = 'verb'
>>> wordlist['a'] = 'det'
>>> wordlist['little'] = 'adj'
>>> wordlist['lamb'] = 'noun'

You can now print the value of wordlist:

>>> wordlist
{'a': 'det', 'lamb': 'noun', 'little': 'adj', 'had': 'verb', 'Mary': 'name'}

Alternately, you could also have assigned wordlist the values above as follows:

>>> wordlist = {'a': 'det', 'lamb': 'noun', 'little': 'adj', 'had': 'verb', 'Mary': 'name'}
>>> wordlist
{'a': 'det', 'lamb': 'noun', 'little': 'adj', 'had': 'verb', 'Mary': 'name'}

You can access the values for each key as follows:

>>> wordlist['lamb']
'noun'

>>> wordlist.keys()
['a', 'lamb', 'little', 'had', 'Mary']

>>> wordlist.has_key('Mary')
True

For more information on dictionaries and operations available for them do help(dict).

Continued...click here to go to Part 2