You are on page 1of 191

ArPython

Version 1.1.7

20 Februrary 2007

Contents

References ......................................................................................................................................... 4

Python language essentials ................................................................................................................ 4


Lexical structure ............................................................................................................................. 4
Objects ........................................................................................................................................... 5
Names ........................................................................................................................................ 7
Assignment................................................................................................................................. 7
lists .............................................................................................. Error! Bookmark not defined.
Dictionaries ................................................................................. Error! Bookmark not defined.
sets ........................................................................................................................................... 15
Flow control ................................................................................................................................. 17
while ......................................................................................................................................... 17
for ............................................................................................................................................. 17
iterators and generators .......................................................................................................... 25
list comprehensions ................................................................................................................. 32
Page 1of 191

Schlumberger Private

Overview ............................................................................................. Error! Bookmark not defined.

With.......................................................................................................................................... 33
Functions ...................................................................................................................................... 43
Classes .......................................................................................................................................... 50
Why Decorate? ........................................................................................................................ 58
Creating Decorators ................................................................................................................. 59
Objects as Decorators .............................................................................................................. 59
Stacking Decorators ................................................................................................................. 60
Functions as Decorators .......................................................................................................... 61
Decorators with Arguments ..................................................................................................... 62
Function Attributes .................................................................................................................. 63

Putting It All Together .............................................................................................................. 64


Conclusion ................................................................................................................................ 66
Modules, Packages, paths and imports ........................................................................................... 74
Setting import path of a module to any directory ....................................................................... 74
Notes on .pth files ........................................................................................................................ 75
Documentation ................................................................................................................................ 75
Pydoc ............................................................................................................................................ 75
Docutils ........................................................................................................................................ 75
Distributions of Python Software .................................................................................................... 75
The Working Set ........................................................................................................................... 77
The Environment.......................................................................................................................... 77
Python Eggs .................................................................................................................................. 78
Initialization, Development, and Deployment ............................................................................. 78
Page 2of 191

Schlumberger Private

Practicing "Safe Decs" .............................................................................................................. 63

Disutils .......................................................................................................................................... 80
Conversion to Executables ............................................................................................................... 80
PyInstaller .................................................................................................................................... 80
Py2exe .......................................................................................................................................... 81
Recipe for plain python (console) ............................................................................................ 81
Recipe for wxpython (windows) .............................................................................................. 82
How does py2exe work and what are all those files? ............................................................. 82
CGI .................................................................................................................................................... 84
CGI Web Applications with Python, Part One.............................................................................. 86
Opening word documents from Python ........................................................................................ 107

more detailed receipt ................................................................................................................ 107


Printing word files ...................................................................................................................... 108
Patterns .......................................................................................................................................... 109
Typology ..................................................................................................................................... 109
mvc ......................................................................................................................................... 119
DQueues......................................................................................................................................... 119
deque objects ........................................................................................................................... 119
Recipes ................................................................................................................................... 123
Heap queue .................................................................................................................................... 125
Event scheduler.............................................................................................................................. 130
Mutual exclusion support .............................................................................................................. 131
Coroutines and Threading ............................................................................................................. 132
Threading ................................................................................................................................... 132
Page 3of 191

Schlumberger Private

Quick receipt .............................................................................................................................. 107

Queue....................................................................................................................................... 133
Python threads - a first example............................................................................................ 135
THE SAME APPLICATION, WRITTEN USING PYTHON THREADS ............................................. 137
Understanding Threading in Python .......................................................................................... 141
My own coroutine stuff ............................................................................................................. 166
Module threadpool .................................................................................................................... 177
APPENDIX: Type Hierarchy............................................................................................................. 177
Basic date and time types .............................................................................................................. 188
Decimal floating point arithmetic .................................................................................................. 190

PyERef

Schlumberger Private

References
Python Essential Reference (Beazley)

PyNut "Python in a Nutshell 2nd ed" by Alex Martelli


PyCBK Python cookbook 2nd ed " by Alex Martelli
PyLib Python 2.5 library
PracPy Practical Python
PyTut

Python 2.5 Tutorial

PyLib

Python 2.5 Library

Python language essentials

Lexical structure

logical lines (consisting of one or more physical lines) with indentation of contiguous to
denote blocks of statements. Each indentation level uses four spaces. Tabs are replaced
by up to 8 spaces. Best is to expand automatically tabs to spaces when editing.

Character set is ascii per default; other character sets are allowed if the first comment
defines the codec (utf-8 or iso-8859-1) with the following coding directives:
Page 4of 191

# -*- coding:utf-8 -*or


# encoding: utf-8

Tokens are elementary lexical components resulting as breakup of a logical line:


identifiers, keywords, operators, delimiters, literals

Objects
All data values are objects, having a

Type, which determines which operations are supported, as well as the attributes
(properties) and items associated to the object (for instance, methods as covered
later on) and whether the objects value can be altered (mutability).

Unique identity, which is an integer that corresponds to the objects location in


memory

Content ( also called value)

An object that contains references to other objects is said to be a container or collection.

Page 5of 191

Schlumberger Private

A source file is a sequence of simple and compound statements. The former lies entirely within a
logical line and can be an expression or assignment. The latter controls one or more other
statements and controls their executions, by means of one or more clauses. A clause has a
header of the form <keyword> : and a body made out of a single or a block of statements.

Some objects allow you to change their content (without changing the identity or the type, that
is). Some objects don't allow you to change their content (more below).

You cannot change the identity.

You cannot change the type. Actually, this is quite subtle:


The type is represented by a type object, which knows more about objects of this type
(how many bytes of memory they usually occupy, what methods they have, etc).( actually,
you can change the type under some rather limited circumstances1.)

so you can in fact change the type, as long as the internal structure (the "C-level type") is
the types involved must (to quote the checkin messages):

Schlumberger Private

identical.

- have the same basic size


- have the same item size
- have the same dict offset
- have the same weaklist offset
- have the same GC flag bit
- have a common base that is the same except for maybe the
dict and weaklist (which may have been added separately at
the same offsets in both types)
- both be heap types

this basically limits the feature to classes defined at the Python level (just like before the
unification); most attempts to use arbitrary types will fail. e.g.

>>> x.__class__ = dict


Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: __class__ assignment: only for heap types

Page 6of 191

Objects may also have this:

zero or more methods (provided by the type object)


zero or more names

Some objects have methods that allow you to change the contents of the object (modify it in place, that is).
Some objects only have methods that allow you to access the contents, not change it. Some objects don't have any
methods at all. Even if they have methods, you can never change the type, nor the identity.

Things like attribute assignment and item references are just syntactic sugar (more below).

Names
The names are a bit different they're not really properties of the object, and the object itself doesn't know what it's

Names live in namespaces (such as a module namespace, an instance namespace, a function's local namespace).
Namespaces are collections of (name, object reference) pairs (implemented using dictionaries).
When you call a function or a method, its namespace is initialized with the arguments you call it with (the names are
taken from the function's argument list, the objects are those you pass in).

Assignment
Assignment statements modify namespaces, not objects. In other words,
name = 10

>>> class c(list):


...

pass

...
>>> x.__class__ = c
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: __class__ assignment: 'a' object layout differs from 'c'

Page 7of 191

Schlumberger Private

called. An object can have any number of names, or no name at all.

means that you're adding the name "name" to your local namespace, and making it refer to an integer object containing
the value 10.

If the name is already present, the assignment replaces the original name:
name = 10
name = 20
means that you're first adding the name "name" to the local namespace, and making it refer to an integer object
containing the value 10. You're then replacing the name, making it point to an integer object containing the value 20.
The original "10" object isn't affected by this operation, and it doesn't care.

In contrast, if you do:


name = []
name.append(1)

namespace. You're then calling a method on that object, telling it to append an integer object to itself. This modifies the
content of the list object, but it doesn't touch the namespace, and it doesn't touch the integer object.

Things like name.attr and name[index] are just syntactic sugar for method calls. The first corresponds to

__setattr__/__getattr__, the second to __setitem__/__getitem__ (depending on which side of the assignment they
appear).

The built-in object() returns a bland object, which is the abstract base class for all classes and
objects.

Types may be built-in or user defined. Built-in types are:

None

Numbers

int,long,float,complex,bool

Sequences

list,tuple,str,xrange,unicode,basestring

Mapping

dict

Sets

set,frozenset
Page 8of 191

Schlumberger Private

you're first adding the name "name" to the local namespace, making it refer to an empty list object. This modifies the

Callable

object, types (see Appendix)

Modules

types.ModuleType

Classes

Types

type

Files

file

Internal

types. (see Appendix)

Classic Classes types.ClassType , types.InstanceType

object

t=(1,2,3)
l=[1,2,3]
d=dict(a=1,b=2)
s=set()

class C:

pass

class Cnew(object):
def name(x):

pass

return x.__name__

print "type: %s" %type(1)


print "type: %s" %type("")
print "type: %s" %type(l)
print "type: %s" %type(t)
print "type: %s" %type(s)
print "type: %s" %type(d)

Page 9of 191

Schlumberger Private

The built-in type(obj) accepts any object as argument and returns the type object that is the type
of obj (built in or in the types module). Be aware of the polyphormic feature of the type built-in,
that also accepts the form type(name,bases,dict) to create a new type object (same as defining a
new class).

print "type: %s" %type(object)


print "type: %s" %type(name)
print "type: %s" %type(C)
print "type: %s" %type(Cnew)

emits:

type: <type 'int'>


type: <type 'str'>
type: <type 'list'>
type: <type 'tuple'>
type: <type 'set'>

Schlumberger Private

type: <type 'dict'>


type: <type 'type'>
type: <type 'function'>
type: <type 'classobj'>
type: <type 'type'>

The attributes associated to an object may be retrieved using the dir() built-in. Some objects have
the attribute __name__, like classes, while others do not. Some objects allow that the __name__
attribute be set in particular the user defined objects.

For instance,
def setName(obj,x):
obj.__name__=str(x)
def getName(x):
Page 10of 191

return x.__name__
c=C()
#c.__name__='my name is c'
#setName(c,'my name is bim bam bola')
print "name: %s" %getName(C)
print "name: %s" %getName(c)

#emits:
name: C
Traceback (most recent call last):
Schlumberger Private

File "D:\Python\lib\dump.py", line 47, in getName


return x.__name__
AttributeError: 'C' object has no attribute '__name_

However, if any of the comments setting name is enabled:


#emits:
name: C
name: my name is bim bam bola

There are objects that do not allow to set the __name__ attribute, like built-in types (as lists): you
get the following error when trying to do that on a list l=[1,2,3]:

Traceback (most recent call last):


Page 11of 191


l.__setattr__('__name__','hallo')
AttributeError: 'list' object has no attribute '__name__'

and dir(l) renders:


['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__',
'__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__',
'__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__
', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__str__', 'append',
'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__',


'__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__',
'__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduc
e_ex__', '__repr__', '__setattr__', '__setitem__', '__str__', 'clear', 'copy', 'fromkeys', 'get',
'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault',
'update', 'values']

Page 12of 191

Schlumberger Private

a similar thing happens with a dictionary or a set

The following test script renders interesting results:

#testattributes.py

#class L(object,list):# would raise TypeError: Error when calling the metaclass bases
# Cannot create a consistent method resolution order (MRO) for bases
object, list
class L(list):
pass
l=L([1,2,3])
x=l.pop()

print dir(L)#does not return attribute __name__ although it exists


print L.__name__

#L.__setattr__(L,'__name__','ANTONIO')# would raise: TypeError: can't apply this


__setattr__ to type object
#l.__setattr__('__name__','ANTONIO')#would raise: AttributeError: 'list' object has no
attribute '__name__'
#print l.__name__# would raise: AttributeError: 'L' object has no attribute '__name__'
l.__dict__['__name__']='antonio'

#works

print l.__name__

Li=[1,2,3]
print dir (Li)
#print Li.__name__ #would raise: attributeError: 'list' object has no attribute
'__name__'
#Li.__dict__['__name__']='antonio'#would raise: attributeError: 'list' object has no
attribute '__dict__'

Page 13of 191

Schlumberger Private

print l

"""
results:
[1, 2]
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__',
'__dict__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
'__getslice__',
'__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__',
'__len__',
'__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__',
'__str__', '__weakref__', 'append', 'count', 'extend', 'index', 'insert', 'pop',
'remove', 'reverse', 'sort']
L

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__',


'__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getslice__',
'__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__',
'__len__',
'__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__',
'__str__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse',
'sort']

"""

Data values (objects) are accessed through references, which is a name that refers to the location
in memory, taking the form of variables, object attributes and items.

A name is bound to an object through an assignment statement (name=value in a simplified form


or more generically target=expression) . Generically, the target may be an identifier , an
attribute reference (obj.name, whereby name is here an attribute name of the object), an
indexing (obj[expr]) or a slicing (obj[start:stop:stride]).
Page 14of 191

Schlumberger Private

antonio

References may be unbound using a del statement, followed by one or more taget references.

Objects can be (arbitrarily) categorized as

collections
o lists
o dictionaries
o sets

A set object is an unordered collection of immutable values. Common uses include membership
testing, removing duplicates from a sequence, and computing mathematical operations such as
intersection, union, difference, and symmetric difference.

Like other collections, sets support x in set, len(set), and for x in set. Being an unordered collection,
sets do not record element position or order of insertion. Accordingly, sets do not support
indexing, slicing, or other sequence-like behavior.

There are currently two builtin set types, set and frozenset. The set type is mutable -- the
contents can be changed using methods like add() and remove(). Since it is mutable, it has
no hash value and cannot be used as either a dictionary key or as an element of another set. The
frozenset type is immutable and hashable -- its contents cannot be altered after is created;
however, it can be used as a dictionary key or as an element of another set.

set([iterable])

Page 15of 191

Schlumberger Private

sets

return a set whose elements are taken from iterable. The elements must be immutable.
To represent sets of sets, the inner sets should be frozenset objects. If iterable is not
specified, returns a new empty set, set([]).

Instances of set and frozenset provide the following operations:


Operation

Equivalent Result

len(s)

cardinality of set s

x in s

test x for membership in s

x not in s

test x for non-membership in s


s <= t

test whether every element in s is in t

s.issuperset(t)

s >= t

test whether every element in t is in s

s.union(t)

s|t

new set with elements from both s and t

s.intersection(t)

s&t

new set with elements common to s and t

s.difference(t)

s-t

new set with elements in s but not in t

s.symmetric_difference(t)

s^t

new set with elements in either s or t but not both

s.copy()

Schlumberger Private

s.issubset(t)

new set with a shallow copy of s

Page 16of 191

Flow control
A program's control flow is the order in which the program's code executes. The control flow of a
Python program is regulated by conditional statements, loops, and function calls.

Conditionan and loop control flow statements are: if, while, for.

while
The while statement in Python supports repeated execution of a statement or block of
statements that are controlled by a conditional expression. Here's the syntax for the while
statement:
Schlumberger Private

while expression:
statement(s)

A while statement can also include an else clause

for
The for statement in Python supports repeated execution of a statement, or block of statements,
controlled by an iterable expression. Here's the syntax for the for statement:
for target in iterable:
statement(s)

The in keyword is part of the syntax of the for statement and is distinct from the in operator,
which tests membership. A for statement can also include an else clause.
iterable may be any Python expression suitable as an argument to built-in function iter, which

returns an iterator object ( is covered below ). In particular, any sequence is iterable.


target is normally an identifier that names the control variable of the loop; the for statement

successively rebinds this variable to each item of the iterator, in order.


Page 17of 191

The statement or statements that make up the loop body execute once for each item in
iterable (unless the loop ends because an exception is raised or a break or return statement
executes). Note that, since the loop body may contain a break statement to terminate the loop,
this is one case in which you may want to use an unbounded iterable-one that, per se, would
never cease yielding items.
You can also have a target with multiple identifiers, as with an unpacking assignment. In this case,
the iterator's items must then be iterables, each with exactly as many items as there are
identifiers in the target.

Some interesting comments:

understanding Python's "for" statement

On the surface, Python's for-in statement is taken right away from Python's predecessor ABC,
where it's described as:

FOR name,... IN train:


commands
Take each element of train in turn

In ABC, what's called statements in Python are known as commands, and sequences are known as
trains. (The whole language is like that, by the way; lots of common mechanisms described using
less-common names. Maybe they thought that renaming everything would make it easier for people
to pick up the subtle details of the language, instead of assuming that everything worked exactly as
other seemingly similar languages, or maybe it only makes sense if you're Dutch.)

Page 18of 191

Schlumberger Private

One of the things I noticed when skimming through the various reactions to my recent "with"-article
is that some people seem to have a somewhat fuzzy understanding of Python's other block
statement, the good old for-in loop statement. The with statement didn't introduce code blocks in
Python; they've always been there. To rectify this, for-in probably deserves it's own article, so here
we go (but be warned that the following is a bit rough; I reserve the right to tweak it a little over the
next few days).

Anyway, to take each element (item) from a train (sequence) in turn, we can simply do (using a
psuedo-Python syntax):

name = train[0]
do something with name
name = train[1]
do something with name
name = train[2]
do something with name
... etc ...
and keep doing that until we run out of items. When we do, we'll get an IndexError exception, which
tells us that it's time to stop.

for name in train:


do something with name

the interpreter will simply fetch train[0] and assign it to name, and then execute the code block. It'll
then fetch train[1], train[2], and so on, until it gets an IndexError.
The code inside the for-in loop is executed in the same scope as the surrounding code; in the
following example:

train = 1, 2, 3
for name in train:
value = name * 10
print value

the variables train, name, and value all live in the same namespace.

Page 19of 191

Schlumberger Private

And in its simplest and original form, this is exactly what the for-in statement does; when you write

This is pretty straightforward, of course, but it immediately gets a bit more interesting once you
realize that you can use custom objects as trains. Just implement the __getitem__ method, and you
can control how the loop behaves. The following code:

class MyTrain:
def __getitem__(self, index):
if not condition:
raise IndexError("that's enough!")
value = fetch item identified by index
return value # hand control back to the block

for name in MyTrain():

will run the loop as long as the given condition is true, with values provided by the custom train. In
other words, the do something part is turned into a block of code that's being executed under the
control of the custom sequence object. The above is equivalent to:

index = 0
while True: # run forever
if not condition:
break
name = fetch item identified by index
do something with name
index = index + 1

except that index is a hidden variable, and the controlling code is placed in a separate object.
You can use this mechanism for everything from generating sequence elements on the fly (like
xrange):

Page 20of 191

Schlumberger Private

do something with name

class MySequence:
def __getitem__(self, index):
if index > 10:
raise IndexError("that's enough!")
return value * 10 # returns 0, 10, 20, ..., 100
and fetching data from an external source:
class MyTable:
def __getitem__(self, index):
value = fetch item index from database table
if value not found:
raise IndexError("not found")
Schlumberger Private

return value
or from a stream:
class MyFileIterator:
def __getitem__(self, index):
text = get next line from file
if end of file:
raise IndexError("end of file")
return text
to fetching data from some other source:
class MyEventSource:
def __getitem__(self, index):
event = get next event
if event == terminate:
raise IndexError
return event

for event in MyEventSource():


Page 21of 191

process event

It's more explicit in the latter examples, but in all these examples, the code in __getitem__ is
basically treating the block of code inside the for-in loop as an in-lined callback.
Also note how the last two examples don't even bother to look at the index; they just keep calling
the for-in block until they run out of data. Or, less obvious, until they run out of bits in the internal
index variable.

To deal with this, and also avoid the issue with having objects that looks a lot as sequences, but
doesn't support random access, the for-in statement was redesigned in Python 2.2. Instead of using
the __getitem__ interface, for-in now starts by looking for an __iter__ hook. If present, this method
is called, and the resulting object is then used to fetch items, one by one. This new protocol
behaves like this:

Schlumberger Private

obj = train.__iter__()
name = obj.next()
do something with name
name = obj.next()
do something with name
...
where obj is an internal variable, and the next method indicates end of data by raising the
StopIterator exception, instead of IndexError. Using a custom object can look something like:

class MyTrain:
def __iter__(self):
return self
def next(self):
if not condition:
raise StopIteration
value = calculate next value
return value # hand control over to the block
Page 22of 191

for name in MyTrain():


do something with name
(Here, the MyTrain object returns itself, which means that the for-in statement will call MyTrain's
own next method to do the actual work. In some cases, it makes more sense to use an independent
object for the iteration).

Using this mechanism, we can now rewrite the file iterator from above as:

class MyFileIterator:
def __iter__(self):
return self # use myself
Schlumberger Private

def next():
text = get next line from file
if end of file:
raise StopIteration()
return text

and, with very little work, get an object that doesn't support normal indexing, and doesn't break
down if used on a file with more than 2 billion lines.

But what about ordinary sequences, you ask? That's of course easily handled by a wrapper object,
that keeps an internal counter, and maps next calls to __getitem__ calls, in exactly the same way
as the original for-in statement did. Python provides a standard implementation of such an object,
iter, which is used automatically if __iter__ doesn't exist.
This wasn't very difficult, was it?

Footnote: In Python 2.2 and later, several non-sequence objects have been extended to support
the new protocol. For example, you can loop over both text files and dictionaries; the former return
lines of text, the latter dictionary keys.
for line in open("file.txt"):
Page 23of 191

do something with line

for key in my_dict:


do something with key

Schlumberger Private
Page 24of 191

iterators and generators


Interesting objects are iterators and generators.
To iterate means to repeat something several times- what you do with loops. In Python you can
iterate over special objects, iterators, representing a stream of data and returning the data one
element at a time. It must support a method called next() always returning the next element
of the stream: you can only go forward and there is no way to get the previous element.

Iterators can be built by implicit or explicit calls to built-in function iter. Calling a generator
also returns an iterator.
Notice that after consuming all of the iterators output the iterator is exhausted: if you need to
do something different with the same stream, you need to create a new (different) iterator.
L= [1,2,3]; C= (1,2,3); S= set(L) ; D= dict(x=1,y=2,z=3)
class K:
def __init__(self,values):
self.values=values
def next(self):
try:
return self.values.pop()
except:
raise StopIteration
def __iter__(self):
return self

k=K(L)

Page 25of 191

Schlumberger Private

Formally, an iterator is an object i such that you can call i.next( ) with no arguments.
i.next( ) returns the next item of iterator i or, when iterator i has no more items, raises a
StopIteration exception. The built-in iter() function takes an arbitrary object and tries to
return an iterator that will return the objects contents or elements raising TypeError if the
object does not support iteration. An object is said to be iterable if you can get an iterator for it.
Any Python sequrence type (lists, tuples, strings) as well as dictionaries are iterable they will
automatically support creation for an iterator. When you write a class you can allow instances of
the class to be iterators by defining a method next and the attribute __iter__().

X=k

it1 = iter(X)

print it1
for item in it1: print item
print it1.next()

emits the following, since the last statement referred to an exhausted iterator :

<__main__.K instance at 0x009F6468>


3

Schlumberger Private

2
1
Traceback (most recent call last):
File "D:\Python\TESTS\test.py", line 24, in <module>
print it1.next()
File "D:\Python\TESTS\test.py", line 13, in next
raise StopIteration
StopIteration

The situation does not change making the following change:

for item in it1: print item


it1 = iter(X) # does not change anything the iterator is exhausted!
print it1.next()

The following file provides iterator tests.


Page 26of 191

A generator2 is a function whose body contains one or more occurrences of the keyword yield.
When the generator is called, the function body does not execute: instead, it returns a special

Generators and iterators are related to the concept of coroutines. Coroutines are more generic than subroutines.
The start of a subroutine is the only point of entry; the start of a coroutine is the first point of entry, and places
within a coroutine following returns (yields) are subsequent points of entry. Subroutines can return only once; in
contrast, coroutines can return (yield) several times. The lifespan of subroutines is dictated by last in, first out (the
last subroutine called is the first to return); in contrast, the lifespan of coroutines is dictated entirely by their use and
need.
Here's a simple example of how coroutines can be useful. Suppose you have a consumer-producer relationship
where one routine creates items and adds them to a queue and another removes items from the queue and uses
them. For reasons of efficiency, you want to add and remove several items at once. The code might look like this:
var q := new queue

Schlumberger Private

coroutine produce
loop
while q is not full
create some new items
add the items to q
yield to consume

coroutine consume
loop
while q is not empty
remove some items from q
use the items
yield to produce

Each coroutine does as much work as it can before yielding control to the other using the yield command. The yield
causes control in the other coroutine to pick up where it left off, but now with the queue modified so that it can do
more work. Although this example is often used to introduce multithreading, it's not necessary to have two threads to
effect this dynamic: the yield statement can be implemented by a branch directly from one routine into the other.

Page 27of 191

iterator object that wraps the function body, its local variables (including its parameters) and
the current point of execution, which is initially the start of the function; when the next method
of this iterator object is called, the function body executes up to the next yield statement (yield
expression). At this point, the function execution is frozen, with current point of execution and
local variables intact, and the expression following yield is returned as the result of the next
method. When next is called again, execution of the function body resumes where it left off,
again up to the next yield statement. If the function body ends or executes a return statement,
the iterator raises a StopIteration exception, to indicate that the iteration is finished. Return
should have no parameters3. A generator may be unbounded (returning an infinite stream o
results) or bounded. In Python 2.5 a generator object has the methods send, throw and close.
Python's generators provide a convenient way to implement the iterator protocol. If a container
object's __iter__() method is implemented as a generator, it will automatically return an iterator
object (technically, a generator object) supplying the __iter__() and next() methods.

The engine calls task.send() with an appropriate argument. One of the library of tasks is Pollster. Pollster calls poll()
with for tasks that are waiting I/O. Tasks that are ready for I/O are fed to the priority queue.

Spasmodic provides an efficient way to manage a large number of sockets and/or files. Other processing works well
too, if it can be subdivided into brief spasms.

def updown(N):
""" small generator function"""
for x in xrange(1, N): yield x
if x==3: return 5
for x in xrange(N, 0, -1): yield x

for i in updown(4): print I

renders:
>>> File "<Module2>", line 4 SyntaxError: 'return' with argument inside generator
Page 28of 191

Schlumberger Private

One framework for coroutines is spasmodic, supporting asynchronous I/O (and other tasks). The SpasmodicEngine
selects tasks (spasmoids) from a (heapqueue based) priority queue. The tasks are Python 2.5 extended generators
(some call them coroutines: PEP 342).

The for statement implicitly calls iter to get an iterator. The following statement:
for x in c:
statement(s)

is exactly equivalent to:


_temporary_iterator = iter(c)
while True:
try: x = _temporary_iterator.next( )
except StopIteration: break
statement(s)

Thus, if iter(c) returns an iterator i such that i.next( ) never raises StopIteration (an unbounded
iterator), the loop for x in c never terminates (unless the statements in the loop body include suitable
break or return statements, or raise or propagate exceptions). iter(c), in turn, calls special method
c._ _iter_ _( ) to obtain and return an iterator on c.
Many of the best ways to build and manipulate iterators are found in standard library module
itertools.

while and for statements may optionally have a trailing else clause. The statement or block

under that else executes when the loop terminates naturally (at the end of the for iterator, or
when the while loop condition becomes false), but not when the loop terminates prematurely
(via break, return, or an exception). When a loop contains one or more break statements, you
often need to check whether the loop terminates naturally or prematurely. You can use an else
clause on the loop for this purpose:
for x in some_container:
if is_ok(x): break

# item x is satisfactory, terminate

loop
else:
print "Warning: no satisfactory item was found in container"
x = None

Page 29of 191

Schlumberger Private

where _temporary_iterator is some arbitrary name that is not used elsewhere in the current scope.

any and all built-in functions in Python 2.5

$ python2.5
Python 2.5 (r25:51908, Sep 27 2006, 12:21:46)
[GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> any
built-in function any
>>> all
built-in function all
Schlumberger Private

>>> print any.__doc__


any(iterable) -> bool

Return True if bool(x) is True for any x in the iterable.


>>> print all.__doc__
all(iterable) -> bool

Return True if bool(x) is True for all values x in the iterable.


>>> any([1,2])
True
>>> all([1,2])
True
>>> any([1,0])
True
Page 30of 191

>>> all([1,0])
False

Schlumberger Private
Page 31of 191

list comprehensions
A common use of a for loop is to inspect each item in an iterable and build a new list by
appending the results of an expression computed on some or all of the items. The expression
form known as a list comprehension lets you code this common idiom concisely and directly.
Since a list comprehension is an expression (rather than a block of statements), you can use it
wherever you need an expression (e.g., as an argument in a function call, in a return statement,
or as a subexpression for some other expression).
A list comprehension has the following syntax:
[ expression for target in
iterable lc-clauses ]
target and iterable are the same as in a regular for statement. You must enclose the
expression in parentheses if it indicates a tuple.
lc-clauses is a series of zero or more clauses, each with one of the following forms:

if expression
target and iterable in each for clause of a list comprehension have the same syntax and

meaning as those in a regular for statement, and the expression in each if clause of a list
comprehension has the same syntax and meaning as the expression in a regular if statement.
A list comprehension is equivalent to a for loop that builds the same list by repeated calls to the
resulting list's append method. For example (assigning the list comprehension result to a variable
for clarity):
result1 = [x+1 for x in some_sequence]

is the same as the for loop:


result2 = []
for x in some_sequence:
result2.append(x+1)

Here's a list comprehension that uses an if clause:


result3 = [x+1 for x in some_sequence if x>23]

This list comprehension is the same as a for loop that contains an if statement:
result4 = []
for x in some_sequence:
Page 32of 191

Schlumberger Private

for target in iterable

if x>23:
result4.append(x+1)

And here's a list comprehension that uses a for clause:


result5 = [x+y for x in alist for y in another]

This is the same as a for loop with another for loop nested inside:
result6 = []
for x in alist:
for y in another:
result6.append(x+y)

As these examples show, the order of for and if in a list comprehension is the same as in the
equivalent loop, but in the list comprehension, the nesting remains implicit.

The with statement is used to wrap the execution of a block with functionality provided by a
separate guard object (see context-managers). This allows common try-except-finally usage
patterns to be encapsulated for convenient reuse.
from __future__ import with_statement

# To enable in 2.5

Syntax:
with expression [as target] :
suite
or:
with expression [as ( target list ) ] :
suite
The expression is evaluated once, and should yield a context guard, which is used to control
execution of the suite. The guard can provide execution-specific data, which is assigned to the
target (or target list).
Note that if a target list is used instead of a single target, the list must be parenthesized.
Here's a more detailed description:
Page 33of 191

Schlumberger Private

With

1.
2.
3.
4.
5.

The context expression is evaluated, to obtain a context guard.


The guard object's __enter__ method is invoked to obtain the context entry value.
The context entry value is assigned to the target or target list, if present.
The suite is executed.
No matter what happens in the suite, the guard object's __exit__ method is invoked.
If an exception caused the suite to be exited, its type, value, and traceback are passed
as arguments to __exit__. Otherwise, three None arguments are supplied.

If the suite was exited due to an exception, and the return value from the __exit__ method is
false, the exception is reraised. If the return value is true, the exception is suppressed, and
execution continues with the statement following the with statement.
If the suite was exited for any reason other than an exception (e.g., by falling off the end of
the suite, or via return, break, or continue), the return value from __exit__ is ignored, and
execution proceeds at the normal location for the kind of exit that was taken.

Note: In Python 2.5, the with statement is only allowed when the with_statement feature
has been enabled. It will always be enabled in Python 2.6. This __future__ import statement
can be used to enable the feature (see future):
from __future__ import with_statement
See Also: PEP 0343, The "with" statement The specification, background, and examples for
the Python with statement.

with statement considered hard?


The documentation to context managers. The term comes from the draft documentation but
doesn't explain it further. From that draft:
Context managers ensure a particular action is taken to establish the context before the
contained suite is entered, and a second action to clean up the context when the suite is exited.

and from the PEP

the actual state modifications made by the context manager


Page 34of 191

Schlumberger Private

Note: The with statement guarantees that if the __enter__ method returns without an error,
then __exit__ will always be called. Thus, if an error occurs during the assignment to the
target list, it will be treated the same as an error occurring within the suite would be. See
step 5 above.

All of the examples in the PEP and the what's new document use the with statement for things
like locks, ensuring a socket or file is closed, database transactions and temporarily modifying a
system setting. These are heavy duty things. Deep things. Things most people don't want to mess
with. The what's new document even says
Under the hood, the 'with' statement is fairly complicated. Most people will only use 'with' in
company with existing objects and don't need to know these details
I think this viewpoint is wrong, or at least overly limited. I think the PEP is more generally useful
than those examples show and the term "context manager" is too abstract. I also conjecture that
the people working on the PEP were systems developers and not applications developers, hence
the bias towards system/state modification examples. ;)
The with Statement is very much similar to the using block in C#.
consider an example of opening a file and making sure that it is closed after the block ends

so after the with block ends the f object is completly distroyed and the file is closed.
and like C# the f object should implement some kind of cleanup functions to make it possible for
the with block to call these functions automatically, in C# the object should implement
IDisposable in python

The expression is evaluated and should result in an object called a ``context manager''.
The context manager must have __enter__() and __exit__() methods.
The context manager's __enter__() method is called. The value returned is assigned to
VAR. If no 'as VAR' clause is present, the value is simply discarded.
The code in BLOCK is executed.
If BLOCK raises an exception, the __exit__(type, value, traceback) is called with the
exception details, the same values returned by sys.exc_info(). The method's return value
controls whether the exception is re-raised: any false value re-raises the exception, and
True will result in suppressing it. You'll only rarely want to suppress the exception,
because if you do the author of the code containing the 'with' statement will never realize
anything went wrong.
If BLOCK didn't raise an exception, the __exit__() method is still called, but type, value,
and traceback are all None.
Page 35of 191

Schlumberger Private

with open('/etc/passwd', 'r') as f:


for line in f:
print line

the new with statement handles the part I care about: making it easier to write code that works
correctly in the case of failures.

The 3 typical use cases are a file that needs to be closed, a lock that needs to be released, and
and a database transaction that needs to be either committed or rolled back. The database case
is the most interesting, since you need to handle success and failure differently, and before
version 2.5 python would not allow you to have try/except/finally. You had to pick either
try/except or try/finally. Python 2.5 also provides a unified try/except/finally, but the with
statement is easier to write, and easier to read.

db_connection = DatabaseConnection()
with db_connection as cursor:
cursor.execute('insert into ...')
cursor.execute('delete from ...')
# ... more operations ...

In order for this to work, it looks like the classes that you are working with need to properly
support a context manager which defines what should happen for success and error. But, not all
classes will need to implement a full blown context manager, the contextlib module allows for an
easy way to add support to existing objects without the need to write a new class.
from contextlib import contextmanager

@contextmanager
def db_transaction (connection):
Page 36of 191

Schlumberger Private

Ive borrowed an example of what user code would look like for a database connection using the
new with statement from the python docs. The idea is that the block of code should run, and
then the transaction should either be committed or rolled back depending on whether the block
exited normally or with an exception:

cursor = connection.cursor()
try:
yield cursor
except:
connection.rollback()
raise
else:
connection.commit()

db = DatabaseConnection()

...
This was one area where Ruby code was much cleaner than Python, so its great to see the new
functionality. Its pretty hard for me to write code which doesnt touch a file, a database, or
threads, so it will be used a lot.

Examples:

# Public domain

from __future__ import with_statement


import contextlib, time

@contextlib.contextmanager
def accum_time(L):
Page 37of 191

Schlumberger Private

with db_transaction(db) as cursor:

"""
Add time used inside a with block to the value of L[0].
"""
start = time.clock()
try:
yield
finally:
end = time.clock()
L[0] += end - start

# Example: measure time to execute code inside with blocks.


Schlumberger Private

t = [0]
with accum_time(t):
print sum(range(1000000))
with accum_time(t):
print sum(range(2000000))
print 'Time:', t[0]

another description (efBot):

understanding the "with" statement #

Judging from comp.lang.python and other forums, Python 2.5's new with statement seems to be
a bit confusing even for experienced Python programmers.

Page 38of 191

As most other things in Python, the with statement is actually very simple, once you understand
the problem it's trying to solve. Consider this piece of code:

set things up
try:
do something
finally:
tear things down

If you do this a lot, it would be quite convenient if you could put the "set things up" and "tear
things down" code in a library function, to make it easy to reuse. You can of course do something
like
def controlled_execution(callback):
set things up
try:
callback(thing)
finally:
tear things down

def my_function(thing):
do something
Page 39of 191

Schlumberger Private

Here, "set things up" could be opening a file, or acquiring some sort of external resource, and
"tear things down" would then be closing the file, or releasing or removing the resource. The tryfinally construct guarantees that the "tear things down" part is always executed, even if the code
that does the work doesn't finish.

controlled_execution(my_function)

But that's a bit verbose, especially if you need to modify local variables. Another approach is to
use a one-shot generator, and use the for-in statement to "wrap" the code:
def controlled_execution():
set things up
try:
yield thing
finally:
Schlumberger Private

tear things down

for thing in controlled_execution():


do something with thing

But yield isn't even allowed inside a try-finally in 2.4 and earlier. And while that could be fixed
(and it has been fixed in 2.5), it's still a bit weird to use a loop construct when you know that you
only want to execute something once.
So after contemplating a number of alternatives, GvR and the python-dev team finally came up
with a generalization of the latter, using an object instead of a generator to control the behaviour
of an external piece of code:

class controlled_execution:
def __enter__(self):
set things up
return thing
Page 40of 191

def __exit__(self, type, value, traceback):


tear things down

with controlled_execution() as thing:


some code

Now, when the "with" statement is executed, Python evaluates the expression, calls the
__enter__ method on the resulting value (which is called a "context guard"), and assigns
whatever __enter__ returns to the variable given by as. Python will then execute the code body,
and no matter what happens in that code, call the guard object's __exit__ method.

def __exit__(self, type, value, traceback):


return isinstance(value, TypeError)

In Python 2.5, the file object has been equipped with __enter__ and __exit__ methods; the
former simply returns the file object itself, and the latter closes the file:
>>> f = open("x.txt")
>>> f
<open file 'x.txt', mode 'r' at 0x00AE82F0>
>>> f.__enter__()
<open file 'x.txt', mode 'r' at 0x00AE82F0>
>>> f.read(1)
Page 41of 191

Schlumberger Private

As an extra bonus, the __exit__ method can look at the exception, if any, and suppress it or act
on it as necessary. To suppress the exception, just return a true value. For example, the following
__exit__ method swallows any TypeError, but lets all other exceptions through:

'X'
>>> f.__exit__(None, None, None)
>>> f.read(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file

so to open a file, process its contents, and make sure to close it, you can simply do:

with open("x.txt") as f:
Schlumberger Private

data = f.read()
do something with data

Page 42of 191

Functions
A function is a group of statements that execute upon request. Python provides many built-in
functions and allows programmers to define their own functions. A request to execute a function
is known as a function call. When you call a function, you can pass arguments that specify data
upon which the function performs its computation. In Python, a function always returns a result
value, either None or a value that represents the results of the computation. Functions defined
within class statements are also known as methods. Functions are objects (values) that are
handled like other objects. Thus, you can pass a function as an argument in a call to another
function. Similarly, a function can return another function as the result of a call. A function, just
like any other object, can be bound to a variable, an item in a container, or an attribute of an
object. Functions can also be keys into a dictionary. For example, if you need to quickly find a
function's inverse given the function, you could define a dictionary whose keys and values are
functions and then make the dictionary bidirectional:
inverse = {sin:asin, cos:acos, tan:atan, log:exp}

The def statement is the most common way to define a function. def is a single-clause
compound statement with the following syntax:
def function-name(parameters):
statement(s)
function-name is an identifier. It is a variable that gets bound (or rebound) to the function

object when def executes.


parameters is an optional list of identifiers, known as formal parameters or just parameters, that

get bound to the values supplied as arguments when the function is called. In the simplest case, a
function doesn't have any formal parameters, which means the function doesn't take any
arguments when it is called. In this case, the function definition has empty parentheses after
function-name.
When a function does take arguments, parameters contains one or more identifiers, separated
by commas (,). In this case, each call to the function supplies values, known as arguments,
corresponding to the parameters listed in the function definition. The parameters are local
variables of the function.

Page 43of 191

Schlumberger Private

for f in inverse.keys( ): inverse[inverse[f]] = f

The def statement sets some attributes of a function object. The attribute func_name, also
accessible as _ _name_ _, refers to the identifier string given as the function name in the def
statement. You may rebind the attribute to any string value, but trying to unbind it raises an
exception. The attribute func_defaults, which you may freely rebind or unbind, refers to the
tuple of default values for the optional parameters (or the empty tuple, if the function has no
optional parameters).
Another function attribute is the documentation string, also known as the docstring. You may use
or rebind a function's docstring attribute as either func_doc or _ _doc_ _. If the first statement
in the function body is a string literal, the compiler binds that string as the function's docstring
attribute
In addition to its predefined attributes, a function object may have other arbitrary attributes. To
create an attribute of a function object, bind a value to the appropriate attribute reference in an
assignment statement after the def statement executes. For example, a function could count
how many times it gets called:
def counter( ):

return counter.count counter.count = 0

Note that this is not common usage. More often, when you want to group together some state
(data) and some behavior (code), you should use the object-oriented mechanisms .
The return statement is allowed only inside a function body and can optionally be followed by
an expression. When return executes, the function terminates, and the value of the expression
is the function's result. A function returns None if it terminates by reaching the end of its body or
by executing a return statement that has no expression (or, of course, by executing return
None).
As a matter of style, you should never write a return statement without an expression at the end
of a function body. If some return statements in a function have an expression, all return
statements should have an expression. return None should only be written explicitly to meet
this style requirement. Python does not enforce these stylistic conventions.
A function call is an expression with the following syntax:
function-object(arguments)
function-object may be any reference to a function (or other callable) object; most often, it's

the function's name. The parentheses denote the function-call operation itself. arguments, in the
simplest case, is a series of zero or more expressions separated by commas (,), giving values for
the function's corresponding parameters. When the function call executes, the parameters are
Page 44of 191

Schlumberger Private

counter.count += 1

bound to the argument values, the function body executes, and the value of the function-call
expression is whatever the function returns.
Note that just mentioning a function (or other callable object) does not call it. To call a function
(or other object) without arguments, you must use ( ) after the function's name.
In traditional terms, all argument passing in Python is by value. For example, if you pass a variable
as an argument, Python passes to the function the object (value) to which the variable currently
refers, not "the variable itself." Thus, a function cannot rebind the caller's variables. However, if
you pass a mutable object as an argument, the function may request changes to that object
because Python passes the object itself, not a copy. Rebinding a variable and mutating an object
are totally disjoint concepts. For example:
def f(x, y):
x = 23
y.append(42)

b = [99]
f(a, b)
print a, b

# prints: 77 [99, 42]

The print statement shows that a is still bound to 77. Function f's rebinding of its parameter x
to 23 has no effect on f's caller, nor, in particular, on the binding of the caller's variable that
happened to be used to pass 77 as the parameter's value. However, the print statement also
shows that b is now bound to [99, 42]. b is still bound to the same list object as before the call,
but that object has mutated, as f has appended 42 to that list object. In either case, f has not
altered the caller's bindings, nor can f alter the number 77, since numbers are immutable.
However, f can alter a list object, since list objects are mutable. In this example, f mutates the
list object that the caller passes to f as the second argument by calling the object's append
method.
Arguments that are just expressions are known as positional arguments. Each positional
argument supplies the value for the parameter that corresponds to it by position (order) in the
function definition.
In a function call, zero or more positional arguments may be followed by zero or more named
arguments, each with the following syntax:
identifier=expression

Page 45of 191

Schlumberger Private

a = 77

The identifier must be one of the parameter names used in the def statement for the
function. The expression supplies the value for the parameter of that name. Most built-in
functions do not accept named arguments, you must call such functions with positional
arguments only. However, all normal functions coded in Python accept named as well as
positional arguments, so you may call them in different ways.
A function call must supply, via a positional or a named argument, exactly one value for each
mandatory parameter, and zero or one value for each optional parameter. For example:
def divide(divisor, dividend):
return dividend // divisor print divide(12, 94)
# prints: 7
print divide(dividend=94, divisor=12)

# prints: 7

A common use of named arguments is to bind some optional parameters to specific values, while
letting other optional parameters take default values:
def f(middle, begin='init', end='finis'):
return begin+middle+end print f('tini', end='')
prints: inittini

Thanks to named argument end='', the caller can specify a value, the empty string '', for f's
third parameter, end, and still let f's second parameter, begin, use its default value, the string
'init'.
At the end of the arguments in a function call, you may optionally use either or both of the
special forms *seq and **dct. If both forms are present, the form with two asterisks must be
last. *seq passes the items of seq to the function as positional arguments (after the normal
positional arguments, if any, that the call gives with the usual syntax). seq may be any iterable.
**dct passes the items of dct to the function as named arguments, where dct must be a
dictionary whose keys are all strings. Each item's key is a parameter name, and the item's value is
the argument's value.
Page 46of 191

Schlumberger Private

As you can see, the two calls to divide are equivalent. You can pass named arguments for
readability purposes whenever you think that identifying the role of each argument and
controlling the order of arguments enhances your code's clarity.

Sometimes you want to pass an argument of the form *seq or **dct when the parameters use
similar forms. For example, using the function sum_args defined in that section (and shown again
here), you may want to print the sum of all the values in dictionary d. This is easy with *seq:
def sum_args(*numbers):
return sum(numbers)
print sum_args(*d.values( ))

(Of course, in this case, print sum(d.values( )) would be simpler and more direct!)
However, you may also pass arguments of the form *seq or **dct when calling a function that
does not use the corresponding forms in its parameters. In that case, of course, you must ensure
that iterable seq has the right number of items, or, respectively, that dictionary dct uses the right
names as its keys; otherwise, the call operation raises an exception.

Variables that are not local are known as global variables (in the absence of nested function
definitions). Global variables are attributes of the module object: Whenever a function's local
variable has the same name as a global variable, that name, within the function body, refers to
the local variable, not the global one. We express this by saying that the local variable hides the
global variable of the same name throughout the function body.
By default, any variable that is bound within a function body is a local variable of the function. If a
function needs to rebind some global variables, the first statement of the function must be:
global identifiers

where identifiers is one or more identifiers separated by commas (,). The identifiers listed in a
global statement refer to the global variables (i.e., attributes of the module object) that the
function needs to rebind.
Don't use global if the function body just uses a global variable (including mutating the object
bound to that variable if the object is mutable). Use a global statement only if the function body
rebinds a global variable (generally by assigning to the variable's name). As a matter of style,
don't use global unless it's strictly necessary, as its presence will cause readers of your program
to assume the statement is there for some useful purpose. In particular, never use global except
as the first statement in a function body.

Page 47of 191

Schlumberger Private

A function's parameters, plus any variables that are bound (by assignment or by other binding
statements, such as def) in the function body, make up the function's local namespace, also
known as local scope. Each of these variables is known as a local variable of the function.

A def statement within a function body defines a nested function, and the function whose body
includes the def is known as an outer function to the nested one. Code in a nested function's
body may access (but not rebind) local variables of an outer function, also known as free
variables of the nested function.
The simplest way to let a nested function access a value is often not to rely on nested scopes, but
rather to explicitly pass that value as one of the function's arguments. If necessary, the
argument's value can be bound when the nested function is defined by using the value as the
default for an optional argument. For example:
def percent1(a, b, c):
def pc(x, total=a+b+c): return (x*100.0) / total
print "Percentages are:", pc(a), pc(b), pc(c)

Here's the same functionality using nested scopes:


def percent2(a, b, c):

print "Percentages are:", pc(a), pc(b), pc(c)

In this specific case, percent1 has a tiny advantage: the computation of a+b+c happens only
once, while percent2's inner function pc repeats the computation three times. However, if the
outer function rebinds its local variables between calls to the nested function, repeating the
computation can be necessary. It's therefore advisable to be aware of both approaches, and
choose the most appropriate one case by case.
A nested function that accesses values from outer local variables is also known as a closure. The
following example shows how to build a closure:
def make_adder(augend):
def add(addend):
return addend+augend
return add

Closures are an exception to the general rule that the object-oriented mechanisms are the best
way to bundle together data and code. When you need specifically to construct callable objects,
with some parameters fixed at object construction time, closures can be simpler and more
effective than classes. For example, the result of make_adder(7) is a function that accepts a
single argument and adds 7 to that argument. An outer function that returns a closure is a
"factory" for members of a family of functions distinguished by some parameters, such as the
Page 48of 191

Schlumberger Private

def pc(x): return (x*100.0) / (a+b+c)

value of argument augend in the previous example, and may often help you avoid code
duplication.

Schlumberger Private
Page 49of 191

Classes
A class is a Python object with several characteristics:
You can call a class object as if it were a function. The call returns another object, known
as an instance of the class; the class is also known as the type of the instance.

A class has arbitrarily named attributes that you can bind and reference.

The values of class attributes can be descriptors (including functions or normal data
objects.

Class attributes bound to functions are also known as methods of the class.

A method can have a special Python-defined name with two leading and two trailing
underscores. Python implicitly invokes such special methods, if a class supplies them,
when various kinds of operations take place on instances of that class.

A class can inherit from other classes, meaning it delegates to other class objects the
lookup of attributes that are not found in the class itself.

An instance of a class is a Python object with arbitrarily named attributes that you can bind and
reference. An instance object implicitly delegates to its class the lookup of attributes not found in
the instance itself. The class, in turn, may delegate the lookup to the classes from which it
inherits, if any.
In Python, classes are objects (values) and are handled like other objects. Thus, you can pass a
class as an argument in a call to a function. Similarly, a function can return a class as the result of
a call. A class, just like any other object, can be bound to a variable (local or global), an item in a
container, or an attribute of an object. Classes can also be keys into a dictionary. The fact that
classes are ordinary objects in Python is often expressed by saying that classes are first-class
objects.

A descriptor is any new-style object whose class supplies a special method named _ _get_ _.
Descriptors that are class attributes control the semantics of accessing and setting attributes on
instances of that class. Roughly speaking, when you access an instance attribute, Python obtains
the attribute's value by calling _ _get_ _ on the corresponding descriptor, if any.

If a descriptor's class also supplies a special method named _ _set_ _, then the descriptor is
known as an overriding descriptor (or, by an older and slightly confusing terminology, a data
descriptor); if the descriptor's class supplies only _ _get_ _, and not _ _set_ _, then the
descriptor is known as a nonoverriding (or nondata) descriptor. For example, the class of function
Page 50of 191

Schlumberger Private

objects supplies _ _get_ _, but not _ _set_ _; therefore, function objects are nonoverriding
descriptors. Roughly speaking, when you assign a value to an instance attribute with a
corresponding descriptor that is overriding, Python sets the attribute value by calling _ _set_ _
on the descriptor.

A common task is to create instances of different classes depending on some condition, or to


avoid creating a new instance if an existing one is available for reuse. A common misconception is
that such needs might be met by having _ _init_ _ return a particular object, but such an
approach is absolutely unfeasible: Python raises an exception when _ _init_ _ returns any
value other than None. The best way to implement flexible object creation is by using an ordinary
function rather than calling the class object directly. A function used in this role is known as a
factory function.

class SpecialCase(object):
def amethod(self): print "special"
class NormalCase(object):
def amethod(self): print "normal"
def appropriateCase(isnormal=True):
if isnormal: return NormalCase( )
else: return SpecialCase( )
aninstance = appropriateCase(isnormal=False)
aninstance.amethod( )

# prints "special", as desired

The built-in object type is the ancestor of all built-in types and new-style classes. The object
type defines some special methods that implement the default semantics of objects:
_ _new_ _ _ _init_ _

Page 51of 191

Schlumberger Private

Calling a factory function is a flexible approach: a function may return an existing reusable
instance, or create a new instance by calling whatever class is appropriate. Say you have two
almost interchangeable classes (SpecialCase and NormalCase) and want to flexibly generate
instances of either one of them, depending on an argument. The following appropriateCase
factory function allows you to do just that

You can create a direct instance of object by calling object( ) without any arguments. The
call implicitly uses object._ _new_ _ and object._ _init_ _ to make and return an
instance object without attributes (and without even a _ _dict_ _ in which to hold
attributes). Such an instance object may be useful as a "sentinel," guaranteed to compare
unequal to any other distinct object.
_ _delattr_ _ _ _getattribute_ _ _ _setattr_ _
By default, an object handles attribute references using these methods of object.
_ _hash_ _ _ _repr_ _ _ _str_ _
Any object can be passed to functions hash and repr and to type str.

A subclass of object may override any of these methods and/or add others.

To build a class method, call built-in type classmethod and bind its result to a class attribute. Like
all binding of class attributes, this is normally done in the body of the class, but you may also
choose to perform it elsewhere. The only argument to classmethod is the function to invoke
when Python calls the class method. Here's how you can define and call a class method:
class ABase(object):
def aclassmet(cls): print 'a class method for', cls._ _name_ _
aclassmet = classmethod(aclassmet)
class ADeriv(ABase): pass bInstance = ABase( )
dInstance = ADeriv( )
ABase.aclassmet( )
bInstance.aclassmet( )
ADeriv.aclassmet( )
dInstance.aclassmet( )

#
#
#
#

prints:
prints:
prints:
prints:

a
a
a
a

class
class
class
class

method
method
method
method

for
for
for
for

ABase
ABase
ADeriv
ADeriv

Python supplies a built-in overriding descriptor type, which you may use to give a class's instances
properties.
Page 52of 191

Schlumberger Private

A class method is a method you can call on a class or on any instance of the class. Python binds
the method's first parameter to the class on which you call the method, or the class of the
instance on which you call the method; it does not bind it to the instance, as for normal bound
methods. There is no equivalent of unbound methods for class methods. The first parameter of a
class method is conventionally named cls. While it is never necessary to define class methods
(you could always alternatively define a normal function that takes the class object as its first
parameter), some programmers consider them to be an elegant alternative to such functions.

A property is an instance attribute with special functionality. You reference, bind, or unbind the
attribute with the normal syntax (e.g., print x.prop, x.prop=23, del x.prop). However, rather
than following the usual semantics for attribute reference, binding, and unbinding, these
accesses call on instance x the methods that you specify as arguments to the built-in type
property. Here's how you define a read-only property:
class Rectangle(object):
def _ _init_ _(self, width, height):
self.width = width
self.height = height
def getArea(self):
return self.width * self.height
area = property(getArea, doc='area of the rectangle')

Properties perform tasks similar to those of special methods _ _getattr_ _, _ _setattr_ _,


and _ _delattr_ _, but in a faster and simpler way. You build a property by calling built-in type
property and binding its result to a class attribute. Like all binding of class attributes, this is
normally done in the body of the class, but you may also choose to perform it elsewhere. Within
the body of a class C, use the following syntax:
attrib = property(fget=None,
fset=None, fdel=None,
doc=None)

When x is an instance of C and you reference x.attrib, Python calls on x the method you passed
as argument fget to the property constructor, without arguments. When you assign x.attrib =
value, Python calls the method you passed as argument fset, with value as the only argument.
When you execute del x.attrib, Python calls the method you passed as argument fdel,
without arguments. Python uses the argument you passed as doc as the docstring of the
attribute. All parameters to property are optional. When an argument is missing, the
Page 53of 191

Schlumberger Private

Each instance r of class Rectangle has a synthetic read-only attribute r.area, computed on the
fly in method r.getArea( ) by multiplying the sides of the rectangle. The docstring
Rectangle.area._ _doc_ _ is 'area of the rectangle'. Attribute r.area is read-only
(attempts to rebind or unbind it fail) because we specify only a get method in the call to
property, no set or del methods.

corresponding operation is forbidden (Python raises an exception when some code attempts that
operation). For example, in the Rectangle example, we made property area read-only, because
we passed an argument only for parameter fget, and not for parameters fset and fdel.

The crucial importance of properties is that their existence makes it perfectly safe and indeed
advisable for you to expose public data attributes as part of your class's public interface. If it ever
becomes necessary, in future versions of your class or other classes that need to be polymorphic
to it, to have some code executed when the attribute is referenced, rebound, or unbound, you
know you will be able to change the plain attribute into a property and get the desired effect
without any impact on any other code that uses your class (a.k.a. "client code"). This lets you
avoid goofy idioms, such as accessor and mutator methods, required by OO languages that lack
properties or equivalent machinery. For example, client code can simply use natural idioms such
as:
someInstance.widgetCounter += 1

someInstance.setWidgetCounter(someInstance.getWidgetCounter( ) + 1)

If at any time you're tempted to code methods whose natural names are something like getThis
or setThat, consider wrapping those methods into properties, for clarity.

All references to instance attributes for new-style instances proceed through special method _
_getattribute_ _. This method is supplied by base class object, where it implements all the
details of object attribute reference semantics. However, you may override _ _getattribute_
_ for special purposes, such as hiding inherited class attributes (e.g., methods) for your subclass's
instances. The following example shows one way to implement a list without append in the newstyle object model:
class listNoAppend(list):
def _ _getattribute_ _(self, name):
if name == 'append': raise AttributeError, name
return list._ _getattribute_ _(self, name)

An instance x of class listNoAppend is almost indistinguishable from a built-in list object, except
that performance is substantially worse, and any reference to x.append raises an exception.

Page 54of 191

Schlumberger Private

rather than being forced into contorted nests of accessors and mutators such as:

Due to the existence of descriptor types such as staticmethod and classmethod, which take as
their argument a function object, Python somewhat frequently uses, within class bodies, idioms
such as:
def f(cls,...):
...definition of f snipped...
f = classmethod(f)

Having the call to classmethod occur textually after the def statement may decrease code
readability because, while reading f's definition, the reader of the code is not yet aware that f is
destined to become a class method rather than an ordinary instance method. The code would be
more readable if the mention of classmethod could be placed right before, rather than after, the
def. Python 2.4 allows such placement, through the new syntax form known as decoration:
@classmethod def f(cls,...):
...definition of f snipped...

Decoration affords a handy shorthand for some higher-order functions (and other callables that
work similarly to higher-order functions). You may apply decoration to any def statement, not
just to def statements occurring in class bodies. You may also code custom decorators, which are
just higher-order functions, accepting a function object as an argument and returning a function
object as the result. For example, here is a decorator that does not modify the function it
decorates, but rather emits the function's docstring to standard output at function-definition
time:
def showdoc(f):
if f.__doc__:
print '%s: %s' % (f.__name__, f.__doc__)
else:
print '%s: No docstring!' % f.__name__
return f
Page 55of 191

Schlumberger Private

The @classmethod decoration must be immediately followed by a def statement and means that
f=classmethod(f) executes right after the def statement (for whatever name f the def
defines). More generally, @expression evaluates the expression (which must be a name, possibly
qualified, or a call) and binds the result to an internal temporary name (say, _ _aux); any such
decoration must be immediately followed by a def statement and means that f=_ _aux(f)
executes right after the def statement (for whatever name f the def defines). The object bound
to _ _aux is known as a decorator, and it's said to decorate function f.

@showdoc
def f1():# emits: f1: a docstring
""" string """
pass

@showdoc
def f2():# emits: f2: No docstring!
pass

Python 2.4 Decorators


Decorators are a powerful Python 2.4 feature that helps you reduce code
duplication and consolidate knowledge.
By Phillip Eby, Dr. Dobb's Journal
Apr 01, 2005
URL:http://www.ddj.com/dept/lightlang/184406073

Phillip is the author of the open-source Python libraries PEAK and PyProtocols, and has
contributed fixes and enhancements to the Python interpreter. He is the author of the
Python Web Server Gateway Interface specification (PEP 333). He can be contacted at
pje@telecommunity.com.

As software environments become more complex and programs get larger, it becomes
more and more necessary to find ways to reduce code duplication and scattering of
knowledge. While simple code duplication is easy to factor out into functions or methods,
Page 56of 191

Schlumberger Private

Decorators

more complex code duplication is not. For example, if a method needs to be wrapped in a
transaction, synchronized in a lock, or have its calls transmitted to a remote object, there
often is no simple way to factor out a function or method to be called, because the part of
the behavior that varies needs to be wrapped inside the common behavior.
A second and related problem is scattering of knowledge. Sometimes a framework needs
to be able to locate all of a program's functions or methods that have a particular
characteristic, such as "all of the remote methods accessible to users with authorization
X." The typical solution is to put this information in external configuration files, but then
you run the risk of configuration being out of sync with the code. For example, you might
add a new method, but forget to also add it to the configuration file. And of course, you'll

configuration file, and any renaming you do requires editing two files.
So no matter how you slice it, duplication is a bad thing for both developer productivity
and software reliabilitywhich is why Python 2.4's new "decorator" feature lets you
address both kinds of duplication. Decorators are Python objects that can register,
annotate, and/or wrap a Python function or method.
For example, the Python atexit module contains a register function that registers a
callback to be invoked when a Python program is exited. Without the new decorator
feature, a program that uses this function looks something like Listing One(a).
When Listing One(a) is run, it prints "Goodbye, world!" because when it exits, the

goodbye() function is invoked. Now look at the decorator version in Listing One(b), which
does exactly the same thing, but uses decorator syntax insteadan @ sign and
expression on the line before the function definition.

Page 57of 191

Schlumberger Private

be doing a lot more typing, because you'll have to put the method names in the

This new syntax lets the registration be placed before the function definition, which
accomplishes two things. First, you are made aware that the function is an atexit function
before you read the function body, giving you a better context for understanding the
function. With such a short function, it hardly makes a difference, but for longer functions
or methods, it can be very helpful to know in advance what you're looking at. Second, the
function name is not repeated. The first program refers to goodbye twice, so there is more
duplicationprecisely the thing we're trying to avoid.
Why Decorate?
The original motivation for adding decorator syntax was to allow class methods and static
methods to be obvious to someone reading a program. Python 2.2 introduced the

Two(b) shows the same code using decorator syntax, which avoids the unnecessary
repetitions of the method name, and gives you a heads-up that a classmethod is being
defined.
While this could have been handled by creating a syntax specifically for class or static
methods, one of Python's primary design principles is that: "Special cases aren't special
enough to break the rules." That is, the language should avoid having privileged features
that you can't reuse for other purposes. Since class methods and static methods in
Python are just objects that wrap a function, it would not make sense to create special
syntax for just two kinds of wrapping. Instead, a syntax was created to allow arbitrary
wrapping, annotation, or registration of functions at the point where they're defined.
Many syntaxes for this feature were discussed, but in the end, a syntax resembling Java
1.5 annotations was chosen. Decorators, however, are considerably more flexible than
Java's annotations, as they are executed at runtime and can have arbitrary behavior,
Page 58of 191

Schlumberger Private

classmethod and staticmethod built-ins, which were used as in Listing Two(a). Listing

while Java annotations are limited to only providing metadata about a particular class or
method.
Creating Decorators
Decorators may appear before any function definition, whether that definition is part of a
module, a class, or even contained in another function definition. You can even stack
multiple decorators on the same function definition, one per line.
But before you can do that, you first need to have some decorators to stack. A decorator
is a callable object (like a function) that accepts one argumentthe function being
decorated. The return value of the decorator replaces the original function definition. See

demonstrating that the mydecorator function is called when the function is defined.
For the first example decorator, I had it return the original function object unchanged, but
in practice, it's rare that you'll do that (except for registration decorators). More often,
you'll either be annotating the function (by adding attributes to it), or wrapping the function
with another function, then returning the wrapper. The returned wrapper then replaces the
original function. For example, the script in Listing Four prints "Hello, world!" because the

does_nothing function is replaced with the return value of stupid_decorator.


Objects as Decorators
As you can see, Python doesn't care what kind of object you return from a decorator,
which means that for advanced uses, you can turn functions or methods into specialized
objects of your own choosing. For example, if you wanted to trace certain functions'
execution, you could use something like Listing Five.

Page 59of 191

Schlumberger Private

the script in Listing Three(a), which produces the output in Listing Three(b),

When run, Listing Five prints "entering" and "exiting" messages around the "Hello, world"
function. As you can see, a decorator doesn't have to be a function; it can be a class, as
long as it can be called with a single argument. (Remember that in Python, calling a class
returns a new instance of that class.) Thus, the traced class is a decorator that replaces a
function with an instance of the traced class.
So after the hello function definition in Listing Five, hello is no longer a function, but is
instead an instance of the traced class that has the old hello function saved in its func
attribute.
When that wrapper instance is called (by the hello() statement at the end of the script),

the original function between printing trace messages.


Stacking Decorators
Now that we have an interesting decorator, you can stack it with another decorator to see
how decorators can be combined.
The script in Listing Six prints "Called with <class '__main__.SomeClass'>", wrapped in
"entering" and "exiting" messages. The ordering of the decorators determines the
structure of the result. Thus, someMethod is a classmethod descriptor wrapping a traced
instance wrapping the original someMethod function. So, outer decorators are listed
before inner decorators.
Therefore, if you are using multiple decorators, you must know what kind of object each
decorator expects to receive, and what kind of object it returns, so that you can arrange
them in a compatible wrapping order, so that the output of the innermost decorator is
compatible with the input of the next-outer decorator.
Page 60of 191

Schlumberger Private

Python's class machinery invokes the instance's __call__() method, which then invokes

Usually, most decorators expect a function on input, and return either a function or an
attribute descriptor as their output. The Python built-ins classmethod, staticmethod, and

property all return attribute descriptors, so their output cannot be passed to a decorator
that expects a function. That's why I had to put classmethod first in Listing Four. As an
experiment, try reversing the order of @traced and @classmethod in Listing Four, and
see if you can guess what will happen.
Functions as Decorators
Because most decorators expect an actual function as their input, some of them may not
be compatible with our initial implementation of @traced, which returns an instance of the

traced class. Let's rework @traced such that it returns an actual function object, so it'll be

Listing Seven provides the same functionality as the original traced decorator, but instead
of returning a traced object instance, it returns a new function object that wraps the
original function. If you've never used Python closures before, you might be a little
confused by this function-in-a-function syntax.
Basically, when you define a function inside of another function, any undefined local
variables in the inner function will take the value of that variable in the outer function. So
here, the value of func in the inner function comes from the value of func in the outer
function.
Because the inner function definition is executed each time the outer function is called,
Python actually creates a new wrapper function object each time. Such function objects
are called "lexical closures," because they enclose a set of variables from the lexical
scope where the function was defined.

Page 61of 191

Schlumberger Private

compatible with a wider range of decorators.

A closure does not actually duplicate the code of the function, however. It simply encloses
a reference to the existing code, and a reference to the free variables from the enclosing
function. In this case, that means that the wrapper closure is essentially a pointer to the
Python bytecode making up the wrapper function body, and a pointer to the local
variables of the traced function during the invocation when the closure was created.
Because a closure is really just a normal Python function object (with some predefined
variables), and because most decorators expect to receive a function object, creating a
closure is perhaps the most popular way of creating a stackable decorator.
Decorators with Arguments

create a pair of @require and @ensure decorators so that you can record a method's
precondition and postcondition. Python lets us specify arguments with our decorators; see
Listing Eight. (Of course, Listing Eight is for illustration only. A full-featured
implementation of preconditions and postconditions would need to be a lot more
sophisticated than this to deal with things like inheritance of conditions, allowing
postconditions to access before/after expressions, and allowing conditions to access
function arguments by name instead of by position.)
You'll notice that the require() decorator creates two closures. The first closure creates a
decorator function that knows the expr that was supplied to @require(). This means

require itself is not really the decorator function here. Instead, require returns the
decorator function, here called decorator. This is very different from the previous
decorators, and this change is necessary to implement parameterized decorators.
The second closure is the actual wrapper function that evaluates expr whenever the
original function is called. Try calling the test() function with different numbers of
Page 62of 191

Schlumberger Private

Many applications of decorators call for parameterization. For example, say you want to

arguments, and see what happens. Also, try changing the @require line to use a different
precondition, or stack multiple @require lines to combine preconditions. You'll also notice
that @require(expr="len(__args)==1") still works. Decorator invocations follow the same
syntax rules as normal Python function or method calls, so you can use positional
arguments, keyword arguments, or both.
Function Attributes
All of the examples so far have been things that can't be done quite so directly with Java
annotations. But what if all you really need is to tack some metadata onto a function or
method for later use? For this purpose, you may wish to use function attributes in your
decorator.

on a function object. For example, suppose you want to track the author of a function or
method, using an @author() decorator? You could implement it as in Listing Nine. In this
example, you simply set an author_name attribute on the function and return it, rather
than creating a wrapper. Then, you can retrieve the attribute at a later time as part of
some metadata-gathering operation.
Practicing "Safe Decs"
To keep the examples simple, I've been ignoring "safe decorator" practices. It's easy to
create a decorator that will work by itself, but creating a decorator that will work properly
when combined with other decorators is a bit more complex. To the extent possible, your
decorator should return an actual function object, with the same name and attributes as
the original function, so as not to confuse an outer decorator or cancel out the work of an
inner decorator.

Page 63of 191

Schlumberger Private

Function attributes, introduced in Python 2.1, let you record arbitrary values as attributes

This means that decorators that simply modify and return the function they were given
(like Listings Three and Nine), are already safe. But decorators that return a wrapper
function need to do two more things to be safe:

Set the new function's name to match the old function's name.
Copy the old function's attributes to the new function.

These can be accomplished by adding just three short lines to our old decorators.
(Compare the version of @require in Listing Ten with the original in Listing Eight.)
Before returning the wrapper function, the decorator function in Listing Ten changes the

wrapper function's name (by setting its __name__ attribute) to match the original

the original function's __dict__, so it will have all the same attributes that the original
function did. It also changes the wrapper function's documentation (its __doc__ attribute)
to match the original function's documentation. Thus, if you used this new @require()
decorator stacked over the @author() decorator, the resulting function would still have an

author_name attribute, even though it was a different function object than the original one
being decorated.
Putting It All Together
To illustrate, I'll use a few of these techniques to implement a complete, useful decorator
that can be combined with other decorators. Specifically, I'll implement an @synchronized
decorator (Listing Eleven) that implements Java-like synchronized methods. A given
object's synchronized methods can only be invoked by one thread at a time. That is, as
long as any synchronized method is executing, any other thread must wait until all the
synchronized methods have returned.

Page 64of 191

Schlumberger Private

function's name, and sets its __dict__ attribute (the dictionary containing its attributes) to

To implement this, you need to have a lock that you can acquire whenever the method is
executing. Then you can create a wrapping decorator that acquires and releases the lock
around the original method call. I'll store this lock in a _sync_lock attribute on the object,
automatically creating a new lock if there's no _sync_lock attribute already present.
But what if one synchronized method calls another synchronized method on the same
object? Using simple mutual exclusion locks would result in a deadlock in this case, so
we'll use a threading.RLock instead. An RLock may be held by only one thread, but it can
be recursively acquired and released. Thus, if one synchronized method calls another on
the same object, the lock count of the RLock simply increases, then decreases as the
methods return. When the lock count reaches zero, other threads can acquire the lock

There are two little tricks being done in Listing Eleven's wrapper code that are worth
knowing about. First, the code uses a try/except block to catch an attribute error in the
case where the object does not already have a synchronization lock. Since in the
common case the lock should exist, this is generally faster than using an if/then test to
check whether the lock exists (because the if/then test would have to execute every time,
but the AttributeError will occur only once).
Second, when the lock doesn't exist, the code uses the setdefault method of the object's
attribute dictionary (its __dict__) to either retrieve an existing value of _sync_lock, or to set
a new one if there was no value there before. This is important because it's possible that
two threads could simultaneously notice that the object has no lock, and then each would
create and successfully acquire its own lock, while ignoring the lock created by the other!
This would mean that our synchronization could fail on the first call to a synchronized
method of a given object.

Page 65of 191

Schlumberger Private

and can, therefore, invoke synchronized methods on the object again.

Using the atomic setdefault operation, however, guarantees that no matter how many
threads simultaneously detect the need for a new lock, they will all receive the same

RLock object. That is, one setdefault() operation sets the lock, then all subsequent
setdefault() operations receive that lock object. Therefore, all threads end up using the
same lock object, and thus only one is able to enter the wrapped method at a time, even if
the lock object was just created.
Conclusion
Python decorators are a simple, highly customizable way to wrap functions or methods,
annotate them with metadata, or register them with a framework of some kind. But, as a
relatively new feature, their full possibilities have not yet been explored, and perhaps the

links to a couple of lists of use cases that were posted to the mailing list for the
developers working on the next version of Python: http://mail.python.org/pipermail/pythondev/2004-April/043902.html and http://mail.python.org/pipermail/python-dev/ 2004April/044132.html.
Each message uses different syntax for decorators, based on some C#-like alternatives
being discussed at the time. But the actual decorator examples presented should still be
usable with the current syntax. And, by the time you read this article, there will likely be
many other uses of decorators out there. For example, Thomas Heller has been working
on experimental decorator support for the ctypes package (http://ctypes.sourceforge.net/),
and I've been working on a complete generic function package using decorators, as part
of the PyProtocols system (http://peak.telecommunity.com/ PyProtocols.html).
So, have fun experimenting with decorators! (Just be sure to practice "safe decs," to
ensure that your decorators will play nice with others.)
Page 66of 191

Schlumberger Private

most exciting uses haven't even been invented yet. Just to give you some ideas, here are

DDJ

Listing One
(a)
import atexit

def goodbye():
print "Goodbye, world!"
Schlumberger Private

atexit.register(goodbye)

(b)
import atexit

@atexit.register
def goodbye():
print "Goodbye, world!"

Back to article

Listing Two
(a)
class Something(object):
Page 67of 191

def someMethod(cls,foo,bar):
print "I'm a class method"
someMethod = classmethod(someMethod)

(b)
class Something(object):
@classmethod
def someMethod(cls,foo,bar):
print "I'm a class method"
Schlumberger Private

Back to article

Listing Three
(a)
def mydecorator(func):
print "decorating", func
return func
print "before definition"
@mydecorator

def some_function():
print "I'm never called, so you'll never see this message"
print "after definition"

Page 68of 191

(b)
before definition
decorating <function some_function at 0x00A933C0>
after definition

Back to article

Listing Four
def stupid_decorator(func):
Schlumberger Private

return "Hello, world!"


@stupid_decorator
def does_nothing():
print "I'm never called, so you'll never see this message"
print does_nothing

Back to article

Listing Five
class traced:
def __init__(self,func):
self.func = func

def __call__(__self,*__args,**__kw):

Page 69of 191

print "entering", __self.func


try:
return __self.func(*__args,**__kw)
finally:
print "exiting", __self.func
@traced
def hello():
print "Hello, world!"
hello()
Schlumberger Private

Back to article

Listing Six
class SomeClass(object):
@classmethod
@traced
def someMethod(cls):
print "Called with class", cls
Something.someMethod()

Back to article

Listing Seven
def traced(func):

Page 70of 191

def wrapper(*__args,**__kw):
print "entering", func
try:
return func(*__args,**__kw)
finally:
print "exiting", func
return wrapper

Back to article

Schlumberger Private

Listing Eight
def require(expr):
def decorator(func):
def wrapper(*__args,**__kw):
assert eval(expr),"Precondition failed"
return func(*__args,**__kw)
return wrapper
return decorator
@require("len(__args)==1")
def test(*args):
print args[0]
test("Hello world!")

Page 71of 191

Back to article

Listing Nine
def author(author_name):
def decorator(func):
func.author_name = author_name
return func
return decorator

Schlumberger Private

@author("Lemony Snicket")
def sequenceOf(unfortunate_events):
pass
print sequenceOf.author_name

# prints "Lemony Snicket"

Back to article

Listing Ten
def require(expr):
def decorator(func):
def wrapper(*__args,**__kw):
assert eval(expr),"Precondition failed"
return func(*__args,**__kw)
wrapper.__name__ = func.__name__

Page 72of 191

wrapper.__dict__ = func.__dict__
wrapper.__doc__ = func.__doc__
return wrapper
return decorator

Back to article

Listing Eleven
def synchronized(func):
def wrapper(self,*__args,**__kw):
Schlumberger Private

try:
rlock = self._sync_lock
except AttributeError:
from threading import RLock
rlock = self.__dict__.setdefault('_sync_lock',RLock())
rlock.acquire()
try:
return func(self,*__args,**__kw)
finally:
rlock.release()
wrapper.__name__ = func.__name__
wrapper.__dict__ = func.__dict__
wrapper.__doc__ = func.__doc__
Page 73of 191

return wrapper
class SomeClass:
"""Example usage"""
@synchronized
def doSomething(self,someParam):
"""This method can only be entered
by one thread at a time"""

Setting import path of a module to any directory


The key issue is to place the file python.pth specifying the package path where the
module is located and then importing the module

Example:
#python.pth
# This file allows access to the packages in d:\python\lib
# For this purpose, python.pth should be located in D:\Python24\Lib\site-packages\python.pth
d:\python\lib

assuming that there is a package wxpy in d:\python\lib and there is need to import
module widgets out of this package, usage would be:

import wxpy.widgets

Page 74of 191

Schlumberger Private

Modules, Packages, paths and imports

of course, all modules within the lib directory are accessible anytime, as if they were in
the standard python location. Hence, there is no need of any specific import:

import path

Notes on .pth files


#

A path configuration file is a file whose name has the form package.pth

#
#

its contents are additional items (one per line) to be added to sys.path.

Non-existing items are never added to sys.path, but no check is made that

the item refers to a directory (rather than a file).


Schlumberger Private

#
#

No item is added to sys.path more than once.

#
#

Blank lines and lines beginning with # are skipped.

Lines starting with import are executed.

Documentation

Pydoc
Docutils
Distributions of Python Software
A "Distribution" is a collection of files that represent a "Release" of a "Project" as of a particular
point in time, denoted by a "Version". Releases may have zero or more "Requirements", which
indicate what releases of another project the release requires in order to function. A Requirement
Page 75of 191

names the other project, expresses some criteria as to what releases of that project are
acceptable, and lists any "Extras" that the requiring release may need from that project. (An Extra
is an optional feature of a Release, that can only be used if its additional Requirements are
satisfied.)

Notice, by the way, that this definition of Distribution is broad enough to include directories
containing Python packages or modules, not just "built distributions" created by the distutils. For
example, the directory containing the Python standard library is a "distribution" by this definition,
and so are the directories you edit your project's code in! In other words, every copy of a project's
code is a "distribution", even if you don't take any special steps to make it one.

A "Pluggable Distribution" or "Pluggable", is an importable distribution that satisfies these


important additional properties:
1. Its project name and distribution format can be unambiguously determined from file or
directory names, without actually examining any file contents. (Most distutils distribution
formats cannot guarantee this, because they do not place any restrictions on project name
strings, and thus allow ambiguity as to what part of their filenames is the project name, and
what part is the version.)
2.

A pluggable distribution contains metadata identifying its release's version, requirements,


extras, and any additional requirements needed to implement those extras. It may also
contain other metadata specific to an application or framework, to support integrating the
pluggable's project with that application or framework.

Distributions that satisfy these two properties are thus "pluggable", because they can be
automatically discovered and "activated" (by adding them to sys.path), then used for importing
Python modules or accessing other resource files and directories that are part of the distributed
project.

Page 76of 191

Schlumberger Private

A "Project" is a library, framework, script, application, or collection of data or other files relevant to
Python. "Projects" must have unique names, in order to tell them apart. Currently, PyPI is useful
as a way of registering project names for uniqueness, because the 'name' argument to distutils
'setup()' command is used to identify the project on PyPI, as well as to generate Distributions' file
names.

The Working Set


The collection of distributions that are currently activated is called a Working Set. Note that a
Working Set can contain any importable distribution, not just pluggable ones. For example, the
Python standard library is an importable distribution that will usually be part of the Working Set,
even though it is not pluggable. Similarly, when you are doing development work on a project, the
files you are editing are also a Distribution. (And, with a little attention to the directory names
used, and including some additional metadata, such a "development distribution" can be made
pluggable as well.)

When Python runs a program, that program must have all its requirements met by importable
distributions in the working set. Initially, a Python program's Working Set consists only of the
importable distributions (whether pluggable or not) listed in sys.path, such as the directory
containing the program's __main__ script, and the directories containing the standard library and
site-packages. If these are the only distributions that the program requires, then of course that
program can run.

The Environment
A set of directories that may be searched for pluggable distributions is called an Environment. By
default, the Environment consists of all existing directories on sys.path, plus any distribution
sources registered with the runtime.

Given an Environment, and a Requirement to be satisfied, our proposed runtime facility would
search the environment for pluggable distributions that satisfy the requirement (and the
requirements of those distributions, recursively), such that it returns a list of distributions to be
added to the working set, or raises a DependencyNotFound error.

Page 77of 191

Schlumberger Private

However, if some of the requirements are not satisfied by the working set, this can lead to errors
that may be hard to diagnose. So, if a Python program were made part of a Project, and the
project explicitly defines its Requirements, which are then expressed as part of a Pluggable
Distribution, then a runtime facility could automatically attempt to locate suitable pluggables and
add them to the working set, or at least give a more specific error message if a requirement can't
be satisfied.

Note that a Working Set should not contain multiple distributions for the same project, so the
runtime system must not propose to add a pluggable distribution to a Working Set if that set
already contains a pluggable for the same project. If a project's requirements can't be met without
adding a conflicting pluggable to the working set, a VersionConflict error is raised. (Unlike a
working set, an Environment may contain more than one pluggable for a given project, because
these are simply distributions that are *available* to be activated.)

Python Eggs
"Python Eggs" are distributions in specific formats that implement the concept of a "Pluggable
Distribution". An egg may be a zipfile or directory whose name ends with '.egg', that contains
Python modules or packages, plus an 'EGG-INFO' subdirectory containing metadata. An egg
may also be a directory containing one or more 'ProjectName.egg-info' subdirectories with
metadata.

those projects to the working set.

The last form of egg is a '.egg-link' file. These exist to support symbolic linking on platforms that
do not natively support symbolic links (e.g. Windows). These consist simply of a single line
indicating the location of a directory that contains either an EGG-INFO or ProjectName.egg-info
subdirectory. This format will be used by project management utilities to add an in-development
distribution to the development Environment.

Initialization, Development, and Deployment

Page 78of 191

Schlumberger Private

The latter form is primarily intended to add discoverability to distributions that -- for whatever
reason -- cannot be restructured to the primary egg format. For example, by placing appropriate
.egg-info directories in site-packages, one could document what distributions are already installed
in that directory. While this would not make those releases capable of being individually
activated, it does allow the runtime system to be aware that any requirements for those projects
are already met, and to know that it should not attempt to add any other releases of

Pluggable distributions can be manually made part of the working set by modifying sys.path. This
can be done via PYTHONPATH, .pth files, or direct code manipulation. However, it is generally
more useful to put distributions in the working set by automatically locating them in an
appropriate Environment.

The default Environment is the directories already on sys.path, so simply placing pluggable
distributions in those directories suffices to make them available for adding to the working set.

But *something* must add them to the working set, even if it is just to designate the project the
current program is part of, so that its dependencies can be automatically resolved and added to
the working set. This means that either a program's start scripts must invoke the
runtime facility and make this initial request, or there must be some automatic means by which
this is accomplished.

For development, however, one does not generally want to have to "install" scripts that one is
actively editing. So, future versions of the runtime facility will have an option to automatically
create wrapper scripts that invoke the in-development versions of the scripts, rather than versions
installed in eggs. This will allow developers to continue to write scripts without embedding any
project or version information in them.

Essentially, for development purposes, there will be a tool to "install" an in-development


distribution to a given Environment, using a symlink or .egg-link file to include the distribution, and
generating wrapper scripts to invoke any "main program" scripts in the project. Thus, a user's

development Environment can include one or more projects whose source code he or she is
editing, as well as any number of built distributions. He or she can then also build source or
binary distributions of their project for deployment, whenever it is necessary or convenient to do
so.

Page 79of 191

Schlumberger Private

The EasyInstall program accomplishes this by creating wrapper scripts when a distribution is
installed. The wrapper scripts know what project the "real" script is part of, and so can ensure
that the right working set is active when the scripts run. The scripts' author does not need to
invoke the runtime facility directly, nor do they even need to be aware that it exists.

Disutils
Disutils
Setuptools
Easyinstall

Conversion to Executables

A little more complex in usage, but the results seem to be more reliable than py2exe. The trac
project site is here (doc proxy on d :).

After configuring it for one particular python interpreter, you need to run Makespec.py and
Build.py for each project.

Makespec uses parameters: --onefile D:\Python\Projects\Connector\unlockmsc\unlockmsc.py


And creates a specification in a folder.
Beware that there are two executable flavours: console and windows, also specifieable as
parameter.

Build.py uses the created specification as simple parameter: unlockmsc\unlockmsc.spec

Page 80of 191

Schlumberger Private

PyInstaller

Py2exe
Beware that there are two executable flavours: console and windows.

Recipe for plain python (console)


Adding the line import py2exe at the start of the normal distutils script setup.py and running
python setup.py py2exe will cause the building and dollection in a subdirectory of the
distribution root directory of an .exe file and one or more .dll files.

# setup.py

script for the use with py2exe

import py2exe
import sys, os

#this script is only useful for py2exe so just run that distutils command.
#that allows to run it with a simple double click.
sys.argv.append('py2exe')

setup(
options = {'py2exe': {
#'includes':['serial','struct'], #unclear what for it is needed
'excludes': ['javax.comm','TERMIOS','FCNTL'],
#'optimize': 2, #when this flag was set, serial library was not found!
'dist_dir': 'dist25',
}
},

name = "unlockmsc",

Page 81of 191

Schlumberger Private

from distutils.core import setup

console = [
{
'script': "unlockmsc.py",
},
],
#zipfile = "stuff.lib", # what for?
#packages=['serial'],

#when do you need it?

description = "msc unlocker",


version = "0.1",
author = "a aranda",
author_email = "aarandagemalto.com",

Schlumberger Private

url = "http://gemalto.com",
)

Recipe for wxpython (windows)

In principle the same as above, but instead of console= windows=.

How does py2exe work and what are all those files?
Let's start from the needed results going back to how py2exe does its job.
Python is an interpreted language and as long as Microsoft will not ship a Python interpreter (and
its accompanying class library) with every copy of its flagship operating systems products, there is
no direct way to execute a Python script on a vanilla Microsoft OS machine. For most casual user
of py2exe, that means that you must create an executable (.exe) file that when clicked on will just
run the script. This is what py2exe does for you. After py2exe has done its magic, you should have
a "dist" directory with all the files necessary to run your python script. No install necessary. Click
and run. No DLL hell, nothing else to download.
Page 82of 191

The actual executable. You can select a custom icon by using some specific
target options (see CustomIcons)

python??.dll

the python interpreter library. This is the brain of your executable

library.zip

This is a standard zip file where all the pure source modules will be inserted
(using the "zipfile" option, you can also select to put that file in a sub-directory
and with a different name)

*.pyd

The pyd files are actually standard Windows DLL (I used the useful depends.exe
to check things around). They are also standard modules for Python. A Python
program can import those pyd. Some applications build pyd to provide
accelerated features. Also they are necessary to provide support to native
functions of the operating system (see also CTypes to never have to use SWIG
again!). Those files also follow into the subdirectory where library.zip will be
installed

*.dll

some pyd probably have some DLL dependencies, and here they come

w9xpopen.exe This is needed on Win9x platform.

To run your program needs all those files as a necessary condition. But it might happen that this
is not a sufficient condition. For examples, as encodings are imported "by name". If you use a
feature that requires encodings, you will need to put an option to include encodings
unconditionally or to import it explicitly from one of your script. (see EncodingsAgain and
EvenMoreEncodings). Some other modules (eg pyserial-pyparallel) also conditionally import
modules for each platform. You can avoid the warning by putting the correct "ignores" options in
py2exe. Last but not least, modules like pygtk seem to create a module reference on-the-fly and
therefore the corresponding warnings also are harmless (see ExcludingDlls to learn how to
correct that).
An important point to note: the main script (the one passed as an option to "windows" or
"console" in your setup file) is not put with all the other files in the library.zip. Instead it is bytecompiled (see OptimizedBytecode for some details on optimization) and inserted into a named
resource in the executable shell. This technique also allows you to insert binary string in the final
Page 83of 191

Schlumberger Private

myprog.exe

executable (very nice if you want to add a custom version tag) through the usage of the
"other_resources" options for the target (see CustomDataInExe).

CGI
With the recent rise of web frameworks, like Ruby on Rails, Django, Turbogears and friends, it
might be fair to assume that the old workhorse CGI can be put out to pasture.
Well as recently as July [1], Bruce Eckel blogged about Testing Python CGIs.

CGI is a simple protocol that describes how a webserver passes http requests to a program, and
how that program makes a response.

The Python CGI Module makes understanding http requests very easy. To send a response, write
to stdout.

CGI is still a very good way of connecting simple programs to the web. It's also a great way of
cutting your teeth with web programming, particularly learning about http. In fact WSGI (the
new Python protocol that aims to allow components in a webstack to communicate) aims to be a
modern evolution of CGI.

Quite sophisticated programs can be written as CGIs, but they are more inefficient than modern
web frameworks, as each request is handled by a separate Python process.

I've hacked around with a few CGIs of my own. I've also written a couple of tutorials, if you're
interested in learning :

Page 84of 191

Schlumberger Private

The beauty of CGI is it's simplicity. Most of the request [2] is passed on stdin, and the program
passes a response back to the server on stdout.

Writing Web Applications as CGI - Part One


Writing Web Applications as CGI - Part Two

I have a nice utility modules.cgi to retrieve and display as html all the modules in a python
distribution.

Schlumberger Private
Page 85of 191

CGI Web Applications with Python, Part One


By Michael Foord | 2004-03-15

Contents

One of today's hottest topics is "web applications." Unlike traditional "shrinkwrapped"


"executable" software that runs locally on a desktop machine, a web application runs on a
centralized server and delivers its features via the Internet, usually via HTTP and a
common web browser. Web applications are increasingly popular, because they can be
accessed readily -- just point the browser to a URL -- and can be accessed
simultaneously by a number of users. Some web applications provide e-commerce (think
eBay) some provide entertainment (such as Yahoo! Games), and others, such as
Salesforce.com, manage enterprise information.
While Java, Perl, and PHP are often lauded as ideal programming languages for web
application development, Python is just as capable. Indeed, Python is perfectly suited to
delivering dynamic content across the Internet.
Page 86of 191

Schlumberger Private

Headers and Line Endings


o Hello World
User Interface and HTML FORMs
o Example Form
Receiving FORM Submissions
o Functions
o Using getform()
o Using getall()
o A List of Values
Experimenting Yourself
The Error 500 Checklist
Conclusion

The simplest way to create web applications with Python is to use the Common Gateway

Interface (CGI) 1. CGI is just another protocol: it describes how to connect clients to web
applications.
Normally, when you fetch static content from a web server, the server finds the file 2 that
you're requesting and sends it back in a response. For example, a request for

http://www.example.com/contact.html returns the HTML page contact.html. However, if


the request refers to a CGI script, then instead of returning the script (as content), the
script runs and the output of the script is sent in the response. CGI scripts generate
content dynamically in response to a request (and its parameters, as you'll see shortly).

the print statement. And contrary to its reputation, CGI is not necessarily slow. Even
though the Python interpreter launches for each and every script invocation, these days,
you should try CGI before choosing a more complex web application framework.

Let's dive into CGI programming with Python. This first of two parts explains the basics of
CGI, describes how HTML forms are sent, and explains how to process form input. The
next article provides an example application and covers more advanced CGI topics, such
as CGI environment variables, HTML templating, and Unicode.
All code in this article is intended to work with Python 2.2 and beyond.
Headers and Line Endings
Half the battle of writing a web application is returning the right headers in response to a
request. Sending valid headers isn't just important for the receiving client -- if your
program doesn't emit valid headers, the web server assumes that your script has failed
and displays the dreaded Error 500... Internal Server Error.
Page 87of 191

Schlumberger Private

Once you understand how CGI works, producing dynamic content is as simple as using

There are lots of different headers you can send 4. But at a minimum, you must send a
Content-Type

header (in fact, in many situations this may be the only header you need to

send) and you must end your list of headers with a blank line.
All headers are of the form header-type: header-value\r\n. The line ending \r\n is
required to comply with the relevant RFC 5. However, most clients and servers allow just
\n,

which is what you'll get as a normal line ending on UNIX type systems.

Hello World
Let's do the obligatory "Hello, World" program as a CGI:
#!/usr/bin/python
Schlumberger Private

import sys
try:
import cgitb
cgitb.enable()
except ImportError:
sys.stderr = sys.stdout

def cgiprint(inline=''):
sys.stdout.write(inline)
sys.stdout.write('\r\n')
sys.stdout.flush()

Page 88of 191

contentheader = 'Content-Type: text/html'

thepage = '''<html><head>
<title>%s</title>
</head><body>
%s
</body></html>
'''
h1 = '<h1>%s</h1>'
Schlumberger Private

if __name__ == '__main__':
cgiprint(contentheader) # content header
cgiprint()

# finish headers with blank line

title = 'Hello World'


headline = h1 % 'Hello World'

print thepage % (title, headline)

Let's walk through the code.


If you're running the CGI script on a Linux or Unix system, you must include the obligatory
"shebang" line (#!/usr/bin/python) at line 1 to tell the script where to find Python. 7
Page 89of 191

The next part of the script is a try/except block that attempts to import the cgitb module.
Normally, errors in a Python program are sent to sys.stderr. However, when running
CGIs, sys.stderr translates to the server error log. But constantly digging out errors from
the error log is a nuisance when debugging. Instead, cgitb pretty-prints tracebacks,
including useful information like variable values, to the browser. (This module was only
introduced in Python 2.2.) If the import fails, stderr redirects to stdout, which does a
similar, but not so effective job. (Do not use the cgitb module in production applications.
The information it displays includes details about your system that may be useful to a
would-be attacker.)
Next, cgiprint() emits two header lines and properly terminates headers with the correct

Content-Type

header. Because the script is sending a web page (which is a form of text)

the type/subtype is text/html. Only one header is sent, then the headers terminate with a
blank line.
cgiprint()

also flushes the output buffer using sys.stdout.flush(). Most servers buffer

the output of scripts until it's completed. For long running scripts, 8 buffering output may
frustrate your user, who'll wonder what's happening. You can either regularly flush your
buffer, or run Python in unbuffered mode. The command-line option to do this is -u, which
you can specify as #!/usr/bin/python -u in your shebang line.
Finally, the script sends a small HTML page, which should look very familiar to you, if
you've used HTML before.
User Interface and HTML FORMs
When writing CGIs, your user interface is the web browser. Combining Javascript,

Dynamic HTML, (DHTML), and HTML forms, you can create rich web applications.
Page 90of 191

Schlumberger Private

line endings. (cgiprint() need only be used for the header lines.) cgiprint() sends a

The basic HTML elements used to communicate with CGIs are forms and form input

components, including text boxes, radio buttons, check boxes, pulldown menus, and the
like. 9
Example Form
A typical, simple HTML form might be coded like this:
<form action="/cgi-bin/formprocessor.py" method="get">
What is Your Name : <input name="param1"
type="text" value="Joe Bloggs" /><br />
<input name="param2" type="radio" value="this"
Schlumberger Private

checked="checked" /> Select This<br />


<input name="param2" type="radio" value="that" />or That<br />
<input name="param3" type="checkbox" checked="checked" />
Check This<br />
<input name="param4" type="checkbox" checked="checked" />
and This Too ?<br />
<input name="hiddenparam" type="hidden" value="some_value" />

<input type="reset" />


<input type="submit" />
</form>

This translates into something like this (border added for effect):
Page 91of 191

What is Your Name :

Joe Bloggs

Select This
or That
Check This
and This Too ?
Reset

Submit Query

When the user hits the Submit button, his (or her) form settings are encapsulated into an
HTTP request. Inside the form tag are two parameters that determine how that
encapsulation occurs. The action parameter is the URI of your CGI script. This is where

the request. The two possible methods are GET and POST.
The simpler of the two encoding choices is GET. With GET, the form's values are encoded
to be "URL safe" 10 and are then added onto the end of the URL as a list of parameters.
With POST, the encoded values are sent as the body of the request, after the headers are
sent.
While GET is simpler, the length of URLs is limited. Hence, using GET imposes a maximum
limit on the form entry that can be sent. (About 1,000 characters is the limit for many
servers.) If you're using a form to get a long text entry from your form, use POST. POST is
more suitable for requests where more data is being sent.

11

One advantage of GET, though, is that you can encode values yourself into a normal
HTML link. This means parameters can be sent to your program without the user having
to hit a submit button. An encoded set of values looks like:
param1=value1&param2=value+2&param3=value%263
Page 92of 191

Schlumberger Private

the request is sent to. The method parameter specifies how the values are encoded into

(An http GET request has this string added to the URL.) So, the whole URL might become
something like http://www.someserver.com/cgi-

bin/test.py?param1=value1&param2=value+2 &param3=value%263.
The ? separates the URI of your script from the encoded parameters. The & characters
separate the parameters from each other. The + represents a space (which shouldn't be
sent as part of a URI, of course), and the %26 is the encoded value that represents an &.
& shouldn't be sent as part of a value or the CGI would think that a new parameter was
being sent.
If you encode your own values into a URL, use the function urllib.encode() from the
module like this:

Schlumberger Private

urllib

value_dict = { 'param_1' : 'value1', 'param_2' : 'value2' }


encoded_params = urllib.encode(value_dict)
full_link = script_url + '?' + encoded_params

Receiving FORM Submissions


HTML forms are encapsulated into requests in a way that equates well to Python's

dictionary data type. Each form input element has a name and a corresponding value.
For instance, if the item is a radio button, the value sent is the value of the selected
button. For example, in the form above, the radio button has the name param2 and its
value is either this or that. For a checkbox, say param3 or param4 above, the value sent
is off or on.
Now that you know the basics of how forms are encoded and sent to CGI, it's time to
introduce Python's cgi module. The cgi module is your interface to receiving form
submissions. It makes things very easy.
Page 93of 191

Reading form data is slightly complicated by two facts. First, form input element names
can be repeated, so values can be lists. (Think of a form that allows you to check all of
the answers that apply.) Second, by default, an input element that has no value -- such as
a text box that hasn't been filled in -- will be missing rather than just empty.
The FieldStorage() method of the cgi module returns an object that represents the form
data. It's almost a dictionary. Rather than repeat the page of the manual on using the cgi
module, let's look at a couple of general purpose functions that, given an object created
by FieldStorage(), do return dictionaries.
Functions
Schlumberger Private

def getform(theform, valuelist, notpresent='', nolist=False):


"""
This function, given a CGI form as a
FieldStorage instance, extracts the
data from it, based on valuelist
passed in. Any non-present values are
set to '' - although this can be
changed. (e.g. to return None so you
can test for missing keywords - where
'' is a valid answer but to have the
field missing isn't.) It also takes a
keyword argument 'nolist'. If this is
True list values only return their

Page 94of 191

first value.
"""
data = {}
for field in valuelist:
if not theform.has_key(field):
# if the field is not present (or was empty)
data[field] = notpresent
else:
# the field is present
Schlumberger Private

if type(theform[field]) != type([]):
# is it a list or a single item
data[field] = theform[field].value
else:
if not nolist:
# do we want a list ?
data[field] = theform.getlist(field)
else:
data[field] = theform.getfirst(field)
# just fetch the first item
return data

def getall(theform, nolist=False):


Page 95of 191

"""
Passed a form (cgi.FieldStorage
instance) return *all* the values.
This doesn't take into account
multipart form data (file uploads).
It also takes a keyword argument
'nolist'. If this is True list values
only return their first value.
"""
Schlumberger Private

data = {}
for field in theform.keys():
# we can't just iterate over it, but must use the keys() method
if type(theform[field]) == type([]):
if not nolist:
data[field] = theform.getlist(field)
else:
data[field] = theform.getfirst(field)
else:
data[field] = theform[field].value
return data

def isblank(indict):
Page 96of 191

"""
Passed an indict of values it checks
if any of the values are set. Returns
True if the indict is empty, else
returns False. I use it on the a form
processed with getform to tell if my
CGI has been activated without any
form values.
"""
Schlumberger Private

for key in indict.keys():


if indict[key]:
return False
return True

For almost all CGIs that receive input from a form, you'll know what parameters to expect.
(After all, you probably wrote the form.) If you pass the getform() function your
FieldStorage()

instance and a list of all parameters you expect to receive, it returns a

dictionary of values. Any missing parameters have the default value '', unless you modify
the notpresent keyword. If you want to make sure that you don't receive any list values,
set the nolist keyword. If a form variable was a list, nolist returns only the first value in
the list.
Or, if you want to retrieve all of the values sent by the form, use the getall() function
above. It also accepts the optional nolist keyword argument.

Page 97of 191

isblank()

is a special function: it performs a quick test to determine if allthe values in the

dictionary returned by getall() or getform() is empty. If it is, the CGI was called without
parameters. In that case, it's typical to generate a welcome page and a form. If the
dictionary isn't blank (isblank() returns False), there's a form to process.
Using getform()
In the next article, all of these functions properly will be used to build a basic application.
But to illustrate their use here, let's process a submission from the Example Form. This
program snippet needs the functions above and the first few lines from Hello World.
import cgi
Schlumberger Private

mainpage = '''<html><head><title>Receiving a \
Form</title></head><body>%s</body></html>'''
error = '''<h1>Error</h1><h2>No Form Submission Was Received</h2>'''
result = '''<h1>Receiving a Form Submission</h1>
<p>We received the following parameters from the form :</p>
<ul>
<li>Your name is "%s".</li>
<li>You selected "%s".</li>
<li>"this" is "%s". </li>
<li>"this too" is "%s". </li>
<li> A hidden parameter was sent "%s".</li>
</ul>
Page 98of 191

'''
possible_parameters = ['param1', 'param2', 'param3', 'param4', \
'hidden_param']

if __name__ == '__main__':
cgiprint(contentheader) # content header
cgiprint()

# finish headers with blank line

theform = cgi.FieldStorage()
Schlumberger Private

formdict = getform(theform, possible_parameters)


if isblank(formdict):
body = error
else:
name = formdict['param1']
radio = formdict['param2'] # should be 'this' or 'that'
check1 = formdict['param3'] # 'on' or 'off'
check2 = formdict['param4']
hidden = formdict['hidden_param']

body = result % (name, radio, check1, check2, hidden)

Page 99of 191

print mainpage % body

Let's walk through this code. There are three main chunks of html: mainpage is the frame
of the page, which just needs the body to be inserted into it. Then error displays if the
script is called without parameters. However, if the script is called from a form
submission, then the parameters are extracted and put into result.
The script prints the obligatory headers and then creates the FieldStorage instance to
represent the form submission. theform is then passed to the function getform(), along
with the list of expected parameters.
If no form submission was made, then all the values in the dictionary returned by
are blank ('' in fact). In this case isblank() returns True and body is set to be

the error message.


If a form was submitted, then isblank() returns False and the values from the dictionary
are extracted and inserted into result. The name variable contains the name entered into
the text box. The value from the radio button (in radio) is either this or that, depending
on which one was selected. check1 and check2 are either on or off, depending on
whether the checkboxes were checked. The hidden parameter is always returned.
Finally, the page is printed, displaying either the error or the results. Easy, no? Using
hidden values opens up the possibility of generating unique values and encoding them
into the form. These could link requests together, so you can dynamically tailor the
content for each user as they navigate through your application (but that's another story).
Using getall()
If the application were larger, with several possible forms, you might not know in advance
exactly which parameters are going to be present. In that case, you can use getall()
Page 100of 191

Schlumberger Private

getform()

instead of getform(). You can then check for the presence of specific parameters and
perform different actions based on which form has been submitted:
formdict = getall(theform)
if formdict.has_key('rating'):
process_feedback(formdict)
# user is submitting feedback
elif formdict.has_key('email'):
subscribe(formdict)
# user is subscribing to the email list
Schlumberger Private

else:
optionlist()
# display a form with all the options in

Using getall(), you can actually turn our last example script into something a bit more
generic and useful. :
import cgi

mainpage = '''<html><head><title>Receiving a \
Form</title></head><body>%s</body></html>'''
result = '''<h1>Receiving a Form Submission</h1>
<p>We received the following parameters from the form :</p>
<ul>%s</ul>'''

Page 101of 191

li = "<li>%s = %s</li>"

if __name__ == '__main__':
cgiprint(contentheader) # content header
cgiprint()

# finish headers with blank line

theform = cgi.FieldStorage()
formdict = getall(theform)
params = []
Schlumberger Private

for entry in formdict:


params.append(li % (entry, str(formdict[entry])))

print mainpage % (result % ''.join(params))

This code gets all the parameters submitted to it using getall(). It then inserts them into
the page as an unordered list. If you send this script a form submission, the page it
displays shows you all the parameters received, where each line will look like parameter
= value.

Because the line of code that produces this uses the str() function for each

value, it can cope with list values.


A List of Values
As mentioned before, it's possible for different parameters in the form to have the same
name. In this case, the value returned in the FieldStorage is a list. You could use this to

Page 102of 191

gather information from your user. For example, a list of areas they are interested in for
newsletters you may be sending out:
<form action="/cgi-bin/formprocessor.py" method="get">
What is Your Name : <input name="name"
type="text" value="Joe Bloggs" /><br />
Email Address : <input name="email"
type="text" /><br />
<input name="interests" type="checkbox" value="computers" />Computers<br />
<input name="interests" type="checkbox" value="sewing" />Sewing<br />
Schlumberger Private

<input name="interests" type="checkbox" value="ballet" />Ballet<br />


<input name="interests" type="checkbox" value="scuba" />Scuba Diving<br />
<input name="interests" type="checkbox" value="cars" />Cars<br />

<input type="reset" />


<input type="submit" />
</form>

When the form above is submitted, it will have a value for the users name, their email
address, and a list of all the interests they checked. The code to directly fetch the value
from the FieldStorage instance is:
import cgi
theform = cgi.FieldStorage()

Page 103of 191

interests = theform['interests'].value

The difficulty have is that if the user only checks one choice, then interests is a single
value rather than the list we are expecting. The alternative is to use the higher level
methods available in FieldStorage.
The getlist() method always returns a list, even if only a single value was supplied. If no
boxes at all were checked, it returns an empty list.
import cgi
theform = cgi.FieldStorage()
interests = theform.getlist('interests')

needs when dealing with values that you expect to be lists.


Experimenting Yourself
You don't need an online server to test CGIs. You can code and debug web applications
on your local machine, which is good news for those who still pay for Internet access by
the minute. With a server running as localhost on your own machine, you can perform the
"code, test, tear out hair, debug, and repeat" cycle from the comfort of your own armchair.
Try Xitami_. It's a fast and lightweight web server, particularly for the Windows platform.
You need to take care when setting up the CGI on the server. It's not difficult, but there
are several steps that must be done.
If the script is going on another server, rather than your own machine, you will probably
have to upload it to the server with FTP. Your FTP client must be set to upload Python
scripts as text. Once copied to the right directory, set the permissions correctly for it to
Page 104of 191

Schlumberger Private

It would be very easy to adapt the getform() and getall() functions to your particular

run. Be sure to also the proper path in the shebang line for the server. (See The Error 500
Checklist section for a few other pitfalls.)
You can find a web page full of CGI examples at
http://www.voidspace.org.uk/python/cgi.shtml. These are available to test or download.
They include an online anagram generator and various smaller test scripts. There is also
a complete framework for doing user authentication and management from CGI, called
logintools.
The Error 500 Checklist
Debugging CGIs can be frustrating. By default any problem with your CGI script results in

can be helpful, if you can get access to the log.


However, more than half of 500 errors can be easily solved by checking the following
common sources of mistakes. You'd be surprised by how often one of these basic
gotchas will getcha!

Was your script uploaded to the server in 'text' mode? (Is your FTP client set to recognize
.py files as text?)
Have you set the script permissions to mode 755 (executable by everyone)? 12
Have you set the path to Python in the first line correctly?
Did you print valid header lines, including the final blank line?
Finally, some servers require the script to be in the cgi-bin folder (or a subdirectory) and
some even require the file extension to be .cgi rather than .py.

Conclusion
We've covered all the basics of CGI. The information here is enough to get you up and
running, and at least looking in the right direction for information.

Page 105of 191

Schlumberger Private

the anonymous error 500. Actual details of the error are written into the server log, which

There's a lot more though: character encoding, using templates to output similar HTML
code repeatedly, and finding out about the HTTP request the user sent, to mention just a
few topics.
In the next part of "Python at the Other End of the Web," we'll touch on these subjects
when we use what we've learnt so far to build an example application.
[1] The full CGI specification can be found at http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
[2]

There's actually no requirement for URLs to map directly to files, but for static content it's the
obvious way of doing it.

[4] Quick Reference to HTTP Headers http://www.cs.tut.fi/~jkorpela/http.html


[5]

The RFC stating that headers end '\r\n' is the very long RFC 2616. See
http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html

Most Python CGIs will run on Linux type servers. If header lines are sent using the normal
[6] 'print' command then they will be terminated with '\n'. This is technically invalid, but usually
won't matter.
[7]

#!/usr/bin/python is one of the more common ones. #!/usr/bin/env python and


#!/usr/local/bin/python are also common. It is likely that one of these will work.

On shared hosting accounts, CGI scripts are likely to be restricted to a maximum running time
of 60 seconds or even 30 seconds. After this, the server usually kills them. If you use your own
[8]
server this won't be a problem of course. Using something like mod_python may be a way
round this CGI restriction, or you can code around it by "chaining" requests.
[9] There is a good forms tutorial at
http://www.csd.abdn.ac.uk/~apreece/teaching/CS1009/practicals/forms.html (It's HTML

Page 106of 191

Schlumberger Private

One common alternative is to embed an interpreter into your server, for example using
Apache with mod_python. This means that the interpreter doesn't have to restart in between
[3]
requests. This can also make session management easier. It does introduce a whole host of
other problems of course. Another alternative is to use a special application server like Zope.

rather than XHTML, but it's still a nice reference.)


[10] The RFC defining URL encoding is RFC 1738. See http://rfc.net/rfc1738.html
As well as sending the values from forms in a POST request, it is also possible to allow file
uploads. This allows your user to select a file from their local hard drive and encode into
[11]
their request. An example CGI that can receive file uploads (including the HTML form needed
to allow it) can be found at http://www.voidspace.org.uk/python/cgi.shtml#upload
On a Linux server the script is run as nobody. This means that the script must be executable
[12] by everybody or it won't run as a CGI. It also means that any files it needs to access/write
over must be readable/writable by everybody.

Schlumberger Private

Opening word documents from Python

Quick receipt
import win32api
win32api.ShellExecute(0,None,"winword.exe",None,"",1)

more detailed receipt


if val[-4:].lower()==".doc":
print "doc"
import win32api
import os.path
if os.path.exists(val):
file=os.path.basename(val)
path=os.path.dirname(val)
print path,file
sfile= os.path.basename( win32api.GetShortPathName(val) )
spath= os.path.dirname( win32api.GetShortPathName(val) )
print spath +

"\\" + sfile

win32api.ShellExecute(0, # handle of parent window


# - 0 means no parent "open",

Page 107of 191

# operation, open, print


-

None is default

open

"winword.exe",
spath+ "\\"+sfile,

# params ofexecutable (in this case the filename)


# problem if the name has blanks in between

"",#os.path.normpath(spath), # initial directory for the application


1 # Specifies whether the application is shown when it is opened
)
else:
print "++++ %s not found" %val

Schlumberger Private

Printing word files


import struct
from time import sleep
from os import spawnv, P_NOWAIT
from dynwin import windll, windde

def PrintUsingWinWord(filename):
app = r"C:\MSOFFICE\WINWORD\WINWORD.EXE"
spawnv(P_NOWAIT, app, (app, "/n"))
print 'spawned WinWord, sleeping 5 seconds'
sleep(5)
s = windde.dde_session()
s.initialize()
CC = windll.membuf(36)
CC.write(struct.pack ('l',36) + (32 * '\000'))
retries = 10

while retries:
try:
s.connect('WinWord', 'System', CC)

Page 108of 191

s.execute('[FileOpen("' + filename + '")]')


#s.execute('[FilePrint 0][FileExit 2]')
s.uninitialize()
retries = 0
except:
print 'Unable to connect to WinWord, remain', retries, 'attempts'
print 'sleeping 10 seconds'
sleep(10)
retries = retries - 1
del s
return

Patterns
There are three categories of object-oriented patterns:

Creational patterns: patterns that can be used to create objects.


Structural patterns: patterns that can be used to combine objects and classes in order to build structured
objects.
Behavioral patterns: patterns that can be used to build a computation and to control data flows.

Creational patterns. There are two main categories of creational patterns: those for creating objects without having
to know the class name, that you could call "abstract object makers" (abstract factory and factory method), and those
to ensure a certain property regarding object creation, such as prohibiting more than one instance for a class
(singleton), building a set of instances from different classes in a consistent way (builder), or creating an instance with
a specific state (prototype).

Abstract factory: an abstract factory is an object maker, where, instead of specifying a class name, you
specify the kind of object you want. For instance, say that you want to create an agent to run analysis
programs, you can ask a factory to do it for you:
clustalw = AnalysisFactory.program('clustalw')
result = clustalw.run(seqfile = 'myseqs')
print result.alig

Page 109of 191

Schlumberger Private

Typology

The clustalw object is an instance of, say, the AnalysisAgent.Clustalw class, but you do not have to know
about it at creation time. The only thing you know is the name of the program you want ('clustalw'), and the
factory will do the rest for you.

Factory method: a factory method is very similar to an abstract factory: just, instead of being a class, it is a
method.
For instance, you can create a sequence object (Bio.Seq.Seq in Biopython) by asking the
get_seq_by_num method of an alignment object (Bio.Align.Generic.Alignment):
first_seq = align.get_seq_by_num(0)

The method which creates this instance of Bio.Seq.Seq is a factory method. The difference with a
factory class is also that the factory method is often more than an object maker: it sometimes
incorporates much more knowledge about the way to create the object than a factory would.

A more simple factory method would be a new method defined in a class to create new instances of
the same class:
my_scoring = scoring.new()

Another example could be a factory method in the Plot class:


p = Plot()

p.new_curve([...])

that would create an instance of the Curve class and draw it as soon as created.

Singleton: ensures that you cannot create more than one instance. For example, if you can define a class to
contain operations and data for genetic code: you need only one instance of this class to perform the task.
Actually, this pattern would not be implemented with a class in Python, but rather with a module, at least if
you can define it statically (a dynamic singleton could not be a module, for a module has to be a file):
>>> import genetic_code

>>> genetic_code.aa('TTT')

Prototype: this pattern also lets you create a new object without knowing its class, but here, the new target
object is created with the same state as the source object:
Page 110of 191

Schlumberger Private

In this case, notice that in order to create my_scoring, you really do not have to know the actual
class of scoring: the only thing you know is that you will get the same one, even if there is a whole
hierarchy of different classes of scoring.

another_seq = seq.copy()

The interest is that you do not get an "empty" object here, but an object identical to seq. You can thus play
with another_seq, change its attributes, etc... without breaking the original object.

Builder: you sometimes need to create a complex object composed of several parts. This is the role of the
builder.
For instance, a builder is needed to build the whole set of nodes and leafs of a tree.
Or you could design a builder to create both Curve and Plot instances in a coherent way as parts of a
GraphicalCurves complex object:
gc = GraphicalCurves(file='my_curves')

For instance, my_curves might contain description of set of curves to display in the same plot or in
different plots.
The Blast parser in Biopython simultaneously instantiates several classes that are all component
parts of of hit: Description, Alignment and Header.

Structural patterns. Structural patterns address issues regarding how to combine and structure objects. For this
reason, several structural patterns provide alternative solutions to design problems that would else involve inheritance
relationships between classes.

Decorator, proxy, adapter: these patterns all enable to combine two (or more) components, as shown in
Figure 18.10. There is one component, A, "in front" of another one, B. A is the visible object a client will see.
The role of A is either to extend or restrict B, or help in using B. So, this pattern is similar to subclassing,
except that, where a sub-class inherits a method from a base class, the decorator delegates to its decoratee
when it does not have the required method. The advantage is flexibility (see Section 18.4.2): you can combine
several of these components in any order at run time without having to create a big and complex hierarchy of
subclasses.
Delegation

Generally, the Python code of the A class looks like:


class A:
def __init__(self, b):
Page 111of 191

Schlumberger Private

"""storing of the decoratee b (b is an instance of class B)"""


self.b = b

def __getattr__(self, name):


"""
methods/attributes A does not know about are delegated
to b
"""
return getattr(self.b, name)

At use time, an instance of class A is created by providing a b instance.

a = A(b)
print a.f()

Everything that class A cannot perform is forwarded to b (providing that class B knows about it).

The decorator enables to add functionalities to another object. Example 18.8 shows a very simple
decorator that prints a sequence in uppercase.
An uppercase sequence class

import string

class UpSeq:

def __init__(self, seq):


self.seq = seq

Page 112of 191

Schlumberger Private

b = B()

def __str__(self):
return string.upper(self.seq)

def __getattr__(self,name):
return getattr(self.seq,name)

The way to use it is for instance:


>>> s=UpSeq(DNA(name='name1', seq='atcgctgtc'))
>>> print s
ATCGCTGTC

'atc'
>>> len(s)

The proxy rather handles the access to an object. There are several kinds of proxy:
protection proxy: to protect the access to an object.
virtual proxy: to physically fetch data only when needed. Database dictionaries in
Biopython work this way:
prosite = Bio.Prosite.ExPASyDictionary()

entry = prosite['PS00079']
Data are fetched only when an access to an entry is actually requested.

remote proxy: to simulate a local access for a remotely activated procedure.


The adapter (or wrapper) helps in connecting two components that have been developped
independantly and that have a different interface. For instance, the Pise package transforms Unix
programs interfaces in standardized interfaces, either Web interfaces, or API. For instance, the
golden Unix command has the following interface:
bash> golden embl:MMVASP

But the Pise wrapper enables to run it and get the result by a Python program, having an interface
defined in the Python language:
factory = PiseFactory()
Page 113of 191

Schlumberger Private

>>> s[0:3]

golden = factory.program("golden",db="embl",query="MMVASP")
job = golden.run()
print job->content()

Composite: this pattern is often used to handle complex composite recursive structures. Example 18.9 shows
a set of classes for a tree structure, illustrated in Figure 18.11. The main idea of the composite design pattern
is to provide an uniform interface to instances from different classes in the same hierarchy, where instances
are all components of the same composite complex object. In Example 18.9, you have two types of nodes:
Node and Leaf, but you want a similar interface for them, that is at least defined by a common base class,
AbstractNode, with two operations: print subtree. These operations should be callable on any node instance,
without knowing its actual sub-class.
>>> t1 = Node ( Leaf ( 'A', 0.71399),
Node ( Node ( Leaf('B', -0.00804),
Schlumberger Private

Leaf('C', 0.07470),
0.15685),
Leaf ('D', -0.04732)
)
)
>>> print t1
(A: 0.71399, ((B: -0.00804, C: 0.0747), D: -0.04732))
>>> t2 = t1.right.subtree()
>>> print t2
((B: -0.00804, C: 0.0747), D: -0.04732)
>>> t3 = t1.left.subtree()
>>> print t3
'A': 0.71399

A composite tree

Page 114of 191

class AbstractNode:

Schlumberger Private

def __str__(self):
pass

def subtree(self):
pass

class Node(AbstractNode):
def __init__(self, left=None, right=None, length=None):
self.left=left
self.right=right
self.length=length

def __str__(self):
return "(" + self.left.__str__() + ", " + self.right.__str__() + ")"

def subtree(self):
return Node(self.left, self.right)
Page 115of 191

class Leaf(AbstractNode):
def __init__(self, name, length=None):
self.name = name
self.length=length
self.left = None
self.right = None

def __str__(self):
return self.name + ": " + str(self.length)

Schlumberger Private

def subtree(self):
return Leaf(self.name, self.length)

Abstract class AbstractNode, base class for both Node and Leaf
Internal nodes are instances of Node class.
Leafs are instances of Leaf class.

Behavioral patterns. Patterns of this category are very useful in sequence analysis, where you often have to combine
algorithms and to analyze complex data structure in a flexible way.

Template: this pattern consists in separating the invariant part of an algorithm from the variant part. In a
sorting procedure, you can generally separate the function which compares items from the main body of the
algorithm. The template method, in this case, is the sort() method, whereas the compare() can be defined by
each subclass depending on its implementation or data types. In the dynamic programming align function, the
template methods can be the begin end and inner methods. The Scoring method may vary depending on the
preferred scoring scheme, as defined in subclasses of the Scoring class.
Strategy: it is the object-oriented equivalent of passing a function as an argument. The Scoring is a strategy.
Page 116of 191

Variations on methods. So, you don't necessarily need inheritance to have a function be a parameter. For
instance, the following function enables you to provide your own functions for score_gap and compare:
def align(matrice, begin, inner, end, score_gap, compare):
...

So, do we need a subclass for this or even to have a Scoring class at all? The answer is that with class and
inheritance, your methods are all pooled together in a "packet", but there is some additional burden on
your side, since you have to define a subclass and instantiate it. On the other hand, passing a function as a
parameter has a limit: you can't change the default values for parameters, such as the default value for a
gap initiation or extension in our example.

Usually, one distinguishes internal versus external iterators. An external iterator is an iterator which enables
to do a for or a while loop on a range of values that are returned by the iterator:
for e in l.elements():
f(e)

or:
i = l.iterator()
e = i.next()
while e:
f(e)
e = i.next()

In the above examples, you control the loop. On the other hand, an internal iterator just lets you define a
function or a method (say, f) to apply to all elements:
l.iterate(f)

Page 117of 191

Schlumberger Private

Iterator: an iterator is an object that let you browse a sequence of items from the beginning to the end.
Generally, it provides:
a method to start iteration
a method to get the next item
a method to test for the end of the iteration

In the Biopython package, files and databases are generally available through an iterator.
handle = open(...)
iter = Bio.Fasta.Iterator(handle)
seq = iter.next()
while seq:
print seq.name
print seq.seq
seq = iter.next()
handle.close()

Starting the iterator.

Testing for the end of the iteration.


Getting the next element.

Visitor: this pattern is useful to specify a function that will be applied on each item of a collection. The
Python map function provides a way to use visitors, such as the f function, which visits each item of the l list
in turn:
>>> def f(x):

return x + 1

>>> l=[0,1,2]

>>> map(f,l)

[1, 2, 3]

The map is an example of an internal iterator (with the f function as a visitor).

Observer The observer pattern provides a framework to maintain a consistent distributed state between
loosely coupled components. One agent, the observer, is in charge of maintaining a list of subscribers, e.g
components that have subscribed to be informed about changes in a given state. Whenever a change occurs in
a state, the observer has to inform each subscriber about it.
Page 118of 191

Schlumberger Private

Getting the next element.

mvc
A well-known example is the Model-View-Controller framework. The view components, the ones who actually
display data, subscribe to "edit events" in order to be able to refresh and redisplay them whenever a change occurs.
MVC can be organized around the classes:

Views, which represent the graphical display

Controllers, which handles user interaction with the widgets in a View.

Delegates, combines a View and a Controller.

Models, which are special mixins for your domain objects

Proxies, special types of Delegate designed to implement forms

deque objects
deque([iterable])
Returns a new deque objected initialized left-to-right (using append()) with data from
iterable. If iterable is not specified, the new deque is empty.
Deques are a generalization of stacks and queues (the name is pronounced
``deck'' and is short for ``double-ended queue''). Deques support thread-safe,
memory efficient appends and pops from either side of the deque with
approximately the same O(1) performance in either direction.
Though list objects support similar operations, they are optimized for fast fixedlength operations and incur O(n) memory movement costs for "pop(0)" and
"insert(0, v)" operations which change both the size and position of the
underlying data representation. New in version 2.4.
Deque objects support the following methods:
Page 119of 191

Schlumberger Private

DQueues

append(x)
Add x to the right side of the deque.
appendleft(x)
Add x to the left side of the deque.
clear()
Remove all elements from the deque leaving it with length 0.
extend(iterable)
Extend the right side of the deque by appending elements from the iterable argument.
extendleft(iterable)

pop()
Remove and return an element from the right side of the deque. If no elements are
present, raises an IndexError.
popleft()
Remove and return an element from the left side of the deque. If no elements are
present, raises an IndexError.
remove(value)
Removed the first occurrence of value. If not found, raises a ValueError. New in
version 2.5.
rotate(n)
Rotate the deque n steps to the right. If n is negative, rotate to the left. Rotating one step
to the right is equivalent to: "d.appendleft(d.pop())".
In addition to the above, deques support iteration, pickling, "len(d)", "reversed(d)",
"copy.copy(d)", "copy.deepcopy(d)", membership testing with the in operator, and
subscript references such as "d[-1]".
Page 120of 191

Schlumberger Private

Extend the left side of the deque by appending elements from iterable. Note, the series of
left appends results in reversing the order of elements in the iterable argument.

Example:
>>> from collections import deque
>>> d = deque('ghi')

# make a new deque with three items

>>> for elem in d:

# iterate over the deque's elements

...

print elem.upper()

G
H
I

# add a new entry to the right side

>>> d.appendleft('f')

# add a new entry to the left side

>>> d
deque

# show the representation of the

deque(['f', 'g', 'h', 'i', 'j'])

>>> d.pop()
item

# return and remove the rightmost

'j'
>>> d.popleft()
item

# return and remove the leftmost

'f'
>>> list(d)

# list the contents of the deque

['g', 'h', 'i']


>>> d[0]

# peek at leftmost item

'g'
>>> d[-1]

# peek at rightmost item

'i'
Page 121of 191

Schlumberger Private

>>> d.append('j')

>>> list(reversed(d))
reverse

# list the contents of a deque in

['i', 'h', 'g']


>>> 'h' in d

# search the deque

True
>>> d.extend('jkl')

# add multiple elements at once

>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> d.rotate(1)

# right rotation

>>> d

>>> d.rotate(-1)

Schlumberger Private

deque(['l', 'g', 'h', 'i', 'j', 'k'])


# left rotation

>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])

>>> deque(reversed(d))

# make a new deque in reverse order

deque(['l', 'k', 'j', 'i', 'h', 'g'])


>>> d.clear()

# empty the deque

>>> d.pop()

# cannot pop from an empty deque

Traceback (most recent call last):


File "<pyshell#6>", line 1, in -topleveld.pop()
IndexError: pop from an empty deque

>>> d.extendleft('abc')
order

# extendleft() reverses the input

Page 122of 191

>>> d
deque(['c', 'b', 'a'])

Recipes
This section shows various approaches to working with deques.
The rotate() method provides a way to implement deque slicing and deletion. For
example, a pure python implementation of del d[n] relies on the rotate() method to
position elements to be popped:
def delete_nth(d, n):

d.popleft()
d.rotate(n)

To implement deque slicing, use a similar approach applying rotate() to bring a target
element to the left side of the deque. Remove old entries with popleft(), add new
entries with extend(), and then reverse the rotation.
With minor variations on that approach, it is easy to implement Forth style stack
manipulations such as dup, drop, swap, over, pick, rot, and roll.
A roundrobin task server can be built from a deque using popleft() to select the
current task and append() to add it back to the tasklist if the input stream is not
exhausted:
def roundrobin(*iterables):
pending = deque(iter(i) for i in iterables)
while pending:
Page 123of 191

Schlumberger Private

d.rotate(-n)

task = pending.popleft()
try:
yield task.next()
except StopIteration:
continue
pending.append(task)

>>> for value in roundrobin('abc', 'd', 'efgh'):


...

print value

a
Schlumberger Private

d
e
b
f
c
g
h

Multi-pass data reduction algorithms can be succinctly expressed and efficiently coded by
extracting elements with multiple calls to popleft(), applying the reduction function,
and calling append() to add the result back to the queue.
For example, building a balanced binary tree of nested lists entails reducing two adjacent
nodes into one by grouping them in a list:
def maketree(iterable):
d = deque(iterable)
Page 124of 191

while len(d) > 1:


pair = [d.popleft(), d.popleft()]
d.append(pair)
return list(d)

>>> print maketree('abcdefgh')


[[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]]

Heap queue
This module provides an implementation of the heap queue algorithm, also known as the

Heaps are arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting
elements from zero. For the sake of comparison, non-existing elements are considered to
be infinite. The interesting property of a heap is that heap[0] is always its smallest element.
The API below differs from textbook heap algorithms in two aspects: (a) We use zerobased indexing. This makes the relationship between the index for a node and the
indexes for its children slightly less obvious, but is more suitable since Python uses zerobased indexing. (b) Our pop method returns the smallest item, not the largest (called a
"min heap" in textbooks; a "max heap" is more common in texts because of its suitability
for in-place sorting).
These two make it possible to view the heap as a regular Python list without surprises:
heap[0] is the smallest item, and heap.sort() maintains the heap invariant!

Page 125of 191

Schlumberger Private

priority queue algorithm.

To create a heap, use a list initialized to [], or you can transform a populated list into a
heap via function heapify().
The following functions are provided:
heappush(heap, item)
Push the value item onto the heap, maintaining the heap invariant.
heappop(heap)
Pop and return the smallest item from the heap, maintaining the heap invariant. If the
heap is empty, IndexError is raised.
heapify(x)

heapreplace(heap, item)
Pop and return the smallest item from the heap, and also push the new item. The heap
size doesn't change. If the heap is empty, IndexError is raised. This is more efficient
than heappop() followed by heappush(), and can be more appropriate when using a
fixed-size heap. Note that the value returned may be larger than item! That constrains
reasonable uses of this routine unless written as part of a conditional replacement:
if item > heap[0]:
item = heapreplace(heap, item)

Example of use:
>>> from heapq import heappush, heappop
>>> heap = []
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> for item in data:
...

heappush(heap, item)

...
Page 126of 191

Schlumberger Private

Transform list x into a heap, in-place, in linear time.

>>> sorted = []
>>> while heap:
...

sorted.append(heappop(heap))

...
>>> print sorted
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> data.sort()
>>> print data == sorted
True
>>>

nlargest(n, iterable[, key])


Return a list with the n largest elements from the dataset defined by iterable. key, if
provided, specifies a function of one argument that is used to extract a comparison key
from each element in the iterable: "key=str.lower"Equivalent to:
"sorted(iterable, key=key, reverse=True)[:n]" New in version 2.4.
Changed in version 2.5: Added the optional key argument.
nsmallest(n, iterable[, key])
Return a list with the n smallest elements from the dataset defined by iterable. key, if
provided, specifies a function of one argument that is used to extract a comparison key
from each element in the iterable: "key=str.lower"Equivalent to:
"sorted(iterable, key=key)[:n]" New in version 2.4. Changed in version 2.5:
Added the optional key argument.
Both functions perform best for smaller values of n. For larger values, it is more efficient
to use the sorted() function. Also, when n==1, it is more efficient to use the builtin
min() and max() functions.

Page 127of 191

Schlumberger Private

The module also offers two general purpose functions based on heaps.

Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for all k, counting elements
from 0. For the sake of comparison, non-existing elements are considered to be infinite.
The interesting property of a heap is that a[0] is always its smallest element.
The strange invariant above is meant to be an efficient memory representation for a
tournament. The numbers below are k, not a[k]:
0

10

11

12

13

14

15 16

17 18

19 20

21 22

23 24

25 26

27 28

29 30

In the tree above, each cell k is topping 2*k+1 and 2*k+2. In an usual binary tournament we
see in sports, each cell is the winner over the two cells it tops, and we can trace the
winner down the tree to see all opponents s/he had. However, in many computer
applications of such tournaments, we do not need to trace the history of a winner. To be
more memory efficient, when a winner is promoted, we try to replace it by something else
at a lower level, and the rule becomes that a cell and the two cells it tops contain three
different items, but the top cell "wins" over the two topped cells.
If this heap invariant is protected at all time, index 0 is clearly the overall winner. The
simplest algorithmic way to remove it and find the "next" winner is to move some loser
(let's say cell 30 in the diagram above) into the 0 position, and then percolate this new 0
down the tree, exchanging values, until the invariant is re-established. This is clearly
Page 128of 191

Schlumberger Private

logarithmic on the total number of items in the tree. By iterating over all items, you get an
O(n log n) sort.
A nice feature of this sort is that you can efficiently insert new items while the sort is going
on, provided that the inserted items are not "better" than the last 0'th element you
extracted. This is especially useful in simulation contexts, where the tree holds all
incoming events, and the "win" condition means the smallest scheduled time. When an
event schedule other events for execution, they are scheduled into the future, so they can
easily go into the heap. So, a heap is a good structure for implementing schedulers (this
is what I used for my MIDI sequencer :-).

are good for this, as they are reasonably speedy, the speed is almost constant, and the
worst case is not much different than the average case. However, there are other
representations which are more efficient overall, yet the worst cases might be terrible.
Heaps are also very useful in big disk sorts. You most probably all know that a big sort
implies producing "runs" (which are pre-sorted sequences, which size is usually related to
the amount of CPU memory), followed by a merging passes for these runs, which
merging is often very cleverly organised5.1. It is very important that the initial sort
produces the longest runs possible. Tournaments are a good way to that. If, using all the
memory available to hold a tournament, you replace and percolate items that happen to fit
the current run, you'll produce runs which are twice the size of the memory for random
input, and much better for input fuzzily ordered.
Moreover, if you output the 0'th item on disk and get an input which may not fit in the
current tournament (because the value "wins" over the last output value), it cannot fit in
the heap, so the size of the heap decreases. The freed memory could be cleverly reused
Page 129of 191

Schlumberger Private

Various structures for implementing schedulers have been extensively studied, and heaps

immediately for progressively building a second heap, which grows at exactly the same
rate the first heap is melting. When the first heap completely vanishes, you switch heaps
and start a new run. Clever and quite effective!
In a word, heaps are useful memory structures to know. I use them in a few applications,
and I think it is good to keep a `heap' module around. :-)

Footnotes
... organised5.1

Event scheduler
The sched module defines a class which implements a general purpose event scheduler:
class scheduler(timefunc, delayfunc)
The scheduler class defines a generic interface to scheduling events. It needs two
functions to actually deal with the ``outside world'' -- timefunc should be callable without
arguments, and return a number (the ``time'', in any units whatsoever). The delayfunc
function should be callable with one argument, compatible with the output of timefunc,
and should delay that many time units. delayfunc will also be called with the argument 0
after each event is run to allow other threads an opportunity to run in multi-threaded
applications.
Example:
Page 130of 191

Schlumberger Private

The disk balancing algorithms which are current, nowadays, are more annoying than
clever, and this is a consequence of the seeking capabilities of the disks. On devices which
cannot seek, like big tape drives, the story was quite different, and one had to be very
clever to ensure (far in advance) that each tape movement will be the most effective
possible (that is, will best participate at "progressing" the merge). Some tapes were even
able to read backwards, and this was also used to avoid the rewinding time. Believe me,
real good tape sorts were quite spectacular to watch! From all times, sorting has always
been a Great Art! :-)

>>> import sched, time


>>> s=sched.scheduler(time.time, time.sleep)
>>> def print_time(): print "From print_time", time.time()
...
>>> def print_some_times():
...

print time.time()

...

s.enter(5, 1, print_time, ())

...

s.enter(10, 1, print_time, ())

...

s.run()

...

print time.time()

...
Schlumberger Private

>>> print_some_times()
930343690.257
From print_time 930343695.274
From print_time 930343700.273
930343700.276

Mutual exclusion support


The mutex module defines a class that allows mutual-exclusion via acquiring and
releasing locks. It does not require (or imply) threading or multi-tasking, though it could be
useful for those purposes.
The mutex module defines the following class:
class mutex()
Page 131of 191

Create a new (unlocked) mutex.


A mutex has two pieces of state -- a ``locked'' bit and a queue. When the mutex is
not locked, the queue is empty. Otherwise, the queue contains zero or more
(function, argument)

pairs representing functions (or methods) waiting to acquire the

lock. When the mutex is unlocked while the queue is not empty, the first queue
entry is removed and its function(argument) pair called, implying it now has the lock.
Of course, no multi-threading is implied - hence the funny interface for lock(),
where a function is called once the lock is acquired.

Schlumberger Private

Coroutines and Threading

Threading
Threading is a technique for decoupling tasks which are not sequentially dependent. Threads
can be used to improve the responsiveness of applications that accept user input while other
tasks run in the background. A related use case is running I/O in parallel with computations in
another thread.
The following code shows how the high level threading module can run tasks in background
while the main program continues to run:
import threading, zipfile

class AsyncZip(threading.Thread):
def __init__(self, infile, outfile):
threading.Thread.__init__(self)
Page 132of 191

self.infile = infile
self.outfile = outfile
def run(self):
f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
f.write(self.infile)
f.close()
print 'Finished background zip of: ', self.infile

background = AsyncZip('mydata.txt', 'myarchive.zip')


Schlumberger Private

background.start()
print 'The main program continues to run in foreground.'

background.join()

# Wait for the background task to finish

print 'Main program waited until background was done.'


The principal challenge of multi-threaded applications is coordinating threads that share data
or other resources. To that end, the threading module provides a number of synchronization
primitives including locks, events, condition variables, and semaphores.
While those tools are powerful, minor design errors can result in problems that are difficult to
reproduce. So, the preferred approach to task coordination is to concentrate all access to a
resource in a single thread and then use the Queue module to feed that thread with requests
from other threads. Applications using Queue objects for inter-thread communication and
coordination are easier to design, more readable, and more reliable.

Queue
The Queue module is often used for inter-thread communication. This small example shows a
single Queue being created, as well as a Receiver object and a Sender object. The Sender
puts messages into the Queue, which the Receiver receives and prints out.:
Page 133of 191

import threading
from Queue import Queue

class Receiver(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
Schlumberger Private

x = self.queue.get() #blocks
print x

class Sender(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
self.queue.put("Hello")
self.queue.put("from")
self.queue.put("the")
Page 134of 191

self.queue.put("sender!")
break

q = Queue()
r = Receiver(q) #pass in the Queue
s = Sender(q) #pass in the same Queue
r.start()
s.start() # causes messages to get sent, which Receiver will print
s.join() #Only wait for s to end
Schlumberger Private

Python threads - a first example

If you have a process that you want to do several things at the same time, threads may be the
answer for you. They let you set up a series of processes (or sub-processes) each of which can be
run independently, but which can be brought back together later and/or co-ordinated as they
run.
For many applications, threads are overkill but on some occasions they can be useful.
A PYTHON APPLICATION WHERE THREADS WOULD HELP

Let's say that you want to check the availability of many computers on a network ... you'll use
ping. But there's a problem - if you "ping" a host that's not running it takes a while to timeout, so
that when you check through a whole lot of systems that aren't responding - the very time a
quick response is probably needed - it can take an age.
Here's a Python program that "ping"s 10 hosts in sequence.
Page 135of 191

import os
import re
import time
import sys
lifeline = re.compile(r"(\d) received")
report = ("No response","Partial Response","Alive")
print time.ctime()

Schlumberger Private

for host in range(60,70):


ip = "192.168.200."+str(host)
pingaling = os.popen("ping -q -c2 "+ip,"r")
print "Testing ",ip,
sys.stdout.flush()
while 1:
line = pingaling.readline()
if not line: break
igot = re.findall(lifeline,line)
if igot:
print report[int(igot[0])]
print time.ctime()

Running that program, it works but it's a bit slow:


[trainee@buttercup trainee]$ python alive
Mon May 9 05:22:51 2005
Testing 192.168.200.60 No response
Testing 192.168.200.61 No response
Testing 192.168.200.62 No response
Testing 192.168.200.63 No response
Testing 192.168.200.64 No response
Testing 192.168.200.65 No response
Testing 192.168.200.66 Alive
Testing 192.168.200.67 No response
Testing 192.168.200.68 No response
Testing 192.168.200.69 No response
Page 136of 191

Mon May 9 05:23:19 2005


[trainee@buttercup trainee]$

That was 28 seconds - in other words, an extra 3 seconds per unavailable host.
THE SAME APPLICATION, WRITTEN USING PYTHON THREADS

I'll write the application and test it first ... then add a few notes at the bottom.
import os
import re
import time
import sys
from threading import Thread
Schlumberger Private

class testit(Thread):
def __init__ (self,ip):
Thread.__init__(self)
self.ip = ip
self.status = -1
def run(self):
pingaling = os.popen("ping -q -c2 "+self.ip,"r")
while 1:
line = pingaling.readline()
if not line: break
igot = re.findall(testit.lifeline,line)
if igot:
self.status = int(igot[0])
testit.lifeline = re.compile(r"(\d) received")
report = ("No response","Partial Response","Alive")
print time.ctime()
pinglist = []
for host in range(60,70):
ip = "192.168.200."+str(host)
Page 137of 191

current = testit(ip)
pinglist.append(current)
current.start()
for pingle in pinglist:
pingle.join()
print "Status from ",pingle.ip,"is",report[pingle.status]
print time.ctime()

And running:

Schlumberger Private

[trainee@buttercup trainee]$ python kicking


Mon May 9 05:23:36 2005
Status from 192.168.200.60 is No response
Status from 192.168.200.61 is No response
Status from 192.168.200.62 is No response
Status from 192.168.200.63 is No response
Status from 192.168.200.64 is No response
Status from 192.168.200.65 is No response
Status from 192.168.200.66 is Alive
Status from 192.168.200.67 is No response
Status from 192.168.200.68 is No response
Status from 192.168.200.69 is No response
Mon May 9 05:23:39 2005
[trainee@buttercup trainee]$

3 seconds - much more acceptable than the 28 seconds that we got when we pinged the hosts
one by one and waited on each.
HOW DOES IT WORK?

We're going to run code concurrently to ping each host computer and (being Python), we create
an object for each of the concurrent tests (threads) we wish to run. Each of these objects inherits
from the Thread class so that we can use all of the logic already written in Python to provide our
parallelism.
Page 138of 191

Although the constructor builds the thread, it does not start it; rather, it leaves the thread based
object at the starting gate. The start method on the testit object actually triggers it off internally, the start method triggers the run method of the testit class and alse returns to the
calling code. In other words, this is the point at which the parallel processing actually starts, and
the run method is called indirectly (via a callback, in a similar way to sort and map make
callbacks).
Once parallel processes have started, you'll want some way to bring the responses back together
at the end and in this first simple example, I've used a join. Note that this is NOT the same join
method that you have in the string class ;-)
A join waits for a run method to terminate. So, having spawned a series of 10 pings in our
example, our code waits for them to finish .... and it waits in the order that they were started.
Some will be queuing up and completed long before our loop of joins gets to them, but that
doesn't matter!

If you're going to be writing a threaded application, there are (broadly) two main approaches you
can take. The example that I've shown above uses a separate thread to take each request
through from beginning to end and all the threads have the same structure. An alternative
strategy is to write a series of "worker" threads each of which performs a step in a multistep
process, and have them passing data on to one another. It's the difference between having an
employee to walk each order through your factory and having employees at a production line
passing each job on.
Where you're running threads, you have to be very much aware of the effect they can have on
one another. As soon as you have two workers, their work may interfere with one another and
you get involved in synchronisation (locking) of objects to ensure that this doesn't happen.
Locking / synchronisation brings its own further complications in that you have to involve
"deadlocks" where two threads both require two resources ... each grabs the first, then waits for
the second which (because the other has it) never becomes available.
FOOTNOTES

Page 139of 191

Schlumberger Private

LOOKING AHEAD - MORE ON THREADS

We've used the operating system's ping process in this example program. Ping responses vary
between operating systems and you may need to alter the ping command and regular expression
to match the response. The example above has been tested on Fedora Core Linux.
Threading makes heavy use of Operating System capabilities and is NOT as portable (no matter
what language you're programming in) than most code

Schlumberger Private
Page 140of 191

Understanding Threading in Python


By Krishna G Pai
When programming, in any language, the capability to spawn worker threads is integral to the
performance of any application. Whether it be running a separate thread to handle user
interaction in a GUI app, while running a potentially blocking process in the background (like your
browser is doing now), threading is essential. This document attempts to show what is possible
and what not while Threading in Python.

1.Why Threading in Python?

Let us say you write, in Python, a nifty utility that lets you filter your mail.

Thus threads increase the responsiveness of your programs. Threads also increase efficiency and
speed of a program, not to mention the algorithmic simplicity.

Combined with the power of python, this makes programming in python very attractive indeed.

The Basics

Let us first see how to start a simple thread. Threading is supported via the thread and
threading modules. These modules are supposed to be optional, but if you use an OS that
doesn't support threading, you'd better switch to Linux.

The code given below runs a simple thread in the background. (Text version)
Page 141of 191

Schlumberger Private

You build a GUI Frontend using PyGTK. Now if you embed the filter code in the frontend, you risk
making the application unresponsive (you still have a dial up connection, and any server
interaction entails a considerable waiting time). Since you don't work at Microsoft, you decide
this is unacceptable and thus you start a separate thread each time you want to filter your mail.

#!/usr/bin/env python

import time
import thread

def myfunction(string,sleeptime,*args):
while 1:

print string
Schlumberger Private

time.sleep(sleeptime) #sleep for a specified amount of time.

if __name__=="__main__":

thread.start_new_thread(myfunction,("Thread No:1",2))

while 1:pass

We start a new thread by using the start_new_thread() function which takes the address of the
object to be run, along with arguments to be passed to the object, which are passed in a tuple.

Locks

Now that we have one thread running, running multiple threads is as simple as calling
start_new_thread() multiple times. The problem now would be to synchronize the many threads
Page 142of 191

which we would be running. Synchronization is done using a Lock object. Locks are created
using the allocate_lock() factory function.
Locks are used as mutex objects, and are used for handling critical sections of code. A thread
enters the critical section by calling the acquire() method, which can either be blocking or nonblocking. A thread exits the critical section, by calling the release() method.
The following listing shows how to use the Lock object. (Text version)
#!/usr/bin/env python

import time
import thread
Schlumberger Private

def myfunction(string,sleeptime,lock,*args):
while 1:
#entering critical section
lock.acquire()
print string," Now Sleeping after Lock acquired for ",sleeptime
time.sleep(sleeptime)
print string," Now releasing lock and then sleeping again"
lock.release()
#exiting critical section
time.sleep(sleeptime) # why?

if __name__=="__main__":

lock=thread.allocate_lock()
Page 143of 191

thread.start_new_thread(myfunction,("Thread No:1",2,lock))
thread.start_new_thread(myfunction,("Thread No:2",2,lock))

while 1:pass

The code given above is fairly straight forward. We call lock.acquire() just before entering the
critical section and then call lock.release() to exit the critical section.
The inquisitive reader now may be wondering why we sleep after exiting the critical section.
Let us examine the output of the above listing.

Schlumberger Private

Output.

Thread No:2 Now Sleeping after Lock acquired for 2


Thread No:2 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2
Thread No:1 Now releasing lock and then sleeping again
Thread No:2 Now Sleeping after Lock acquired for 2
Thread No:2 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2
Thread No:1 Now releasing lock and then sleeping again
Thread No:2 Now Sleeping after Lock acquired for 2

Here every thread is given an opportunity to enter the critical section. But the same cannot be
said if we remove time.sleep(sleeptime) from the above listing.
Page 144of 191

Output without time.sleep(sleeptime)

Thread No:1 Now Sleeping after Lock acquired for 2


Thread No:1 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2
Thread No:1 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2
Thread No:1 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2
Thread No:1 Now releasing lock and then sleeping again
Schlumberger Private

Thread No:1 Now Sleeping after Lock acquired for 2


Thread No:1 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2
Thread No:1 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2
Thread No:1 Now releasing lock and then sleeping again
Thread No:1 Now Sleeping after Lock acquired for 2

Why does this happen? The answer lies in the fact that Python is not fully threadsafe. Unlike
Java, where threading was considered to be so important that it is a part of the syntax, in Python
threads were laid down at the altar of Portability.

In fact the documentation reads:

Page 145of 191

Not all built-in functions that may block waiting for I/O allow other threads to run. (The most
popular ones (time.sleep(), file.read(), select.select()) work as expected.)

It is not possible to interrupt the acquire() method on a lock -- the KeyboardInterrupt exception
will happen after the lock has been acquired.

Schlumberger Private
Page 146of 191

What this means is that quite probably any code like the following:

while 1:
lock.acquire()
.....
#some operation
.....
lock.release()
Schlumberger Private

will cause starvation of one or more threads.

The Global Interpreter Lock

Currently, The Python Interpreter (Python 2.3.4) is not thread safe. There are no priorities, no
thread groups. Threads cannot be stopped and suspended, resumed or interrupted. That is, the
support provided is very much basic. However a lot can still be accomplished with this meager
support, with the use of the threading module, as we shall see in the following sections. One
of the main reasons is that in actuality only one thread is running at a time. This is because of
some thing called a Global Interpreter Lock (GIL). In order to support multi-threaded Python
programs, there's a global lock that must be held by the current thread before it can safely access
Python objects. Without the lock competing threads could cause havoc, for example: when two
threads simultaneously increment the reference count of the same object, the reference count
could end up being incremented only once instead of twice. Thus only the thread that has
acquired the GIL may operate on Python Objects or call Python C API functions.

In order to support multi threaded Python programs the interpreter regularly releases and
reacquires the lock, by default every 10 bytecode instructions. This can however be changed
Page 147of 191

using the sys.setcheckinterval() function. The lock is also released and reacquired around
potentially blocking I/O operations like reading or writing a file, so that other threads can run
while the thread that requests the I/O is waiting for the I/O operation to complete.

In particular note:

C extensions can release the GIL.


Blocking I/O can release the GIL.

application uses sys.exc_info() to access the exception last raised in the current thread. There's
one global variable left, however: the pointer to the current PyThreadState structure. While
most thread packages have a way to store ``per-thread global data,'' Python's internal platform
independent thread abstraction doesn't support this yet. Therefore, the current thread state
must be manipulated explicitly. The global interpreter lock is used to protect the pointer to the
current thread state. When releasing the lock and saving the thread state, the current thread
state pointer must be retrieved before the lock is released (since another thread could
immediately acquire the lock and store its own thread state in the global variable). Conversely,
when acquiring the lock and restoring the thread state, the lock must be acquired before storing
the thread state pointer

-----------------------------------------------------------------Global Thread | Global Thread

| Global Thread

| Global Thread

Pointer 1

| Pointer 2

| Pointer 2

| Pointer 2

-----------------------------------------------------------------^
|

^
|

^
|

^
|

Page 148of 191

Schlumberger Private

The Python Interpreter keeps some book keeping info per thread, for which it uses a data
structure called PyThreadState. Earlier the state was stored in global variables and switching
threads could cause problems. In particular, exception handling is now thread safe when the

-----------------------------------------------|

Global

Interpreter

Lock

-----------------------------------------------^

Thread No

Thread No

Thread No
3

Thread No
4

Schlumberger Private

Using the Threading Module

Python manages to get a lot done using so little. The Threading module uses the built in thread
package to provide some very interesting features that would make your programming a whole
lot easier. There are in built mechanisms which provide critical section locks, wait/notify locks
etc. In particular we shall look at:

Using the Thread object


Profiling the Threaded code
Using the Condition, Event, and Queue object.
Using the Threading Library

Page 149of 191

The major Components of the Threading module are:

Lock object
RLock object
Semaphore Object
Condition Object
Event Object
Thread Object

The Thread Object


The Thread Object is a wrapper to the start_new_thread() function, which we saw earlier, but with a
little more functionality. The Thread object is never used directly, but only by subclassing the
threading.Thread interface. The user is supposed then to override possibly the __init__() or
run() function. Do not override the start() function, or provide more than one argument to run.

Note that you are supposed to call Thread.__init__() if you are overriding __init__().

Let us see a simple example:

#!/usr/bin/env python
Page 150of 191

Schlumberger Private

While we have visited the Lock object in the previous sections, the RLock object is something
new. RLock provides a mechanism for a thread to acquire multiple instances of the same lock,
each time incrementing the depth of locking when acquiring and decrementing the depth of
locking when releasing. RLock makes it very easy to write code which conforms to the classical
Readers Writers Problem. The Semaphore Object (rather the Semaphore Object Factory) is the
general implementation of the Semaphore mooted by Dijikstra. We shall understand the
implementation of the Condition, Event and Thread Objects via some examples.

#simple code which uses threads

import time
from threading import Thread

class MyThread(Thread):

def __init__(self,bignum):

Thread.__init__(self)
Schlumberger Private

self.bignum=bignum

def run(self):

for l in range(10):
for k in range(self.bignum):
res=0
for i in range(self.bignum):
res+=1

def test():
bignum=1000
thr1=MyThread(bignum)
Page 151of 191

thr1.start()
thr1.join()

if __name__=="__main__":
test()

There are 2 things to note here, the thread does not start running until the start() method is
called, and that join() makes the calling thread wait until the thread has finished execution.
So Far, So Good! However being ever curious we wonder wether there are any performance gains
in using threads.

It is the practice of every good programmer to profile his code, to find out his weak spots, his
strengths, and in general to know his inner soul ;-). And since we are dealing with the Tao of
Threading in Python, we might as well as ask ourselves which is the faster, two threads sharing
the load or one heavy duty brute force thread?

Which is faster?

thread1
--------

for i in range(bignum):
for k in range(bignum):
res+=i

thread2
---------

for i in range(bignum):
for k in range(bignum):
res+=i

Page 152of 191

Schlumberger Private

Profiling Threaded Code.

or?

thread 3
-------------------for i in range(bignum):
for k in range(bignum):
res+=i

for i in range(bignum):
for k in range(bignum):

Following the way of the masters we make no assumptions, and let the code do the talking.
Generally there are 2 ways to profile code in Python, the most common and comprehensive way
would be to use the profile.run() method, or time the execution of the code using
time.clock(). We shall do both. Consider the listing shown below.
#!/usr/bin/env python

#Let us profile code which uses threads


import time
from threading import Thread

class MyThread(Thread):

Page 153of 191

Schlumberger Private

res+=i

def __init__(self,bignum):

Thread.__init__(self)
self.bignum=bignum

def run(self):

for l in range(10):
for k in range(self.bignum):
res=0
Schlumberger Private

for i in range(self.bignum):
res+=1

def myadd_nothread(bignum):

for l in range(10):
for k in range(bignum):
res=0
for i in range(bignum):
res+=1

for l in range(10):
for k in range(bignum):
Page 154of 191

res=0
for i in range(bignum):
res+=1

def thread_test(bignum):
#We create 2 Thread objects for the 2 threads.
thr1=MyThread(bignum)
thr2=MyThread(bignum)

thr1.start()
Schlumberger Private

thr2.start()

thr1.join()
thr2.join()

def test():

bignum=1000

#Let us test the threading part

starttime=time.clock()
thread_test(bignum)
Page 155of 191

stoptime=time.clock()

print "Running 2 threads took %.3f seconds" % (stoptime-starttime)

#Now run without Threads.


starttime=time.clock()
myadd_nothread(bignum)
stoptime=time.clock()

print "Running Without Threads took %.3f seconds" % (stoptime-starttime)


Schlumberger Private

if __name__=="__main__":

test()

Profiling Threaded Code in Python

We get some surprising results, notably the following..

Running 2 threads took 0.000 seconds


Running Without Threads took 5.160 seconds

Page 156of 191

Being ever sceptically we try to profile the threaded code by running,


profile.run('test()'). What we see seems to add credence to the results achieved
earlier.
42 function calls in 5.170 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)


1 0.000 0.000 5.170 5.170 :1(?)
2 0.000 0.000 0.000 0.000 prof3.py:10(__init__)
Schlumberger Private

1 5.170 5.170 5.170 5.170 prof3.py:24(myadd_nothread)


1 0.000 0.000 0.000 0.000 prof3.py:38(thread_test)
1 0.000 0.000 5.170 5.170 prof3.py:50(test)
1 0.000 0.000 5.170 5.170 profile:0(prof3.test())
0 0.000

0.000

profile:0(profiler)

2 0.000 0.000 0.000 0.000 threading.py:147(Condition)


2 0.000 0.000 0.000 0.000 threading.py:152(__init__)
1 0.000 0.000 0.000 0.000 threading.py:180(_release_save)
1 0.000 0.000 0.000 0.000 threading.py:183(_acquire_restore)
1 0.000 0.000 0.000 0.000 threading.py:186(_is_owned)
1 0.000 0.000 0.000 0.000 threading.py:195(wait)
2 0.000 0.000 0.000 0.000 threading.py:356(_newname)
2 0.000 0.000 0.000 0.000 threading.py:373(__init__)
2 0.000 0.000 0.000 0.000 threading.py:387(_set_daemon)
4 0.000 0.000 0.000 0.000 threading.py:39(__init__)
Page 157of 191

2 0.000 0.000 0.000 0.000 threading.py:402(start)


6 0.000 0.000 0.000 0.000 threading.py:44(_note)
2 0.000 0.000 0.000 0.000 threading.py:468(join)
2 0.000 0.000 0.000 0.000 threading.py:507(isDaemon)
5 0.000 0.000 0.000 0.000 threading.py:609(currentThread)
As I said it seems to add credence, until we notice that run() isn't even called! What does this
teach us? Apart from distrusting the profiler, never follow any mechanical way of testing your
code. The reason why the output is misleading and the profiler silent is because both
time.clock() and the profiler measure the time spent in running the current thread.
How then do we measure time taken by the two threads? We use the time.time() function.

The correct way to profile the code is given in the following listing: Correct Way to Profile the
Code.

The results we now obtain are more realistic and reliable.

Running 2 threads took 5.125 seconds


Running Without Threads took 5.137 seconds

As we can see there is no significant difference between threaded and non threaded apps.

Condition, Event and Queue Objects.

Page 158of 191

Schlumberger Private

But doesn't time.time() give the absolute time, you ask? What about Context switches?
True, but since we are only interested in the measuring the total time taken and not the work
distribution, we can ignore context switches (and indeed the code has been structured to ignore
context switches).

Conditions are a way of synchronizing access between multiple threads, which wait for a
particular condition to be true to start any major processing. Condition Objects are a very elegant
mechanism by which it is possible to implent the Producer Consumer Problem. Indeed this is true
for anything in Python. Conditions take a lock object or if none is supplied creates its own
RLock object. A Thread waits for a particular condition to be true by using the wait()
function, while it signals another thread by using the notify() or notifyAll() method.

Let us see how the classical Producer Consumer Problem is solved using this.
#!/usr/bin/env python

#Let us profile code which uses threads


import thread
Schlumberger Private

import time
from threading import *

class itemQ:

def __init__(self):
self.count=0

def produce(self,num=1):
self.count+=num

def consume(self):
if self.count: self.count-=1

Page 159of 191

def isEmpty(self):
return not self.count

class Producer(Thread):

def __init__(self,condition,itemq,sleeptime=1):
Thread.__init__(self)
self.cond=condition
self.itemq=itemq
Schlumberger Private

self.sleeptime=sleeptime

def run(self):
cond=self.cond
itemq=self.itemq

while 1 :

cond.acquire() #acquire the lock


print currentThread(),"Produced One Item"
itemq.produce()
cond.notifyAll()
cond.release()

Page 160of 191

time.sleep(self.sleeptime)

class Consumer(Thread):

def __init__(self,condition,itemq,sleeptime=2):
Thread.__init__(self)
self.cond=condition
self.itemq=itemq
self.sleeptime=sleeptime
Schlumberger Private

def run(self):
cond=self.cond
itemq=self.itemq

while 1:
time.sleep(self.sleeptime)

cond.acquire() #acquire the lock

while itemq.isEmpty():
cond.wait()

itemq.consume()
Page 161of 191

print currentThread(),"Consumed One Item"


cond.release()

if __name__=="__main__":

q=itemQ()

Schlumberger Private

cond=Condition()

pro=Producer(cond,q)
cons1=Consumer(cond,q)
cons2=Consumer(cond,q)

pro.start()
cons1.start()
cons2.start()
while 1: pass

Producer Consumer Listing in Python

Page 162of 191

Here the currentThread() function returns the id of the currently running thread. Note that
wait() has an optional argument specifying the seconds it should wait before it times out. I
would discourage this use because they used a polling mechanism to implement this, and
according to the source code we poll atleast 20 times ever second. Once again I would like to
point out how not to use the Condition Object. Consider the following, borrowed from Python2.3.4/Libs/threading.py

def notify(self, n=1):


currentThread() # for side-effect
assert self._is_owned(), "notify() of un-acquire()d lock"
__waiters = self.__waiters
waiters = __waiters[:n]
Schlumberger Private

if not waiters:
if __debug__:
self._note("%s.notify(): no waiters", self)
return
self._note("%s.notify(): notifying %d waiter%s", self, n,
n!=1 and "s" or "")
for waiter in waiters:
waiter.release()
try:
__waiters.remove(waiter)
except ValueError:
pass

def notifyAll(self):
Page 163of 191

self.notify(len(self.__waiters))

Python-2.3.4/Lib/threading.py

What threading.py does is to maintain a list of threads waiting on the current condition. It
then attempts to notify them by removing the first n waiters from the list. And since it removes
the same first n waiters every time, this can potentially cause starvation to certain threads. So to
test our theory out, let us modify the Producer consumer listing given above by making the
following changes:

cons2=Consumer(cond,q,sleeptime=1)

This will potentially cause starvation to one of the threads depending upon how they are inserted
into the list. In fact a test run by making the above changes shows this to be the case. Thus you
will have to be carefull to potential pitfalls in using Python Threading. Notice that there is no
point in calling notifyAll() since it is inefficient and should be avoided.

By now you must have a pretty good idea about how to go about threading in Python. For closure
I shall briefly describe the Event and Queue Objects and describe how to use them.

The Event Object is actually a thin wrapper on the Condition Object so that we don't have to
mess about with locks. The methods provided are very much self explanatory: set() clear()
isSet() wait(timeout). One thing to note is that the Event Object uses notifyAll(),
thus use it only when necessary.

The code given is a simple example of the event object. event.py


Page 164of 191

Schlumberger Private

cons1=Consumer(cond,q,sleeptime=1)

Although Queues dont come under the Threading module, they do provide an easy interface
which should be suitable to sovle most problems. The main advantage of Queue is that it doesn't
implement the threading module. Thus you can instead use it with the thread module.
Queues are a simple and efficient way of implementing stack, priority queue etc. Since it handles
both data protection and synchronization. The methods used are put(item,block)
get(block) Queue(maxsize) qsize() empty() full().
A simple example using Queues is given in the following listing. q.py

References

I suggest you read the following for more info:


Schlumberger Private

threading.py
http://www.python.org/doc/current/api/threads.html
"http://starship.python.net/crew/aahz"

Page 165of 191

My own coroutine stuff


#cooperativetasks.py
"""

generators to achieve simple, cooperative multitasking (from cookbook)

"""
from itertools import chain
from types import GeneratorType
from collections import deque

def continuator(gen):
""" Yielding from generator used inside another generator """

Schlumberger Private

while True:
i = gen.next()
if isinstance(i, GeneratorType):
gen = chain(i, gen)
else:
yield i

class Task:
def __init__(self, pool):
self.generator = self.main()
pool.add(self)
def main(self):
"Must be a generator"
pass

class TaskPool:
"""
NOTE max speed ~~ 20000 task switches per second per 100MHz
NOTE using pyrex or psyco ~~ 25% speed improvement

Page 166of 191

NOTE ram usage ~~ 1KB per task


"""
def __init__(self):
self.tasks = deque()
def add(self, task):
self.tasks.append(task)
def iteration(self, iter_cnt=1):
tasks = self.tasks
for i in range(iter_cnt):
try:
tasks[0].generator.next()
tasks.rotate(-1)
except StopIteration:

Schlumberger Private

del tasks[0]
except IndexError:
# allow internal exception to propagate
if len(tasks) > 0: raise

#### EXAMPLE taskpool #########################################################

class ExampleTask(Task):
def __init__(self, pool, name, max_iterations):
self.name = name
self.max_iterations = max_iterations
Task.__init__(self, pool)
def main(self):
i = 0
while i < self.max_iterations:
print self.name, i
i += 1
yield 0

Page 167of 191

print self.name, 'finishing'

pool = TaskPool()
task_a = ExampleTask(pool, 'AAA',

5)

task_b = ExampleTask(pool, 'bbb', 10)

for i in xrange(100):

pool.iteration()

#microthreading.py

import sys,signal
# credit: original idea was based on an article by David Mertz
# http://gnosis.cx/publish/programming/charming_python_b7.txt
# some example 'microthread' generators

Schlumberger Private

def empty(name):
""" This is an empty task for demonstration purposes. """
while True:
print "<empty process>", name
yield None

def terminating(name, maxn):


""" This is a counting task for demonstration purposes. """
for i in xrange(maxn):
print "Here %s, %s out of %s" % (name, i, maxn)
yield None
print "Done with %s, bailing out after %s times" % (name, maxn)

def delay(duration=0.8):
""" Do nothing at all for 'duration' seconds. """
import time
while True:
print "<sleep %d s.>" % duration
time.sleep(duration)

Page 168of 191

yield None

class GenericScheduler(object):
""" the constructor accepts a list of microthreads to run cooperatively, relinquishing
control to the scheduler (via yield) or just completing (finishing off) via return.
whenever a task is completed, the scheduler supplants it via a noop task"""
def __init__(self, threads, stop_asap=False):
signal.signal(signal.SIGINT, self.shutdownHandler)
self.shutdownRequest = False
self.threads = threads
self.stop_asap = stop_asap

def shutdownHandler(self, n, frame):

Schlumberger Private

""" Initiate a request to shutdown cleanly on SIGINT."""


print "Request to shut down."
self.shutdownRequest = True

def schedule(self):
def noop( ):
while True:
#print '.',
yield None
n = len(self.threads)
while True:
for i, thread in enumerate(self.threads):
#enumerate(iterator1) returns an iterator2 producing tuples (count,value
from iterator1)
#print "#",list(enumerate(self.threads)) # [(0,...),...,(5,...)]
try: thread.next( )
except StopIteration:
if self.stop_asap: return
n -= 1

Page 169of 191

if n==0: return
self.threads[i] = noop( )
if self.shutdownRequest:
return

if __name__== "__main__":
s = GenericScheduler([ empty('boo'), delay( ), empty('foo'),
terminating('ant', 5), terminating('ar', 9), delay(0.5),
], stop_asap=False)
s.schedule( )
sys.exit(0)
s = GenericScheduler([ empty('boo'), delay( ), empty('foo'),
terminating('fie', 5), delay(0.5),

Schlumberger Private

], stop_asap=False)
s.schedule( )

#priorityqueue.py
"""
simple scheduler
"""
import heapq, time

class Scheduler(object):
def __init__(self):
self.queue = []
def add(self, job, t=0):
heapq.heappush(self.queue, [t, job])
def process_job(self):
t, task = heapq.heappop(self.queue)
time.sleep(t)
for job in self.queue:

Page 170of 191

job[0] -= t
task.run()
def process_loop(self):
while self.queue:
self.process_job()

class cq:
""" circular queue"""
def __init__(self,q):
self.q=q
def __iter__(self):
return self
def next(self):

Schlumberger Private

self.q=self.q[1:] +[self.q[0]]
return self.q[-1]

class WakeUp(object):
def run(self):
print "Wake up!"

class Heartbeat(object):
def run(self):
print "tick"
#scheduler.add(self, 1)

scheduler = Scheduler()
scheduler.add(Heartbeat(), 1)
scheduler.add(Heartbeat(), 0)
scheduler.add(WakeUp(), 5)

scheduler.process_loop()

Page 171of 191

cs= [(Heartbeat,0),(Heartbeat,0),(WakeUp,0)]
citer= cq(cs)

for task,i in citer:


scheduler.add(task(),i)
scheduler.process_loop()

Schlumberger Private
Page 172of 191

#coroutines_gvr.py
import collections

class Trampoline:
"""Manage communications between coroutines

A simple co-routine scheduler or "trampoline" that lets


coroutines "call" other coroutines by yielding the coroutine
they wish to invoke.

Any non-generator value yielded by

a coroutine is returned to the coroutine that "called" the


one yielding the value.

Similarly, if a coroutine raises an

exception, the exception is propagated to its "caller".

In

effect, this example emulates simple tasklets as are used

Schlumberger Private

in Stackless Python, as long as you use a yield expression to


invoke routines that would otherwise "block".
"""
running = False

def __init__(self):
self.queue = collections.deque()

def add(self, coroutine):


"""Request that a coroutine be executed"""
self.schedule(coroutine)

def run(self):
result = None
self.running = True
try:
while self.running and self.queue:
func = self.queue.popleft()
result = func()

Page 173of 191

return result
finally:
self.running = False

def stop(self):
self.running = False

def schedule(self, coroutine, stack=(), value=None, *exc):


def resume():
try:
if exc:
value = coroutine.throw(value,*exc)
else:

Schlumberger Private

value = coroutine.send(value)
except:
if stack:
# send the error back to the "caller"
self.schedule(
stack[0], stack[1], *sys.exc_info()
)
else:
# Nothing left in this pseudothread to
# handle it, let it propagate to the
# run loop
raise

if isinstance(value, types.GeneratorType):
# Yielded to a specific coroutine, push the
# current one on the stack, and call the new
# one with no args
self.schedule(value, (coroutine,stack))

Page 174of 191

elif stack:
# Yielded a result, pop the stack and send the
# value to the caller
self.schedule(stack[0], stack[1], value)

# else: this pseudothread has ended

self.queue.append(resume)

"""
A simple "echo" server, and code to run it using a trampoline
(presumes the existence of "nonblocking_read",
"nonblocking_write", and other I/O coroutines, that e.g. raise

Schlumberger Private

ConnectionLost if the connection is closed):


"""

# coroutine function that echos data back on a connected socket


def echo_handler(sock):
while True:
try:
data = yield nonblocking_read(sock)
yield nonblocking_write(sock, data)
except ConnectionLost:
pass

# exit normally if connection lost

# coroutine function that listens for connections on a


# socket, and then launches a service "handler" coroutine
# to service the connection
def listen_on(trampoline, sock, handler):
while True:
# get the next incoming connection
connected_socket = yield nonblocking_accept(sock)

Page 175of 191

# start another coroutine to handle the connection


trampoline.add( handler(connected_socket) )

#stub
def nonblocking_read(sock): pass
def nonblocking_write(sock,data):pass
def listening_socket(host,echo):
return 1

# Create a scheduler to manage all our coroutines


t = Trampoline()

Schlumberger Private

# Create a coroutine instance to run the echo_handler on


# incoming connections
server = listen_on( t, listening_socket("localhost","echo"), echo_handler )

# Add the coroutine to the scheduler


t.add(server)

# loop forever, accepting connections and servicing them


# "in parallel"
t.run()

Page 176of 191

Module threadpool
http://chrisarndt.de/en/software/python/threadpool/api/
Easy to use object-oriented thread pool framework.
A thread pool is an object that maintains a pool of worker threads to perform time consuming
operations in parallel. It assigns jobs to the threads by putting them in a work request queue, where
they are picked up by the next available thread. This then performs the requested operation in the
background and puts the results in a another queue.
The thread pool object can then collect the results from all threads from this queue as soon as they
become available or after all threads have finished their work. It's also possible, to define callbacks
to handle each result as it comes in.

Basic usage:
>>> pool = TreadPool(poolsize)
>>> requests = makeRequests(some_callable, list_of_args, callback)
>>> [pool.putRequest(req) for req in requests]
>>> pool.wait()

See the end of the module code for a brief, annotated usage example.

APPENDIX: Type Hierarchy


None

Page 177of 191

Schlumberger Private

The basic concept and some code was taken from the book "Python in a Nutshell" by Alex
Martelli, copyright 2003, ISBN 0-596-00188-6, from section 14.5 "Threaded Program
Architecture". I wrapped the main program logic in the ThreadPool class, added the WorkRequest
class and the callback system and tweaked the code here and there. Kudos also to Florent Aide for
the exception handling mechanism.

This type has a single value. There is a single object with this value. This object is accessed through the built-in name
None. It is used to signify the absence of a value in many situations, e.g., it is returned from functions that don't
explicitly return anything. Its truth value is false.

NotImplemented
This type has a single value. There is a single object with this value. This object is accessed through the built-in name
NotImplemented. Numeric methods and rich comparison methods may return this value if they do not implement
the operation for the operands provided. (The interpreter will then try the reflected operation, or some other
fallback, depending on the operator.) Its truth value is true.

Ellipsis
This type has a single value. There is a single object with this value. This object is accessed through the built-in name
Ellipsis. It is used to indicate the presence of the "..." syntax in a slice. Its truth value is true.

These are created by numeric literals and returned as results by arithmetic operators and arithmetic built-in
functions. Numeric objects are immutable; once created their value never changes. Python numbers are of course
strongly related to mathematical numbers, but subject to the limitations of numerical representation in computers.
Python distinguishes between integers, floating point numbers, and complex numbers:

Integers
These represent elements from the mathematical set of integers (positive and negative).
There are three types of integers:

Plain integers
These represent numbers in the range -2147483648 through 2147483647. (The range may be
larger on machines with a larger natural word size, but not smaller.) When the result of an
operation would fall outside this range, the result is normally returned as a long integer (in some
cases, the exception OverflowError is raised instead). For the purpose of shift and mask operations,
integers are assumed to have a binary, 2's complement notation using 32 or more bits, and hiding
no bits from the user (i.e., all 4294967296 different bit patterns correspond to different values).

Page 178of 191

Schlumberger Private

Numbers

Long integers
These represent numbers in an unlimited range, subject to available (virtual) memory only. For the
purpose of shift and mask operations, a binary representation is assumed, and negative numbers
are represented in a variant of 2's complement which gives the illusion of an infinite string of sign
bits extending to the left.

Booleans
These represent the truth values False and True. The two objects representing the values False and
True are the only Boolean objects. The Boolean type is a subtype of plain integers, and Boolean
values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that
when converted to a string, the strings "False" or "True" are returned, respectively.

Floating point numbers


These represent machine-level double precision floating point numbers. You are at the mercy of the
underlying machine architecture (and C or Java implementation) for the accepted range and handling of
overflow. Python does not support single-precision floating point numbers; the savings in processor and
memory usage that are usually the reason for using these is dwarfed by the overhead of using objects in
Python, so there is no reason to complicate the language with two kinds of floating point numbers.

Complex numbers
These represent complex numbers as a pair of machine-level double precision floating point numbers. The
same caveats apply as for floating point numbers. The real and imaginary parts of a complex number z can
be retrieved through the read-only attributes z.real and z.imag.

Sequences
These represent finite ordered sets indexed by non-negative numbers. The built-in function len() returns the number
of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of
sequence a is selected by a[i].

Page 179of 191

Schlumberger Private

The rules for integer representation are intended to give the most meaningful interpretation of
shift and mask operations involving negative integers and the least surprises when switching
between the plain and long integer domains. Any operation except left shift, if it yields a result in
the plain integer domain without causing overflow, will yield the same result in the long integer
domain or when using mixed operands.

Sequences also support slicing: a[i:j] selects all items with index k such that i <= k < j. When used as an expression, a
slice is a sequence of the same type. This implies that the index set is renumbered so that it starts at 0.
Some sequences also support ``extended slicing'' with a third ``step'' parameter: a[i:j:k] selects all items of a with
index x where x = i + n*k, n >= 0 and i <= x < j.

Sequences are distinguished according to their mutability:

Immutable sequences
An object of an immutable sequence type cannot change once it is created. (If the object contains
references to other objects, these other objects may be mutable and may be changed; however, the
collection of objects directly referenced by an immutable object cannot change.)
The following types are immutable sequences:

The items of a string are characters. There is no separate character type; a character is represented
by a string of one item. Characters represent (at least) 8-bit bytes. The built-in functions chr() and
ord() convert between characters and nonnegative integers representing the byte values. Bytes
with the values 0-127 usually represent the corresponding ASCII values, but the interpretation of
values is up to the program. The string data type is also used to represent arrays of bytes, e.g., to
hold data read from a file.

(On systems whose native character set is not ASCII, strings may use EBCDIC in their internal
representation, provided the functions chr() and ord() implement a mapping between ASCII and
EBCDIC, and string comparison preserves the ASCII order. Or perhaps someone can propose a
better rule?)

Unicode
The items of a Unicode object are Unicode code units. A Unicode code unit is represented by a
Unicode object of one item and can hold either a 16-bit or 32-bit value representing a Unicode
ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how
Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and
will be reported as two separate items. The built-in functions unichr() and ord() convert between
code units and nonnegative integers representing the Unicode ordinals as defined in the Unicode
Standard 3.0. Conversion from and to other encodings are possible through the Unicode method
encode() and the built-in function unicode().
Page 180of 191

Schlumberger Private

Strings

Tuples
The items of a tuple are arbitrary Python objects. Tuples of two or more items are formed by
comma-separated lists of expressions. A tuple of one item (a `singleton') can be formed by affixing
a comma to an expression (an expression by itself does not create a tuple, since parentheses must
be usable for grouping of expressions). An empty tuple can be formed by an empty pair of
parentheses.

Mutable sequences
Mutable sequences can be changed after they are created. The subscription and slicing notations can be
used as the target of assignment and del (delete) statements.
There is currently a single intrinsic mutable sequence type:

The items of a list are arbitrary Python objects. Lists are formed by placing a comma-separated list
of expressions in square brackets. (Note that there are no special cases needed to form lists of
length 0 or 1.)

The extension module array provides an additional example of a mutable sequence type.

Mappings
These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item
indexed by k from the mapping a; this can be used in expressions and as the target of assignments or del statements.
The built-in function len() returns the number of items in a mapping.
There is currently a single intrinsic mapping type:

Dictionaries
These represent finite sets of objects indexed by nearly arbitrary values. The only types of values not acceptable as
keys are values containing lists or dictionaries or other mutable types that are compared by value rather than by
object identity, the reason being that the efficient implementation of dictionaries requires a key's hash value to
remain constant. Numeric types used for keys obey the normal rules for numeric comparison: if two numbers
compare equal (e.g., 1 and 1.0) then they can be used interchangeably to index the same dictionary entry.
Page 181of 191

Schlumberger Private

Lists

Dictionaries are mutable; they can be created by the {...} notation (see section 5.2.6, ``Dictionary Displays'').
The extension modules dbm, gdbm, and bsddb provide additional examples of mapping types.

Callable types
These are the types to which the function call operation can be applied:

User-defined functions
A user-defined function object is created by a function definition (see section 7.6, ``Function definitions''). It should
be called with an argument list containing the same number of items as the function's formal parameter list.

Attribute

Meaning

func_doc

The function's documentation string, or None if unavailable

Writable

__doc__

Another way of spelling func_doc

Writable

func_name

The function's name

Writable

__name__

Another way of spelling func_name

Writable

__module__ The name of the module the function was defined in, or None if unavailable.

Writable

func_defaults A tuple containing default argument values for those arguments that have defaults,
Writable
or None if no arguments have a default value
func_code

The code object representing the compiled function body.

Writable

func_globals A reference to the dictionary that holds the function's global variables -- the global
namespace of the module in which the function was defined.

Readonly

func_dict

The namespace supporting arbitrary function attributes.

Writable

None or a tuple of cells that contain bindings for the function's free variables.

Readonly

func_closure

Most of the attributes labelled ``Writable'' check the type of the assigned value.
Page 182of 191

Schlumberger Private

Special attributes:

Changed in version 2.4: func_name is now writable.

Function objects also support getting and setting arbitrary attributes, which can be used, for example, to attach
metadata to functions. Regular attribute dot-notation is used to get and set such attributes. Note that the current
implementation only supports function attributes on user-defined functions. Function attributes on built-in functions
may be supported in the future.

Additional information about a function's definition can be retrieved from its code object; see the description of
internal types below.

User-defined methods
A user-defined method object combines a class, a class instance (or None) and any callable object (normally a userdefined function).

Methods also support accessing (but not setting) the arbitrary function attributes on the underlying function object.
User-defined method objects may be created when getting an attribute of a class (perhaps via an instance of that
class), if that attribute is a user-defined function object, an unbound user-defined method object, or a class method
object.

When the attribute is a user-defined method object, a new method object is only created if the class from which it is
being retrieved is the same as, or a derived class of, the class stored in the original method object; otherwise, the
original method object is used as it is.

When a user-defined method object is created by retrieving a user-defined function object from a class, its im_self
attribute is None and the method object is said to be unbound. When one is created by retrieving a user-defined
function object from a class via one of its instances, its im_self attribute is the instance, and the method object is said
to be bound. In either case, the new method's im_class attribute is the class from which the retrieval takes place,
and its im_func attribute is the original function object.
Page 183of 191

Schlumberger Private

Special read-only attributes: im_self is the class instance object, im_func is the function object; im_class is the class
of im_self for bound methods or the class that asked for the method for unbound methods; __doc__ is the method's
documentation (same as im_func.__doc__); __name__ is the method name (same as im_func.__name__);
__module__ is the name of the module the method was defined in, or None if unavailable. Changed in version 2.2:
im_self used to refer to the class that defined the method.

When a user-defined method object is created by retrieving another method object from a class or instance, the
behaviour is the same as for a function object, except that the im_func attribute of the new instance is not the
original method object but its im_func attribute.

When a user-defined method object is created by retrieving a class method object from a class or instance, its
im_self attribute is the class itself (the same as the im_class attribute), and its im_func attribute is the function
object underlying the class method.

When an unbound user-defined method object is called, the underlying function (im_func) is called, with the
restriction that the first argument must be an instance of the proper class (im_class) or of a derived class thereof.

When a user-defined method object is derived from a class method object, the ``class instance'' stored in im_self will
actually be the class itself, so that calling either x.f(1) or C.f(1) is equivalent to calling f(C,1) where f is the underlying
function.

Note that the transformation from function object to (unbound or bound) method object happens each time the
attribute is retrieved from the class or instance. In some cases, a fruitful optimization is to assign the attribute to a
local variable and call that local variable. Also notice that this transformation only happens for user-defined
functions; other callable objects (and all non-callable objects) are retrieved without transformation. It is also
important to note that user-defined functions which are attributes of a class instance are not converted to bound
methods; this only happens when the function is an attribute of the class.

Generator functions
A function or method which uses the yield statement is called a generator function. Such a function, when called,
always returns an iterator object which can be used to execute the body of the function: calling the iterator's next()
method will cause the function to execute until it provides a value using the yield statement. When the function
executes a return statement or falls off the end, a StopIteration exception is raised and the iterator will have reached
the end of the set of values to be returned.

Page 184of 191

Schlumberger Private

When a bound user-defined method object is called, the underlying function (im_func) is called, inserting the class
instance (im_self) in front of the argument list. For instance, when C is a class which contains a definition for a
function f(), and x is an instance of C, calling x.f(1) is equivalent to calling C.f(x, 1).

Built-in functions
A built-in function object is a wrapper around a C function. Examples of built-in functions are len() and math.sin()
(math is a standard built-in module). The number and type of the arguments are determined by the C function.
Special read-only attributes: __doc__ is the function's documentation string, or None if unavailable; __name__ is the
function's name; __self__ is set to None (but see the next item); __module__ is the name of the module the function
was defined in or None if unavailable.

Built-in methods
This is really a different disguise of a built-in function, this time containing an object passed to the C function as an
implicit extra argument. An example of a built-in method is alist.append(), assuming alist is a list object. In this case,
the special read-only attribute __self__ is set to the object denoted by list.

Class Types

Classic Classes
Class objects are described below. When a class object is called, a new class instance (also described below) is
created and returned. This implies a call to the class's __init__() method if it has one. Any arguments are passed on
to the __init__() method. If there is no __init__() method, the class must be called without arguments.

Class instances
Class instances are described below. Class instances are callable only when the class has a __call__() method;
x(arguments) is a shorthand for x.__call__(arguments).

Modules
Modules are imported by the import statement (see section 6.12, ``The import statement'').A module object has a
namespace implemented by a dictionary object (this is the dictionary referenced by the func_globals attribute of
functions defined in the module). Attribute references are translated to lookups in this dictionary, e.g., m.x is
equivalent to m.__dict__["x"]. A module object does not contain the code object used to initialize the module (since
it isn't needed once the initialization is done).

Page 185of 191

Schlumberger Private

Class types, or ``new-style classes,'' are callable. These objects normally act as factories for new instances of
themselves, but variations are possible for class types that override __new__(). The arguments of the call are passed
to __new__() and, in the typical case, to __init__() to initialize the new instance.

Attribute assignment updates the module's namespace dictionary, e.g., "m.x = 1" is equivalent to "m.__dict__["x"] =
1".

Special read-only attribute: __dict__ is the module's namespace as a dictionary object.

Predefined (writable) attributes: __name__ is the module's name; __doc__ is the module's documentation string, or
None if unavailable; __file__ is the pathname of the file from which the module was loaded, if it was loaded from a
file. The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension
modules loaded dynamically from a shared library, it is the pathname of the shared library file.

Classes

When a class attribute reference (for class C, say) would yield a user-defined function object or an unbound userdefined method object whose associated class is either C or one of its base classes, it is transformed into an unbound
user-defined method object whose im_class attribute is C. When it would yield a class method object, it is
transformed into a bound user-defined method object whose im_class and im_self attributes are both C. When it
would yield a static method object, it is transformed into the object wrapped by the static method object. See
section 3.4.2 for another way in which attributes retrieved from a class may differ from those actually contained in
its __dict__.
Class attribute assignments update the class's dictionary, never the dictionary of a base class.

A class object can be called (see above) to yield a class instance (see below).

Special attributes: __name__ is the class name; __module__ is the module name in which the class was defined;
__dict__ is the dictionary containing the class's namespace; __bases__ is a tuple (possibly empty or a singleton)
containing the base classes, in the order of their occurrence in the base class list; __doc__ is the class's
documentation string, or None if undefined.

Class instances
Page 186of 191

Schlumberger Private

Class objects are created by class definitions (see section 7.7, ``Class definitions''). A class has a namespace
implemented by a dictionary object. Class attribute references are translated to lookups in this dictionary, e.g., "C.x"
is translated to "C.__dict__["x"]". When the attribute name is not found there, the attribute search continues in the
base classes. The search is depth-first, left-to-right in the order of occurrence in the base class list.

A class instance is created by calling a class object (see above). A class instance has a namespace implemented as a
dictionary which is the first place in which attribute references are searched. When an attribute is not found there,
and the instance's class has an attribute by that name, the search continues with the class attributes. If a class
attribute is found that is a user-defined function object or an unbound user-defined method object whose associated
class is the class (call it C) of the instance for which the attribute reference was initiated or one of its bases, it is
transformed into a bound user-defined method object whose im_class attribute is C and whose im_self attribute is
the instance. Static method and class method objects are also transformed, as if they had been retrieved from
class C; see above under ``Classes''. See section 3.4.2 for another way in which attributes of a class retrieved via its
instances may differ from the objects actually stored in the class's __dict__. If no class attribute is found, and the
object's class has a

__getattr__() method, that is called to satisfy the lookup.


Attribute assignments and deletions update the instance's dictionary, never a class's dictionary. If the class has a

__setattr__() or __delattr__() method, this is called instead of updating the instance dictionary directly.

Special attributes: __dict__ is the attribute dictionary; __class__ is the instance's class.

Files
A file object represents an open file. File objects are created by the open() built-in function, and also by os.popen(),
os.fdopen(), and the makefile()method of socket objects (and perhaps by other functions or methods provided by
extension modules). The objects sys.stdin, sys.stdout and sys.stderr are initialized to file objects corresponding to the
interpreter's standard input, output and error streams. See the Python Library Reference for complete
documentation of file objects.

Internal types
A few types used internally by the interpreter are exposed to the user. Their definitions may change with future
versions of the interpreter, but they are mentioned here for completeness.

Code objects
Code objects represent byte-compiled executable Python code, or bytecode.
Page 187of 191

Schlumberger Private

Class instances can pretend to be numbers, sequences, or mappings if they have methods with certain special names.

Frame objects
Frame objects represent execution frames. They may occur in traceback objects .
Traceback objects
Traceback objects represent a stack trace of an exception. A traceback object is created when an exception
occurs.

Slice objects
Slice objects are used to represent slices when extended slice syntax is used. This is a slice using two colons, or
multiple slices or ellipses separated by commas, e.g., a[i:j:step], a[i:j, k:l], or a[..., i:j]. They are also created by the
built-in slice() function.

Static method objects

Class method objects


A class method object, like a static method object, is a wrapper around another object that alters the way in which
that object is retrieved from classes and class instances. The behaviour of class method objects upon such retrieval is
described above, under ``User-defined methods''. Class method objects are created by the built-in classmethod()
constructor

Basic date and time types


The datetime module supplies classes for manipulating dates and times in both simple
and complex ways. While date and time arithmetic is supported, the focus of the
implementation is on efficient member extraction for output formatting and manipulation.
There are two kinds of date and time objects: ``naive'' and ``aware''. This distinction refers
to whether the object has any notion of time zone, daylight saving time, or other kind of
Page 188of 191

Schlumberger Private

Static method objects provide a way of defeating the transformation of function objects to method objects described
above. A static method object is a wrapper around any other object, usually a user-defined method object. When a
static method object is retrieved from a class or a class instance, the object actually returned is the wrapped object,
which is not subject to any further transformation. Static method objects are not themselves callable, although the
objects they wrap usually are. Static method objects are created by the built-in staticmethod() constructor.

algorithmic or political time adjustment. Whether a naive datetime object represents


Coordinated Universal Time (UTC), local time, or time in some other timezone is purely
up to the program, just like it's up to the program whether a particular number represents
metres, miles, or mass. Naive datetime objects are easy to understand and to work
with, at the cost of ignoring some aspects of reality.
For applications requiring more, datetime and time objects have an optional time zone
information member, tzinfo, that can contain an instance of a subclass of the abstract
tzinfo class. These tzinfo objects capture information about the offset from UTC
time, the time zone name, and whether Daylight Saving Time is in effect. Note that no
concrete tzinfo classes are supplied by the datetime module. Supporting timezones

adjustment across the world are more political than rational, and there is no standard
suitable for every application.
The datetime module exports the following constants:
MINYEAR
The smallest year number allowed in a date or datetime object. MINYEAR is 1.
MAXYEAR
The largest year number allowed in a date or datetime object. MAXYEAR is 9999.
The datetime class does not directly support parsing formatted time strings. You can use
time.strptime to do the parsing and create a datetime object from the tuple it returns:
>>> s = "2005-12-06T12:13:14"
>>> from datetime import datetime
>>> from time import strptime
>>> datetime(*strptime(s, "%Y-%m-%dT%H:%M:%S")[0:6])
Page 189of 191

Schlumberger Private

at whatever level of detail is required is up to the application. The rules for time

datetime.datetime(2005, 12, 6, 12, 13, 14)

Decimal floating point arithmetic


New in version 2.4.
The decimal module provides support for decimal floating point arithmetic. It offers
several advantages over the float() datatype:

>>> getcontext().prec = 6

>>> Decimal(1) / Decimal(7)

Decimal("0.142857")

>>> getcontext().prec = 28

>>> Decimal(1) / Decimal(7)

Decimal("0.1428571428571428571428571429")

Both binary and decimal floating point are implemented in terms of published standards.
While the built-in float type exposes only a modest portion of its capabilities, the decimal
module exposes all required parts of the standard. When needed, the programmer has
full control over rounding and signal handling.
Page 190of 191

Schlumberger Private

Decimal numbers can be represented exactly. In contrast, numbers like 1.1 do not have
an exact representation in binary floating point. End users typically would not expect 1.1
to display as 1.1000000000000001 as it does with binary floating point.
The exactness carries over into arithmetic. In decimal floating point, "0.1 + 0.1 +
0.1 - 0.3" is exactly equal to zero. In binary floating point, result is
5.5511151231257827e-017. While near to zero, the differences prevent reliable
equality testing and differences can accumulate. For this reason, decimal would be
preferred in accounting applications which have strict equality invariants.
The decimal module incorporates a notion of significant places so that "1.30 + 1.20"
is 2.50. The trailing zero is kept to indicate significance. This is the customary
presentation for monetary applications. For multiplication, the ``schoolbook'' approach
uses all the figures in the multiplicands. For instance, "1.3 * 1.2" gives 1.56 while
"1.30 * 1.20" gives 1.5600.
Unlike hardware based binary floating point, the decimal module has a user settable
precision (defaulting to 28 places) which can be as large as needed for a given problem:

The module design is centered around three concepts: the decimal number, the context
for arithmetic, and signals.
A decimal number is immutable. It has a sign, coefficient digits, and an exponent. To
preserve significance, the coefficient digits do not truncate trailing zeroes. Decimals also
include special values such as Infinity, -Infinity, and NaN. The standard also
differentiates -0 from +0.
The context for arithmetic is an environment specifying precision, rounding rules, limits on
exponents, flags indicating the results of operations, and trap enablers which determine
whether signals are treated as exceptions. Rounding options include ROUND_CEILING,

ROUND_HALF_UP, and ROUND_UP.


Signals are groups of exceptional conditions arising during the course of computation.
Depending on the needs of the application, signals may be ignored, considered as
informational, or treated as exceptions. The signals in the decimal module are: Clamped,
InvalidOperation, DivisionByZero, Inexact, Rounded, Subnormal, Overflow,
and Underflow.
For each signal there is a flag and a trap enabler. When a signal is encountered, its flag is
incremented from zero and, then, if the trap enabler is set to one, an exception is raised.
Flags are sticky, so the user needs to reset them before monitoring a calculation.

Page 191of 191

Schlumberger Private

ROUND_DOWN, ROUND_FLOOR, ROUND_HALF_DOWN, ROUND_HALF_EVEN,

You might also like