Professional Documents
Culture Documents
Version 1.1.7
20 Februrary 2007
Contents
References ......................................................................................................................................... 4
Schlumberger Private
With.......................................................................................................................................... 33
Functions ...................................................................................................................................... 43
Classes .......................................................................................................................................... 50
Why Decorate? ........................................................................................................................ 58
Creating Decorators ................................................................................................................. 59
Objects as Decorators .............................................................................................................. 59
Stacking Decorators ................................................................................................................. 60
Functions as Decorators .......................................................................................................... 61
Decorators with Arguments ..................................................................................................... 62
Function Attributes .................................................................................................................. 63
Schlumberger Private
Disutils .......................................................................................................................................... 80
Conversion to Executables ............................................................................................................... 80
PyInstaller .................................................................................................................................... 80
Py2exe .......................................................................................................................................... 81
Recipe for plain python (console) ............................................................................................ 81
Recipe for wxpython (windows) .............................................................................................. 82
How does py2exe work and what are all those files? ............................................................. 82
CGI .................................................................................................................................................... 84
CGI Web Applications with Python, Part One.............................................................................. 86
Opening word documents from Python ........................................................................................ 107
Schlumberger Private
Queue....................................................................................................................................... 133
Python threads - a first example............................................................................................ 135
THE SAME APPLICATION, WRITTEN USING PYTHON THREADS ............................................. 137
Understanding Threading in Python .......................................................................................... 141
My own coroutine stuff ............................................................................................................. 166
Module threadpool .................................................................................................................... 177
APPENDIX: Type Hierarchy............................................................................................................. 177
Basic date and time types .............................................................................................................. 188
Decimal floating point arithmetic .................................................................................................. 190
PyERef
Schlumberger Private
References
Python Essential Reference (Beazley)
PyLib
Lexical structure
logical lines (consisting of one or more physical lines) with indentation of contiguous to
denote blocks of statements. Each indentation level uses four spaces. Tabs are replaced
by up to 8 spaces. Best is to expand automatically tabs to spaces when editing.
Character set is ascii per default; other character sets are allowed if the first comment
defines the codec (utf-8 or iso-8859-1) with the following coding directives:
Page 4of 191
Objects
All data values are objects, having a
Type, which determines which operations are supported, as well as the attributes
(properties) and items associated to the object (for instance, methods as covered
later on) and whether the objects value can be altered (mutability).
Schlumberger Private
A source file is a sequence of simple and compound statements. The former lies entirely within a
logical line and can be an expression or assignment. The latter controls one or more other
statements and controls their executions, by means of one or more clauses. A clause has a
header of the form <keyword> : and a body made out of a single or a block of statements.
Some objects allow you to change their content (without changing the identity or the type, that
is). Some objects don't allow you to change their content (more below).
so you can in fact change the type, as long as the internal structure (the "C-level type") is
the types involved must (to quote the checkin messages):
Schlumberger Private
identical.
this basically limits the feature to classes defined at the Python level (just like before the
unification); most attempts to use arbitrary types will fail. e.g.
Some objects have methods that allow you to change the contents of the object (modify it in place, that is).
Some objects only have methods that allow you to access the contents, not change it. Some objects don't have any
methods at all. Even if they have methods, you can never change the type, nor the identity.
Things like attribute assignment and item references are just syntactic sugar (more below).
Names
The names are a bit different they're not really properties of the object, and the object itself doesn't know what it's
Names live in namespaces (such as a module namespace, an instance namespace, a function's local namespace).
Namespaces are collections of (name, object reference) pairs (implemented using dictionaries).
When you call a function or a method, its namespace is initialized with the arguments you call it with (the names are
taken from the function's argument list, the objects are those you pass in).
Assignment
Assignment statements modify namespaces, not objects. In other words,
name = 10
pass
...
>>> x.__class__ = c
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: __class__ assignment: 'a' object layout differs from 'c'
Schlumberger Private
means that you're adding the name "name" to your local namespace, and making it refer to an integer object containing
the value 10.
If the name is already present, the assignment replaces the original name:
name = 10
name = 20
means that you're first adding the name "name" to the local namespace, and making it refer to an integer object
containing the value 10. You're then replacing the name, making it point to an integer object containing the value 20.
The original "10" object isn't affected by this operation, and it doesn't care.
namespace. You're then calling a method on that object, telling it to append an integer object to itself. This modifies the
content of the list object, but it doesn't touch the namespace, and it doesn't touch the integer object.
Things like name.attr and name[index] are just syntactic sugar for method calls. The first corresponds to
__setattr__/__getattr__, the second to __setitem__/__getitem__ (depending on which side of the assignment they
appear).
The built-in object() returns a bland object, which is the abstract base class for all classes and
objects.
None
Numbers
int,long,float,complex,bool
Sequences
list,tuple,str,xrange,unicode,basestring
Mapping
dict
Sets
set,frozenset
Page 8of 191
Schlumberger Private
you're first adding the name "name" to the local namespace, making it refer to an empty list object. This modifies the
Callable
Modules
types.ModuleType
Classes
Types
type
Files
file
Internal
object
t=(1,2,3)
l=[1,2,3]
d=dict(a=1,b=2)
s=set()
class C:
pass
class Cnew(object):
def name(x):
pass
return x.__name__
Schlumberger Private
The built-in type(obj) accepts any object as argument and returns the type object that is the type
of obj (built in or in the types module). Be aware of the polyphormic feature of the type built-in,
that also accepts the form type(name,bases,dict) to create a new type object (same as defining a
new class).
emits:
Schlumberger Private
The attributes associated to an object may be retrieved using the dir() built-in. Some objects have
the attribute __name__, like classes, while others do not. Some objects allow that the __name__
attribute be set in particular the user defined objects.
For instance,
def setName(obj,x):
obj.__name__=str(x)
def getName(x):
Page 10of 191
return x.__name__
c=C()
#c.__name__='my name is c'
#setName(c,'my name is bim bam bola')
print "name: %s" %getName(C)
print "name: %s" %getName(c)
#emits:
name: C
Traceback (most recent call last):
Schlumberger Private
There are objects that do not allow to set the __name__ attribute, like built-in types (as lists): you
get the following error when trying to do that on a list l=[1,2,3]:
l.__setattr__('__name__','hallo')
AttributeError: 'list' object has no attribute '__name__'
Schlumberger Private
#testattributes.py
#class L(object,list):# would raise TypeError: Error when calling the metaclass bases
# Cannot create a consistent method resolution order (MRO) for bases
object, list
class L(list):
pass
l=L([1,2,3])
x=l.pop()
#works
print l.__name__
Li=[1,2,3]
print dir (Li)
#print Li.__name__ #would raise: attributeError: 'list' object has no attribute
'__name__'
#Li.__dict__['__name__']='antonio'#would raise: attributeError: 'list' object has no
attribute '__dict__'
Schlumberger Private
print l
"""
results:
[1, 2]
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__',
'__dict__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
'__getslice__',
'__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__',
'__len__',
'__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__',
'__str__', '__weakref__', 'append', 'count', 'extend', 'index', 'insert', 'pop',
'remove', 'reverse', 'sort']
L
"""
Data values (objects) are accessed through references, which is a name that refers to the location
in memory, taking the form of variables, object attributes and items.
Schlumberger Private
antonio
References may be unbound using a del statement, followed by one or more taget references.
collections
o lists
o dictionaries
o sets
A set object is an unordered collection of immutable values. Common uses include membership
testing, removing duplicates from a sequence, and computing mathematical operations such as
intersection, union, difference, and symmetric difference.
Like other collections, sets support x in set, len(set), and for x in set. Being an unordered collection,
sets do not record element position or order of insertion. Accordingly, sets do not support
indexing, slicing, or other sequence-like behavior.
There are currently two builtin set types, set and frozenset. The set type is mutable -- the
contents can be changed using methods like add() and remove(). Since it is mutable, it has
no hash value and cannot be used as either a dictionary key or as an element of another set. The
frozenset type is immutable and hashable -- its contents cannot be altered after is created;
however, it can be used as a dictionary key or as an element of another set.
set([iterable])
Schlumberger Private
sets
return a set whose elements are taken from iterable. The elements must be immutable.
To represent sets of sets, the inner sets should be frozenset objects. If iterable is not
specified, returns a new empty set, set([]).
Equivalent Result
len(s)
cardinality of set s
x in s
x not in s
s.issuperset(t)
s >= t
s.union(t)
s|t
s.intersection(t)
s&t
s.difference(t)
s-t
s.symmetric_difference(t)
s^t
s.copy()
Schlumberger Private
s.issubset(t)
Flow control
A program's control flow is the order in which the program's code executes. The control flow of a
Python program is regulated by conditional statements, loops, and function calls.
Conditionan and loop control flow statements are: if, while, for.
while
The while statement in Python supports repeated execution of a statement or block of
statements that are controlled by a conditional expression. Here's the syntax for the while
statement:
Schlumberger Private
while expression:
statement(s)
for
The for statement in Python supports repeated execution of a statement, or block of statements,
controlled by an iterable expression. Here's the syntax for the for statement:
for target in iterable:
statement(s)
The in keyword is part of the syntax of the for statement and is distinct from the in operator,
which tests membership. A for statement can also include an else clause.
iterable may be any Python expression suitable as an argument to built-in function iter, which
The statement or statements that make up the loop body execute once for each item in
iterable (unless the loop ends because an exception is raised or a break or return statement
executes). Note that, since the loop body may contain a break statement to terminate the loop,
this is one case in which you may want to use an unbounded iterable-one that, per se, would
never cease yielding items.
You can also have a target with multiple identifiers, as with an unpacking assignment. In this case,
the iterator's items must then be iterables, each with exactly as many items as there are
identifiers in the target.
On the surface, Python's for-in statement is taken right away from Python's predecessor ABC,
where it's described as:
In ABC, what's called statements in Python are known as commands, and sequences are known as
trains. (The whole language is like that, by the way; lots of common mechanisms described using
less-common names. Maybe they thought that renaming everything would make it easier for people
to pick up the subtle details of the language, instead of assuming that everything worked exactly as
other seemingly similar languages, or maybe it only makes sense if you're Dutch.)
Schlumberger Private
One of the things I noticed when skimming through the various reactions to my recent "with"-article
is that some people seem to have a somewhat fuzzy understanding of Python's other block
statement, the good old for-in loop statement. The with statement didn't introduce code blocks in
Python; they've always been there. To rectify this, for-in probably deserves it's own article, so here
we go (but be warned that the following is a bit rough; I reserve the right to tweak it a little over the
next few days).
Anyway, to take each element (item) from a train (sequence) in turn, we can simply do (using a
psuedo-Python syntax):
name = train[0]
do something with name
name = train[1]
do something with name
name = train[2]
do something with name
... etc ...
and keep doing that until we run out of items. When we do, we'll get an IndexError exception, which
tells us that it's time to stop.
the interpreter will simply fetch train[0] and assign it to name, and then execute the code block. It'll
then fetch train[1], train[2], and so on, until it gets an IndexError.
The code inside the for-in loop is executed in the same scope as the surrounding code; in the
following example:
train = 1, 2, 3
for name in train:
value = name * 10
print value
the variables train, name, and value all live in the same namespace.
Schlumberger Private
And in its simplest and original form, this is exactly what the for-in statement does; when you write
This is pretty straightforward, of course, but it immediately gets a bit more interesting once you
realize that you can use custom objects as trains. Just implement the __getitem__ method, and you
can control how the loop behaves. The following code:
class MyTrain:
def __getitem__(self, index):
if not condition:
raise IndexError("that's enough!")
value = fetch item identified by index
return value # hand control back to the block
will run the loop as long as the given condition is true, with values provided by the custom train. In
other words, the do something part is turned into a block of code that's being executed under the
control of the custom sequence object. The above is equivalent to:
index = 0
while True: # run forever
if not condition:
break
name = fetch item identified by index
do something with name
index = index + 1
except that index is a hidden variable, and the controlling code is placed in a separate object.
You can use this mechanism for everything from generating sequence elements on the fly (like
xrange):
Schlumberger Private
class MySequence:
def __getitem__(self, index):
if index > 10:
raise IndexError("that's enough!")
return value * 10 # returns 0, 10, 20, ..., 100
and fetching data from an external source:
class MyTable:
def __getitem__(self, index):
value = fetch item index from database table
if value not found:
raise IndexError("not found")
Schlumberger Private
return value
or from a stream:
class MyFileIterator:
def __getitem__(self, index):
text = get next line from file
if end of file:
raise IndexError("end of file")
return text
to fetching data from some other source:
class MyEventSource:
def __getitem__(self, index):
event = get next event
if event == terminate:
raise IndexError
return event
process event
It's more explicit in the latter examples, but in all these examples, the code in __getitem__ is
basically treating the block of code inside the for-in loop as an in-lined callback.
Also note how the last two examples don't even bother to look at the index; they just keep calling
the for-in block until they run out of data. Or, less obvious, until they run out of bits in the internal
index variable.
To deal with this, and also avoid the issue with having objects that looks a lot as sequences, but
doesn't support random access, the for-in statement was redesigned in Python 2.2. Instead of using
the __getitem__ interface, for-in now starts by looking for an __iter__ hook. If present, this method
is called, and the resulting object is then used to fetch items, one by one. This new protocol
behaves like this:
Schlumberger Private
obj = train.__iter__()
name = obj.next()
do something with name
name = obj.next()
do something with name
...
where obj is an internal variable, and the next method indicates end of data by raising the
StopIterator exception, instead of IndexError. Using a custom object can look something like:
class MyTrain:
def __iter__(self):
return self
def next(self):
if not condition:
raise StopIteration
value = calculate next value
return value # hand control over to the block
Page 22of 191
Using this mechanism, we can now rewrite the file iterator from above as:
class MyFileIterator:
def __iter__(self):
return self # use myself
Schlumberger Private
def next():
text = get next line from file
if end of file:
raise StopIteration()
return text
and, with very little work, get an object that doesn't support normal indexing, and doesn't break
down if used on a file with more than 2 billion lines.
But what about ordinary sequences, you ask? That's of course easily handled by a wrapper object,
that keeps an internal counter, and maps next calls to __getitem__ calls, in exactly the same way
as the original for-in statement did. Python provides a standard implementation of such an object,
iter, which is used automatically if __iter__ doesn't exist.
This wasn't very difficult, was it?
Footnote: In Python 2.2 and later, several non-sequence objects have been extended to support
the new protocol. For example, you can loop over both text files and dictionaries; the former return
lines of text, the latter dictionary keys.
for line in open("file.txt"):
Page 23of 191
Schlumberger Private
Page 24of 191
Iterators can be built by implicit or explicit calls to built-in function iter. Calling a generator
also returns an iterator.
Notice that after consuming all of the iterators output the iterator is exhausted: if you need to
do something different with the same stream, you need to create a new (different) iterator.
L= [1,2,3]; C= (1,2,3); S= set(L) ; D= dict(x=1,y=2,z=3)
class K:
def __init__(self,values):
self.values=values
def next(self):
try:
return self.values.pop()
except:
raise StopIteration
def __iter__(self):
return self
k=K(L)
Schlumberger Private
Formally, an iterator is an object i such that you can call i.next( ) with no arguments.
i.next( ) returns the next item of iterator i or, when iterator i has no more items, raises a
StopIteration exception. The built-in iter() function takes an arbitrary object and tries to
return an iterator that will return the objects contents or elements raising TypeError if the
object does not support iteration. An object is said to be iterable if you can get an iterator for it.
Any Python sequrence type (lists, tuples, strings) as well as dictionaries are iterable they will
automatically support creation for an iterator. When you write a class you can allow instances of
the class to be iterators by defining a method next and the attribute __iter__().
X=k
it1 = iter(X)
print it1
for item in it1: print item
print it1.next()
emits the following, since the last statement referred to an exhausted iterator :
Schlumberger Private
2
1
Traceback (most recent call last):
File "D:\Python\TESTS\test.py", line 24, in <module>
print it1.next()
File "D:\Python\TESTS\test.py", line 13, in next
raise StopIteration
StopIteration
A generator2 is a function whose body contains one or more occurrences of the keyword yield.
When the generator is called, the function body does not execute: instead, it returns a special
Generators and iterators are related to the concept of coroutines. Coroutines are more generic than subroutines.
The start of a subroutine is the only point of entry; the start of a coroutine is the first point of entry, and places
within a coroutine following returns (yields) are subsequent points of entry. Subroutines can return only once; in
contrast, coroutines can return (yield) several times. The lifespan of subroutines is dictated by last in, first out (the
last subroutine called is the first to return); in contrast, the lifespan of coroutines is dictated entirely by their use and
need.
Here's a simple example of how coroutines can be useful. Suppose you have a consumer-producer relationship
where one routine creates items and adds them to a queue and another removes items from the queue and uses
them. For reasons of efficiency, you want to add and remove several items at once. The code might look like this:
var q := new queue
Schlumberger Private
coroutine produce
loop
while q is not full
create some new items
add the items to q
yield to consume
coroutine consume
loop
while q is not empty
remove some items from q
use the items
yield to produce
Each coroutine does as much work as it can before yielding control to the other using the yield command. The yield
causes control in the other coroutine to pick up where it left off, but now with the queue modified so that it can do
more work. Although this example is often used to introduce multithreading, it's not necessary to have two threads to
effect this dynamic: the yield statement can be implemented by a branch directly from one routine into the other.
iterator object that wraps the function body, its local variables (including its parameters) and
the current point of execution, which is initially the start of the function; when the next method
of this iterator object is called, the function body executes up to the next yield statement (yield
expression). At this point, the function execution is frozen, with current point of execution and
local variables intact, and the expression following yield is returned as the result of the next
method. When next is called again, execution of the function body resumes where it left off,
again up to the next yield statement. If the function body ends or executes a return statement,
the iterator raises a StopIteration exception, to indicate that the iteration is finished. Return
should have no parameters3. A generator may be unbounded (returning an infinite stream o
results) or bounded. In Python 2.5 a generator object has the methods send, throw and close.
Python's generators provide a convenient way to implement the iterator protocol. If a container
object's __iter__() method is implemented as a generator, it will automatically return an iterator
object (technically, a generator object) supplying the __iter__() and next() methods.
The engine calls task.send() with an appropriate argument. One of the library of tasks is Pollster. Pollster calls poll()
with for tasks that are waiting I/O. Tasks that are ready for I/O are fed to the priority queue.
Spasmodic provides an efficient way to manage a large number of sockets and/or files. Other processing works well
too, if it can be subdivided into brief spasms.
def updown(N):
""" small generator function"""
for x in xrange(1, N): yield x
if x==3: return 5
for x in xrange(N, 0, -1): yield x
renders:
>>> File "<Module2>", line 4 SyntaxError: 'return' with argument inside generator
Page 28of 191
Schlumberger Private
One framework for coroutines is spasmodic, supporting asynchronous I/O (and other tasks). The SpasmodicEngine
selects tasks (spasmoids) from a (heapqueue based) priority queue. The tasks are Python 2.5 extended generators
(some call them coroutines: PEP 342).
The for statement implicitly calls iter to get an iterator. The following statement:
for x in c:
statement(s)
Thus, if iter(c) returns an iterator i such that i.next( ) never raises StopIteration (an unbounded
iterator), the loop for x in c never terminates (unless the statements in the loop body include suitable
break or return statements, or raise or propagate exceptions). iter(c), in turn, calls special method
c._ _iter_ _( ) to obtain and return an iterator on c.
Many of the best ways to build and manipulate iterators are found in standard library module
itertools.
while and for statements may optionally have a trailing else clause. The statement or block
under that else executes when the loop terminates naturally (at the end of the for iterator, or
when the while loop condition becomes false), but not when the loop terminates prematurely
(via break, return, or an exception). When a loop contains one or more break statements, you
often need to check whether the loop terminates naturally or prematurely. You can use an else
clause on the loop for this purpose:
for x in some_container:
if is_ok(x): break
loop
else:
print "Warning: no satisfactory item was found in container"
x = None
Schlumberger Private
where _temporary_iterator is some arbitrary name that is not used elsewhere in the current scope.
$ python2.5
Python 2.5 (r25:51908, Sep 27 2006, 12:21:46)
[GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> any
built-in function any
>>> all
built-in function all
Schlumberger Private
>>> all([1,0])
False
Schlumberger Private
Page 31of 191
list comprehensions
A common use of a for loop is to inspect each item in an iterable and build a new list by
appending the results of an expression computed on some or all of the items. The expression
form known as a list comprehension lets you code this common idiom concisely and directly.
Since a list comprehension is an expression (rather than a block of statements), you can use it
wherever you need an expression (e.g., as an argument in a function call, in a return statement,
or as a subexpression for some other expression).
A list comprehension has the following syntax:
[ expression for target in
iterable lc-clauses ]
target and iterable are the same as in a regular for statement. You must enclose the
expression in parentheses if it indicates a tuple.
lc-clauses is a series of zero or more clauses, each with one of the following forms:
if expression
target and iterable in each for clause of a list comprehension have the same syntax and
meaning as those in a regular for statement, and the expression in each if clause of a list
comprehension has the same syntax and meaning as the expression in a regular if statement.
A list comprehension is equivalent to a for loop that builds the same list by repeated calls to the
resulting list's append method. For example (assigning the list comprehension result to a variable
for clarity):
result1 = [x+1 for x in some_sequence]
This list comprehension is the same as a for loop that contains an if statement:
result4 = []
for x in some_sequence:
Page 32of 191
Schlumberger Private
if x>23:
result4.append(x+1)
This is the same as a for loop with another for loop nested inside:
result6 = []
for x in alist:
for y in another:
result6.append(x+y)
As these examples show, the order of for and if in a list comprehension is the same as in the
equivalent loop, but in the list comprehension, the nesting remains implicit.
The with statement is used to wrap the execution of a block with functionality provided by a
separate guard object (see context-managers). This allows common try-except-finally usage
patterns to be encapsulated for convenient reuse.
from __future__ import with_statement
# To enable in 2.5
Syntax:
with expression [as target] :
suite
or:
with expression [as ( target list ) ] :
suite
The expression is evaluated once, and should yield a context guard, which is used to control
execution of the suite. The guard can provide execution-specific data, which is assigned to the
target (or target list).
Note that if a target list is used instead of a single target, the list must be parenthesized.
Here's a more detailed description:
Page 33of 191
Schlumberger Private
With
1.
2.
3.
4.
5.
If the suite was exited due to an exception, and the return value from the __exit__ method is
false, the exception is reraised. If the return value is true, the exception is suppressed, and
execution continues with the statement following the with statement.
If the suite was exited for any reason other than an exception (e.g., by falling off the end of
the suite, or via return, break, or continue), the return value from __exit__ is ignored, and
execution proceeds at the normal location for the kind of exit that was taken.
Note: In Python 2.5, the with statement is only allowed when the with_statement feature
has been enabled. It will always be enabled in Python 2.6. This __future__ import statement
can be used to enable the feature (see future):
from __future__ import with_statement
See Also: PEP 0343, The "with" statement The specification, background, and examples for
the Python with statement.
Schlumberger Private
Note: The with statement guarantees that if the __enter__ method returns without an error,
then __exit__ will always be called. Thus, if an error occurs during the assignment to the
target list, it will be treated the same as an error occurring within the suite would be. See
step 5 above.
All of the examples in the PEP and the what's new document use the with statement for things
like locks, ensuring a socket or file is closed, database transactions and temporarily modifying a
system setting. These are heavy duty things. Deep things. Things most people don't want to mess
with. The what's new document even says
Under the hood, the 'with' statement is fairly complicated. Most people will only use 'with' in
company with existing objects and don't need to know these details
I think this viewpoint is wrong, or at least overly limited. I think the PEP is more generally useful
than those examples show and the term "context manager" is too abstract. I also conjecture that
the people working on the PEP were systems developers and not applications developers, hence
the bias towards system/state modification examples. ;)
The with Statement is very much similar to the using block in C#.
consider an example of opening a file and making sure that it is closed after the block ends
so after the with block ends the f object is completly distroyed and the file is closed.
and like C# the f object should implement some kind of cleanup functions to make it possible for
the with block to call these functions automatically, in C# the object should implement
IDisposable in python
The expression is evaluated and should result in an object called a ``context manager''.
The context manager must have __enter__() and __exit__() methods.
The context manager's __enter__() method is called. The value returned is assigned to
VAR. If no 'as VAR' clause is present, the value is simply discarded.
The code in BLOCK is executed.
If BLOCK raises an exception, the __exit__(type, value, traceback) is called with the
exception details, the same values returned by sys.exc_info(). The method's return value
controls whether the exception is re-raised: any false value re-raises the exception, and
True will result in suppressing it. You'll only rarely want to suppress the exception,
because if you do the author of the code containing the 'with' statement will never realize
anything went wrong.
If BLOCK didn't raise an exception, the __exit__() method is still called, but type, value,
and traceback are all None.
Page 35of 191
Schlumberger Private
the new with statement handles the part I care about: making it easier to write code that works
correctly in the case of failures.
The 3 typical use cases are a file that needs to be closed, a lock that needs to be released, and
and a database transaction that needs to be either committed or rolled back. The database case
is the most interesting, since you need to handle success and failure differently, and before
version 2.5 python would not allow you to have try/except/finally. You had to pick either
try/except or try/finally. Python 2.5 also provides a unified try/except/finally, but the with
statement is easier to write, and easier to read.
db_connection = DatabaseConnection()
with db_connection as cursor:
cursor.execute('insert into ...')
cursor.execute('delete from ...')
# ... more operations ...
In order for this to work, it looks like the classes that you are working with need to properly
support a context manager which defines what should happen for success and error. But, not all
classes will need to implement a full blown context manager, the contextlib module allows for an
easy way to add support to existing objects without the need to write a new class.
from contextlib import contextmanager
@contextmanager
def db_transaction (connection):
Page 36of 191
Schlumberger Private
Ive borrowed an example of what user code would look like for a database connection using the
new with statement from the python docs. The idea is that the block of code should run, and
then the transaction should either be committed or rolled back depending on whether the block
exited normally or with an exception:
cursor = connection.cursor()
try:
yield cursor
except:
connection.rollback()
raise
else:
connection.commit()
db = DatabaseConnection()
...
This was one area where Ruby code was much cleaner than Python, so its great to see the new
functionality. Its pretty hard for me to write code which doesnt touch a file, a database, or
threads, so it will be used a lot.
Examples:
# Public domain
@contextlib.contextmanager
def accum_time(L):
Page 37of 191
Schlumberger Private
"""
Add time used inside a with block to the value of L[0].
"""
start = time.clock()
try:
yield
finally:
end = time.clock()
L[0] += end - start
t = [0]
with accum_time(t):
print sum(range(1000000))
with accum_time(t):
print sum(range(2000000))
print 'Time:', t[0]
Judging from comp.lang.python and other forums, Python 2.5's new with statement seems to be
a bit confusing even for experienced Python programmers.
As most other things in Python, the with statement is actually very simple, once you understand
the problem it's trying to solve. Consider this piece of code:
set things up
try:
do something
finally:
tear things down
If you do this a lot, it would be quite convenient if you could put the "set things up" and "tear
things down" code in a library function, to make it easy to reuse. You can of course do something
like
def controlled_execution(callback):
set things up
try:
callback(thing)
finally:
tear things down
def my_function(thing):
do something
Page 39of 191
Schlumberger Private
Here, "set things up" could be opening a file, or acquiring some sort of external resource, and
"tear things down" would then be closing the file, or releasing or removing the resource. The tryfinally construct guarantees that the "tear things down" part is always executed, even if the code
that does the work doesn't finish.
controlled_execution(my_function)
But that's a bit verbose, especially if you need to modify local variables. Another approach is to
use a one-shot generator, and use the for-in statement to "wrap" the code:
def controlled_execution():
set things up
try:
yield thing
finally:
Schlumberger Private
But yield isn't even allowed inside a try-finally in 2.4 and earlier. And while that could be fixed
(and it has been fixed in 2.5), it's still a bit weird to use a loop construct when you know that you
only want to execute something once.
So after contemplating a number of alternatives, GvR and the python-dev team finally came up
with a generalization of the latter, using an object instead of a generator to control the behaviour
of an external piece of code:
class controlled_execution:
def __enter__(self):
set things up
return thing
Page 40of 191
Now, when the "with" statement is executed, Python evaluates the expression, calls the
__enter__ method on the resulting value (which is called a "context guard"), and assigns
whatever __enter__ returns to the variable given by as. Python will then execute the code body,
and no matter what happens in that code, call the guard object's __exit__ method.
In Python 2.5, the file object has been equipped with __enter__ and __exit__ methods; the
former simply returns the file object itself, and the latter closes the file:
>>> f = open("x.txt")
>>> f
<open file 'x.txt', mode 'r' at 0x00AE82F0>
>>> f.__enter__()
<open file 'x.txt', mode 'r' at 0x00AE82F0>
>>> f.read(1)
Page 41of 191
Schlumberger Private
As an extra bonus, the __exit__ method can look at the exception, if any, and suppress it or act
on it as necessary. To suppress the exception, just return a true value. For example, the following
__exit__ method swallows any TypeError, but lets all other exceptions through:
'X'
>>> f.__exit__(None, None, None)
>>> f.read(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file
so to open a file, process its contents, and make sure to close it, you can simply do:
with open("x.txt") as f:
Schlumberger Private
data = f.read()
do something with data
Functions
A function is a group of statements that execute upon request. Python provides many built-in
functions and allows programmers to define their own functions. A request to execute a function
is known as a function call. When you call a function, you can pass arguments that specify data
upon which the function performs its computation. In Python, a function always returns a result
value, either None or a value that represents the results of the computation. Functions defined
within class statements are also known as methods. Functions are objects (values) that are
handled like other objects. Thus, you can pass a function as an argument in a call to another
function. Similarly, a function can return another function as the result of a call. A function, just
like any other object, can be bound to a variable, an item in a container, or an attribute of an
object. Functions can also be keys into a dictionary. For example, if you need to quickly find a
function's inverse given the function, you could define a dictionary whose keys and values are
functions and then make the dictionary bidirectional:
inverse = {sin:asin, cos:acos, tan:atan, log:exp}
The def statement is the most common way to define a function. def is a single-clause
compound statement with the following syntax:
def function-name(parameters):
statement(s)
function-name is an identifier. It is a variable that gets bound (or rebound) to the function
get bound to the values supplied as arguments when the function is called. In the simplest case, a
function doesn't have any formal parameters, which means the function doesn't take any
arguments when it is called. In this case, the function definition has empty parentheses after
function-name.
When a function does take arguments, parameters contains one or more identifiers, separated
by commas (,). In this case, each call to the function supplies values, known as arguments,
corresponding to the parameters listed in the function definition. The parameters are local
variables of the function.
Schlumberger Private
The def statement sets some attributes of a function object. The attribute func_name, also
accessible as _ _name_ _, refers to the identifier string given as the function name in the def
statement. You may rebind the attribute to any string value, but trying to unbind it raises an
exception. The attribute func_defaults, which you may freely rebind or unbind, refers to the
tuple of default values for the optional parameters (or the empty tuple, if the function has no
optional parameters).
Another function attribute is the documentation string, also known as the docstring. You may use
or rebind a function's docstring attribute as either func_doc or _ _doc_ _. If the first statement
in the function body is a string literal, the compiler binds that string as the function's docstring
attribute
In addition to its predefined attributes, a function object may have other arbitrary attributes. To
create an attribute of a function object, bind a value to the appropriate attribute reference in an
assignment statement after the def statement executes. For example, a function could count
how many times it gets called:
def counter( ):
Note that this is not common usage. More often, when you want to group together some state
(data) and some behavior (code), you should use the object-oriented mechanisms .
The return statement is allowed only inside a function body and can optionally be followed by
an expression. When return executes, the function terminates, and the value of the expression
is the function's result. A function returns None if it terminates by reaching the end of its body or
by executing a return statement that has no expression (or, of course, by executing return
None).
As a matter of style, you should never write a return statement without an expression at the end
of a function body. If some return statements in a function have an expression, all return
statements should have an expression. return None should only be written explicitly to meet
this style requirement. Python does not enforce these stylistic conventions.
A function call is an expression with the following syntax:
function-object(arguments)
function-object may be any reference to a function (or other callable) object; most often, it's
the function's name. The parentheses denote the function-call operation itself. arguments, in the
simplest case, is a series of zero or more expressions separated by commas (,), giving values for
the function's corresponding parameters. When the function call executes, the parameters are
Page 44of 191
Schlumberger Private
counter.count += 1
bound to the argument values, the function body executes, and the value of the function-call
expression is whatever the function returns.
Note that just mentioning a function (or other callable object) does not call it. To call a function
(or other object) without arguments, you must use ( ) after the function's name.
In traditional terms, all argument passing in Python is by value. For example, if you pass a variable
as an argument, Python passes to the function the object (value) to which the variable currently
refers, not "the variable itself." Thus, a function cannot rebind the caller's variables. However, if
you pass a mutable object as an argument, the function may request changes to that object
because Python passes the object itself, not a copy. Rebinding a variable and mutating an object
are totally disjoint concepts. For example:
def f(x, y):
x = 23
y.append(42)
b = [99]
f(a, b)
print a, b
The print statement shows that a is still bound to 77. Function f's rebinding of its parameter x
to 23 has no effect on f's caller, nor, in particular, on the binding of the caller's variable that
happened to be used to pass 77 as the parameter's value. However, the print statement also
shows that b is now bound to [99, 42]. b is still bound to the same list object as before the call,
but that object has mutated, as f has appended 42 to that list object. In either case, f has not
altered the caller's bindings, nor can f alter the number 77, since numbers are immutable.
However, f can alter a list object, since list objects are mutable. In this example, f mutates the
list object that the caller passes to f as the second argument by calling the object's append
method.
Arguments that are just expressions are known as positional arguments. Each positional
argument supplies the value for the parameter that corresponds to it by position (order) in the
function definition.
In a function call, zero or more positional arguments may be followed by zero or more named
arguments, each with the following syntax:
identifier=expression
Schlumberger Private
a = 77
The identifier must be one of the parameter names used in the def statement for the
function. The expression supplies the value for the parameter of that name. Most built-in
functions do not accept named arguments, you must call such functions with positional
arguments only. However, all normal functions coded in Python accept named as well as
positional arguments, so you may call them in different ways.
A function call must supply, via a positional or a named argument, exactly one value for each
mandatory parameter, and zero or one value for each optional parameter. For example:
def divide(divisor, dividend):
return dividend // divisor print divide(12, 94)
# prints: 7
print divide(dividend=94, divisor=12)
# prints: 7
A common use of named arguments is to bind some optional parameters to specific values, while
letting other optional parameters take default values:
def f(middle, begin='init', end='finis'):
return begin+middle+end print f('tini', end='')
prints: inittini
Thanks to named argument end='', the caller can specify a value, the empty string '', for f's
third parameter, end, and still let f's second parameter, begin, use its default value, the string
'init'.
At the end of the arguments in a function call, you may optionally use either or both of the
special forms *seq and **dct. If both forms are present, the form with two asterisks must be
last. *seq passes the items of seq to the function as positional arguments (after the normal
positional arguments, if any, that the call gives with the usual syntax). seq may be any iterable.
**dct passes the items of dct to the function as named arguments, where dct must be a
dictionary whose keys are all strings. Each item's key is a parameter name, and the item's value is
the argument's value.
Page 46of 191
Schlumberger Private
As you can see, the two calls to divide are equivalent. You can pass named arguments for
readability purposes whenever you think that identifying the role of each argument and
controlling the order of arguments enhances your code's clarity.
Sometimes you want to pass an argument of the form *seq or **dct when the parameters use
similar forms. For example, using the function sum_args defined in that section (and shown again
here), you may want to print the sum of all the values in dictionary d. This is easy with *seq:
def sum_args(*numbers):
return sum(numbers)
print sum_args(*d.values( ))
(Of course, in this case, print sum(d.values( )) would be simpler and more direct!)
However, you may also pass arguments of the form *seq or **dct when calling a function that
does not use the corresponding forms in its parameters. In that case, of course, you must ensure
that iterable seq has the right number of items, or, respectively, that dictionary dct uses the right
names as its keys; otherwise, the call operation raises an exception.
Variables that are not local are known as global variables (in the absence of nested function
definitions). Global variables are attributes of the module object: Whenever a function's local
variable has the same name as a global variable, that name, within the function body, refers to
the local variable, not the global one. We express this by saying that the local variable hides the
global variable of the same name throughout the function body.
By default, any variable that is bound within a function body is a local variable of the function. If a
function needs to rebind some global variables, the first statement of the function must be:
global identifiers
where identifiers is one or more identifiers separated by commas (,). The identifiers listed in a
global statement refer to the global variables (i.e., attributes of the module object) that the
function needs to rebind.
Don't use global if the function body just uses a global variable (including mutating the object
bound to that variable if the object is mutable). Use a global statement only if the function body
rebinds a global variable (generally by assigning to the variable's name). As a matter of style,
don't use global unless it's strictly necessary, as its presence will cause readers of your program
to assume the statement is there for some useful purpose. In particular, never use global except
as the first statement in a function body.
Schlumberger Private
A function's parameters, plus any variables that are bound (by assignment or by other binding
statements, such as def) in the function body, make up the function's local namespace, also
known as local scope. Each of these variables is known as a local variable of the function.
A def statement within a function body defines a nested function, and the function whose body
includes the def is known as an outer function to the nested one. Code in a nested function's
body may access (but not rebind) local variables of an outer function, also known as free
variables of the nested function.
The simplest way to let a nested function access a value is often not to rely on nested scopes, but
rather to explicitly pass that value as one of the function's arguments. If necessary, the
argument's value can be bound when the nested function is defined by using the value as the
default for an optional argument. For example:
def percent1(a, b, c):
def pc(x, total=a+b+c): return (x*100.0) / total
print "Percentages are:", pc(a), pc(b), pc(c)
In this specific case, percent1 has a tiny advantage: the computation of a+b+c happens only
once, while percent2's inner function pc repeats the computation three times. However, if the
outer function rebinds its local variables between calls to the nested function, repeating the
computation can be necessary. It's therefore advisable to be aware of both approaches, and
choose the most appropriate one case by case.
A nested function that accesses values from outer local variables is also known as a closure. The
following example shows how to build a closure:
def make_adder(augend):
def add(addend):
return addend+augend
return add
Closures are an exception to the general rule that the object-oriented mechanisms are the best
way to bundle together data and code. When you need specifically to construct callable objects,
with some parameters fixed at object construction time, closures can be simpler and more
effective than classes. For example, the result of make_adder(7) is a function that accepts a
single argument and adds 7 to that argument. An outer function that returns a closure is a
"factory" for members of a family of functions distinguished by some parameters, such as the
Page 48of 191
Schlumberger Private
value of argument augend in the previous example, and may often help you avoid code
duplication.
Schlumberger Private
Page 49of 191
Classes
A class is a Python object with several characteristics:
You can call a class object as if it were a function. The call returns another object, known
as an instance of the class; the class is also known as the type of the instance.
A class has arbitrarily named attributes that you can bind and reference.
The values of class attributes can be descriptors (including functions or normal data
objects.
Class attributes bound to functions are also known as methods of the class.
A method can have a special Python-defined name with two leading and two trailing
underscores. Python implicitly invokes such special methods, if a class supplies them,
when various kinds of operations take place on instances of that class.
A class can inherit from other classes, meaning it delegates to other class objects the
lookup of attributes that are not found in the class itself.
An instance of a class is a Python object with arbitrarily named attributes that you can bind and
reference. An instance object implicitly delegates to its class the lookup of attributes not found in
the instance itself. The class, in turn, may delegate the lookup to the classes from which it
inherits, if any.
In Python, classes are objects (values) and are handled like other objects. Thus, you can pass a
class as an argument in a call to a function. Similarly, a function can return a class as the result of
a call. A class, just like any other object, can be bound to a variable (local or global), an item in a
container, or an attribute of an object. Classes can also be keys into a dictionary. The fact that
classes are ordinary objects in Python is often expressed by saying that classes are first-class
objects.
A descriptor is any new-style object whose class supplies a special method named _ _get_ _.
Descriptors that are class attributes control the semantics of accessing and setting attributes on
instances of that class. Roughly speaking, when you access an instance attribute, Python obtains
the attribute's value by calling _ _get_ _ on the corresponding descriptor, if any.
If a descriptor's class also supplies a special method named _ _set_ _, then the descriptor is
known as an overriding descriptor (or, by an older and slightly confusing terminology, a data
descriptor); if the descriptor's class supplies only _ _get_ _, and not _ _set_ _, then the
descriptor is known as a nonoverriding (or nondata) descriptor. For example, the class of function
Page 50of 191
Schlumberger Private
objects supplies _ _get_ _, but not _ _set_ _; therefore, function objects are nonoverriding
descriptors. Roughly speaking, when you assign a value to an instance attribute with a
corresponding descriptor that is overriding, Python sets the attribute value by calling _ _set_ _
on the descriptor.
class SpecialCase(object):
def amethod(self): print "special"
class NormalCase(object):
def amethod(self): print "normal"
def appropriateCase(isnormal=True):
if isnormal: return NormalCase( )
else: return SpecialCase( )
aninstance = appropriateCase(isnormal=False)
aninstance.amethod( )
The built-in object type is the ancestor of all built-in types and new-style classes. The object
type defines some special methods that implement the default semantics of objects:
_ _new_ _ _ _init_ _
Schlumberger Private
Calling a factory function is a flexible approach: a function may return an existing reusable
instance, or create a new instance by calling whatever class is appropriate. Say you have two
almost interchangeable classes (SpecialCase and NormalCase) and want to flexibly generate
instances of either one of them, depending on an argument. The following appropriateCase
factory function allows you to do just that
You can create a direct instance of object by calling object( ) without any arguments. The
call implicitly uses object._ _new_ _ and object._ _init_ _ to make and return an
instance object without attributes (and without even a _ _dict_ _ in which to hold
attributes). Such an instance object may be useful as a "sentinel," guaranteed to compare
unequal to any other distinct object.
_ _delattr_ _ _ _getattribute_ _ _ _setattr_ _
By default, an object handles attribute references using these methods of object.
_ _hash_ _ _ _repr_ _ _ _str_ _
Any object can be passed to functions hash and repr and to type str.
A subclass of object may override any of these methods and/or add others.
To build a class method, call built-in type classmethod and bind its result to a class attribute. Like
all binding of class attributes, this is normally done in the body of the class, but you may also
choose to perform it elsewhere. The only argument to classmethod is the function to invoke
when Python calls the class method. Here's how you can define and call a class method:
class ABase(object):
def aclassmet(cls): print 'a class method for', cls._ _name_ _
aclassmet = classmethod(aclassmet)
class ADeriv(ABase): pass bInstance = ABase( )
dInstance = ADeriv( )
ABase.aclassmet( )
bInstance.aclassmet( )
ADeriv.aclassmet( )
dInstance.aclassmet( )
#
#
#
#
prints:
prints:
prints:
prints:
a
a
a
a
class
class
class
class
method
method
method
method
for
for
for
for
ABase
ABase
ADeriv
ADeriv
Python supplies a built-in overriding descriptor type, which you may use to give a class's instances
properties.
Page 52of 191
Schlumberger Private
A class method is a method you can call on a class or on any instance of the class. Python binds
the method's first parameter to the class on which you call the method, or the class of the
instance on which you call the method; it does not bind it to the instance, as for normal bound
methods. There is no equivalent of unbound methods for class methods. The first parameter of a
class method is conventionally named cls. While it is never necessary to define class methods
(you could always alternatively define a normal function that takes the class object as its first
parameter), some programmers consider them to be an elegant alternative to such functions.
A property is an instance attribute with special functionality. You reference, bind, or unbind the
attribute with the normal syntax (e.g., print x.prop, x.prop=23, del x.prop). However, rather
than following the usual semantics for attribute reference, binding, and unbinding, these
accesses call on instance x the methods that you specify as arguments to the built-in type
property. Here's how you define a read-only property:
class Rectangle(object):
def _ _init_ _(self, width, height):
self.width = width
self.height = height
def getArea(self):
return self.width * self.height
area = property(getArea, doc='area of the rectangle')
When x is an instance of C and you reference x.attrib, Python calls on x the method you passed
as argument fget to the property constructor, without arguments. When you assign x.attrib =
value, Python calls the method you passed as argument fset, with value as the only argument.
When you execute del x.attrib, Python calls the method you passed as argument fdel,
without arguments. Python uses the argument you passed as doc as the docstring of the
attribute. All parameters to property are optional. When an argument is missing, the
Page 53of 191
Schlumberger Private
Each instance r of class Rectangle has a synthetic read-only attribute r.area, computed on the
fly in method r.getArea( ) by multiplying the sides of the rectangle. The docstring
Rectangle.area._ _doc_ _ is 'area of the rectangle'. Attribute r.area is read-only
(attempts to rebind or unbind it fail) because we specify only a get method in the call to
property, no set or del methods.
corresponding operation is forbidden (Python raises an exception when some code attempts that
operation). For example, in the Rectangle example, we made property area read-only, because
we passed an argument only for parameter fget, and not for parameters fset and fdel.
The crucial importance of properties is that their existence makes it perfectly safe and indeed
advisable for you to expose public data attributes as part of your class's public interface. If it ever
becomes necessary, in future versions of your class or other classes that need to be polymorphic
to it, to have some code executed when the attribute is referenced, rebound, or unbound, you
know you will be able to change the plain attribute into a property and get the desired effect
without any impact on any other code that uses your class (a.k.a. "client code"). This lets you
avoid goofy idioms, such as accessor and mutator methods, required by OO languages that lack
properties or equivalent machinery. For example, client code can simply use natural idioms such
as:
someInstance.widgetCounter += 1
someInstance.setWidgetCounter(someInstance.getWidgetCounter( ) + 1)
If at any time you're tempted to code methods whose natural names are something like getThis
or setThat, consider wrapping those methods into properties, for clarity.
All references to instance attributes for new-style instances proceed through special method _
_getattribute_ _. This method is supplied by base class object, where it implements all the
details of object attribute reference semantics. However, you may override _ _getattribute_
_ for special purposes, such as hiding inherited class attributes (e.g., methods) for your subclass's
instances. The following example shows one way to implement a list without append in the newstyle object model:
class listNoAppend(list):
def _ _getattribute_ _(self, name):
if name == 'append': raise AttributeError, name
return list._ _getattribute_ _(self, name)
An instance x of class listNoAppend is almost indistinguishable from a built-in list object, except
that performance is substantially worse, and any reference to x.append raises an exception.
Schlumberger Private
rather than being forced into contorted nests of accessors and mutators such as:
Due to the existence of descriptor types such as staticmethod and classmethod, which take as
their argument a function object, Python somewhat frequently uses, within class bodies, idioms
such as:
def f(cls,...):
...definition of f snipped...
f = classmethod(f)
Having the call to classmethod occur textually after the def statement may decrease code
readability because, while reading f's definition, the reader of the code is not yet aware that f is
destined to become a class method rather than an ordinary instance method. The code would be
more readable if the mention of classmethod could be placed right before, rather than after, the
def. Python 2.4 allows such placement, through the new syntax form known as decoration:
@classmethod def f(cls,...):
...definition of f snipped...
Decoration affords a handy shorthand for some higher-order functions (and other callables that
work similarly to higher-order functions). You may apply decoration to any def statement, not
just to def statements occurring in class bodies. You may also code custom decorators, which are
just higher-order functions, accepting a function object as an argument and returning a function
object as the result. For example, here is a decorator that does not modify the function it
decorates, but rather emits the function's docstring to standard output at function-definition
time:
def showdoc(f):
if f.__doc__:
print '%s: %s' % (f.__name__, f.__doc__)
else:
print '%s: No docstring!' % f.__name__
return f
Page 55of 191
Schlumberger Private
The @classmethod decoration must be immediately followed by a def statement and means that
f=classmethod(f) executes right after the def statement (for whatever name f the def
defines). More generally, @expression evaluates the expression (which must be a name, possibly
qualified, or a call) and binds the result to an internal temporary name (say, _ _aux); any such
decoration must be immediately followed by a def statement and means that f=_ _aux(f)
executes right after the def statement (for whatever name f the def defines). The object bound
to _ _aux is known as a decorator, and it's said to decorate function f.
@showdoc
def f1():# emits: f1: a docstring
""" string """
pass
@showdoc
def f2():# emits: f2: No docstring!
pass
Phillip is the author of the open-source Python libraries PEAK and PyProtocols, and has
contributed fixes and enhancements to the Python interpreter. He is the author of the
Python Web Server Gateway Interface specification (PEP 333). He can be contacted at
pje@telecommunity.com.
As software environments become more complex and programs get larger, it becomes
more and more necessary to find ways to reduce code duplication and scattering of
knowledge. While simple code duplication is easy to factor out into functions or methods,
Page 56of 191
Schlumberger Private
Decorators
more complex code duplication is not. For example, if a method needs to be wrapped in a
transaction, synchronized in a lock, or have its calls transmitted to a remote object, there
often is no simple way to factor out a function or method to be called, because the part of
the behavior that varies needs to be wrapped inside the common behavior.
A second and related problem is scattering of knowledge. Sometimes a framework needs
to be able to locate all of a program's functions or methods that have a particular
characteristic, such as "all of the remote methods accessible to users with authorization
X." The typical solution is to put this information in external configuration files, but then
you run the risk of configuration being out of sync with the code. For example, you might
add a new method, but forget to also add it to the configuration file. And of course, you'll
configuration file, and any renaming you do requires editing two files.
So no matter how you slice it, duplication is a bad thing for both developer productivity
and software reliabilitywhich is why Python 2.4's new "decorator" feature lets you
address both kinds of duplication. Decorators are Python objects that can register,
annotate, and/or wrap a Python function or method.
For example, the Python atexit module contains a register function that registers a
callback to be invoked when a Python program is exited. Without the new decorator
feature, a program that uses this function looks something like Listing One(a).
When Listing One(a) is run, it prints "Goodbye, world!" because when it exits, the
goodbye() function is invoked. Now look at the decorator version in Listing One(b), which
does exactly the same thing, but uses decorator syntax insteadan @ sign and
expression on the line before the function definition.
Schlumberger Private
be doing a lot more typing, because you'll have to put the method names in the
This new syntax lets the registration be placed before the function definition, which
accomplishes two things. First, you are made aware that the function is an atexit function
before you read the function body, giving you a better context for understanding the
function. With such a short function, it hardly makes a difference, but for longer functions
or methods, it can be very helpful to know in advance what you're looking at. Second, the
function name is not repeated. The first program refers to goodbye twice, so there is more
duplicationprecisely the thing we're trying to avoid.
Why Decorate?
The original motivation for adding decorator syntax was to allow class methods and static
methods to be obvious to someone reading a program. Python 2.2 introduced the
Two(b) shows the same code using decorator syntax, which avoids the unnecessary
repetitions of the method name, and gives you a heads-up that a classmethod is being
defined.
While this could have been handled by creating a syntax specifically for class or static
methods, one of Python's primary design principles is that: "Special cases aren't special
enough to break the rules." That is, the language should avoid having privileged features
that you can't reuse for other purposes. Since class methods and static methods in
Python are just objects that wrap a function, it would not make sense to create special
syntax for just two kinds of wrapping. Instead, a syntax was created to allow arbitrary
wrapping, annotation, or registration of functions at the point where they're defined.
Many syntaxes for this feature were discussed, but in the end, a syntax resembling Java
1.5 annotations was chosen. Decorators, however, are considerably more flexible than
Java's annotations, as they are executed at runtime and can have arbitrary behavior,
Page 58of 191
Schlumberger Private
classmethod and staticmethod built-ins, which were used as in Listing Two(a). Listing
while Java annotations are limited to only providing metadata about a particular class or
method.
Creating Decorators
Decorators may appear before any function definition, whether that definition is part of a
module, a class, or even contained in another function definition. You can even stack
multiple decorators on the same function definition, one per line.
But before you can do that, you first need to have some decorators to stack. A decorator
is a callable object (like a function) that accepts one argumentthe function being
decorated. The return value of the decorator replaces the original function definition. See
demonstrating that the mydecorator function is called when the function is defined.
For the first example decorator, I had it return the original function object unchanged, but
in practice, it's rare that you'll do that (except for registration decorators). More often,
you'll either be annotating the function (by adding attributes to it), or wrapping the function
with another function, then returning the wrapper. The returned wrapper then replaces the
original function. For example, the script in Listing Four prints "Hello, world!" because the
Schlumberger Private
the script in Listing Three(a), which produces the output in Listing Three(b),
When run, Listing Five prints "entering" and "exiting" messages around the "Hello, world"
function. As you can see, a decorator doesn't have to be a function; it can be a class, as
long as it can be called with a single argument. (Remember that in Python, calling a class
returns a new instance of that class.) Thus, the traced class is a decorator that replaces a
function with an instance of the traced class.
So after the hello function definition in Listing Five, hello is no longer a function, but is
instead an instance of the traced class that has the old hello function saved in its func
attribute.
When that wrapper instance is called (by the hello() statement at the end of the script),
Schlumberger Private
Python's class machinery invokes the instance's __call__() method, which then invokes
Usually, most decorators expect a function on input, and return either a function or an
attribute descriptor as their output. The Python built-ins classmethod, staticmethod, and
property all return attribute descriptors, so their output cannot be passed to a decorator
that expects a function. That's why I had to put classmethod first in Listing Four. As an
experiment, try reversing the order of @traced and @classmethod in Listing Four, and
see if you can guess what will happen.
Functions as Decorators
Because most decorators expect an actual function as their input, some of them may not
be compatible with our initial implementation of @traced, which returns an instance of the
traced class. Let's rework @traced such that it returns an actual function object, so it'll be
Listing Seven provides the same functionality as the original traced decorator, but instead
of returning a traced object instance, it returns a new function object that wraps the
original function. If you've never used Python closures before, you might be a little
confused by this function-in-a-function syntax.
Basically, when you define a function inside of another function, any undefined local
variables in the inner function will take the value of that variable in the outer function. So
here, the value of func in the inner function comes from the value of func in the outer
function.
Because the inner function definition is executed each time the outer function is called,
Python actually creates a new wrapper function object each time. Such function objects
are called "lexical closures," because they enclose a set of variables from the lexical
scope where the function was defined.
Schlumberger Private
A closure does not actually duplicate the code of the function, however. It simply encloses
a reference to the existing code, and a reference to the free variables from the enclosing
function. In this case, that means that the wrapper closure is essentially a pointer to the
Python bytecode making up the wrapper function body, and a pointer to the local
variables of the traced function during the invocation when the closure was created.
Because a closure is really just a normal Python function object (with some predefined
variables), and because most decorators expect to receive a function object, creating a
closure is perhaps the most popular way of creating a stackable decorator.
Decorators with Arguments
create a pair of @require and @ensure decorators so that you can record a method's
precondition and postcondition. Python lets us specify arguments with our decorators; see
Listing Eight. (Of course, Listing Eight is for illustration only. A full-featured
implementation of preconditions and postconditions would need to be a lot more
sophisticated than this to deal with things like inheritance of conditions, allowing
postconditions to access before/after expressions, and allowing conditions to access
function arguments by name instead of by position.)
You'll notice that the require() decorator creates two closures. The first closure creates a
decorator function that knows the expr that was supplied to @require(). This means
require itself is not really the decorator function here. Instead, require returns the
decorator function, here called decorator. This is very different from the previous
decorators, and this change is necessary to implement parameterized decorators.
The second closure is the actual wrapper function that evaluates expr whenever the
original function is called. Try calling the test() function with different numbers of
Page 62of 191
Schlumberger Private
Many applications of decorators call for parameterization. For example, say you want to
arguments, and see what happens. Also, try changing the @require line to use a different
precondition, or stack multiple @require lines to combine preconditions. You'll also notice
that @require(expr="len(__args)==1") still works. Decorator invocations follow the same
syntax rules as normal Python function or method calls, so you can use positional
arguments, keyword arguments, or both.
Function Attributes
All of the examples so far have been things that can't be done quite so directly with Java
annotations. But what if all you really need is to tack some metadata onto a function or
method for later use? For this purpose, you may wish to use function attributes in your
decorator.
on a function object. For example, suppose you want to track the author of a function or
method, using an @author() decorator? You could implement it as in Listing Nine. In this
example, you simply set an author_name attribute on the function and return it, rather
than creating a wrapper. Then, you can retrieve the attribute at a later time as part of
some metadata-gathering operation.
Practicing "Safe Decs"
To keep the examples simple, I've been ignoring "safe decorator" practices. It's easy to
create a decorator that will work by itself, but creating a decorator that will work properly
when combined with other decorators is a bit more complex. To the extent possible, your
decorator should return an actual function object, with the same name and attributes as
the original function, so as not to confuse an outer decorator or cancel out the work of an
inner decorator.
Schlumberger Private
Function attributes, introduced in Python 2.1, let you record arbitrary values as attributes
This means that decorators that simply modify and return the function they were given
(like Listings Three and Nine), are already safe. But decorators that return a wrapper
function need to do two more things to be safe:
Set the new function's name to match the old function's name.
Copy the old function's attributes to the new function.
These can be accomplished by adding just three short lines to our old decorators.
(Compare the version of @require in Listing Ten with the original in Listing Eight.)
Before returning the wrapper function, the decorator function in Listing Ten changes the
wrapper function's name (by setting its __name__ attribute) to match the original
the original function's __dict__, so it will have all the same attributes that the original
function did. It also changes the wrapper function's documentation (its __doc__ attribute)
to match the original function's documentation. Thus, if you used this new @require()
decorator stacked over the @author() decorator, the resulting function would still have an
author_name attribute, even though it was a different function object than the original one
being decorated.
Putting It All Together
To illustrate, I'll use a few of these techniques to implement a complete, useful decorator
that can be combined with other decorators. Specifically, I'll implement an @synchronized
decorator (Listing Eleven) that implements Java-like synchronized methods. A given
object's synchronized methods can only be invoked by one thread at a time. That is, as
long as any synchronized method is executing, any other thread must wait until all the
synchronized methods have returned.
Schlumberger Private
function's name, and sets its __dict__ attribute (the dictionary containing its attributes) to
To implement this, you need to have a lock that you can acquire whenever the method is
executing. Then you can create a wrapping decorator that acquires and releases the lock
around the original method call. I'll store this lock in a _sync_lock attribute on the object,
automatically creating a new lock if there's no _sync_lock attribute already present.
But what if one synchronized method calls another synchronized method on the same
object? Using simple mutual exclusion locks would result in a deadlock in this case, so
we'll use a threading.RLock instead. An RLock may be held by only one thread, but it can
be recursively acquired and released. Thus, if one synchronized method calls another on
the same object, the lock count of the RLock simply increases, then decreases as the
methods return. When the lock count reaches zero, other threads can acquire the lock
There are two little tricks being done in Listing Eleven's wrapper code that are worth
knowing about. First, the code uses a try/except block to catch an attribute error in the
case where the object does not already have a synchronization lock. Since in the
common case the lock should exist, this is generally faster than using an if/then test to
check whether the lock exists (because the if/then test would have to execute every time,
but the AttributeError will occur only once).
Second, when the lock doesn't exist, the code uses the setdefault method of the object's
attribute dictionary (its __dict__) to either retrieve an existing value of _sync_lock, or to set
a new one if there was no value there before. This is important because it's possible that
two threads could simultaneously notice that the object has no lock, and then each would
create and successfully acquire its own lock, while ignoring the lock created by the other!
This would mean that our synchronization could fail on the first call to a synchronized
method of a given object.
Schlumberger Private
Using the atomic setdefault operation, however, guarantees that no matter how many
threads simultaneously detect the need for a new lock, they will all receive the same
RLock object. That is, one setdefault() operation sets the lock, then all subsequent
setdefault() operations receive that lock object. Therefore, all threads end up using the
same lock object, and thus only one is able to enter the wrapped method at a time, even if
the lock object was just created.
Conclusion
Python decorators are a simple, highly customizable way to wrap functions or methods,
annotate them with metadata, or register them with a framework of some kind. But, as a
relatively new feature, their full possibilities have not yet been explored, and perhaps the
links to a couple of lists of use cases that were posted to the mailing list for the
developers working on the next version of Python: http://mail.python.org/pipermail/pythondev/2004-April/043902.html and http://mail.python.org/pipermail/python-dev/ 2004April/044132.html.
Each message uses different syntax for decorators, based on some C#-like alternatives
being discussed at the time. But the actual decorator examples presented should still be
usable with the current syntax. And, by the time you read this article, there will likely be
many other uses of decorators out there. For example, Thomas Heller has been working
on experimental decorator support for the ctypes package (http://ctypes.sourceforge.net/),
and I've been working on a complete generic function package using decorators, as part
of the PyProtocols system (http://peak.telecommunity.com/ PyProtocols.html).
So, have fun experimenting with decorators! (Just be sure to practice "safe decs," to
ensure that your decorators will play nice with others.)
Page 66of 191
Schlumberger Private
most exciting uses haven't even been invented yet. Just to give you some ideas, here are
DDJ
Listing One
(a)
import atexit
def goodbye():
print "Goodbye, world!"
Schlumberger Private
atexit.register(goodbye)
(b)
import atexit
@atexit.register
def goodbye():
print "Goodbye, world!"
Back to article
Listing Two
(a)
class Something(object):
Page 67of 191
def someMethod(cls,foo,bar):
print "I'm a class method"
someMethod = classmethod(someMethod)
(b)
class Something(object):
@classmethod
def someMethod(cls,foo,bar):
print "I'm a class method"
Schlumberger Private
Back to article
Listing Three
(a)
def mydecorator(func):
print "decorating", func
return func
print "before definition"
@mydecorator
def some_function():
print "I'm never called, so you'll never see this message"
print "after definition"
(b)
before definition
decorating <function some_function at 0x00A933C0>
after definition
Back to article
Listing Four
def stupid_decorator(func):
Schlumberger Private
Back to article
Listing Five
class traced:
def __init__(self,func):
self.func = func
def __call__(__self,*__args,**__kw):
Back to article
Listing Six
class SomeClass(object):
@classmethod
@traced
def someMethod(cls):
print "Called with class", cls
Something.someMethod()
Back to article
Listing Seven
def traced(func):
def wrapper(*__args,**__kw):
print "entering", func
try:
return func(*__args,**__kw)
finally:
print "exiting", func
return wrapper
Back to article
Schlumberger Private
Listing Eight
def require(expr):
def decorator(func):
def wrapper(*__args,**__kw):
assert eval(expr),"Precondition failed"
return func(*__args,**__kw)
return wrapper
return decorator
@require("len(__args)==1")
def test(*args):
print args[0]
test("Hello world!")
Back to article
Listing Nine
def author(author_name):
def decorator(func):
func.author_name = author_name
return func
return decorator
Schlumberger Private
@author("Lemony Snicket")
def sequenceOf(unfortunate_events):
pass
print sequenceOf.author_name
Back to article
Listing Ten
def require(expr):
def decorator(func):
def wrapper(*__args,**__kw):
assert eval(expr),"Precondition failed"
return func(*__args,**__kw)
wrapper.__name__ = func.__name__
wrapper.__dict__ = func.__dict__
wrapper.__doc__ = func.__doc__
return wrapper
return decorator
Back to article
Listing Eleven
def synchronized(func):
def wrapper(self,*__args,**__kw):
Schlumberger Private
try:
rlock = self._sync_lock
except AttributeError:
from threading import RLock
rlock = self.__dict__.setdefault('_sync_lock',RLock())
rlock.acquire()
try:
return func(self,*__args,**__kw)
finally:
rlock.release()
wrapper.__name__ = func.__name__
wrapper.__dict__ = func.__dict__
wrapper.__doc__ = func.__doc__
Page 73of 191
return wrapper
class SomeClass:
"""Example usage"""
@synchronized
def doSomething(self,someParam):
"""This method can only be entered
by one thread at a time"""
Example:
#python.pth
# This file allows access to the packages in d:\python\lib
# For this purpose, python.pth should be located in D:\Python24\Lib\site-packages\python.pth
d:\python\lib
assuming that there is a package wxpy in d:\python\lib and there is need to import
module widgets out of this package, usage would be:
import wxpy.widgets
Schlumberger Private
of course, all modules within the lib directory are accessible anytime, as if they were in
the standard python location. Hence, there is no need of any specific import:
import path
A path configuration file is a file whose name has the form package.pth
#
#
its contents are additional items (one per line) to be added to sys.path.
Non-existing items are never added to sys.path, but no check is made that
#
#
#
#
Documentation
Pydoc
Docutils
Distributions of Python Software
A "Distribution" is a collection of files that represent a "Release" of a "Project" as of a particular
point in time, denoted by a "Version". Releases may have zero or more "Requirements", which
indicate what releases of another project the release requires in order to function. A Requirement
Page 75of 191
names the other project, expresses some criteria as to what releases of that project are
acceptable, and lists any "Extras" that the requiring release may need from that project. (An Extra
is an optional feature of a Release, that can only be used if its additional Requirements are
satisfied.)
Notice, by the way, that this definition of Distribution is broad enough to include directories
containing Python packages or modules, not just "built distributions" created by the distutils. For
example, the directory containing the Python standard library is a "distribution" by this definition,
and so are the directories you edit your project's code in! In other words, every copy of a project's
code is a "distribution", even if you don't take any special steps to make it one.
Distributions that satisfy these two properties are thus "pluggable", because they can be
automatically discovered and "activated" (by adding them to sys.path), then used for importing
Python modules or accessing other resource files and directories that are part of the distributed
project.
Schlumberger Private
A "Project" is a library, framework, script, application, or collection of data or other files relevant to
Python. "Projects" must have unique names, in order to tell them apart. Currently, PyPI is useful
as a way of registering project names for uniqueness, because the 'name' argument to distutils
'setup()' command is used to identify the project on PyPI, as well as to generate Distributions' file
names.
When Python runs a program, that program must have all its requirements met by importable
distributions in the working set. Initially, a Python program's Working Set consists only of the
importable distributions (whether pluggable or not) listed in sys.path, such as the directory
containing the program's __main__ script, and the directories containing the standard library and
site-packages. If these are the only distributions that the program requires, then of course that
program can run.
The Environment
A set of directories that may be searched for pluggable distributions is called an Environment. By
default, the Environment consists of all existing directories on sys.path, plus any distribution
sources registered with the runtime.
Given an Environment, and a Requirement to be satisfied, our proposed runtime facility would
search the environment for pluggable distributions that satisfy the requirement (and the
requirements of those distributions, recursively), such that it returns a list of distributions to be
added to the working set, or raises a DependencyNotFound error.
Schlumberger Private
However, if some of the requirements are not satisfied by the working set, this can lead to errors
that may be hard to diagnose. So, if a Python program were made part of a Project, and the
project explicitly defines its Requirements, which are then expressed as part of a Pluggable
Distribution, then a runtime facility could automatically attempt to locate suitable pluggables and
add them to the working set, or at least give a more specific error message if a requirement can't
be satisfied.
Note that a Working Set should not contain multiple distributions for the same project, so the
runtime system must not propose to add a pluggable distribution to a Working Set if that set
already contains a pluggable for the same project. If a project's requirements can't be met without
adding a conflicting pluggable to the working set, a VersionConflict error is raised. (Unlike a
working set, an Environment may contain more than one pluggable for a given project, because
these are simply distributions that are *available* to be activated.)
Python Eggs
"Python Eggs" are distributions in specific formats that implement the concept of a "Pluggable
Distribution". An egg may be a zipfile or directory whose name ends with '.egg', that contains
Python modules or packages, plus an 'EGG-INFO' subdirectory containing metadata. An egg
may also be a directory containing one or more 'ProjectName.egg-info' subdirectories with
metadata.
The last form of egg is a '.egg-link' file. These exist to support symbolic linking on platforms that
do not natively support symbolic links (e.g. Windows). These consist simply of a single line
indicating the location of a directory that contains either an EGG-INFO or ProjectName.egg-info
subdirectory. This format will be used by project management utilities to add an in-development
distribution to the development Environment.
Schlumberger Private
The latter form is primarily intended to add discoverability to distributions that -- for whatever
reason -- cannot be restructured to the primary egg format. For example, by placing appropriate
.egg-info directories in site-packages, one could document what distributions are already installed
in that directory. While this would not make those releases capable of being individually
activated, it does allow the runtime system to be aware that any requirements for those projects
are already met, and to know that it should not attempt to add any other releases of
Pluggable distributions can be manually made part of the working set by modifying sys.path. This
can be done via PYTHONPATH, .pth files, or direct code manipulation. However, it is generally
more useful to put distributions in the working set by automatically locating them in an
appropriate Environment.
The default Environment is the directories already on sys.path, so simply placing pluggable
distributions in those directories suffices to make them available for adding to the working set.
But *something* must add them to the working set, even if it is just to designate the project the
current program is part of, so that its dependencies can be automatically resolved and added to
the working set. This means that either a program's start scripts must invoke the
runtime facility and make this initial request, or there must be some automatic means by which
this is accomplished.
For development, however, one does not generally want to have to "install" scripts that one is
actively editing. So, future versions of the runtime facility will have an option to automatically
create wrapper scripts that invoke the in-development versions of the scripts, rather than versions
installed in eggs. This will allow developers to continue to write scripts without embedding any
project or version information in them.
development Environment can include one or more projects whose source code he or she is
editing, as well as any number of built distributions. He or she can then also build source or
binary distributions of their project for deployment, whenever it is necessary or convenient to do
so.
Schlumberger Private
The EasyInstall program accomplishes this by creating wrapper scripts when a distribution is
installed. The wrapper scripts know what project the "real" script is part of, and so can ensure
that the right working set is active when the scripts run. The scripts' author does not need to
invoke the runtime facility directly, nor do they even need to be aware that it exists.
Disutils
Disutils
Setuptools
Easyinstall
Conversion to Executables
A little more complex in usage, but the results seem to be more reliable than py2exe. The trac
project site is here (doc proxy on d :).
After configuring it for one particular python interpreter, you need to run Makespec.py and
Build.py for each project.
Schlumberger Private
PyInstaller
Py2exe
Beware that there are two executable flavours: console and windows.
# setup.py
import py2exe
import sys, os
#this script is only useful for py2exe so just run that distutils command.
#that allows to run it with a simple double click.
sys.argv.append('py2exe')
setup(
options = {'py2exe': {
#'includes':['serial','struct'], #unclear what for it is needed
'excludes': ['javax.comm','TERMIOS','FCNTL'],
#'optimize': 2, #when this flag was set, serial library was not found!
'dist_dir': 'dist25',
}
},
name = "unlockmsc",
Schlumberger Private
console = [
{
'script': "unlockmsc.py",
},
],
#zipfile = "stuff.lib", # what for?
#packages=['serial'],
Schlumberger Private
url = "http://gemalto.com",
)
How does py2exe work and what are all those files?
Let's start from the needed results going back to how py2exe does its job.
Python is an interpreted language and as long as Microsoft will not ship a Python interpreter (and
its accompanying class library) with every copy of its flagship operating systems products, there is
no direct way to execute a Python script on a vanilla Microsoft OS machine. For most casual user
of py2exe, that means that you must create an executable (.exe) file that when clicked on will just
run the script. This is what py2exe does for you. After py2exe has done its magic, you should have
a "dist" directory with all the files necessary to run your python script. No install necessary. Click
and run. No DLL hell, nothing else to download.
Page 82of 191
The actual executable. You can select a custom icon by using some specific
target options (see CustomIcons)
python??.dll
library.zip
This is a standard zip file where all the pure source modules will be inserted
(using the "zipfile" option, you can also select to put that file in a sub-directory
and with a different name)
*.pyd
The pyd files are actually standard Windows DLL (I used the useful depends.exe
to check things around). They are also standard modules for Python. A Python
program can import those pyd. Some applications build pyd to provide
accelerated features. Also they are necessary to provide support to native
functions of the operating system (see also CTypes to never have to use SWIG
again!). Those files also follow into the subdirectory where library.zip will be
installed
*.dll
some pyd probably have some DLL dependencies, and here they come
To run your program needs all those files as a necessary condition. But it might happen that this
is not a sufficient condition. For examples, as encodings are imported "by name". If you use a
feature that requires encodings, you will need to put an option to include encodings
unconditionally or to import it explicitly from one of your script. (see EncodingsAgain and
EvenMoreEncodings). Some other modules (eg pyserial-pyparallel) also conditionally import
modules for each platform. You can avoid the warning by putting the correct "ignores" options in
py2exe. Last but not least, modules like pygtk seem to create a module reference on-the-fly and
therefore the corresponding warnings also are harmless (see ExcludingDlls to learn how to
correct that).
An important point to note: the main script (the one passed as an option to "windows" or
"console" in your setup file) is not put with all the other files in the library.zip. Instead it is bytecompiled (see OptimizedBytecode for some details on optimization) and inserted into a named
resource in the executable shell. This technique also allows you to insert binary string in the final
Page 83of 191
Schlumberger Private
myprog.exe
executable (very nice if you want to add a custom version tag) through the usage of the
"other_resources" options for the target (see CustomDataInExe).
CGI
With the recent rise of web frameworks, like Ruby on Rails, Django, Turbogears and friends, it
might be fair to assume that the old workhorse CGI can be put out to pasture.
Well as recently as July [1], Bruce Eckel blogged about Testing Python CGIs.
CGI is a simple protocol that describes how a webserver passes http requests to a program, and
how that program makes a response.
The Python CGI Module makes understanding http requests very easy. To send a response, write
to stdout.
CGI is still a very good way of connecting simple programs to the web. It's also a great way of
cutting your teeth with web programming, particularly learning about http. In fact WSGI (the
new Python protocol that aims to allow components in a webstack to communicate) aims to be a
modern evolution of CGI.
Quite sophisticated programs can be written as CGIs, but they are more inefficient than modern
web frameworks, as each request is handled by a separate Python process.
I've hacked around with a few CGIs of my own. I've also written a couple of tutorials, if you're
interested in learning :
Schlumberger Private
The beauty of CGI is it's simplicity. Most of the request [2] is passed on stdin, and the program
passes a response back to the server on stdout.
I have a nice utility modules.cgi to retrieve and display as html all the modules in a python
distribution.
Schlumberger Private
Page 85of 191
Contents
Schlumberger Private
The simplest way to create web applications with Python is to use the Common Gateway
Interface (CGI) 1. CGI is just another protocol: it describes how to connect clients to web
applications.
Normally, when you fetch static content from a web server, the server finds the file 2 that
you're requesting and sends it back in a response. For example, a request for
the print statement. And contrary to its reputation, CGI is not necessarily slow. Even
though the Python interpreter launches for each and every script invocation, these days,
you should try CGI before choosing a more complex web application framework.
Let's dive into CGI programming with Python. This first of two parts explains the basics of
CGI, describes how HTML forms are sent, and explains how to process form input. The
next article provides an example application and covers more advanced CGI topics, such
as CGI environment variables, HTML templating, and Unicode.
All code in this article is intended to work with Python 2.2 and beyond.
Headers and Line Endings
Half the battle of writing a web application is returning the right headers in response to a
request. Sending valid headers isn't just important for the receiving client -- if your
program doesn't emit valid headers, the web server assumes that your script has failed
and displays the dreaded Error 500... Internal Server Error.
Page 87of 191
Schlumberger Private
Once you understand how CGI works, producing dynamic content is as simple as using
There are lots of different headers you can send 4. But at a minimum, you must send a
Content-Type
header (in fact, in many situations this may be the only header you need to
send) and you must end your list of headers with a blank line.
All headers are of the form header-type: header-value\r\n. The line ending \r\n is
required to comply with the relevant RFC 5. However, most clients and servers allow just
\n,
which is what you'll get as a normal line ending on UNIX type systems.
Hello World
Let's do the obligatory "Hello, World" program as a CGI:
#!/usr/bin/python
Schlumberger Private
import sys
try:
import cgitb
cgitb.enable()
except ImportError:
sys.stderr = sys.stdout
def cgiprint(inline=''):
sys.stdout.write(inline)
sys.stdout.write('\r\n')
sys.stdout.flush()
thepage = '''<html><head>
<title>%s</title>
</head><body>
%s
</body></html>
'''
h1 = '<h1>%s</h1>'
Schlumberger Private
if __name__ == '__main__':
cgiprint(contentheader) # content header
cgiprint()
The next part of the script is a try/except block that attempts to import the cgitb module.
Normally, errors in a Python program are sent to sys.stderr. However, when running
CGIs, sys.stderr translates to the server error log. But constantly digging out errors from
the error log is a nuisance when debugging. Instead, cgitb pretty-prints tracebacks,
including useful information like variable values, to the browser. (This module was only
introduced in Python 2.2.) If the import fails, stderr redirects to stdout, which does a
similar, but not so effective job. (Do not use the cgitb module in production applications.
The information it displays includes details about your system that may be useful to a
would-be attacker.)
Next, cgiprint() emits two header lines and properly terminates headers with the correct
Content-Type
header. Because the script is sending a web page (which is a form of text)
the type/subtype is text/html. Only one header is sent, then the headers terminate with a
blank line.
cgiprint()
also flushes the output buffer using sys.stdout.flush(). Most servers buffer
the output of scripts until it's completed. For long running scripts, 8 buffering output may
frustrate your user, who'll wonder what's happening. You can either regularly flush your
buffer, or run Python in unbuffered mode. The command-line option to do this is -u, which
you can specify as #!/usr/bin/python -u in your shebang line.
Finally, the script sends a small HTML page, which should look very familiar to you, if
you've used HTML before.
User Interface and HTML FORMs
When writing CGIs, your user interface is the web browser. Combining Javascript,
Dynamic HTML, (DHTML), and HTML forms, you can create rich web applications.
Page 90of 191
Schlumberger Private
line endings. (cgiprint() need only be used for the header lines.) cgiprint() sends a
The basic HTML elements used to communicate with CGIs are forms and form input
components, including text boxes, radio buttons, check boxes, pulldown menus, and the
like. 9
Example Form
A typical, simple HTML form might be coded like this:
<form action="/cgi-bin/formprocessor.py" method="get">
What is Your Name : <input name="param1"
type="text" value="Joe Bloggs" /><br />
<input name="param2" type="radio" value="this"
Schlumberger Private
This translates into something like this (border added for effect):
Page 91of 191
Joe Bloggs
Select This
or That
Check This
and This Too ?
Reset
Submit Query
When the user hits the Submit button, his (or her) form settings are encapsulated into an
HTTP request. Inside the form tag are two parameters that determine how that
encapsulation occurs. The action parameter is the URI of your CGI script. This is where
the request. The two possible methods are GET and POST.
The simpler of the two encoding choices is GET. With GET, the form's values are encoded
to be "URL safe" 10 and are then added onto the end of the URL as a list of parameters.
With POST, the encoded values are sent as the body of the request, after the headers are
sent.
While GET is simpler, the length of URLs is limited. Hence, using GET imposes a maximum
limit on the form entry that can be sent. (About 1,000 characters is the limit for many
servers.) If you're using a form to get a long text entry from your form, use POST. POST is
more suitable for requests where more data is being sent.
11
One advantage of GET, though, is that you can encode values yourself into a normal
HTML link. This means parameters can be sent to your program without the user having
to hit a submit button. An encoded set of values looks like:
param1=value1¶m2=value+2¶m3=value%263
Page 92of 191
Schlumberger Private
the request is sent to. The method parameter specifies how the values are encoded into
(An http GET request has this string added to the URL.) So, the whole URL might become
something like http://www.someserver.com/cgi-
bin/test.py?param1=value1¶m2=value+2 ¶m3=value%263.
The ? separates the URI of your script from the encoded parameters. The & characters
separate the parameters from each other. The + represents a space (which shouldn't be
sent as part of a URI, of course), and the %26 is the encoded value that represents an &.
& shouldn't be sent as part of a value or the CGI would think that a new parameter was
being sent.
If you encode your own values into a URL, use the function urllib.encode() from the
module like this:
Schlumberger Private
urllib
dictionary data type. Each form input element has a name and a corresponding value.
For instance, if the item is a radio button, the value sent is the value of the selected
button. For example, in the form above, the radio button has the name param2 and its
value is either this or that. For a checkbox, say param3 or param4 above, the value sent
is off or on.
Now that you know the basics of how forms are encoded and sent to CGI, it's time to
introduce Python's cgi module. The cgi module is your interface to receiving form
submissions. It makes things very easy.
Page 93of 191
Reading form data is slightly complicated by two facts. First, form input element names
can be repeated, so values can be lists. (Think of a form that allows you to check all of
the answers that apply.) Second, by default, an input element that has no value -- such as
a text box that hasn't been filled in -- will be missing rather than just empty.
The FieldStorage() method of the cgi module returns an object that represents the form
data. It's almost a dictionary. Rather than repeat the page of the manual on using the cgi
module, let's look at a couple of general purpose functions that, given an object created
by FieldStorage(), do return dictionaries.
Functions
Schlumberger Private
first value.
"""
data = {}
for field in valuelist:
if not theform.has_key(field):
# if the field is not present (or was empty)
data[field] = notpresent
else:
# the field is present
Schlumberger Private
if type(theform[field]) != type([]):
# is it a list or a single item
data[field] = theform[field].value
else:
if not nolist:
# do we want a list ?
data[field] = theform.getlist(field)
else:
data[field] = theform.getfirst(field)
# just fetch the first item
return data
"""
Passed a form (cgi.FieldStorage
instance) return *all* the values.
This doesn't take into account
multipart form data (file uploads).
It also takes a keyword argument
'nolist'. If this is True list values
only return their first value.
"""
Schlumberger Private
data = {}
for field in theform.keys():
# we can't just iterate over it, but must use the keys() method
if type(theform[field]) == type([]):
if not nolist:
data[field] = theform.getlist(field)
else:
data[field] = theform.getfirst(field)
else:
data[field] = theform[field].value
return data
def isblank(indict):
Page 96of 191
"""
Passed an indict of values it checks
if any of the values are set. Returns
True if the indict is empty, else
returns False. I use it on the a form
processed with getform to tell if my
CGI has been activated without any
form values.
"""
Schlumberger Private
For almost all CGIs that receive input from a form, you'll know what parameters to expect.
(After all, you probably wrote the form.) If you pass the getform() function your
FieldStorage()
dictionary of values. Any missing parameters have the default value '', unless you modify
the notpresent keyword. If you want to make sure that you don't receive any list values,
set the nolist keyword. If a form variable was a list, nolist returns only the first value in
the list.
Or, if you want to retrieve all of the values sent by the form, use the getall() function
above. It also accepts the optional nolist keyword argument.
isblank()
dictionary returned by getall() or getform() is empty. If it is, the CGI was called without
parameters. In that case, it's typical to generate a welcome page and a form. If the
dictionary isn't blank (isblank() returns False), there's a form to process.
Using getform()
In the next article, all of these functions properly will be used to build a basic application.
But to illustrate their use here, let's process a submission from the Example Form. This
program snippet needs the functions above and the first few lines from Hello World.
import cgi
Schlumberger Private
mainpage = '''<html><head><title>Receiving a \
Form</title></head><body>%s</body></html>'''
error = '''<h1>Error</h1><h2>No Form Submission Was Received</h2>'''
result = '''<h1>Receiving a Form Submission</h1>
<p>We received the following parameters from the form :</p>
<ul>
<li>Your name is "%s".</li>
<li>You selected "%s".</li>
<li>"this" is "%s". </li>
<li>"this too" is "%s". </li>
<li> A hidden parameter was sent "%s".</li>
</ul>
Page 98of 191
'''
possible_parameters = ['param1', 'param2', 'param3', 'param4', \
'hidden_param']
if __name__ == '__main__':
cgiprint(contentheader) # content header
cgiprint()
theform = cgi.FieldStorage()
Schlumberger Private
Let's walk through this code. There are three main chunks of html: mainpage is the frame
of the page, which just needs the body to be inserted into it. Then error displays if the
script is called without parameters. However, if the script is called from a form
submission, then the parameters are extracted and put into result.
The script prints the obligatory headers and then creates the FieldStorage instance to
represent the form submission. theform is then passed to the function getform(), along
with the list of expected parameters.
If no form submission was made, then all the values in the dictionary returned by
are blank ('' in fact). In this case isblank() returns True and body is set to be
Schlumberger Private
getform()
instead of getform(). You can then check for the presence of specific parameters and
perform different actions based on which form has been submitted:
formdict = getall(theform)
if formdict.has_key('rating'):
process_feedback(formdict)
# user is submitting feedback
elif formdict.has_key('email'):
subscribe(formdict)
# user is subscribing to the email list
Schlumberger Private
else:
optionlist()
# display a form with all the options in
Using getall(), you can actually turn our last example script into something a bit more
generic and useful. :
import cgi
mainpage = '''<html><head><title>Receiving a \
Form</title></head><body>%s</body></html>'''
result = '''<h1>Receiving a Form Submission</h1>
<p>We received the following parameters from the form :</p>
<ul>%s</ul>'''
li = "<li>%s = %s</li>"
if __name__ == '__main__':
cgiprint(contentheader) # content header
cgiprint()
theform = cgi.FieldStorage()
formdict = getall(theform)
params = []
Schlumberger Private
This code gets all the parameters submitted to it using getall(). It then inserts them into
the page as an unordered list. If you send this script a form submission, the page it
displays shows you all the parameters received, where each line will look like parameter
= value.
Because the line of code that produces this uses the str() function for each
gather information from your user. For example, a list of areas they are interested in for
newsletters you may be sending out:
<form action="/cgi-bin/formprocessor.py" method="get">
What is Your Name : <input name="name"
type="text" value="Joe Bloggs" /><br />
Email Address : <input name="email"
type="text" /><br />
<input name="interests" type="checkbox" value="computers" />Computers<br />
<input name="interests" type="checkbox" value="sewing" />Sewing<br />
Schlumberger Private
When the form above is submitted, it will have a value for the users name, their email
address, and a list of all the interests they checked. The code to directly fetch the value
from the FieldStorage instance is:
import cgi
theform = cgi.FieldStorage()
interests = theform['interests'].value
The difficulty have is that if the user only checks one choice, then interests is a single
value rather than the list we are expecting. The alternative is to use the higher level
methods available in FieldStorage.
The getlist() method always returns a list, even if only a single value was supplied. If no
boxes at all were checked, it returns an empty list.
import cgi
theform = cgi.FieldStorage()
interests = theform.getlist('interests')
Schlumberger Private
It would be very easy to adapt the getform() and getall() functions to your particular
run. Be sure to also the proper path in the shebang line for the server. (See The Error 500
Checklist section for a few other pitfalls.)
You can find a web page full of CGI examples at
http://www.voidspace.org.uk/python/cgi.shtml. These are available to test or download.
They include an online anagram generator and various smaller test scripts. There is also
a complete framework for doing user authentication and management from CGI, called
logintools.
The Error 500 Checklist
Debugging CGIs can be frustrating. By default any problem with your CGI script results in
Was your script uploaded to the server in 'text' mode? (Is your FTP client set to recognize
.py files as text?)
Have you set the script permissions to mode 755 (executable by everyone)? 12
Have you set the path to Python in the first line correctly?
Did you print valid header lines, including the final blank line?
Finally, some servers require the script to be in the cgi-bin folder (or a subdirectory) and
some even require the file extension to be .cgi rather than .py.
Conclusion
We've covered all the basics of CGI. The information here is enough to get you up and
running, and at least looking in the right direction for information.
Schlumberger Private
the anonymous error 500. Actual details of the error are written into the server log, which
There's a lot more though: character encoding, using templates to output similar HTML
code repeatedly, and finding out about the HTTP request the user sent, to mention just a
few topics.
In the next part of "Python at the Other End of the Web," we'll touch on these subjects
when we use what we've learnt so far to build an example application.
[1] The full CGI specification can be found at http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
[2]
There's actually no requirement for URLs to map directly to files, but for static content it's the
obvious way of doing it.
The RFC stating that headers end '\r\n' is the very long RFC 2616. See
http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
Most Python CGIs will run on Linux type servers. If header lines are sent using the normal
[6] 'print' command then they will be terminated with '\n'. This is technically invalid, but usually
won't matter.
[7]
On shared hosting accounts, CGI scripts are likely to be restricted to a maximum running time
of 60 seconds or even 30 seconds. After this, the server usually kills them. If you use your own
[8]
server this won't be a problem of course. Using something like mod_python may be a way
round this CGI restriction, or you can code around it by "chaining" requests.
[9] There is a good forms tutorial at
http://www.csd.abdn.ac.uk/~apreece/teaching/CS1009/practicals/forms.html (It's HTML
Schlumberger Private
One common alternative is to embed an interpreter into your server, for example using
Apache with mod_python. This means that the interpreter doesn't have to restart in between
[3]
requests. This can also make session management easier. It does introduce a whole host of
other problems of course. Another alternative is to use a special application server like Zope.
Schlumberger Private
Quick receipt
import win32api
win32api.ShellExecute(0,None,"winword.exe",None,"",1)
"\\" + sfile
None is default
open
"winword.exe",
spath+ "\\"+sfile,
Schlumberger Private
def PrintUsingWinWord(filename):
app = r"C:\MSOFFICE\WINWORD\WINWORD.EXE"
spawnv(P_NOWAIT, app, (app, "/n"))
print 'spawned WinWord, sleeping 5 seconds'
sleep(5)
s = windde.dde_session()
s.initialize()
CC = windll.membuf(36)
CC.write(struct.pack ('l',36) + (32 * '\000'))
retries = 10
while retries:
try:
s.connect('WinWord', 'System', CC)
Patterns
There are three categories of object-oriented patterns:
Creational patterns. There are two main categories of creational patterns: those for creating objects without having
to know the class name, that you could call "abstract object makers" (abstract factory and factory method), and those
to ensure a certain property regarding object creation, such as prohibiting more than one instance for a class
(singleton), building a set of instances from different classes in a consistent way (builder), or creating an instance with
a specific state (prototype).
Abstract factory: an abstract factory is an object maker, where, instead of specifying a class name, you
specify the kind of object you want. For instance, say that you want to create an agent to run analysis
programs, you can ask a factory to do it for you:
clustalw = AnalysisFactory.program('clustalw')
result = clustalw.run(seqfile = 'myseqs')
print result.alig
Schlumberger Private
Typology
The clustalw object is an instance of, say, the AnalysisAgent.Clustalw class, but you do not have to know
about it at creation time. The only thing you know is the name of the program you want ('clustalw'), and the
factory will do the rest for you.
Factory method: a factory method is very similar to an abstract factory: just, instead of being a class, it is a
method.
For instance, you can create a sequence object (Bio.Seq.Seq in Biopython) by asking the
get_seq_by_num method of an alignment object (Bio.Align.Generic.Alignment):
first_seq = align.get_seq_by_num(0)
The method which creates this instance of Bio.Seq.Seq is a factory method. The difference with a
factory class is also that the factory method is often more than an object maker: it sometimes
incorporates much more knowledge about the way to create the object than a factory would.
A more simple factory method would be a new method defined in a class to create new instances of
the same class:
my_scoring = scoring.new()
p.new_curve([...])
that would create an instance of the Curve class and draw it as soon as created.
Singleton: ensures that you cannot create more than one instance. For example, if you can define a class to
contain operations and data for genetic code: you need only one instance of this class to perform the task.
Actually, this pattern would not be implemented with a class in Python, but rather with a module, at least if
you can define it statically (a dynamic singleton could not be a module, for a module has to be a file):
>>> import genetic_code
>>> genetic_code.aa('TTT')
Prototype: this pattern also lets you create a new object without knowing its class, but here, the new target
object is created with the same state as the source object:
Page 110of 191
Schlumberger Private
In this case, notice that in order to create my_scoring, you really do not have to know the actual
class of scoring: the only thing you know is that you will get the same one, even if there is a whole
hierarchy of different classes of scoring.
another_seq = seq.copy()
The interest is that you do not get an "empty" object here, but an object identical to seq. You can thus play
with another_seq, change its attributes, etc... without breaking the original object.
Builder: you sometimes need to create a complex object composed of several parts. This is the role of the
builder.
For instance, a builder is needed to build the whole set of nodes and leafs of a tree.
Or you could design a builder to create both Curve and Plot instances in a coherent way as parts of a
GraphicalCurves complex object:
gc = GraphicalCurves(file='my_curves')
For instance, my_curves might contain description of set of curves to display in the same plot or in
different plots.
The Blast parser in Biopython simultaneously instantiates several classes that are all component
parts of of hit: Description, Alignment and Header.
Structural patterns. Structural patterns address issues regarding how to combine and structure objects. For this
reason, several structural patterns provide alternative solutions to design problems that would else involve inheritance
relationships between classes.
Decorator, proxy, adapter: these patterns all enable to combine two (or more) components, as shown in
Figure 18.10. There is one component, A, "in front" of another one, B. A is the visible object a client will see.
The role of A is either to extend or restrict B, or help in using B. So, this pattern is similar to subclassing,
except that, where a sub-class inherits a method from a base class, the decorator delegates to its decoratee
when it does not have the required method. The advantage is flexibility (see Section 18.4.2): you can combine
several of these components in any order at run time without having to create a big and complex hierarchy of
subclasses.
Delegation
Schlumberger Private
a = A(b)
print a.f()
Everything that class A cannot perform is forwarded to b (providing that class B knows about it).
The decorator enables to add functionalities to another object. Example 18.8 shows a very simple
decorator that prints a sequence in uppercase.
An uppercase sequence class
import string
class UpSeq:
Schlumberger Private
b = B()
def __str__(self):
return string.upper(self.seq)
def __getattr__(self,name):
return getattr(self.seq,name)
'atc'
>>> len(s)
The proxy rather handles the access to an object. There are several kinds of proxy:
protection proxy: to protect the access to an object.
virtual proxy: to physically fetch data only when needed. Database dictionaries in
Biopython work this way:
prosite = Bio.Prosite.ExPASyDictionary()
entry = prosite['PS00079']
Data are fetched only when an access to an entry is actually requested.
But the Pise wrapper enables to run it and get the result by a Python program, having an interface
defined in the Python language:
factory = PiseFactory()
Page 113of 191
Schlumberger Private
>>> s[0:3]
golden = factory.program("golden",db="embl",query="MMVASP")
job = golden.run()
print job->content()
Composite: this pattern is often used to handle complex composite recursive structures. Example 18.9 shows
a set of classes for a tree structure, illustrated in Figure 18.11. The main idea of the composite design pattern
is to provide an uniform interface to instances from different classes in the same hierarchy, where instances
are all components of the same composite complex object. In Example 18.9, you have two types of nodes:
Node and Leaf, but you want a similar interface for them, that is at least defined by a common base class,
AbstractNode, with two operations: print subtree. These operations should be callable on any node instance,
without knowing its actual sub-class.
>>> t1 = Node ( Leaf ( 'A', 0.71399),
Node ( Node ( Leaf('B', -0.00804),
Schlumberger Private
Leaf('C', 0.07470),
0.15685),
Leaf ('D', -0.04732)
)
)
>>> print t1
(A: 0.71399, ((B: -0.00804, C: 0.0747), D: -0.04732))
>>> t2 = t1.right.subtree()
>>> print t2
((B: -0.00804, C: 0.0747), D: -0.04732)
>>> t3 = t1.left.subtree()
>>> print t3
'A': 0.71399
A composite tree
class AbstractNode:
Schlumberger Private
def __str__(self):
pass
def subtree(self):
pass
class Node(AbstractNode):
def __init__(self, left=None, right=None, length=None):
self.left=left
self.right=right
self.length=length
def __str__(self):
return "(" + self.left.__str__() + ", " + self.right.__str__() + ")"
def subtree(self):
return Node(self.left, self.right)
Page 115of 191
class Leaf(AbstractNode):
def __init__(self, name, length=None):
self.name = name
self.length=length
self.left = None
self.right = None
def __str__(self):
return self.name + ": " + str(self.length)
Schlumberger Private
def subtree(self):
return Leaf(self.name, self.length)
Abstract class AbstractNode, base class for both Node and Leaf
Internal nodes are instances of Node class.
Leafs are instances of Leaf class.
Behavioral patterns. Patterns of this category are very useful in sequence analysis, where you often have to combine
algorithms and to analyze complex data structure in a flexible way.
Template: this pattern consists in separating the invariant part of an algorithm from the variant part. In a
sorting procedure, you can generally separate the function which compares items from the main body of the
algorithm. The template method, in this case, is the sort() method, whereas the compare() can be defined by
each subclass depending on its implementation or data types. In the dynamic programming align function, the
template methods can be the begin end and inner methods. The Scoring method may vary depending on the
preferred scoring scheme, as defined in subclasses of the Scoring class.
Strategy: it is the object-oriented equivalent of passing a function as an argument. The Scoring is a strategy.
Page 116of 191
Variations on methods. So, you don't necessarily need inheritance to have a function be a parameter. For
instance, the following function enables you to provide your own functions for score_gap and compare:
def align(matrice, begin, inner, end, score_gap, compare):
...
So, do we need a subclass for this or even to have a Scoring class at all? The answer is that with class and
inheritance, your methods are all pooled together in a "packet", but there is some additional burden on
your side, since you have to define a subclass and instantiate it. On the other hand, passing a function as a
parameter has a limit: you can't change the default values for parameters, such as the default value for a
gap initiation or extension in our example.
Usually, one distinguishes internal versus external iterators. An external iterator is an iterator which enables
to do a for or a while loop on a range of values that are returned by the iterator:
for e in l.elements():
f(e)
or:
i = l.iterator()
e = i.next()
while e:
f(e)
e = i.next()
In the above examples, you control the loop. On the other hand, an internal iterator just lets you define a
function or a method (say, f) to apply to all elements:
l.iterate(f)
Schlumberger Private
Iterator: an iterator is an object that let you browse a sequence of items from the beginning to the end.
Generally, it provides:
a method to start iteration
a method to get the next item
a method to test for the end of the iteration
In the Biopython package, files and databases are generally available through an iterator.
handle = open(...)
iter = Bio.Fasta.Iterator(handle)
seq = iter.next()
while seq:
print seq.name
print seq.seq
seq = iter.next()
handle.close()
Visitor: this pattern is useful to specify a function that will be applied on each item of a collection. The
Python map function provides a way to use visitors, such as the f function, which visits each item of the l list
in turn:
>>> def f(x):
return x + 1
>>> l=[0,1,2]
>>> map(f,l)
[1, 2, 3]
Observer The observer pattern provides a framework to maintain a consistent distributed state between
loosely coupled components. One agent, the observer, is in charge of maintaining a list of subscribers, e.g
components that have subscribed to be informed about changes in a given state. Whenever a change occurs in
a state, the observer has to inform each subscriber about it.
Page 118of 191
Schlumberger Private
mvc
A well-known example is the Model-View-Controller framework. The view components, the ones who actually
display data, subscribe to "edit events" in order to be able to refresh and redisplay them whenever a change occurs.
MVC can be organized around the classes:
deque objects
deque([iterable])
Returns a new deque objected initialized left-to-right (using append()) with data from
iterable. If iterable is not specified, the new deque is empty.
Deques are a generalization of stacks and queues (the name is pronounced
``deck'' and is short for ``double-ended queue''). Deques support thread-safe,
memory efficient appends and pops from either side of the deque with
approximately the same O(1) performance in either direction.
Though list objects support similar operations, they are optimized for fast fixedlength operations and incur O(n) memory movement costs for "pop(0)" and
"insert(0, v)" operations which change both the size and position of the
underlying data representation. New in version 2.4.
Deque objects support the following methods:
Page 119of 191
Schlumberger Private
DQueues
append(x)
Add x to the right side of the deque.
appendleft(x)
Add x to the left side of the deque.
clear()
Remove all elements from the deque leaving it with length 0.
extend(iterable)
Extend the right side of the deque by appending elements from the iterable argument.
extendleft(iterable)
pop()
Remove and return an element from the right side of the deque. If no elements are
present, raises an IndexError.
popleft()
Remove and return an element from the left side of the deque. If no elements are
present, raises an IndexError.
remove(value)
Removed the first occurrence of value. If not found, raises a ValueError. New in
version 2.5.
rotate(n)
Rotate the deque n steps to the right. If n is negative, rotate to the left. Rotating one step
to the right is equivalent to: "d.appendleft(d.pop())".
In addition to the above, deques support iteration, pickling, "len(d)", "reversed(d)",
"copy.copy(d)", "copy.deepcopy(d)", membership testing with the in operator, and
subscript references such as "d[-1]".
Page 120of 191
Schlumberger Private
Extend the left side of the deque by appending elements from iterable. Note, the series of
left appends results in reversing the order of elements in the iterable argument.
Example:
>>> from collections import deque
>>> d = deque('ghi')
...
print elem.upper()
G
H
I
>>> d.appendleft('f')
>>> d
deque
>>> d.pop()
item
'j'
>>> d.popleft()
item
'f'
>>> list(d)
'g'
>>> d[-1]
'i'
Page 121of 191
Schlumberger Private
>>> d.append('j')
>>> list(reversed(d))
reverse
True
>>> d.extend('jkl')
>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> d.rotate(1)
# right rotation
>>> d
>>> d.rotate(-1)
Schlumberger Private
>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> deque(reversed(d))
>>> d.pop()
>>> d.extendleft('abc')
order
>>> d
deque(['c', 'b', 'a'])
Recipes
This section shows various approaches to working with deques.
The rotate() method provides a way to implement deque slicing and deletion. For
example, a pure python implementation of del d[n] relies on the rotate() method to
position elements to be popped:
def delete_nth(d, n):
d.popleft()
d.rotate(n)
To implement deque slicing, use a similar approach applying rotate() to bring a target
element to the left side of the deque. Remove old entries with popleft(), add new
entries with extend(), and then reverse the rotation.
With minor variations on that approach, it is easy to implement Forth style stack
manipulations such as dup, drop, swap, over, pick, rot, and roll.
A roundrobin task server can be built from a deque using popleft() to select the
current task and append() to add it back to the tasklist if the input stream is not
exhausted:
def roundrobin(*iterables):
pending = deque(iter(i) for i in iterables)
while pending:
Page 123of 191
Schlumberger Private
d.rotate(-n)
task = pending.popleft()
try:
yield task.next()
except StopIteration:
continue
pending.append(task)
print value
a
Schlumberger Private
d
e
b
f
c
g
h
Multi-pass data reduction algorithms can be succinctly expressed and efficiently coded by
extracting elements with multiple calls to popleft(), applying the reduction function,
and calling append() to add the result back to the queue.
For example, building a balanced binary tree of nested lists entails reducing two adjacent
nodes into one by grouping them in a list:
def maketree(iterable):
d = deque(iterable)
Page 124of 191
Heap queue
This module provides an implementation of the heap queue algorithm, also known as the
Heaps are arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting
elements from zero. For the sake of comparison, non-existing elements are considered to
be infinite. The interesting property of a heap is that heap[0] is always its smallest element.
The API below differs from textbook heap algorithms in two aspects: (a) We use zerobased indexing. This makes the relationship between the index for a node and the
indexes for its children slightly less obvious, but is more suitable since Python uses zerobased indexing. (b) Our pop method returns the smallest item, not the largest (called a
"min heap" in textbooks; a "max heap" is more common in texts because of its suitability
for in-place sorting).
These two make it possible to view the heap as a regular Python list without surprises:
heap[0] is the smallest item, and heap.sort() maintains the heap invariant!
Schlumberger Private
To create a heap, use a list initialized to [], or you can transform a populated list into a
heap via function heapify().
The following functions are provided:
heappush(heap, item)
Push the value item onto the heap, maintaining the heap invariant.
heappop(heap)
Pop and return the smallest item from the heap, maintaining the heap invariant. If the
heap is empty, IndexError is raised.
heapify(x)
heapreplace(heap, item)
Pop and return the smallest item from the heap, and also push the new item. The heap
size doesn't change. If the heap is empty, IndexError is raised. This is more efficient
than heappop() followed by heappush(), and can be more appropriate when using a
fixed-size heap. Note that the value returned may be larger than item! That constrains
reasonable uses of this routine unless written as part of a conditional replacement:
if item > heap[0]:
item = heapreplace(heap, item)
Example of use:
>>> from heapq import heappush, heappop
>>> heap = []
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> for item in data:
...
heappush(heap, item)
...
Page 126of 191
Schlumberger Private
>>> sorted = []
>>> while heap:
...
sorted.append(heappop(heap))
...
>>> print sorted
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> data.sort()
>>> print data == sorted
True
>>>
Schlumberger Private
The module also offers two general purpose functions based on heaps.
Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for all k, counting elements
from 0. For the sake of comparison, non-existing elements are considered to be infinite.
The interesting property of a heap is that a[0] is always its smallest element.
The strange invariant above is meant to be an efficient memory representation for a
tournament. The numbers below are k, not a[k]:
0
10
11
12
13
14
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
In the tree above, each cell k is topping 2*k+1 and 2*k+2. In an usual binary tournament we
see in sports, each cell is the winner over the two cells it tops, and we can trace the
winner down the tree to see all opponents s/he had. However, in many computer
applications of such tournaments, we do not need to trace the history of a winner. To be
more memory efficient, when a winner is promoted, we try to replace it by something else
at a lower level, and the rule becomes that a cell and the two cells it tops contain three
different items, but the top cell "wins" over the two topped cells.
If this heap invariant is protected at all time, index 0 is clearly the overall winner. The
simplest algorithmic way to remove it and find the "next" winner is to move some loser
(let's say cell 30 in the diagram above) into the 0 position, and then percolate this new 0
down the tree, exchanging values, until the invariant is re-established. This is clearly
Page 128of 191
Schlumberger Private
logarithmic on the total number of items in the tree. By iterating over all items, you get an
O(n log n) sort.
A nice feature of this sort is that you can efficiently insert new items while the sort is going
on, provided that the inserted items are not "better" than the last 0'th element you
extracted. This is especially useful in simulation contexts, where the tree holds all
incoming events, and the "win" condition means the smallest scheduled time. When an
event schedule other events for execution, they are scheduled into the future, so they can
easily go into the heap. So, a heap is a good structure for implementing schedulers (this
is what I used for my MIDI sequencer :-).
are good for this, as they are reasonably speedy, the speed is almost constant, and the
worst case is not much different than the average case. However, there are other
representations which are more efficient overall, yet the worst cases might be terrible.
Heaps are also very useful in big disk sorts. You most probably all know that a big sort
implies producing "runs" (which are pre-sorted sequences, which size is usually related to
the amount of CPU memory), followed by a merging passes for these runs, which
merging is often very cleverly organised5.1. It is very important that the initial sort
produces the longest runs possible. Tournaments are a good way to that. If, using all the
memory available to hold a tournament, you replace and percolate items that happen to fit
the current run, you'll produce runs which are twice the size of the memory for random
input, and much better for input fuzzily ordered.
Moreover, if you output the 0'th item on disk and get an input which may not fit in the
current tournament (because the value "wins" over the last output value), it cannot fit in
the heap, so the size of the heap decreases. The freed memory could be cleverly reused
Page 129of 191
Schlumberger Private
Various structures for implementing schedulers have been extensively studied, and heaps
immediately for progressively building a second heap, which grows at exactly the same
rate the first heap is melting. When the first heap completely vanishes, you switch heaps
and start a new run. Clever and quite effective!
In a word, heaps are useful memory structures to know. I use them in a few applications,
and I think it is good to keep a `heap' module around. :-)
Footnotes
... organised5.1
Event scheduler
The sched module defines a class which implements a general purpose event scheduler:
class scheduler(timefunc, delayfunc)
The scheduler class defines a generic interface to scheduling events. It needs two
functions to actually deal with the ``outside world'' -- timefunc should be callable without
arguments, and return a number (the ``time'', in any units whatsoever). The delayfunc
function should be callable with one argument, compatible with the output of timefunc,
and should delay that many time units. delayfunc will also be called with the argument 0
after each event is run to allow other threads an opportunity to run in multi-threaded
applications.
Example:
Page 130of 191
Schlumberger Private
The disk balancing algorithms which are current, nowadays, are more annoying than
clever, and this is a consequence of the seeking capabilities of the disks. On devices which
cannot seek, like big tape drives, the story was quite different, and one had to be very
clever to ensure (far in advance) that each tape movement will be the most effective
possible (that is, will best participate at "progressing" the merge). Some tapes were even
able to read backwards, and this was also used to avoid the rewinding time. Believe me,
real good tape sorts were quite spectacular to watch! From all times, sorting has always
been a Great Art! :-)
print time.time()
...
...
...
s.run()
...
print time.time()
...
Schlumberger Private
>>> print_some_times()
930343690.257
From print_time 930343695.274
From print_time 930343700.273
930343700.276
lock. When the mutex is unlocked while the queue is not empty, the first queue
entry is removed and its function(argument) pair called, implying it now has the lock.
Of course, no multi-threading is implied - hence the funny interface for lock(),
where a function is called once the lock is acquired.
Schlumberger Private
Threading
Threading is a technique for decoupling tasks which are not sequentially dependent. Threads
can be used to improve the responsiveness of applications that accept user input while other
tasks run in the background. A related use case is running I/O in parallel with computations in
another thread.
The following code shows how the high level threading module can run tasks in background
while the main program continues to run:
import threading, zipfile
class AsyncZip(threading.Thread):
def __init__(self, infile, outfile):
threading.Thread.__init__(self)
Page 132of 191
self.infile = infile
self.outfile = outfile
def run(self):
f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
f.write(self.infile)
f.close()
print 'Finished background zip of: ', self.infile
background.start()
print 'The main program continues to run in foreground.'
background.join()
Queue
The Queue module is often used for inter-thread communication. This small example shows a
single Queue being created, as well as a Receiver object and a Sender object. The Sender
puts messages into the Queue, which the Receiver receives and prints out.:
Page 133of 191
import threading
from Queue import Queue
class Receiver(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
Schlumberger Private
x = self.queue.get() #blocks
print x
class Sender(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
self.queue.put("Hello")
self.queue.put("from")
self.queue.put("the")
Page 134of 191
self.queue.put("sender!")
break
q = Queue()
r = Receiver(q) #pass in the Queue
s = Sender(q) #pass in the same Queue
r.start()
s.start() # causes messages to get sent, which Receiver will print
s.join() #Only wait for s to end
Schlumberger Private
If you have a process that you want to do several things at the same time, threads may be the
answer for you. They let you set up a series of processes (or sub-processes) each of which can be
run independently, but which can be brought back together later and/or co-ordinated as they
run.
For many applications, threads are overkill but on some occasions they can be useful.
A PYTHON APPLICATION WHERE THREADS WOULD HELP
Let's say that you want to check the availability of many computers on a network ... you'll use
ping. But there's a problem - if you "ping" a host that's not running it takes a while to timeout, so
that when you check through a whole lot of systems that aren't responding - the very time a
quick response is probably needed - it can take an age.
Here's a Python program that "ping"s 10 hosts in sequence.
Page 135of 191
import os
import re
import time
import sys
lifeline = re.compile(r"(\d) received")
report = ("No response","Partial Response","Alive")
print time.ctime()
Schlumberger Private
That was 28 seconds - in other words, an extra 3 seconds per unavailable host.
THE SAME APPLICATION, WRITTEN USING PYTHON THREADS
I'll write the application and test it first ... then add a few notes at the bottom.
import os
import re
import time
import sys
from threading import Thread
Schlumberger Private
class testit(Thread):
def __init__ (self,ip):
Thread.__init__(self)
self.ip = ip
self.status = -1
def run(self):
pingaling = os.popen("ping -q -c2 "+self.ip,"r")
while 1:
line = pingaling.readline()
if not line: break
igot = re.findall(testit.lifeline,line)
if igot:
self.status = int(igot[0])
testit.lifeline = re.compile(r"(\d) received")
report = ("No response","Partial Response","Alive")
print time.ctime()
pinglist = []
for host in range(60,70):
ip = "192.168.200."+str(host)
Page 137of 191
current = testit(ip)
pinglist.append(current)
current.start()
for pingle in pinglist:
pingle.join()
print "Status from ",pingle.ip,"is",report[pingle.status]
print time.ctime()
And running:
Schlumberger Private
3 seconds - much more acceptable than the 28 seconds that we got when we pinged the hosts
one by one and waited on each.
HOW DOES IT WORK?
We're going to run code concurrently to ping each host computer and (being Python), we create
an object for each of the concurrent tests (threads) we wish to run. Each of these objects inherits
from the Thread class so that we can use all of the logic already written in Python to provide our
parallelism.
Page 138of 191
Although the constructor builds the thread, it does not start it; rather, it leaves the thread based
object at the starting gate. The start method on the testit object actually triggers it off internally, the start method triggers the run method of the testit class and alse returns to the
calling code. In other words, this is the point at which the parallel processing actually starts, and
the run method is called indirectly (via a callback, in a similar way to sort and map make
callbacks).
Once parallel processes have started, you'll want some way to bring the responses back together
at the end and in this first simple example, I've used a join. Note that this is NOT the same join
method that you have in the string class ;-)
A join waits for a run method to terminate. So, having spawned a series of 10 pings in our
example, our code waits for them to finish .... and it waits in the order that they were started.
Some will be queuing up and completed long before our loop of joins gets to them, but that
doesn't matter!
If you're going to be writing a threaded application, there are (broadly) two main approaches you
can take. The example that I've shown above uses a separate thread to take each request
through from beginning to end and all the threads have the same structure. An alternative
strategy is to write a series of "worker" threads each of which performs a step in a multistep
process, and have them passing data on to one another. It's the difference between having an
employee to walk each order through your factory and having employees at a production line
passing each job on.
Where you're running threads, you have to be very much aware of the effect they can have on
one another. As soon as you have two workers, their work may interfere with one another and
you get involved in synchronisation (locking) of objects to ensure that this doesn't happen.
Locking / synchronisation brings its own further complications in that you have to involve
"deadlocks" where two threads both require two resources ... each grabs the first, then waits for
the second which (because the other has it) never becomes available.
FOOTNOTES
Schlumberger Private
We've used the operating system's ping process in this example program. Ping responses vary
between operating systems and you may need to alter the ping command and regular expression
to match the response. The example above has been tested on Fedora Core Linux.
Threading makes heavy use of Operating System capabilities and is NOT as portable (no matter
what language you're programming in) than most code
Schlumberger Private
Page 140of 191
Let us say you write, in Python, a nifty utility that lets you filter your mail.
Thus threads increase the responsiveness of your programs. Threads also increase efficiency and
speed of a program, not to mention the algorithmic simplicity.
Combined with the power of python, this makes programming in python very attractive indeed.
The Basics
Let us first see how to start a simple thread. Threading is supported via the thread and
threading modules. These modules are supposed to be optional, but if you use an OS that
doesn't support threading, you'd better switch to Linux.
The code given below runs a simple thread in the background. (Text version)
Page 141of 191
Schlumberger Private
You build a GUI Frontend using PyGTK. Now if you embed the filter code in the frontend, you risk
making the application unresponsive (you still have a dial up connection, and any server
interaction entails a considerable waiting time). Since you don't work at Microsoft, you decide
this is unacceptable and thus you start a separate thread each time you want to filter your mail.
#!/usr/bin/env python
import time
import thread
def myfunction(string,sleeptime,*args):
while 1:
print string
Schlumberger Private
if __name__=="__main__":
thread.start_new_thread(myfunction,("Thread No:1",2))
while 1:pass
We start a new thread by using the start_new_thread() function which takes the address of the
object to be run, along with arguments to be passed to the object, which are passed in a tuple.
Locks
Now that we have one thread running, running multiple threads is as simple as calling
start_new_thread() multiple times. The problem now would be to synchronize the many threads
Page 142of 191
which we would be running. Synchronization is done using a Lock object. Locks are created
using the allocate_lock() factory function.
Locks are used as mutex objects, and are used for handling critical sections of code. A thread
enters the critical section by calling the acquire() method, which can either be blocking or nonblocking. A thread exits the critical section, by calling the release() method.
The following listing shows how to use the Lock object. (Text version)
#!/usr/bin/env python
import time
import thread
Schlumberger Private
def myfunction(string,sleeptime,lock,*args):
while 1:
#entering critical section
lock.acquire()
print string," Now Sleeping after Lock acquired for ",sleeptime
time.sleep(sleeptime)
print string," Now releasing lock and then sleeping again"
lock.release()
#exiting critical section
time.sleep(sleeptime) # why?
if __name__=="__main__":
lock=thread.allocate_lock()
Page 143of 191
thread.start_new_thread(myfunction,("Thread No:1",2,lock))
thread.start_new_thread(myfunction,("Thread No:2",2,lock))
while 1:pass
The code given above is fairly straight forward. We call lock.acquire() just before entering the
critical section and then call lock.release() to exit the critical section.
The inquisitive reader now may be wondering why we sleep after exiting the critical section.
Let us examine the output of the above listing.
Schlumberger Private
Output.
Here every thread is given an opportunity to enter the critical section. But the same cannot be
said if we remove time.sleep(sleeptime) from the above listing.
Page 144of 191
Why does this happen? The answer lies in the fact that Python is not fully threadsafe. Unlike
Java, where threading was considered to be so important that it is a part of the syntax, in Python
threads were laid down at the altar of Portability.
Not all built-in functions that may block waiting for I/O allow other threads to run. (The most
popular ones (time.sleep(), file.read(), select.select()) work as expected.)
It is not possible to interrupt the acquire() method on a lock -- the KeyboardInterrupt exception
will happen after the lock has been acquired.
Schlumberger Private
Page 146of 191
What this means is that quite probably any code like the following:
while 1:
lock.acquire()
.....
#some operation
.....
lock.release()
Schlumberger Private
Currently, The Python Interpreter (Python 2.3.4) is not thread safe. There are no priorities, no
thread groups. Threads cannot be stopped and suspended, resumed or interrupted. That is, the
support provided is very much basic. However a lot can still be accomplished with this meager
support, with the use of the threading module, as we shall see in the following sections. One
of the main reasons is that in actuality only one thread is running at a time. This is because of
some thing called a Global Interpreter Lock (GIL). In order to support multi-threaded Python
programs, there's a global lock that must be held by the current thread before it can safely access
Python objects. Without the lock competing threads could cause havoc, for example: when two
threads simultaneously increment the reference count of the same object, the reference count
could end up being incremented only once instead of twice. Thus only the thread that has
acquired the GIL may operate on Python Objects or call Python C API functions.
In order to support multi threaded Python programs the interpreter regularly releases and
reacquires the lock, by default every 10 bytecode instructions. This can however be changed
Page 147of 191
using the sys.setcheckinterval() function. The lock is also released and reacquired around
potentially blocking I/O operations like reading or writing a file, so that other threads can run
while the thread that requests the I/O is waiting for the I/O operation to complete.
In particular note:
application uses sys.exc_info() to access the exception last raised in the current thread. There's
one global variable left, however: the pointer to the current PyThreadState structure. While
most thread packages have a way to store ``per-thread global data,'' Python's internal platform
independent thread abstraction doesn't support this yet. Therefore, the current thread state
must be manipulated explicitly. The global interpreter lock is used to protect the pointer to the
current thread state. When releasing the lock and saving the thread state, the current thread
state pointer must be retrieved before the lock is released (since another thread could
immediately acquire the lock and store its own thread state in the global variable). Conversely,
when acquiring the lock and restoring the thread state, the lock must be acquired before storing
the thread state pointer
| Global Thread
| Global Thread
Pointer 1
| Pointer 2
| Pointer 2
| Pointer 2
-----------------------------------------------------------------^
|
^
|
^
|
^
|
Schlumberger Private
The Python Interpreter keeps some book keeping info per thread, for which it uses a data
structure called PyThreadState. Earlier the state was stored in global variables and switching
threads could cause problems. In particular, exception handling is now thread safe when the
-----------------------------------------------|
Global
Interpreter
Lock
-----------------------------------------------^
Thread No
Thread No
Thread No
3
Thread No
4
Schlumberger Private
Python manages to get a lot done using so little. The Threading module uses the built in thread
package to provide some very interesting features that would make your programming a whole
lot easier. There are in built mechanisms which provide critical section locks, wait/notify locks
etc. In particular we shall look at:
Lock object
RLock object
Semaphore Object
Condition Object
Event Object
Thread Object
Note that you are supposed to call Thread.__init__() if you are overriding __init__().
#!/usr/bin/env python
Page 150of 191
Schlumberger Private
While we have visited the Lock object in the previous sections, the RLock object is something
new. RLock provides a mechanism for a thread to acquire multiple instances of the same lock,
each time incrementing the depth of locking when acquiring and decrementing the depth of
locking when releasing. RLock makes it very easy to write code which conforms to the classical
Readers Writers Problem. The Semaphore Object (rather the Semaphore Object Factory) is the
general implementation of the Semaphore mooted by Dijikstra. We shall understand the
implementation of the Condition, Event and Thread Objects via some examples.
import time
from threading import Thread
class MyThread(Thread):
def __init__(self,bignum):
Thread.__init__(self)
Schlumberger Private
self.bignum=bignum
def run(self):
for l in range(10):
for k in range(self.bignum):
res=0
for i in range(self.bignum):
res+=1
def test():
bignum=1000
thr1=MyThread(bignum)
Page 151of 191
thr1.start()
thr1.join()
if __name__=="__main__":
test()
There are 2 things to note here, the thread does not start running until the start() method is
called, and that join() makes the calling thread wait until the thread has finished execution.
So Far, So Good! However being ever curious we wonder wether there are any performance gains
in using threads.
It is the practice of every good programmer to profile his code, to find out his weak spots, his
strengths, and in general to know his inner soul ;-). And since we are dealing with the Tao of
Threading in Python, we might as well as ask ourselves which is the faster, two threads sharing
the load or one heavy duty brute force thread?
Which is faster?
thread1
--------
for i in range(bignum):
for k in range(bignum):
res+=i
thread2
---------
for i in range(bignum):
for k in range(bignum):
res+=i
Schlumberger Private
or?
thread 3
-------------------for i in range(bignum):
for k in range(bignum):
res+=i
for i in range(bignum):
for k in range(bignum):
Following the way of the masters we make no assumptions, and let the code do the talking.
Generally there are 2 ways to profile code in Python, the most common and comprehensive way
would be to use the profile.run() method, or time the execution of the code using
time.clock(). We shall do both. Consider the listing shown below.
#!/usr/bin/env python
class MyThread(Thread):
Schlumberger Private
res+=i
def __init__(self,bignum):
Thread.__init__(self)
self.bignum=bignum
def run(self):
for l in range(10):
for k in range(self.bignum):
res=0
Schlumberger Private
for i in range(self.bignum):
res+=1
def myadd_nothread(bignum):
for l in range(10):
for k in range(bignum):
res=0
for i in range(bignum):
res+=1
for l in range(10):
for k in range(bignum):
Page 154of 191
res=0
for i in range(bignum):
res+=1
def thread_test(bignum):
#We create 2 Thread objects for the 2 threads.
thr1=MyThread(bignum)
thr2=MyThread(bignum)
thr1.start()
Schlumberger Private
thr2.start()
thr1.join()
thr2.join()
def test():
bignum=1000
starttime=time.clock()
thread_test(bignum)
Page 155of 191
stoptime=time.clock()
if __name__=="__main__":
test()
0.000
profile:0(profiler)
The correct way to profile the code is given in the following listing: Correct Way to Profile the
Code.
As we can see there is no significant difference between threaded and non threaded apps.
Schlumberger Private
But doesn't time.time() give the absolute time, you ask? What about Context switches?
True, but since we are only interested in the measuring the total time taken and not the work
distribution, we can ignore context switches (and indeed the code has been structured to ignore
context switches).
Conditions are a way of synchronizing access between multiple threads, which wait for a
particular condition to be true to start any major processing. Condition Objects are a very elegant
mechanism by which it is possible to implent the Producer Consumer Problem. Indeed this is true
for anything in Python. Conditions take a lock object or if none is supplied creates its own
RLock object. A Thread waits for a particular condition to be true by using the wait()
function, while it signals another thread by using the notify() or notifyAll() method.
Let us see how the classical Producer Consumer Problem is solved using this.
#!/usr/bin/env python
import time
from threading import *
class itemQ:
def __init__(self):
self.count=0
def produce(self,num=1):
self.count+=num
def consume(self):
if self.count: self.count-=1
def isEmpty(self):
return not self.count
class Producer(Thread):
def __init__(self,condition,itemq,sleeptime=1):
Thread.__init__(self)
self.cond=condition
self.itemq=itemq
Schlumberger Private
self.sleeptime=sleeptime
def run(self):
cond=self.cond
itemq=self.itemq
while 1 :
time.sleep(self.sleeptime)
class Consumer(Thread):
def __init__(self,condition,itemq,sleeptime=2):
Thread.__init__(self)
self.cond=condition
self.itemq=itemq
self.sleeptime=sleeptime
Schlumberger Private
def run(self):
cond=self.cond
itemq=self.itemq
while 1:
time.sleep(self.sleeptime)
while itemq.isEmpty():
cond.wait()
itemq.consume()
Page 161of 191
if __name__=="__main__":
q=itemQ()
Schlumberger Private
cond=Condition()
pro=Producer(cond,q)
cons1=Consumer(cond,q)
cons2=Consumer(cond,q)
pro.start()
cons1.start()
cons2.start()
while 1: pass
Here the currentThread() function returns the id of the currently running thread. Note that
wait() has an optional argument specifying the seconds it should wait before it times out. I
would discourage this use because they used a polling mechanism to implement this, and
according to the source code we poll atleast 20 times ever second. Once again I would like to
point out how not to use the Condition Object. Consider the following, borrowed from Python2.3.4/Libs/threading.py
if not waiters:
if __debug__:
self._note("%s.notify(): no waiters", self)
return
self._note("%s.notify(): notifying %d waiter%s", self, n,
n!=1 and "s" or "")
for waiter in waiters:
waiter.release()
try:
__waiters.remove(waiter)
except ValueError:
pass
def notifyAll(self):
Page 163of 191
self.notify(len(self.__waiters))
Python-2.3.4/Lib/threading.py
What threading.py does is to maintain a list of threads waiting on the current condition. It
then attempts to notify them by removing the first n waiters from the list. And since it removes
the same first n waiters every time, this can potentially cause starvation to certain threads. So to
test our theory out, let us modify the Producer consumer listing given above by making the
following changes:
cons2=Consumer(cond,q,sleeptime=1)
This will potentially cause starvation to one of the threads depending upon how they are inserted
into the list. In fact a test run by making the above changes shows this to be the case. Thus you
will have to be carefull to potential pitfalls in using Python Threading. Notice that there is no
point in calling notifyAll() since it is inefficient and should be avoided.
By now you must have a pretty good idea about how to go about threading in Python. For closure
I shall briefly describe the Event and Queue Objects and describe how to use them.
The Event Object is actually a thin wrapper on the Condition Object so that we don't have to
mess about with locks. The methods provided are very much self explanatory: set() clear()
isSet() wait(timeout). One thing to note is that the Event Object uses notifyAll(),
thus use it only when necessary.
Schlumberger Private
cons1=Consumer(cond,q,sleeptime=1)
Although Queues dont come under the Threading module, they do provide an easy interface
which should be suitable to sovle most problems. The main advantage of Queue is that it doesn't
implement the threading module. Thus you can instead use it with the thread module.
Queues are a simple and efficient way of implementing stack, priority queue etc. Since it handles
both data protection and synchronization. The methods used are put(item,block)
get(block) Queue(maxsize) qsize() empty() full().
A simple example using Queues is given in the following listing. q.py
References
threading.py
http://www.python.org/doc/current/api/threads.html
"http://starship.python.net/crew/aahz"
"""
from itertools import chain
from types import GeneratorType
from collections import deque
def continuator(gen):
""" Yielding from generator used inside another generator """
Schlumberger Private
while True:
i = gen.next()
if isinstance(i, GeneratorType):
gen = chain(i, gen)
else:
yield i
class Task:
def __init__(self, pool):
self.generator = self.main()
pool.add(self)
def main(self):
"Must be a generator"
pass
class TaskPool:
"""
NOTE max speed ~~ 20000 task switches per second per 100MHz
NOTE using pyrex or psyco ~~ 25% speed improvement
Schlumberger Private
del tasks[0]
except IndexError:
# allow internal exception to propagate
if len(tasks) > 0: raise
class ExampleTask(Task):
def __init__(self, pool, name, max_iterations):
self.name = name
self.max_iterations = max_iterations
Task.__init__(self, pool)
def main(self):
i = 0
while i < self.max_iterations:
print self.name, i
i += 1
yield 0
pool = TaskPool()
task_a = ExampleTask(pool, 'AAA',
5)
for i in xrange(100):
pool.iteration()
#microthreading.py
import sys,signal
# credit: original idea was based on an article by David Mertz
# http://gnosis.cx/publish/programming/charming_python_b7.txt
# some example 'microthread' generators
Schlumberger Private
def empty(name):
""" This is an empty task for demonstration purposes. """
while True:
print "<empty process>", name
yield None
def delay(duration=0.8):
""" Do nothing at all for 'duration' seconds. """
import time
while True:
print "<sleep %d s.>" % duration
time.sleep(duration)
yield None
class GenericScheduler(object):
""" the constructor accepts a list of microthreads to run cooperatively, relinquishing
control to the scheduler (via yield) or just completing (finishing off) via return.
whenever a task is completed, the scheduler supplants it via a noop task"""
def __init__(self, threads, stop_asap=False):
signal.signal(signal.SIGINT, self.shutdownHandler)
self.shutdownRequest = False
self.threads = threads
self.stop_asap = stop_asap
Schlumberger Private
def schedule(self):
def noop( ):
while True:
#print '.',
yield None
n = len(self.threads)
while True:
for i, thread in enumerate(self.threads):
#enumerate(iterator1) returns an iterator2 producing tuples (count,value
from iterator1)
#print "#",list(enumerate(self.threads)) # [(0,...),...,(5,...)]
try: thread.next( )
except StopIteration:
if self.stop_asap: return
n -= 1
if n==0: return
self.threads[i] = noop( )
if self.shutdownRequest:
return
if __name__== "__main__":
s = GenericScheduler([ empty('boo'), delay( ), empty('foo'),
terminating('ant', 5), terminating('ar', 9), delay(0.5),
], stop_asap=False)
s.schedule( )
sys.exit(0)
s = GenericScheduler([ empty('boo'), delay( ), empty('foo'),
terminating('fie', 5), delay(0.5),
Schlumberger Private
], stop_asap=False)
s.schedule( )
#priorityqueue.py
"""
simple scheduler
"""
import heapq, time
class Scheduler(object):
def __init__(self):
self.queue = []
def add(self, job, t=0):
heapq.heappush(self.queue, [t, job])
def process_job(self):
t, task = heapq.heappop(self.queue)
time.sleep(t)
for job in self.queue:
job[0] -= t
task.run()
def process_loop(self):
while self.queue:
self.process_job()
class cq:
""" circular queue"""
def __init__(self,q):
self.q=q
def __iter__(self):
return self
def next(self):
Schlumberger Private
self.q=self.q[1:] +[self.q[0]]
return self.q[-1]
class WakeUp(object):
def run(self):
print "Wake up!"
class Heartbeat(object):
def run(self):
print "tick"
#scheduler.add(self, 1)
scheduler = Scheduler()
scheduler.add(Heartbeat(), 1)
scheduler.add(Heartbeat(), 0)
scheduler.add(WakeUp(), 5)
scheduler.process_loop()
cs= [(Heartbeat,0),(Heartbeat,0),(WakeUp,0)]
citer= cq(cs)
Schlumberger Private
Page 172of 191
#coroutines_gvr.py
import collections
class Trampoline:
"""Manage communications between coroutines
In
Schlumberger Private
def __init__(self):
self.queue = collections.deque()
def run(self):
result = None
self.running = True
try:
while self.running and self.queue:
func = self.queue.popleft()
result = func()
return result
finally:
self.running = False
def stop(self):
self.running = False
Schlumberger Private
value = coroutine.send(value)
except:
if stack:
# send the error back to the "caller"
self.schedule(
stack[0], stack[1], *sys.exc_info()
)
else:
# Nothing left in this pseudothread to
# handle it, let it propagate to the
# run loop
raise
if isinstance(value, types.GeneratorType):
# Yielded to a specific coroutine, push the
# current one on the stack, and call the new
# one with no args
self.schedule(value, (coroutine,stack))
elif stack:
# Yielded a result, pop the stack and send the
# value to the caller
self.schedule(stack[0], stack[1], value)
self.queue.append(resume)
"""
A simple "echo" server, and code to run it using a trampoline
(presumes the existence of "nonblocking_read",
"nonblocking_write", and other I/O coroutines, that e.g. raise
Schlumberger Private
#stub
def nonblocking_read(sock): pass
def nonblocking_write(sock,data):pass
def listening_socket(host,echo):
return 1
Schlumberger Private
Module threadpool
http://chrisarndt.de/en/software/python/threadpool/api/
Easy to use object-oriented thread pool framework.
A thread pool is an object that maintains a pool of worker threads to perform time consuming
operations in parallel. It assigns jobs to the threads by putting them in a work request queue, where
they are picked up by the next available thread. This then performs the requested operation in the
background and puts the results in a another queue.
The thread pool object can then collect the results from all threads from this queue as soon as they
become available or after all threads have finished their work. It's also possible, to define callbacks
to handle each result as it comes in.
Basic usage:
>>> pool = TreadPool(poolsize)
>>> requests = makeRequests(some_callable, list_of_args, callback)
>>> [pool.putRequest(req) for req in requests]
>>> pool.wait()
See the end of the module code for a brief, annotated usage example.
Schlumberger Private
The basic concept and some code was taken from the book "Python in a Nutshell" by Alex
Martelli, copyright 2003, ISBN 0-596-00188-6, from section 14.5 "Threaded Program
Architecture". I wrapped the main program logic in the ThreadPool class, added the WorkRequest
class and the callback system and tweaked the code here and there. Kudos also to Florent Aide for
the exception handling mechanism.
This type has a single value. There is a single object with this value. This object is accessed through the built-in name
None. It is used to signify the absence of a value in many situations, e.g., it is returned from functions that don't
explicitly return anything. Its truth value is false.
NotImplemented
This type has a single value. There is a single object with this value. This object is accessed through the built-in name
NotImplemented. Numeric methods and rich comparison methods may return this value if they do not implement
the operation for the operands provided. (The interpreter will then try the reflected operation, or some other
fallback, depending on the operator.) Its truth value is true.
Ellipsis
This type has a single value. There is a single object with this value. This object is accessed through the built-in name
Ellipsis. It is used to indicate the presence of the "..." syntax in a slice. Its truth value is true.
These are created by numeric literals and returned as results by arithmetic operators and arithmetic built-in
functions. Numeric objects are immutable; once created their value never changes. Python numbers are of course
strongly related to mathematical numbers, but subject to the limitations of numerical representation in computers.
Python distinguishes between integers, floating point numbers, and complex numbers:
Integers
These represent elements from the mathematical set of integers (positive and negative).
There are three types of integers:
Plain integers
These represent numbers in the range -2147483648 through 2147483647. (The range may be
larger on machines with a larger natural word size, but not smaller.) When the result of an
operation would fall outside this range, the result is normally returned as a long integer (in some
cases, the exception OverflowError is raised instead). For the purpose of shift and mask operations,
integers are assumed to have a binary, 2's complement notation using 32 or more bits, and hiding
no bits from the user (i.e., all 4294967296 different bit patterns correspond to different values).
Schlumberger Private
Numbers
Long integers
These represent numbers in an unlimited range, subject to available (virtual) memory only. For the
purpose of shift and mask operations, a binary representation is assumed, and negative numbers
are represented in a variant of 2's complement which gives the illusion of an infinite string of sign
bits extending to the left.
Booleans
These represent the truth values False and True. The two objects representing the values False and
True are the only Boolean objects. The Boolean type is a subtype of plain integers, and Boolean
values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that
when converted to a string, the strings "False" or "True" are returned, respectively.
Complex numbers
These represent complex numbers as a pair of machine-level double precision floating point numbers. The
same caveats apply as for floating point numbers. The real and imaginary parts of a complex number z can
be retrieved through the read-only attributes z.real and z.imag.
Sequences
These represent finite ordered sets indexed by non-negative numbers. The built-in function len() returns the number
of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of
sequence a is selected by a[i].
Schlumberger Private
The rules for integer representation are intended to give the most meaningful interpretation of
shift and mask operations involving negative integers and the least surprises when switching
between the plain and long integer domains. Any operation except left shift, if it yields a result in
the plain integer domain without causing overflow, will yield the same result in the long integer
domain or when using mixed operands.
Sequences also support slicing: a[i:j] selects all items with index k such that i <= k < j. When used as an expression, a
slice is a sequence of the same type. This implies that the index set is renumbered so that it starts at 0.
Some sequences also support ``extended slicing'' with a third ``step'' parameter: a[i:j:k] selects all items of a with
index x where x = i + n*k, n >= 0 and i <= x < j.
Immutable sequences
An object of an immutable sequence type cannot change once it is created. (If the object contains
references to other objects, these other objects may be mutable and may be changed; however, the
collection of objects directly referenced by an immutable object cannot change.)
The following types are immutable sequences:
The items of a string are characters. There is no separate character type; a character is represented
by a string of one item. Characters represent (at least) 8-bit bytes. The built-in functions chr() and
ord() convert between characters and nonnegative integers representing the byte values. Bytes
with the values 0-127 usually represent the corresponding ASCII values, but the interpretation of
values is up to the program. The string data type is also used to represent arrays of bytes, e.g., to
hold data read from a file.
(On systems whose native character set is not ASCII, strings may use EBCDIC in their internal
representation, provided the functions chr() and ord() implement a mapping between ASCII and
EBCDIC, and string comparison preserves the ASCII order. Or perhaps someone can propose a
better rule?)
Unicode
The items of a Unicode object are Unicode code units. A Unicode code unit is represented by a
Unicode object of one item and can hold either a 16-bit or 32-bit value representing a Unicode
ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how
Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and
will be reported as two separate items. The built-in functions unichr() and ord() convert between
code units and nonnegative integers representing the Unicode ordinals as defined in the Unicode
Standard 3.0. Conversion from and to other encodings are possible through the Unicode method
encode() and the built-in function unicode().
Page 180of 191
Schlumberger Private
Strings
Tuples
The items of a tuple are arbitrary Python objects. Tuples of two or more items are formed by
comma-separated lists of expressions. A tuple of one item (a `singleton') can be formed by affixing
a comma to an expression (an expression by itself does not create a tuple, since parentheses must
be usable for grouping of expressions). An empty tuple can be formed by an empty pair of
parentheses.
Mutable sequences
Mutable sequences can be changed after they are created. The subscription and slicing notations can be
used as the target of assignment and del (delete) statements.
There is currently a single intrinsic mutable sequence type:
The items of a list are arbitrary Python objects. Lists are formed by placing a comma-separated list
of expressions in square brackets. (Note that there are no special cases needed to form lists of
length 0 or 1.)
The extension module array provides an additional example of a mutable sequence type.
Mappings
These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item
indexed by k from the mapping a; this can be used in expressions and as the target of assignments or del statements.
The built-in function len() returns the number of items in a mapping.
There is currently a single intrinsic mapping type:
Dictionaries
These represent finite sets of objects indexed by nearly arbitrary values. The only types of values not acceptable as
keys are values containing lists or dictionaries or other mutable types that are compared by value rather than by
object identity, the reason being that the efficient implementation of dictionaries requires a key's hash value to
remain constant. Numeric types used for keys obey the normal rules for numeric comparison: if two numbers
compare equal (e.g., 1 and 1.0) then they can be used interchangeably to index the same dictionary entry.
Page 181of 191
Schlumberger Private
Lists
Dictionaries are mutable; they can be created by the {...} notation (see section 5.2.6, ``Dictionary Displays'').
The extension modules dbm, gdbm, and bsddb provide additional examples of mapping types.
Callable types
These are the types to which the function call operation can be applied:
User-defined functions
A user-defined function object is created by a function definition (see section 7.6, ``Function definitions''). It should
be called with an argument list containing the same number of items as the function's formal parameter list.
Attribute
Meaning
func_doc
Writable
__doc__
Writable
func_name
Writable
__name__
Writable
__module__ The name of the module the function was defined in, or None if unavailable.
Writable
func_defaults A tuple containing default argument values for those arguments that have defaults,
Writable
or None if no arguments have a default value
func_code
Writable
func_globals A reference to the dictionary that holds the function's global variables -- the global
namespace of the module in which the function was defined.
Readonly
func_dict
Writable
None or a tuple of cells that contain bindings for the function's free variables.
Readonly
func_closure
Most of the attributes labelled ``Writable'' check the type of the assigned value.
Page 182of 191
Schlumberger Private
Special attributes:
Function objects also support getting and setting arbitrary attributes, which can be used, for example, to attach
metadata to functions. Regular attribute dot-notation is used to get and set such attributes. Note that the current
implementation only supports function attributes on user-defined functions. Function attributes on built-in functions
may be supported in the future.
Additional information about a function's definition can be retrieved from its code object; see the description of
internal types below.
User-defined methods
A user-defined method object combines a class, a class instance (or None) and any callable object (normally a userdefined function).
Methods also support accessing (but not setting) the arbitrary function attributes on the underlying function object.
User-defined method objects may be created when getting an attribute of a class (perhaps via an instance of that
class), if that attribute is a user-defined function object, an unbound user-defined method object, or a class method
object.
When the attribute is a user-defined method object, a new method object is only created if the class from which it is
being retrieved is the same as, or a derived class of, the class stored in the original method object; otherwise, the
original method object is used as it is.
When a user-defined method object is created by retrieving a user-defined function object from a class, its im_self
attribute is None and the method object is said to be unbound. When one is created by retrieving a user-defined
function object from a class via one of its instances, its im_self attribute is the instance, and the method object is said
to be bound. In either case, the new method's im_class attribute is the class from which the retrieval takes place,
and its im_func attribute is the original function object.
Page 183of 191
Schlumberger Private
Special read-only attributes: im_self is the class instance object, im_func is the function object; im_class is the class
of im_self for bound methods or the class that asked for the method for unbound methods; __doc__ is the method's
documentation (same as im_func.__doc__); __name__ is the method name (same as im_func.__name__);
__module__ is the name of the module the method was defined in, or None if unavailable. Changed in version 2.2:
im_self used to refer to the class that defined the method.
When a user-defined method object is created by retrieving another method object from a class or instance, the
behaviour is the same as for a function object, except that the im_func attribute of the new instance is not the
original method object but its im_func attribute.
When a user-defined method object is created by retrieving a class method object from a class or instance, its
im_self attribute is the class itself (the same as the im_class attribute), and its im_func attribute is the function
object underlying the class method.
When an unbound user-defined method object is called, the underlying function (im_func) is called, with the
restriction that the first argument must be an instance of the proper class (im_class) or of a derived class thereof.
When a user-defined method object is derived from a class method object, the ``class instance'' stored in im_self will
actually be the class itself, so that calling either x.f(1) or C.f(1) is equivalent to calling f(C,1) where f is the underlying
function.
Note that the transformation from function object to (unbound or bound) method object happens each time the
attribute is retrieved from the class or instance. In some cases, a fruitful optimization is to assign the attribute to a
local variable and call that local variable. Also notice that this transformation only happens for user-defined
functions; other callable objects (and all non-callable objects) are retrieved without transformation. It is also
important to note that user-defined functions which are attributes of a class instance are not converted to bound
methods; this only happens when the function is an attribute of the class.
Generator functions
A function or method which uses the yield statement is called a generator function. Such a function, when called,
always returns an iterator object which can be used to execute the body of the function: calling the iterator's next()
method will cause the function to execute until it provides a value using the yield statement. When the function
executes a return statement or falls off the end, a StopIteration exception is raised and the iterator will have reached
the end of the set of values to be returned.
Schlumberger Private
When a bound user-defined method object is called, the underlying function (im_func) is called, inserting the class
instance (im_self) in front of the argument list. For instance, when C is a class which contains a definition for a
function f(), and x is an instance of C, calling x.f(1) is equivalent to calling C.f(x, 1).
Built-in functions
A built-in function object is a wrapper around a C function. Examples of built-in functions are len() and math.sin()
(math is a standard built-in module). The number and type of the arguments are determined by the C function.
Special read-only attributes: __doc__ is the function's documentation string, or None if unavailable; __name__ is the
function's name; __self__ is set to None (but see the next item); __module__ is the name of the module the function
was defined in or None if unavailable.
Built-in methods
This is really a different disguise of a built-in function, this time containing an object passed to the C function as an
implicit extra argument. An example of a built-in method is alist.append(), assuming alist is a list object. In this case,
the special read-only attribute __self__ is set to the object denoted by list.
Class Types
Classic Classes
Class objects are described below. When a class object is called, a new class instance (also described below) is
created and returned. This implies a call to the class's __init__() method if it has one. Any arguments are passed on
to the __init__() method. If there is no __init__() method, the class must be called without arguments.
Class instances
Class instances are described below. Class instances are callable only when the class has a __call__() method;
x(arguments) is a shorthand for x.__call__(arguments).
Modules
Modules are imported by the import statement (see section 6.12, ``The import statement'').A module object has a
namespace implemented by a dictionary object (this is the dictionary referenced by the func_globals attribute of
functions defined in the module). Attribute references are translated to lookups in this dictionary, e.g., m.x is
equivalent to m.__dict__["x"]. A module object does not contain the code object used to initialize the module (since
it isn't needed once the initialization is done).
Schlumberger Private
Class types, or ``new-style classes,'' are callable. These objects normally act as factories for new instances of
themselves, but variations are possible for class types that override __new__(). The arguments of the call are passed
to __new__() and, in the typical case, to __init__() to initialize the new instance.
Attribute assignment updates the module's namespace dictionary, e.g., "m.x = 1" is equivalent to "m.__dict__["x"] =
1".
Predefined (writable) attributes: __name__ is the module's name; __doc__ is the module's documentation string, or
None if unavailable; __file__ is the pathname of the file from which the module was loaded, if it was loaded from a
file. The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension
modules loaded dynamically from a shared library, it is the pathname of the shared library file.
Classes
When a class attribute reference (for class C, say) would yield a user-defined function object or an unbound userdefined method object whose associated class is either C or one of its base classes, it is transformed into an unbound
user-defined method object whose im_class attribute is C. When it would yield a class method object, it is
transformed into a bound user-defined method object whose im_class and im_self attributes are both C. When it
would yield a static method object, it is transformed into the object wrapped by the static method object. See
section 3.4.2 for another way in which attributes retrieved from a class may differ from those actually contained in
its __dict__.
Class attribute assignments update the class's dictionary, never the dictionary of a base class.
A class object can be called (see above) to yield a class instance (see below).
Special attributes: __name__ is the class name; __module__ is the module name in which the class was defined;
__dict__ is the dictionary containing the class's namespace; __bases__ is a tuple (possibly empty or a singleton)
containing the base classes, in the order of their occurrence in the base class list; __doc__ is the class's
documentation string, or None if undefined.
Class instances
Page 186of 191
Schlumberger Private
Class objects are created by class definitions (see section 7.7, ``Class definitions''). A class has a namespace
implemented by a dictionary object. Class attribute references are translated to lookups in this dictionary, e.g., "C.x"
is translated to "C.__dict__["x"]". When the attribute name is not found there, the attribute search continues in the
base classes. The search is depth-first, left-to-right in the order of occurrence in the base class list.
A class instance is created by calling a class object (see above). A class instance has a namespace implemented as a
dictionary which is the first place in which attribute references are searched. When an attribute is not found there,
and the instance's class has an attribute by that name, the search continues with the class attributes. If a class
attribute is found that is a user-defined function object or an unbound user-defined method object whose associated
class is the class (call it C) of the instance for which the attribute reference was initiated or one of its bases, it is
transformed into a bound user-defined method object whose im_class attribute is C and whose im_self attribute is
the instance. Static method and class method objects are also transformed, as if they had been retrieved from
class C; see above under ``Classes''. See section 3.4.2 for another way in which attributes of a class retrieved via its
instances may differ from the objects actually stored in the class's __dict__. If no class attribute is found, and the
object's class has a
__setattr__() or __delattr__() method, this is called instead of updating the instance dictionary directly.
Special attributes: __dict__ is the attribute dictionary; __class__ is the instance's class.
Files
A file object represents an open file. File objects are created by the open() built-in function, and also by os.popen(),
os.fdopen(), and the makefile()method of socket objects (and perhaps by other functions or methods provided by
extension modules). The objects sys.stdin, sys.stdout and sys.stderr are initialized to file objects corresponding to the
interpreter's standard input, output and error streams. See the Python Library Reference for complete
documentation of file objects.
Internal types
A few types used internally by the interpreter are exposed to the user. Their definitions may change with future
versions of the interpreter, but they are mentioned here for completeness.
Code objects
Code objects represent byte-compiled executable Python code, or bytecode.
Page 187of 191
Schlumberger Private
Class instances can pretend to be numbers, sequences, or mappings if they have methods with certain special names.
Frame objects
Frame objects represent execution frames. They may occur in traceback objects .
Traceback objects
Traceback objects represent a stack trace of an exception. A traceback object is created when an exception
occurs.
Slice objects
Slice objects are used to represent slices when extended slice syntax is used. This is a slice using two colons, or
multiple slices or ellipses separated by commas, e.g., a[i:j:step], a[i:j, k:l], or a[..., i:j]. They are also created by the
built-in slice() function.
Schlumberger Private
Static method objects provide a way of defeating the transformation of function objects to method objects described
above. A static method object is a wrapper around any other object, usually a user-defined method object. When a
static method object is retrieved from a class or a class instance, the object actually returned is the wrapped object,
which is not subject to any further transformation. Static method objects are not themselves callable, although the
objects they wrap usually are. Static method objects are created by the built-in staticmethod() constructor.
adjustment across the world are more political than rational, and there is no standard
suitable for every application.
The datetime module exports the following constants:
MINYEAR
The smallest year number allowed in a date or datetime object. MINYEAR is 1.
MAXYEAR
The largest year number allowed in a date or datetime object. MAXYEAR is 9999.
The datetime class does not directly support parsing formatted time strings. You can use
time.strptime to do the parsing and create a datetime object from the tuple it returns:
>>> s = "2005-12-06T12:13:14"
>>> from datetime import datetime
>>> from time import strptime
>>> datetime(*strptime(s, "%Y-%m-%dT%H:%M:%S")[0:6])
Page 189of 191
Schlumberger Private
at whatever level of detail is required is up to the application. The rules for time
>>> getcontext().prec = 6
Decimal("0.142857")
>>> getcontext().prec = 28
Decimal("0.1428571428571428571428571429")
Both binary and decimal floating point are implemented in terms of published standards.
While the built-in float type exposes only a modest portion of its capabilities, the decimal
module exposes all required parts of the standard. When needed, the programmer has
full control over rounding and signal handling.
Page 190of 191
Schlumberger Private
Decimal numbers can be represented exactly. In contrast, numbers like 1.1 do not have
an exact representation in binary floating point. End users typically would not expect 1.1
to display as 1.1000000000000001 as it does with binary floating point.
The exactness carries over into arithmetic. In decimal floating point, "0.1 + 0.1 +
0.1 - 0.3" is exactly equal to zero. In binary floating point, result is
5.5511151231257827e-017. While near to zero, the differences prevent reliable
equality testing and differences can accumulate. For this reason, decimal would be
preferred in accounting applications which have strict equality invariants.
The decimal module incorporates a notion of significant places so that "1.30 + 1.20"
is 2.50. The trailing zero is kept to indicate significance. This is the customary
presentation for monetary applications. For multiplication, the ``schoolbook'' approach
uses all the figures in the multiplicands. For instance, "1.3 * 1.2" gives 1.56 while
"1.30 * 1.20" gives 1.5600.
Unlike hardware based binary floating point, the decimal module has a user settable
precision (defaulting to 28 places) which can be as large as needed for a given problem:
The module design is centered around three concepts: the decimal number, the context
for arithmetic, and signals.
A decimal number is immutable. It has a sign, coefficient digits, and an exponent. To
preserve significance, the coefficient digits do not truncate trailing zeroes. Decimals also
include special values such as Infinity, -Infinity, and NaN. The standard also
differentiates -0 from +0.
The context for arithmetic is an environment specifying precision, rounding rules, limits on
exponents, flags indicating the results of operations, and trap enablers which determine
whether signals are treated as exceptions. Rounding options include ROUND_CEILING,
Schlumberger Private