You are on page 1of 138

Python 4: Advanced Python

Lesso n 1: Go ing Furt he r wit h Funct io ns


Abo ut Eclipse
Perspectives and the Red Leaf Ico n
Wo rking Sets

Functio ns Are Objects


Functio n Attributes
Functio n and Metho d Calls
Functio n Co mpo sitio n
Lambdas: Ano nymo us Functio ns

Quiz 1 Pro ject 1


Lesso n 2: Dat a St ruct ure s
Organizing Data
Handling Multi-Dimensio nal Arrays in Pytho n

Creating a Two -Dimensio nal Array


List o f Lists Example
Using a Single List to Represent an Array
Using an array.array instead o f a List
Using a dict instead o f a List

Summary
Quiz 1 Pro ject 1
Lesso n 3: De le gat io n and Co m po sit io n
Extending Functio nality by Inheritance
Mo re Co mplex Delegatio n
Extending Functio nality by Co mpo sitio n
Recursive Co mpo sitio n
Quiz 1 Pro ject 1
Lesso n 4: Publish and Subscribe
On Pro gram Structure
Publish and Subscribe
Publish and Subscribe in Actio n
Validating Requests and Identifying Output
Making the Algo rithm Mo re General

A No te o n Debugging
Quiz 1 Pro ject 1
Lesso n 5: Opt im izing Yo ur Co de
Start with Co rrectness
Where to Optimize

The Pro file Mo dule


Two Different Mo dules
Using the Pro file Mo dule
Mo re Co mplex Repo rting

What to Optimize
Lo o p Optimizatio ns
Pre-co mputing Attribute References
Lo cal Variables are Faster than Glo bal Variables

Ho w to Optimize
Do n't Optimize Prematurely
Use Timings, No t Intuitio n
Make One Change at a Time
The Best Way is No t Always Obvio us

Quiz 1 Pro ject 1


Lesso n 6 : Using Exce pt io ns Wise ly
Exceptio ns Are No t (Necessarily) Erro rs
Specifying Exceptio ns

Creating Exceptio ns and Raising Instances


Using Exceptio ns Wisely
Exceptio n Timings

Quiz 1 Pro ject 1


Lesso n 7: Advance d Use s o f De co rat o rs
Deco rato r Syntax
Classes as Deco rato rs
Class Deco rato rs
Odd Deco rato r Tricks
Static and Class Metho d Deco rato rs
Parameterizing Deco rato rs
Quiz 1 Pro ject 1
Lesso n 8 : Advance d Ge ne rat o rs
What Generato rs Represent
Uses o f Infinite Sequences
The Iterto o ls Mo dule
iterto o ls.tee: duplicating generato rs
iterto o ls.chain() and iterto o ls.islice(): Co ncatenating Sequences and Slicing Generato rs Like Lists
iterto o ls.co unt(), iterto o ls.cycle() and iterto o ls.repeat()
iterto o ls.dro pwhile() and iterto o ls.takewhile()

Generato r Expressio ns
Quiz 1 Pro ject 1
Lesso n 9 : Use s o f Int ro spe ct io n
The Meaning o f 'Intro spectio n'
So me Simple Intro spectio n Examples

Attribute Handling Functio ns


What Use is Intro spectio n?
The Inspect mo dule
The getmembers() Functio n
Intro specting Functio ns

Quiz 1 Pro ject 1


Lesso n 10 : Mult i-T hre ading
Threads and Pro cesses
Multipro gramming
Multipro cessing
Multi-Threading
Threading, Multipro cessing, CPytho n and the GIL

The Threading Library Mo dule


Creating Threads (1)
Waiting fo r Threads
Creating Threads (2)

Quiz 1 Pro ject 1


Lesso n 11: Mo re o n Mult i-T hre ading
Thread Synchro nizatio n
threading.Lo ck Objects

The Queue Standard Library


Adding Items to Queues: Queue.put()
Remo ving Items fro m Queues: Queue.get()
Mo nito ring Co mpletio n: Queue.task_do ne() and Queue.jo in()
A Simple Scalable Multi-Threaded Wo rkho rse
The Output Thread
The Wo rker Threads
The Co ntro l Thread
Other Appro aches

Quiz 1 Pro ject 1


Lesso n 12: Mult i-Pro ce ssing
The Multipro cessing Library Mo dule
multipro cessing Objects
A Simple Multipro cessing Example

A Multipro cessing Wo rker Pro cess Po o l


The Output Pro cess
The Wo rker Pro cess
The Co ntro l Pro cess

Quiz 1 Pro ject 1


Lesso n 13: Funct io ns and Ot he r Obje ct s
A Deeper Lo o k at Functio ns
Required Keywo rd Arguments
Functio n Anno tatio ns
Nested Functio ns and Namespaces
Partial Functio ns

Mo re Magic Metho ds
Ho w Pytho n Expressio ns Wo rk

Quiz 1 Pro ject 1


Lesso n 14: Co nt e xt Manage rs
Ano ther Pytho n Co ntro l Structure: The With Statement
Using a Simple Co ntext Manager
The Co ntext Manager Pro to co l: __enter__() and __exit__()
Writing Co ntext Manager Classes
Library Suppo rt fo r Co ntext Managers
Nested Co ntext Managers

Decimal Arithmetic and Arithmetic Co ntexts


Decimal Arithmetic Co ntexts
Decimal Signals
The Default Decimal Co ntext

Quiz 1 Pro ject 1


Lesso n 15: Me m o ry-Mappe d File s
Memo ry Mapping
Memo ry-Mapped Files Are Still Files
The mmap Interface
What Use is mmap(), and Ho w Do es it Wo rk?

A Memo ry-Mapped Example


Quiz 1 Pro ject 1
Lesso n 16 : Yo ur Fut ure wit h Pyt ho n
Pytho n Co nferences
Tuto rials
Talks
The Hallway Track
Open Space
Lightning Talks
Birds o f a Feather Sessio ns (BOFs)
Sprints: Mo ving Ahead

The Pytho n Jo b Market and Career Cho ices


Pytho n Develo pment
Tips and Tricks
Quiz 1 Pro ject 1

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Going Further with Functions
Welco me to the O'Reilly Scho o l o f Techno lo gy (OST) Advanced Pytho n co urse! We're happy yo u've cho sen to learn Pytho n
pro gramming with us. By the time yo u finish this co urse, yo u will have expanded yo ur kno wledge o f Pytho n and applied it to
so me really interesting techno lo gies.

Course Objectives
When yo u co mplete this co urse, yo u will be able to :

extend Pytho n co de functio nality thro ugh inheritance, co mplex delegatio n, and recursive co mpo sitio n.
publish, subscribe, and o ptimize yo ur co de.
create advanced class deco rato rs and generato rs in Pytho n.
demo nstrate kno wledge o f Pytho n intro spectio n.
apply multi-threading and mult-pro cessing to Pytho n develo pment.
manage arithmetic co ntexts and memo ry mapping.
demo nstrate understanding o f the Pytho n co mmunity, co nferences, and jo b market.
develo p a multi-pro cessing so lutio n to a significant data pro cessing pro blem.

This co urse builds o n yo ur existing Pytho n kno wledge, inco rpo rating further o bject-o riented design principles and techniques
with the intentio n o f ro unding o ut yo ur skill set. Techniques like recursio n, co mpo sitio n, and delegatio n are explained and put
into practice thro ugh the ever-present test-driven practical wo rk.

Learning with O'Reilly School of T echnology Courses


As with every O'Reilly Scho o l o f Techno lo gy co urse, we'll take a user-active appro ach to learning. This means that yo u
(the user) will be active! Yo u'll learn by do ing, building live pro grams, testing them and experimenting with them
hands-o n!

To learn a new skill o r techno lo gy, yo u have to experiment. The mo re yo u experiment, the mo re yo u learn. Our system
is designed to maximize experimentatio n and help yo u learn to learn a new skill.

We'll pro gram as much as po ssible to be sure that the principles sink in and stay with yo u.

Each time we discuss a new co ncept, yo u'll put it into co de and see what YOU can do with it. On o ccasio n we'll even
give yo u co de that do esn't wo rk, so yo u can see co mmo n mistakes and ho w to reco ver fro m them. Making mistakes
is actually ano ther go o d way to learn.

Abo ve all, we want to help yo u to learn to learn. We give yo u the to o ls to take co ntro l o f yo ur o wn learning experience.

When yo u co mplete an OST co urse, yo u kno w the subject matter, and yo u kno w ho w to expand yo ur kno wledge, so
yo u can handle changes like so ftware and o perating system updates.

Here are so me tips fo r using O'Reilly Scho o l o f Techno lo gy co urses effectively:

T ype t he co de . Resist the temptatio n to cut and paste the example co de we give yo u. Typing the co de
actually gives yo u a feel fo r the pro gramming task. Then play aro und with the examples to find o ut what else
yo u can make them do , and to check yo ur understanding. It's highly unlikely yo u'll break anything by
experimentatio n. If yo u do break so mething, that's an indicatio n to us that we need to impro ve o ur system!
T ake yo ur t im e . Learning takes time. Rushing can have negative effects o n yo ur pro gress. Slo w do wn and
let yo ur brain abso rb the new info rmatio n tho ro ughly. Taking yo ur time helps to maintain a relaxed, po sitive
appro ach. It also gives yo u the chance to try new things and learn mo re than yo u o therwise wo uld if yo u
blew thro ugh all o f the co ursewo rk to o quickly.
Expe rim e nt . Wander fro m the path o ften and explo re the po ssibilities. We can't anticipate all o f yo ur
questio ns and ideas, so it's up to yo u to experiment and create o n yo ur o wn. Yo ur instructo r will help if yo u
go co mpletely o ff the rails.
Acce pt guidance , but do n't de pe nd o n it . Try to so lve pro blems o n yo ur o wn. Go ing fro m
misunderstanding to understanding is the best way to acquire a new skill. Part o f what yo u're learning is
pro blem so lving. Of co urse, yo u can always co ntact yo ur instructo r fo r hints when yo u need them.
Use all available re so urce s! In real-life pro blem-so lving, yo u aren't bo und by false limitatio ns; in OST
co urses, yo u are free to use any reso urces at yo ur dispo sal to so lve pro blems yo u enco unter: the Internet,
reference bo o ks, and o nline help are all fair game.
Have f un! Relax, keep practicing, and do n't be afraid to make mistakes! Yo ur instructo r will keep yo u at it
until yo u've mastered the skill. We want yo u to get that satisfied, "I'm so co o l! I did it!" feeling. And yo u'll have
so me pro jects to sho w o ff when yo u're do ne.

Lesson Format
We'll try o ut lo ts o f examples in each lesso n. We'll have yo u write co de, lo o k at co de, and edit existing co de. The co de
will be presented in bo xes that will indicate what needs to be do ne to the co de inside.

Whenever yo u see white bo xes like the o ne belo w, yo u'll type the co ntents into the edito r windo w to try the example
yo urself. The CODE TO TYPE bar o n to p o f the white bo x co ntains directio ns fo r yo u to fo llo w:

CODE TO TYPE:

White boxes like this contain code for you to try out (type into a file to run).

If you have already written some of the code, new code for you to add looks like this.

If we want you to remove existing code, the code to remove will look like this.

We may also include instructive comments that you don't need to type.

We may run pro grams and do so me o ther activities in a terminal sessio n in the o perating system o r o ther co mmand-
line enviro nment. These will be sho wn like this:

INTERACTIVE SESSION:

The plain black text that we present in these INTERACTIVE boxes is


provided by the system (not for you to type). The commands we want you to type look lik
e this.

Co de and info rmatio n presented in a gray OBSERVE bo x is fo r yo u to inspect and absorb. This info rmatio n is o ften
co lo r-co ded, and fo llo wed by text explaining the co de in detail:

OBSERVE:
Gray "Observe" boxes like this contain information (usually code specifics) for you to
observe.

The paragraph(s) that fo llo w may pro vide additio n details o n inf o rm at io n that was highlighted in the Observe bo x.

We'll also set especially pertinent info rmatio n apart in "No te" bo xes:

Note No tes pro vide info rmatio n that is useful, but no t abso lutely necessary fo r perfo rming the tasks at hand.

T ip Tips pro vide info rmatio n that might help make the to o ls easier fo r yo u to use, such as sho rtcut keys.

WARNING Warnings pro vide info rmatio n that can help prevent pro gram crashes and data lo ss.

Befo re yo u start pro gramming in Pytho n, let's review a co uple o f the to o ls yo u'll be using. If yo u've already taken the OST
co urse o n Int ro duct io n t o Pyt ho n, Ge t t ing Mo re Out o f Pyt ho n and/o r T he Pyt ho n Enviro nm e nt , yo u can skip to
the next sectio n if yo u like, o r yo u might want to go thro ugh this sectio n to refresh yo ur memo ry.

About Eclipse
We use an Integrated Develo pment Enviro nment (IDE) called Eclipse. It's the pro gram filling up yo ur screen right no w.
IDEs assist pro grammers by perfo rming tasks that need to be do ne repetitively. IDEs can also help to edit and debug
co de, and o rganize pro jects.
Perspectives and the Red Leaf Icon
The Ellipse Plug-in fo r Eclipse was develo ped by OST. It adds a Red Leaf ico n to the to o lbar in Eclipse. This
ico n is yo ur "panic butto n." Because Eclipse is versatile and allo ws yo u to mo ve things aro und, like views,
to o lbars, and such, it's po ssible to lo se yo ur way. If yo u do get co nfused and want to return to the default
perspective (windo w layo ut), the Red Leaf ico n is the fastest and easiest way to do that.

To use the Red Leaf ico n to :

re se t t he curre nt pe rspe ct ive : click the ico n.


change pe rspe ct ive s: click the dro p-do wn arro w beside the ico n to select a perspective.
se le ct a pe rspe ct ive : click the dro p-do wn arro w beside the Red Leaf ico n and select the co urse
(J ava, Pyt ho n, C++, etc.). Selecting a specific co urse o pens the perspective designed fo r that
particular co urse.
Fo r this co urse, select Pyt ho n:

Working Sets
In this co urse, we'll use working sets. All pro jects created in Eclipse exist in the wo rkspace directo ry o f yo ur
acco unt o n o ur server. As yo u create pro jects thro ugho ut the co urse, yo ur directo ry co uld beco me pretty
cluttered. A wo rking set is a view o f the wo rkspace that behaves like a fo lder, but it's actually an asso ciatio n o f
files. Wo rking sets allo w yo u to limit the detail that yo u see at any given time. The difference between a
wo rking set and a fo lder is that a wo rking set do esn't actually exist in the file system.

A wo rking set is a co nvenient way to gro up related items to gether. Yo u can assign a pro ject to o ne o r mo re
wo rking sets. In so me cases, like the Pytho n extensio n to Eclipse, new pro jects are created in a catch-all
"Other Pro jects" wo rking set. To o rganize yo ur wo rk better, we'll have yo u assign yo ur pro jects to an
appro priate wo rking set when yo u create them. To do that, yo u'l right-click o n the pro ject name and select the
Assign Wo rking Se t s menu item.

We've already created so me wo rking sets fo r yo u in the Eclipse IDE. Yo u can turn the wo rking set display o n
o r o f f in Eclipse.

Fo r this co urse, we'll display o nly the wo rking sets yo u need. In the upper-right co rner o f the Package
Explo rer panel, click the do wnward arro w and select Co nf igure Wo rking Se t s:
Select the Ot he r Pro je ct s wo rking set as well as the o nes that begin with "Pytho n4," then click OK:

Let's create a pro ject to sto re o ur pro grams fo r this lesso n. Select File | Ne w | Pyde v Pro je ct , and enter the
info rmatio n as sho wn:
Click Finish. When asked if yo u want to o pen the asso ciated perspective, check the Re m e m be r m y
de cisio n bo x and click No :

By default, the new pro ject is added to the Other Pro jects wo rking set. Find Pyt ho n4 _Le sso n0 1 there, right-
click it, and select Assign Wo rking Se t s... as sho wn:
Select the Pyt ho n4 _Le sso ns wo rking set and click OK:
In the next sectio n, we'll get to enter so me Pytho n co de and run it!

Functions Are Objects


Everything in Pytho n is an o bject, but unlike mo st o bjects in Pytho n, functio n o bjects are no t created by calling a class.
Instead yo u use the de f statement, which causes the interpreter to co mpile the indented suite that co mprises the
functio n bo dy and bind the co mpiled co de o bject to the functio n's name in the current lo cal namespace.

Function Attributes
Like any o bject in Pytho n, functio ns have a particular type; and like with any o bject in Pytho n, yo u can examine
a functio n's namespace with the dir() functio n. Let's o pen a new interactive sessio n. Select the Co nso le tab,
click the do wn arro w and select Pyde v co nso le :
In the dialo g that appears, select Pyt ho n co nso le :

Then, type the co mmands sho wn:

INTERACTIVE SESSION:

>>> def g(x):


... return x*x
...
>>> g
<function g at 0x100572490>
>>> type(g)
<class 'function'>
>>> dir(g)
['__annotations__', '__call__', '__class__', '__closure__', '__code__',
'__defaults__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__',
'__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__',
'__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__']
>>>

Note Keep this interactive sessio n o pen thro ugho ut this lesso n.

While this tells yo u what attributes functio n o bjects po ssess, it do es no t make it very clear which o f them are
unique to functio ns. A go o d Pytho n pro grammer like yo u needs to be able to think o f a way to disco ver the
attributes o f functio n that aren't also attributes o f the base o bject, o bje ct .

Think abo ut it fo r a minute. Here's a hint: think abo ut sets.

Yo u may remember that the set() functio n pro duces a set when applied to any iterable (which includes lists:
the dir() functio n returns a list). Yo u may also remember that sets implement a subtractio n o peratio n: if a and
b are sets, then a-b is the set o f items in a that are no t also in b. Co ntinue the interactive sesso n as sho wn:

INTERACTIVE SESSION:

>>> def f(x):


... return x
...
>>> function_attrs = set(dir(f))
>>> object_attrs = set(dir(object))
>>> function_attrs -= object_attrs
>>> from pprint import pprint
>>> pprint(sorted(function_attrs))
['__annotations__',
'__call__',
'__closure__',
'__code__',
'__defaults__',
'__dict__',
'__get__',
'__globals__',
'__kwdefaults__',
'__module__',
'__name__']
>>>

At this stage in yo ur Pytho n pro gramming career, yo u do n't need to wo rry abo ut mo st o f these, but there's
certainly no harm in learning what they do . So me o f the features they o ffer are very advanced. Yo u can read
mo re abo ut them in the o fficial Pytho n do cumentatio n. Yo u can learn a lo t by wo rking o n an interactive
terminal sessio n and by reading the do cumentatio n.

Function and Method Calls


The __call__() metho d is interestingits name implies that it has so mething to do with functio n calling, and
this is co rrect. The interpreter calls any callable o bject by making use o f its __call__() metho d. Yo u can
actually call this metho d directly if yo u want to ; it's exactly the same as calling the functio n directly.

INTERACTIVE SESSION:

>>> def f1(x):


... print("f1({}) called".format(x))
... return x
...
>>> f1.__call__(23) # should be equivalent to f1(23)
f1(23) called
23
>>>

Yo u can define yo ur o wn classes to include a __call__() metho d, and if yo u do , the instances yo u create fro m
that class will be callable directly, just like functio ns. This is a fairly general mechanism that illustrates a
Pytho n equivalence yo u haven't o bserved yet:
Give it a try. Create a class with instances that are callable. Then verify that yo u can call the instances:

INTERACTIVE SESSION:

>>> class Func:


... def __call__(self, arg):
... print("%r(%r) called" % (self, arg))
... return arg
...
>>> f2 = Func()
>>> f2
<__main__.Func object at 0x100569dd0>
>>> f2("Danny")
<__main__.Func object at 0x100569dd0>('Danny') called
'Danny'
>>>

As we've seen, when yo u define a __call__() metho d o n the class, yo u can call its instances. These calls
result in the activatio n o f the __call__() metho d, with the instance pro vided (as always o n a metho d call) as
the first argument, fo llo wed by the po sitio nal and keywo rd arguments that were passed to the instance call.
Metho ds are no rmally defined o n a class. While it is po ssible to bind callable o bjects to names in an
instance's namespace, the interpreter do es not treat it as a true metho d, and as such, it do es no t add the
instance as a first argument. So , callables in the instance's __dict__ are called with o nly the arguments
present o n the call lineno instance is implicitly added as a first argument.

The so -called "magic" metho ds (tho se with names that begin and end with a do uble
undersco re) are never lo o ked fo r o n the instancethe interpreter go es straight to the classes
Note fo r these metho ds. So even when the instance's __dict__ co ntains the key "__call__", it is
igno red and the class's __call__() metho d is activated.

Let's co ntinue o ur co nso le sessio n:

INTERACTIVE SESSION:

>>> def userfunc(arg):


... print("Userfunc called: ", arg)
...
>>> f2.regular = userfunc
>>> f2.regular("Instance")
Userfunc called: Instance
>>> f2.__call__ = userfunc
>>> f2("Hopeful")
<__main__.Func object at 0x100569dd0>('Hopeful') called
'Hopeful'

Since all callables have a __call__() metho d, and the __call__() metho d is callable, yo u might wo nder whether
it to o has a __call__() metho d. The answer is yes, it do es (and so do es that __call__() metho d, and so o n...):
INTERACTIVE SESSION:

>>> "__call__" in dir(f2.__call__)


True
>>> f2.__call__("Audrey")
Userfunc called: Audrey
>>> f2.__call__.__call__("Audrey")
Userfunc called: Audrey
>>> f2.__call__.__call__.__call__("Audrey")
Userfunc called: Audrey
>>>

Function Composition
Because functio ns are first-class o bjects, they can be passed as arguments to o ther functio ns, and such. If f
and g are functio ns, then mathematicians defined the co mpo sitio n f * g o f tho se two functio ns by saying that (f
* g)(x) = f(g(x)). In o ther wo rds, the co mpo sitio n o f two functio ns is a new functio n, that behaves the same as
applying the first functio n to the o utput o f the seco nd.

Suppo se yo u were given two functio ns; co uld yo u co nstruct their co mpo sitio n? Of co urse yo u co uld! Fo r
example, yo u co uld write a functio n that takes two functio ns as arguments, then internally defines a functio n
that calls the first o n the result o f the seco nd. Then the co mpo se functio n returns that functio n. It's actually
almo st easier to write the functio n than it is to describe it:

INTERACTIVE SESSION:

>>> def compose(g, h):


... def anon(x):
... return g(h(x))
... return anon
...
>>> f3 = compose(f1, f2)
>>> f3("Shillalegh")
<__main__.Func object at 0x100569dd0>('Shillalegh') called
f1('Shillalegh') called
'Shillalegh'

While it's pretty straightfo rward to co mpo se functio ns this way, a mathematician wo uld find it much mo re
natural to co mpo se the functio ns with a multiplicatio n o perato r (the asterisk*). Unfo rtunately, an attempt to
multiply two functio ns to gether is do o med to fail, as Pytho n functio ns have no t been designed to be
multiplied. If we co uld add a __mul__() metho d to o ur functio ns, we might stand a chance, but as we've seen,
this is no t po ssible with functio n instances, and the class o f functio ns is a built-in o bject written in C:
impo ssible to change and difficult fro m which to inherit. Even when yo u do subclass the functio n type, ho w
wo uld yo u create instances? The def statement will always create regular functio ns.

While yo u may no t be able to subclass the functio n o bject, yo u do kno w ho w to create o bject classes with
callable instances. Using this technique, yo u co uld create a class with instances that act as pro xies fo r the
functio ns. This class co uld define a __mul__() metho d, which wo uld take ano ther similar class as an
argument and return the co mpo sitio n o f the two pro xied functio ns. This is typical o f the way that Pytho n allo ws
yo u to "ho o k" into its wo rkings to achieve a result that is simpler to use.

In yo ur Pyt ho n4 _Le sso n0 1/src fo lder, create a pro gram called co m po sable .py as sho wn belo w:
CODE TO TYPE:
"""
composable.py: defines a composable function class.
"""
class Composable:
def __init__(self, f):
"Store reference to proxied function."
self.func = f
def __call__(self, *args, **kwargs):
"Proxy the function, passing all arguments through."
return self.func(*args, **kwargs)
def __mul__(self, other):
"Return the composition of proxied and another function."
if type(other) is Composable:
def anon(x):
return self.func(other.func(x))
return Composable(anon)
raise TypeError("Illegal operands for multiplication")
def __repr__(self):
return "<Composable function {0} at 0x{1:X}>".format(
self.func.__name__, id(self))

Save and run it. (Remember ho w to run a Pytho n pro gram in OST's sandbo x enviro nment? Right-click in the
edito r windo w fo r the t e st array.py file, and select Run As | Pyt ho n Run.)

An alternative implementatio n o f the __mul__() metho d might have used the statement re t urn
Note se lf (o t he r(x)). Do yo u think that this wo uld have been a better implementatio n? Why o r why
no t?

Yo u will need tests, o f co urse. So yo u sho uld also create a pro gram called t e st _co m po sable .py that reads
as fo llo ws.
CODE TO TYPE:
"""
test_composable.py" performs simple tests of composable functions.
"""
import unittest
from composable import Composable

def reverse(s):
"Reverses a string using negative-stride sequencing."
return s[::-1]

def square(x):
"Multiplies a number by itself."
return x*x

class ComposableTestCase(unittest.TestCase):

def test_inverse(self):
reverser = Composable(reverse)
nulltran = reverser * reverser
for s in "", "a", "0123456789", "abcdefghijklmnopqrstuvwxyz":
self.assertEquals(nulltran(s), s)

def test_square(self):
squarer = Composable(square)
po4 = squarer * squarer
for v, r in ((1, 1), (2, 16), (3, 81)):
self.assertEqual(po4(v), r)

def test_exceptions(self):
fc = Composable(square)
with self.assertRaises(TypeError):
fc = fc * 3

if __name__ == "__main__":
unittest.main()

The unit tests are relatively straightfo rward, simply co mparing the expected results fro m kno wn inputs with
expected o utputs. In o lder Pytho n releases it co uld be difficult to find o ut which iteratio n o f a lo o p had caused
the assertio n to fail, but with the impro ved erro r messages o f newer releases this is much less o f a pro blem:
argument values fo r failing assertio ns are much better repo rted than previo usly.

The exceptio n is tested by running the TestCase's assertRaises() metho d with a single argument (specifying
the exceptio n(s) that are expected and acceptable. Under these circumstances the metho d returns what is
called a "co ntext manager" that will catch and analyze any exceptio ns raised fro m the indented suite. (There is
a bro ader treatment o f co ntext managers in a later lesso n). When yo u run the test pro gram yo u sho uld see
three successful tests.

Output fro m test_co mpo sable.py


...
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK

Once yo u get the idea o f ho w this wo rks, yo u'll so o n realize that the __mul__() metho d co uld be extended to
handle a regular functio nin o ther wo rds, as lo ng as the o perand to the left o f the "*" is a Co mpo sable, the
o perand to the right wo uld be either a Co mpo sable o r a functio n. So the metho d can be extended slightly to
make Co mpo sables mo re usable.

Let's go ahead and edit co mpo sable.py to allo w co mpo sitio n with 'raw' functio ns:
CODE TO TYPE:

"""
composable.py: defines a composable function class.
"""
import types
class Composable:
def __init__(self, f):
"Store reference to proxied function."
self.func = f
def __call__(self, *args, **kwargs):
"Proxy the function, passing all arguments through."
return self.func(*args, **kwargs)
def __mul__(self, other):
"Return the composition of proxied and another function."
if type(other) is Composable:
def anon(x):
return self.func(other.func(x))
return Composable(anon)
elif type(other) is types.FunctionType:
def anon(x):
return self.func(other(x))
return Composable(anon)
raise TypeError("Illegal operands for multiplication")
def __repr__(self):
return "<Composable function {0} at 0x{1:X}>".format(
self.func.__name__, id(self))

No w the updated __mul__() metho d do es o ne thing if the right o perand (o ther) is a Co mpo sable: it defines
and returns a functio n that extracts the functio ns fro m bo th Co mpo sables, that is the co mpo sitio n o f bo th o f
tho se functio ns. But if the right-side o perato r is a functio n (which yo u check fo r by using the types mo dule,
designed specifically to allo w easy reference to the less usual Pytho n types), then the functio n passed in as
an argument can be used directly rather than having to be extracted fro m a Co mpo sable.

The tests need to be mo dified, but no t as much as yo u might think. The simplest change is to have the
test_square() metho d use a functio n as the right o perand o f its multiplicatio ns. This sho uld no t lo se any
testing capability, since the first two tests were fo rmerly testing essentially the same things. A further exceptio n
test is also added to ensure that when the functio n is the left o perand an exceptio n is also raised.
CODE TO TYPE:

"""
test_composable.py" performs simple tests of composable functions.
"""
import unittest
from composable import Composable

def reverse(s):
"Reverses a string using negative-stride sequencing."
return s[::-1]

def square(x):
"Multiplies a number by itself."
return x*x

class ComposableTestCase(unittest.TestCase):

def test_inverse(self):
reverser = Composable(reverse)
nulltran = reverser * reverser
for s in "", "a", "0123456789", "abcdefghijklmnopqrstuvwxyz":
self.assertEquals(nulltran(s), s)

def test_square(self):
squarer = Composable(square)
po4 = squarer * squarer
for v, r in ((1, 1), (2, 16), (3, 81)):
self.assertEqual(po4(v), r)

def test_exceptions(self):
fc = Composable(square)
with self.assertRaises(TypeError):
fc = fc * 3
with self.assertRaises(TypeError):
fc = square * fc

if __name__ == "__main__":
unittest.main()

A TypeErro r exceptio n therefo re is raised when yo u attempt to multiply a functio n by a Co mpo sable. The tests
as mo dified sho uld all succeed. If no t, then debug yo ur so lutio n until they do , with yo ur mento r's assistance if
necessary.

The extensio ns yo u made to the Co mpo sable class in the last exercise made it mo re capable, but the last
example sho ws that there are always wrinkles that yo u need to take care o f to make yo ur co de as fully
general as it can be. Ho w far to go in adapting to all po ssible circumstances is a matter o f judgment. Having a
go o d set o f tests at least ensures that the co de is being exercised (it's also a go o d idea to emplo y coverage
testing, to ensure that yo ur tests do n't leave any o f the co de unexecuted: this is no t always as easy as yo u
might think).

Lambdas: Anonymous Functions


Pytho n also has a feature that allo ws yo u to define simple functio ns as an expressio n. The lambda
expression is a way o f expressing a functio n witho ut having to use a def statement. Because it's an
expressio n, there are limits to what yo u can do with a lambda. So me pro grammers use them frequently, but
o thers prefer to define all o f their functio ns. It's impo rtant fo r yo u to understand them, because yo u'll likely
enco unter them in o ther peo ple's co de.
While the equivalence abo ve is no t exact, it's clo se eno ugh fo r all practical purpo ses. The keywo rd lam bda is
fo llo wed by the names o f any parameters (all parameters to lambdas are po sitio nal) in a co mma-separated
list. A co lo n separates the parameters fro m the expressio n (no rmally referencing the parameters). The value
o f the expressio n will be returned fro m a call (yo u may need to restart the co nso le, so yo u'll need to redefine
so me o f the functio ns):

INTERACTIVE SESSION:

>>> def compose(g, h):


... def anon(x):
... return g(h(x))
... return anon
...
>>>
>>> add1 = lambda x: x+1
>>> add1
<function <lambda> at 0x100582270>
>>> sqr = lambda x: x*x
>>> sqp1 = compose(sqr, add1)
>>> sqp1(5)
36
>>> type(add1)
<class 'function'>
>>>

It is relatively easy to write a lambda equivalent to the co mpo se() functio n we created earlierand it wo rks as
it wo uld with any callable. The last result sho ws yo u that to the interpreter, lambda expressio ns are entirely
equivalent to functio ns (lambda expressio ns and functio ns have the same type, "<class 'functio n'>").

Also , the lambda has no name (o r mo re precisely: all lambdas have the same name). When yo u define a
functio n with de f , the interpreter sto res the name fro m the def statement as its __name__ attribute. All
lambdas have the same name, '<lambda>', when they are created. Yo u can change that name by assignment
to the attribute, but in general, if yo u're go ing to spend mo re than o ne line o n a lambda, then yo u might as well
just write a named functio n instead.

Finally, keep in mind that lambda is deliberately restricted to functio ns with bo dies that co mprise a single
expressio n (which is implicitly what the lambda returns when called, with any argument values substituted fo r
the parameters in the expressio n). Again, rather than writing expressio ns that co ntinue o ver several lines, it
wo uld be better to write a named functio n (which, amo ng o ther things, can be pro perly do cumented with
do cstrings). If yo u do wish to co ntinue the expressio n o ver multiple lines, the best way to do that is to
parenthesize the lambda expressio n. Do yo u think the parenthesized seco nd versio n is an impro vement?
Think abo ut that as yo u wo rk thro ugh this interactive sessio n:
INTERACTIVE SESSION:

>>> def f1(x):


... print("f1({}) called".format(x))
... return x
...
>>> class Func:
... def __call__(self, arg):
... print("%r(%r) called" % (self, arg))
... return arg
...
>>> f2 = Func()
>>> ff = lambda f, g: lambda x: f(g(x))
>>> lam = ff(f1, f2)
>>> lam("Ebenezer")
<__main__.Func object at 0x10057a510>('Ebenezer') called
f1('Ebenezer') called
'Ebenezer'
>>>
>>> ff = lambda f, g: (lambda x:
... f(g(x)))
>>> lam = ff(f1, f2)
>>> lam("Ebenezer")
<__main__.Func object at 0x10057a510>('Ebenezer') called
f1('Ebenezer') called
'Ebenezer'
>>>

If yo u understand that last example, co nsider yo urself a highly co mpetent Pytho n pro grammer. Well do ne!
These po ints are subtle, and yo ur understanding o f the language is beco ming increasingly tho ro ugh as yo u
co ntinue here.

The to o ls fro m this lesso n will allo w yo u to use callables with greater flexibility and to better purpo se. Yo u've learned
ways to write co de that is able to co llabo rate with the interpreter and will allo w yo u to acco mplish many o f yo ur desired
pro gramming tasks mo re efficiently. Nice wo rk!

When yo u finish the lesso n, return to the syllabus and co mplete the quizzes and pro jects.

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Data Structures
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

o rganize data efficiently.


create a two -dimensio nal array.

Organizing Data
In general, pro gramming mo dels the real wo rld. Keep that in mind and it will help yo u to cho o se appro priate data
representatio ns fo r specific o bjects. This may sound pretty straightfo rward, but in fact, it takes a co nsiderable amo unt o f
experience to get it right.

Initially, yo u might struggle to find the best data structure fo r an applicatio n, but ultimately wo rking thro ugh tho se
struggles will make yo u a better pro grammer. Of co urse yo u co uld bypass such challenges and fo llo w so me o ther
pro grammer's prio r directio n, but I wo uldn't reco mmend do ing that. There's no substitute fo r wo rking thro ugh
pro gramming challenges yo urself. Yo u develo p a mo re tho ro ugh understanding o f yo ur pro grams when yo u make
yo ur o wn design decisio ns.

As yo u write mo re Pytho n, yo u'll be able to acco mo date increasingly co mplex data structures. So far, mo st o f the
structures we've created have been lists o r dicts o f the basic Pytho n typesthe immutables, like numbers and strings.
Ho wever, there's no reaso n yo u can't use lists, tuples, dicts, o r o ther co mplex o bjects (o f yo ur o wn creatio n o r created
using so me existing library) as the elements o f yo ur data structures.

Data structures are impo rtant within yo ur o bjects, as well. Yo u define the behavio r o f a who le class o f o bjects with a
class statement. This class statement defines the behavio r o f each instance o f the class by pro viding metho ds that the
user can call to effect specific actio ns. Each instance has its o wn namespace tho ugh, which makes it appear like a data
structure with behavio rs co mmo n to all members o f its class.

Handling Multi-Dimensional Arrays in Python


Pytho n's "array" mo dule pro vides a way to sto re a sequence o f values o f the same type in a co mpact
representatio n that do es no t require Pytho n o bject o verhead fo r each value in the array. Array o bjects are
o ne-dimensio nal, similar to Pytho n lists, and mo st co de actually creates arrays fro m an iterable co ntaining
the relevant values. With large numbers o f elements, this can represent a substantial memo ry savings, but the
features o ffered by this array type are limited. Fo r full multi-dimensio nal arrays o f co mplex data types, yo u
wo uld no rmally go to the (third-party, but o pen so urce) NumPy package. In mo st co mputer languages,
multiple dimensio ns can be addressed by using multiple subscripts. So the Nth item in the Mth ro w o f an
array called D wo uld be D(M, N) in Fo rtran (which uses parentheses fo r subscripting).

INTERACTIVE CONSOLE SESSION

>>> mylst = ["one", "two", "three"]


>>> mylst[1]
'two'
>>> mylst[1.3]
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: list indices must be integers, not float
>>> mylst[(1, 3)]
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: list indices must be integers, not tuple
>>>

A Pytho n list may have o nly a single integer o r a slice as an index; anything else will raise a TypeErro r
exceptio n such as "list indices must be integers."

A list is a o ne-dimensio nal array, with o nly a single length. A two -dimensio nal array has a size in each o f two
dimensio ns (o ften discussed as the numbers o f ro ws and co lumns). Think o f it as a sequence o f o ne-
dimensio nal listsan array o f arrays. Similarly, co nsider a three-dimensio nal array as a sequence o f two -
dimensio nal arrays, and so o n (altho ugh fo ur-dimensio nal arrays are no t used all that frequently).

In Pytho n we can usually create a class to execute any task. Yo u may remember that indexing is achieved by
the use o f the __ge t it e m __() metho d. Let's create a basic class that repo rts the arguments that call that
class's __ge t it e m __() metho d. This will help us to see ho w Pytho n indexing wo rks.

The o nly two types that can be used as indexes o n a sequence are integers and slices. The co ntents within the
square brackets in the indexing co nstruct may be mo re co mplex than a regular integer. Yo u wo n't usually
wo rk directly with slices, because in Pytho n yo u can get the same access to sequences using multiple
subscripts, separated by co lo ns (o ften referred to as slicing notation). Yo u can slice a sequence with no tatio n
like s[m :n], and yo u can even specify a third item by adding what is kno wn as the stride (a stride o f S causes
o nly every Sth value to be included in the slice) using the fo rm s[M:N:S]. Altho ugh there are no Pytho n types
that implement multi-dimensio nal arrays, the language is ready fo r them, and even allo ws multiple slices as
subscripts. The Numpy package frequently inco rpo rates slicing no tatio n to help facilitate data subsetting.

INTERACTIVE CONSOLE SESSION

>>> class GI:


... def __getitem__(self, *args, **kw):
... print("Args:", args)
... print("Kws: ", kw)
...
>>> gi = GI()
>>> gi[0]
Args: (0,)
Kws: {}
>>> gi[0:1]
Args: (slice(0, 1, None),)
Kws: {}
>>> gi[0:10:-2]
Args: (slice(0, 10, -2),)
Kws: {}
>>> gi[1, 2, 3]
Args: ((1, 2, 3),)
Kws: {}
>>> gi[1:2:3, 4:5:6]
Args: ((slice(1, 2, 3), slice(4, 5, 6)),)
Kws: {}
>>> gi[1, 2:3, 4:5:6]
Args: ((1, slice(2, 3, None), slice(4, 5, 6)),)
Kws: {}
>>> gi[(1, 2:3, 4:5:6)]
File "<console>", line 1
gi[(1, 2:3, 4:5:6)]
^
SyntaxError: invalid syntax
>>> (1, 2:3, 4:5:6)
File "<console>", line 1
(1, 2:3, 4:5:6)
^
SyntaxError: invalid syntax
>>>

Slices are allo wed o nly as to p-level elements o f a tuple o f subscripting expressio ns. Parenthesizing the
tuple, o r trying to use a similar expressio n o utside o f subscripting brackets, bo th result in syntax erro rs. A
single integer index is passed thro ugh to the __ge t it e m __() metho d witho ut change. But the interpreter
creates a special o bject called a slice o bject fo r co nstructs that co ntain co lo ns. The slice o bject is passed
thro ugh to the __ge t it e m __() metho d. The last line in the example demo nstrates that the interpreter allo ws
us to use multiple slice no tatio ns as subscripts, and the __ge t it e m __() metho d will receive a tuple o f slice
o bjects. This gives yo u the freedo m to implement subscripting and slicing just abo ut any way yo u wanto f
co urse, yo u have to understand ho w to use slice o bjects to take full advantage o f the no tatio n. Fo r o ur
purpo ses no w, this isn't abso lutely necessary, but the kno wledge will be valuable later in many o ther
co ntexts. The diagrams belo w summarize what we've learned so far abo ut Pytho n subscripting:
The abo ve equivalence ho lds true whether M is an integer o r a slice. In cases where the slice is
Note pro vided as a single argument, it sho uld be co nsidered equivalent to o ne o f the __ge t it e m __()
calls belo w.

and

The list is a basic Pytho n sequence, and like all the built-in sequence types, it is o ne-dimensio nal (that is, any
item can be addressed with a single integer subscript o f appro priate value). But multi-dimensio nal lists are
o ften mo re co nvenient fro m a pro grammer's perspective, and, with the exceptio n o f the slicing no tatio n, if yo u
write a tuple o f values as a subscript, then that tuple is passed directly thro ugh to the __ge t it e m __() metho d.
So it's po ssible to map tuples o nto integer subscripts that can select a given item fro m an underlying list.
Here's ho w a two -dimensio nal array sho uld lo o k to the pro grammer:
The mo st straightfo rward way to represent an array in Pytho n is as a list o f lists. Well actually, that wo uld
represent a two-dimensio nal arraya three- dimensio nal array wo uld have to be a list o f lists o f lists, but yo u
get the idea. So , in o rder to represent the array sho wn abo ve, we co uld sto re it as either a list o f ro ws o r a list
o f co lumns. It do esn't really matter which type o f list yo u cho o se, as lo ng as yo u remain co nsistent. We'll use
"ro w majo r o rder" (meaning we'll sto re a reference to the ro ws and then use the co lumn number to index the
element within that ro w) this time aro und.

Fo r example, we co uld represent a 6 x5 array as a six-element list, each item in that list co nsisting o f a five-
element list which represents a ro w o f the array. To access a single item, yo u first have to index the ro w list
with a ro w number (resulting in a reference to a ro w list), and then index that list to extract the element fro m the
required co lumn. Take a lo o k:

Creating a T wo-Dimensional Array


List of Lists Example
Let's write so me co de to create an identity matrix. This is a square array where every element is zero except
fo r the main diago nal (the elements that have the same number fo r bo th ro w and co lumn), and values o f o ne.
When yo u are dealing with co mplicated data structures, the print mo dule o ften presents them mo re readably
than a print.

While it might be easier to bang away at a co nso le windo w fo r small pieces o f co de, it's go o d practice to
define an API and write tests to exercise that API. This will allo w yo u to try and test different representatio ns
efficiently, and yo u are able to impro ve yo ur tests as yo u go . Create a Pyt ho n4 _Le sso n0 2 pro ject, and in its
/src fo lder, create t e st array.py as sho wn:

CODE TO TYPE: testarray.py


"""
Test list-of-list based array implementations.
"""
import unittest
import arr

class TestArray(unittest.TestCase):
def test_zeroes(self):
for N in range(4):
a = arr.array(N, N)
for i in range(N):
for j in range(N):
self.assertEqual(a[i][j], 0)

def test_identity(self):
for N in range(4):
a = arr.array(N, N)
for i in range(N):
a[i][i] = 1
for i in range(N):
for j in range(N):
self.assertEqual(a[i][j], i==j)

if __name__ == "__main__":
unittest.main()

The tests are fairly limited at first, but even these basic tests allo w yo u to detect gro ss erro rs in the co de. Next,
yo u'll need an arr mo dule o n which the test will o perate. Let's start with a basic arr m o dule fo r no w. Create
arr.py in the same fo lder as sho wn:

CODE TO TYPE: arr.py


"""
Naive implementation of list-of-lists creation.
"""

def array(M, N):


"Create an M-element list of N-element row lists."
rows = []
for _ in range(M):
cols = []
for _ in range(N):
cols.append(0)
rows.append(cols)
return rows

Run t e st array; all tests pass.

OBSERVE:
..
----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK

By no w yo u may be able to devise ways to make the array co de simpler. Right no w, o ur co de is


straightfo rward, but rather verbo se. Let's trim it do wn a little by using a list co mprehensio n to create the
individual ro ws. Mo dify yo ur co de as sho wn:
CODE TO EDIT: Mo dify arr.py
"""
Naive implementation of list-of-lists creation.
"""
def array(M, N):
"Create an M-element list of N-element row lists."
rows = []
for _ in range(M):
cols = []
for _ in range(N):
cols.append(0)
rows.append(cols[0] * N)
return rows

All the tests still pass:

OBSERVE:
..
----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK

At the mo ment, we are wo rking strictly in two dimensio ns. But we are using "do uble subscripting"[M][N],
rather than the "tuple o f subscripts" no tatio n[M, N] that mo st pro grammers use (and that the Pytho n
interpreter is already prepared to accept). So let's mo dify o ur tests to use that no tatio n, and verify that o ur
existing implementatio n breaks when called witho ut change. Mo dify t e st array.py as sho wn:

CODE TO TYPE
"""
Test list-of-list array implementations using tuple subscripting.
"""
import unittest
import arr

class TestArray(unittest.TestCase):
def test_zeroes(self):
for N in range(4):
a = arr.array(N, N)
for i in range(N):
for j in range(N):
self.assertEqual(a[i][j], 0)
self.assertEqual(a[i, j], 0)

def test_identity(self):
for N in range(4):
a = arr.array(N, N)
for i in range(N):
a[i][i] = 1
a[i, i] = 1
for i in range(N):
for j in range(N):
self.assertEqual(a[i][j], i==j)
self.assertEqual(a[i, j], i==j)

if __name__ == "__main__":
unittest.main()

The test o utput indicates that so mething isn't quite right in the array co de after tuple-subscripting is used:
OBSERVE:
EE
======================================================================
ERROR: test_identity (__main__.TestArray)
----------------------------------------------------------------------
Traceback (most recent call last):
File "V:\workspace\Python4_Lesson02\src\testarray.py", line 19, in test_identi
ty
a[i, i] = 1
TypeError: list indices must be integers, not tuple

======================================================================
ERROR: test_zeroes (__main__.TestArray)
----------------------------------------------------------------------
Traceback (most recent call last):
File "V:\workspace\Python4_Lesson02\src\testarray.py", line 13, in test_zeroes
self.assertEqual(a[i, j], 0)
TypeError: list indices must be integers, not tuple

----------------------------------------------------------------------
Ran 2 tests in 0.000s

FAILED (errors=2)

The o nly way to fix this is to define a class with a __ge t it e m __() metho d, which will allo w yo u direct access
to the values passed as subscripts. This will make is easier to lo cate the co rrect element. Of co urse, the
__init __() metho d has to create the lists and bind them to an instance variable that __ge t it e m __() can
access. The test co de includes setting so me array elements, so yo u also have to implement __se t it e m __().
(To respo nd pro perly to the del statement, a __de lit e m __() metho d sho uld also be implemented, but this is
no t necessary fo r o ur immediate purpo ses.) Rewrite arr.py as sho wn:

CODE TO TYPE: arr.py


"""
Class-based list-of-lists allowing tuple subscripting.
"""

def array(M, N):


"Create an M-element list of N-element row lists."
rows = []
for _ in range(M):
rows.append([0] * N)
return rows

class array:

def __init__(self, M, N):


"Create an M-element list of N-element row lists."
self._rows = []
for _ in range(M):
self._rows.append([0] * N)

def __getitem__(self, key):


"Returns the appropriate element for a two-element subscript tuple."
row, col = key
return self._rows[row][col]

def __setitem__(self, key, value):


"Sets the appropriate element for a two-element subscript tuple."
row, col = key
self._rows[row][col] = value

Save it and rerun the test. With __ge t it e m __() and __se t it e m __() in place o n yo ur array class, the tests
pass again.
Using a Single List to Represent an Array
Using the standard subscripting API, yo u have built a way to reference two -dimensio nal arrays represented
internally as a list o f lists. If yo u wanted to represent a three-dimensio nal array, yo u'd have to change the
co de to o perate o n a list o f lists o f lists, and so o n. Ho wever, the co de might be mo re adaptable if it used just
a single list and perfo rmed arithmetic o n the subscripts to wo rk o ut which element to access.

No w let's mo dify yo ur current versio n o f the arr mo dule to demo nstrate the principle o n a 2-D array. We aren't
go ing to extend the number o f dimensio ns yet, but yo u might get an idea fo r ho w the co de co uld be extended.
Mo dify arr.py as sho wn:

CODE TO EDIT: arr.py


"""
Class-based single-list allowing tuple subscripting
"""

class array:

def __init__(self, M, N):


"Create an M-element list of N-element row lists."
"Create a list long enough to hold M*N elements."
self._rows = []
for _ in range(M):
self._rows.append([0] * N)
self._data = [0] * M * N
self._rows = M
self._cols = N

def __getitem__(self, key):


"Returns the appropriate element for a two-element subscript tuple."
row, col = key
return self._rows[row][col]
row, col = self._validate_key(key)
return self._data[row*self._cols+col]

def __setitem__(self, key, value):


"Sets the appropriate element for a two-element subscript tuple."
row, col = key
self._rows[row][col] = value
row, col = self._validate_key(key)
self._data[row*self._cols+col] = value

def _validate_key(self, key):


"""Validates a key against the array's shape, returning good tuples.
Raises KeyError on problems."""
row, col = key
if (0 <= row < self._rows and
0 <= col < self._cols):
return key
raise KeyError("Subscript out of range")

The changes that have been made here are pretty much invisible to the co de that uses the mo dule.

The __init __() metho d no w initializes a single list that is big eno ugh to ho ld all ro ws and co lumns. It also
saves the array size in ro ws and co lumns. Previo us versio ns co uld rely o n access to the lists to detect any
illegal values in the subscripts; no w it has to be do ne explicitly because the lo catio n o f the required element in
the list no w has to be calculated. We can no lo nger rely o n IndexErro r exceptio ns to detect an o ut-o f-bo unds
subscript. The current __ge t it e m __() and __se t it e m __() metho ds use a _validat e _ke y() metho d to verify
that the subscript values do indeed fall within the required bo unds befo re using them.

Altho ugh all existing tests pass, this detail abo ut the index bo unds checking reminds us to add tests to verify
that the lo gic wo rks and that a KeyErro r exceptio n is raised when illegal values are used. The resulting
changes are no t co mplex. Mo dify t e st arry.py as sho wn:
CODE TO EDIT: testarray.py
"""
Test list-of-list array implementations using tuple subscripting.
"""
import unittest
import arr

class TestArray(unittest.TestCase):
def test_zeroes(self):
for N in range(4):
a = arr.array(N, N)
for i in range(N):
for j in range(N):
self.assertEqual(a[i, j], 0)

def test_identity(self):
for N in range(4):
a = arr.array(N, N)
for i in range(N):
a[i, i] = 1
for i in range(N):
for j in range(N):
self.assertEqual(a[i, j], i==j)

def _index(self, a, r, c):


return a[r, c]

def test_key_validity(self):
a = arr.array(10, 10)
self.assertRaises(KeyError, self._index, a ,-1, 1)
self.assertRaises(KeyError, self._index, a ,10, 1)
self.assertRaises(KeyError, self._index, a ,1, -1)
self.assertRaises(KeyError, self._index, a ,1, 10)

if __name__ == "__main__":
unittest.main()

When all three tests pass, yo u can be co nfident in yo ur bo unds-checking lo gic. Keep in mind that it's just as
impo rtant to make sure yo ur pro gram fails when it sho uld, as it is to make sure it runs co rrectly when it
sho uld!

OBSERVE
...
----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK

As lo ng as the API remains the same, yo u'll have co nsiderable flexibility and pro gramming technique o ptio ns.
Let's co nsider alternative representatio ns.

Using an array.array instead of a List


The array mo dule defines a single data type (also called "array"), which is similar to a list, except that it sto res
ho mo geneo us values (each cell can ho ld values o f a given type o nly, that type being passed when the array
is created). The changes required to use such an array instead o f a list are minimal. Mo dify arr.py as sho wn:
CODE TO EDIT: arr.py
"""
Class-based array allowing tuple subscripting
"""
import array as sys_array

class array:

def __init__(self, M, N):


"Create a list long enough to hold M*N elements."
"Create an M-element list of N-element row lists."
self._data = [0] * M * Nsys_array.array("i", [0] * M * N)
self._rows = M
self._cols = N

def __getitem__(self, key):


"Returns the appropriate element for a two-element subscript tuple."
row, col = self._validate_key(key)
return self._data[row*self._cols+col]

def __setitem__(self, key, value):


"Sets the appropriate element for a two-element subscript tuple."
row, col = self._validate_key(key)
self._data[row*self._cols+col] = value

def _validate_key(self, key):


"""Validates a key against the array's shape, returning good tuples.
Raises KeyError on problems."""
row, col = key
if (0 <= row < self._rows and
0 <= col < self._cols):
return key
raise KeyError("Subscript out of range")

The testing do esn't change in this case (no te that the updated co de in the arr mo dule requires the numbers
sto red in the array.array to be integers), and so yo ur tests pass immediately. The advantage o f this
implementatio n (fo r applicatio ns using integer data) is mo st evident when yo u're wo rking with extremely large
data structures. In these cases, values can be packed clo sely to gether within memo ry, because the
array.array structure do es no t sto re them as Pytho n values. This co uld save large amo unts o f memo ry
o verhead with large datasets, and further smaller savings wo uld result fro m no t having to allo cate memo ry fo r
the lists that refer to ro ws o r individual values.

Using a dict instead of a List


So me mathematical techniques use "sparse" data sets. These are usually representatio ns o f very large data
sets where the majo rity o f the values are zero (and therefo re do no t need to be duplicated). This technique
lends itself to using a dict to sto re the no n-zero values using the subscript tuple passed in to the
__ge t it e m __() metho d.

Since the data sto rage element do es no t pro vide any bo unds checking, the metho ds sho uld still do that.
There is no need to initialize the dict with zero es, because the absence o f a value implies a zero ! Mo dify
arr.py as sho wn:
CODE TO EDIT: arr.py
"""
Class-based dict allowing tuple subscripting and sparse data
"""
import array as sys_array

class array:

def __init__(self, M, N):


"Create an M-element list of N-element row lists."
self._data = sys_array.array("i", [0] * M * N)
self._data = {}
self._rows = M
self._cols = N

def __getitem__(self, key):


"Returns the appropriate element for a two-element subscript tuple."
row, col = self._validate_key(key)
try:
return self._data[row, col]
except KeyError:
return 0
return self._data[row*self._cols+col]

def __setitem__(self, key, value):


"Sets the appropriate element for a two-element subscript tuple."
row, col = self._validate_key(key)
self._data[row*self._cols+col] = value
self._data[row, col] = value

def _validate_key(self, key):


"""Validates a key against the array's shape, returning good tuples.
Raises KeyError on problems."""
row, col = key
if (0 <= row < self._rows and
0 <= col < self._cols):
return key
raise KeyError("Subscript out of range")

Save it and run the test again. The testing is so mewhat simplified in this versio n, since zero values do no t
need to be asserted. (No te that the current __se t it e m __() metho d is deficient in so me ways; the sto rage o f a
zero sho uld result in the given key being remo ved fro m the dict if present).

Summary
So no w we have lo ads o f o ptio ns at o ur dispo sal to co mplete o ur vario us Pytho n tasks. Having so much flexibility
enables yo u to cho o se specific techniques to suit yo ur specific needs. With so me practice, yo u'll be able to make the
mo st efficient co mpro mises between efficient use o f sto rage and adequate co mputatio n speed. Yo u're do ing a fine jo b
so far! See yo u in the next lesso n...

When yo u finish the lesso n, do n't fo rget to return to the syllabus and co mplete the ho mewo rk.

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Delegation and Composition
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

extend functio nality by inheritance.


execute mo re co mplex delegatio n.
extend functio nality by co mpo sitio n.
utilize recursive co mpo sitio n.

Let's get right to it then, shall we?

In Pytho n, it's unusual to co me acro ss deep inheritance trees (E inherits fro m D which inherits fro m C which inherits fro m B
which inherits fro m A). While such pro gram structures are po ssible, they can beco me unwieldy quickly. If yo u want to implement
a dict-like o bject with so me additio nal pro perties, yo u co uld cho o se to inhe rit fro m dict and extend the behavio r, o r yo u co uld
decide to co m po se yo ur o wn o bject fro m scratch and make use o f a dict internally to pro vide the desired dict-like pro perties.

Extending Functionality by Inheritance


Suppo se yo u want to make yo ur pro gram keep co unt o f ho w many items have been added (that is, ho w many times a
previo usly no n-existent key was bo und in the table. If the key already exists, it isn't an additio nit's a replacement).
With inheritance, yo u'd do it like this:

INTERACTIVE CONSOLE SESSION

>>> class Dict(dict):


... def __init__(self, *args, **kw):
... dict.__init__(self, *args, **kw)
... self.adds = 0
... def __setitem__(self, key, value):
... if key not in self:
... self.adds += 1
... dict.__setitem__(self, key, value)
...
>>> d = Dict(a=1, b=2)
>>> print("Adds:", d.adds)
Adds: 0
>>> d["newkey"] = "add"
>>> print("Adds:", d.adds)
Adds: 1
>>> d["newkey"] = "replace"
>>> print("Adds:", d.adds)
Adds: 1
>>>

This co de behaves as we'd expect. Albeit limited, it pro vides functio nality o ver and abo ve that o f dict o bjects.

OBSERVE:

class Dict(dict):
def __init__(self, *args, **kw):
self.adds = 0
dict.__init__(self, *args, **kw)
def __setitem__(self, key, value):
if key not in self:
self.adds += 1
dict.__setitem__(self, key, value)
Our Dict class inherits fro m the dict built-in. Because this Dict class needs to perfo rm so me initializatio n, it has to
make sure that the dict o bject initializes pro perly. The dict acco mplishes this with an explicit call to the parent o bject
(dict) with the arguments that were pro vided to the initializing call to the class. dict .__init __(se lf , *args, **kw) passes
all the po sitio nal and keywo rd arguments that the caller passes, beginning with pro viding the current instance as an
explicit first argument (remember, the auto matic pro visio n o f the instance argument o nly happens when a metho d is
called o n an instancethis metho d is being called o n the superclass).

Because the dict type can be called with many different arguments, it is necessary to ado pt this style, so that this dict
can be used just like a regular dict. We might say that the Dict o bject delegates mo st o f its initializatio n to its
superclass. Similarly, the o nly difference between t he __se t it e m __() metho d and a pure dict appears when testing to
determine whe t he r t he ke y alre ady e xist s in t he dict , and if no t, incrementing the "add" co unt. The remainder o f
the metho d is implemented by calling dict's superclass (the standard dict) to perfo rm the no rmal item assignment, by
calling its __se t it e m __() metho d with the same arguments: dict .__se t it e m __(se lf , ke y, value ).

The initializer functio n do es no t call the __se t it e m __ () metho d to add any initial elementsthe adds attribute still
has the value zero immediately after creatio n, despite the fact that the instance was created with two items.

We didn't do it here, but if yo u are go ing to deliver co de to paying custo mers, o r if yo u expect the co de to
see heavy use, yo u'll want to run tests that verify it o perates co rrectly. Writing tests can be difficult, but
Note when so mething is go ing into pro ductio n, it's impo rtant to have a bank o f tests available. That way, if
anyo ne refacto rs yo ur co de, they can do so with so me co nfidence that if the tests still pass, they haven't
bro ken anything.

The Dict class inherits fro m dict. This is appro priate because mo st o f the behavio r yo u want is standard dict behavio r.
Since bo th the __init __() and __se t it e m __() metho ds o f Dict call the equivalent metho ds o f dict as a part o f their
co de, we say that tho se metho ds extend the co rrespo nding dict metho ds.

More Complex Delegation


In general, the mo re o f a particular o bject's behavio rs yo u need, the mo re likely yo u are to inherit fro m it. But if o nly a
small part o f the behavio r yo u require is pro vided by an existing class, yo u might cho o se to create an instance o f that
class and bind it to an instance variable o f yo ur o wn class instead. The appro ach is similar, but do es no t use
inheritance. Let's take a lo o k at that:

INTERACTIVE CONSOLE SESSION

>>> class MyDict:


... def __init__(self, *args, **kwargs):
... self._d = dict(*args, **kwargs)
... def __setitem__(self, key, value):
... return self._d.__setitem__(key, value)
... def __getitem__(self, key):
... return self._d.__getitem__(key)
... def __delitem__(self, key):
... return self._d.__delitem__(key)
...
>>> dd = MyDict(wynken=1, blynken=2)
>>> dd['blynken']
2
>>> dd['nod'] -->
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in __getitem__
KeyError: 'nod'
>>> dd['nod'] = 3
>>> dd['nod']
3
>>> del dd['nod']
>>> dd.keys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MyDict' object has no attribute 'keys'
>>>
Here the MyDict class creates a dict in its __init __() metho d and binds it to the instance's _d variable. Three metho ds
o f the MyDict class are delegated to that instance, but no ne o f the o ther metho ds o f the dict are available to the MyDict
user (which may o r may no t be what yo u intend). In this particular case, the MyDict class do esn't subclass dict, and so
no t all dict metho ds are available.

The final attempt to access the keys o f the MyDict instance sho ws o ne po tential sho rtco ming o f this appro ach:
metho ds o f the underlying o bject have to be made available explicitly. This technique can be useful when o nly a limited
subset o f behavio rs is required, alo ng with o ther functio nality (pro vided by additio nal metho ds) no t available fro m the
base type. Where mo st o f the behavio rs o f the base type are required, it is usually better to use inheritance, and then
o verride the metho ds that yo u do n't want to make available with a metho d that raises an exceptio n.

Extending Functionality by Composition


Object co mpo sitio n allo ws yo u to create co mplex o bjects by using o ther o bjects, typically bo und to instance variables.
An example where yo u might use such a co mplex o bject is during an attempt to simulate Pytho n's namespace access.
Yo u have already seen that Pytho n gives many o bjects a namespace, and yo u kno w that the interpreter, when lo o king
fo r an attribute o f a particular name, will first lo o k in the instance's namespace, next in the instance's class's
namespace, and so o n until it gets to the "to p" o f the inheritance chain (which is the built-in o bject class).

It is relatively straightfo rward to mo del a Pytho n namespace; they are almo st indistinguishable fro m dicts. Names are
used as keys, and the values asso ciated with the names are the natural parallel to the values o f the variables with
tho se names. Multiple dicts can be sto red in a list, with the dict to be searched placed first, as the lo west-numbered
element.

INTERACTIVE CONSOLE SESSION

>>> class Ns:


... def __init__(self, *args):
... "Initialize a tuple of namespaces presented as dicts."
... self._dlist = args
... def __getitem__(self, key):
... for d in self._dlist:
... try:
... return d[key]
... except KeyError:
... pass
... raise KeyError("{!r} not present in Ns object".format(key))
...
>>> ns = Ns(
... {"one": 1, "two": 2},
... {"one": 13, "three": 3},
... {"one": 14, "four": 4}
... )
>>>
>>> ns["one"]
1
>>> ns["four"]
4
>>>

The Ns class uses a list o f dicts as its primary data sto re, and do esn't call any o f their metho ds directly. It do es call
their metho ds indirectly tho ugh, because the __ge t it e m __() metho d iterates o ver the list and then tries to access the
required element fro m each dict in turn. Each failure raises a KeyErro r exceptio n, which is igno red by the pass
statement to mo ve o n to the next iteratio n. So , effectively the __ge t it e m __() metho d searches a list o f dicts, sto pping
as so o n as it finds so mething to return. That is why ns[" o ne " ] returned 1. While 14 is asso ciated with the same key,
this asso ciatio n takes place in a dict later in the list and so is never co nsidered; the functio n has already fo und the
same key in an earlier list and returned with that key's value.

Think o f an Ns o bject as being "co mpo sed" o f a list and dicts. Technically, any o bject can be co nsidered as being
co mpo sed o f all o f its instance variables, but we do n't no rmally regard co mpo sitio n as extending to simple types such
as numbers and strings. If yo u think abo ut Pytho n namespaces they act a bit like this: there are o ften a number o f
namespaces that the interpreter needs to search. Adding a new namespace (like a new layer o f inheritance do es to a
class's instances, fo r example) wo uld be the equivalent o n inserting a new dict at po sitio n 0 (Do yo u kno w which list
metho d will do that?).
Recursive Composition
So me data structures are simple, o thers are co mplex. Certain co mplex data structures are co mpo sed o f o ther
instances o f the same type o f o bject; such structures are so metimes said to be recursively composed. A typical
example is the tree, used in many languages to sto re data in such a way that it can easily be retrieved bo th rando mly
and sequentially (in the o rder o f the keys). The tree is made up o f no des. Each no de co ntains data and two po inters.
One o f the data elements will typically be used as the key, which determines the o rdering to be maintained amo ng the
no des. The first po inter po ints to a subtree co ntaining o nly no des with key values that are less than the key value o f the
current no de, and the seco nd po ints to a subtree co ntaining o nly no des with key values that are greater than that o f the
current no de.

Either o f the subtrees may be empty (there may no t be any no des with the required key values); if bo th subtrees are
empty, the no de is said to be a leaf node, co ntaining o nly data. If the relevant subtree is empty, the co rrespo nding
po inter element will have the value No ne (all no des start o ut co ntaining o nly data, with No ne as the left and right
po inters).

In a real pro gram, the no des wo uld have o ther data attached to them as well as the keys, but we have
Note o mitted this feature to allo w yo u to fo cus o n the necessary lo gic to maintain a tree.

Create a new PyDev pro ject named Pyt ho n4 _Le sso n0 3 and assign it to the Pyt ho n4 _Le sso ns wo rking set. Then,
in yo ur Pyt ho n4 _Le sso n0 3/src fo lder, create m yt re e .py as sho wn:

CODE TO TYPE:
'''
Created on Aug 18, 2011

@author: sholden
'''
class Tree:
def __init__(self, key):
"Create a new Tree object with empty L & R subtrees."
self.key = key
self.left = self.right = None
def insert(self, key):
"Insert a new element into the tree in the correct position."
if key < self.key:
if self.left:
self.left.insert(key)
else:
self.left = Tree(key)
elif key > self.key:
if self.right:
self.right.insert(key)
else:
self.right = Tree(key)
else:
raise ValueError("Attempt to insert duplicate value")
def walk(self):
"Generate the keys from the tree in sorted order."
if self.left:
for n in self.left.walk():
yield n
yield self.key
if self.right:
for n in self.right.walk():
yield n

if __name__ == '__main__':
t = Tree("D")
for c in "BJQKFAC":
t.insert(c)

print(list(t.walk()))
Here again we cho se no t to have yo u write tests fo r yo ur co de, but we do test it rather info rmally with the co de
fo llo wing the class declaratio n. The tree as created, co nsists o f a single no de. After creatio n, a lo o p inserts a number
o f characters, and then finally the walk() metho d is used to visit each no de and print o ut the value o f each data
element.

The ro o t o f the tree is a Tree o bject, which in turn may po int to o ther Tree no des. This means that each subtree has the
same structure as its parent, which implies that the same metho ds/algo rithms can be used o n the subtrees. This can
make the pro cessing lo gic fo r recursive structures quite co mpact.

The inse rt () metho d lo cates the co rrect place fo r the insertio n by co mparing the no de key with the key to be inserted. If
the new key is less than the no de's key, it must be po sitio ned in the left subtree, if greater, in the right subtree. If there
isn't a subtree there (indicated by the left o r right attribute having a value o f No ne), the newly-created no de is added as
its value. If the subtree exists, its insert metho d is called to place it co rrectly. So no t o nly is the data structure recursive,
so is the algo rithm to deal with it!

The walk() metho d is designed to pro duce values fro m the no des in so rted o rder. Again the algo rithm is recursive:
first it walks the left subtree (if o ne exists), then it pro duces the current no de (it yields the key value, but clearly the data
wo uld be preferable, either instead o f o r in additio n to the key value, if it were being sto redhere we are mo re
co ncerned with the basics o f the tree structure than with having the tree carry data, which co uld easily be added as a
new Tree instance variable passed in to the __init __() call o n creatio n).

In essence, a Tree is a "ro o t no de" (the first o ne added, in this case with key "D") that co ntains a key value and two
subtreesthe first o ne fo r key values less than that o f the ro o t no de, the seco nd fo r key values greater than that o f the
ro o t no de. The subtrees, o f co urse, are defined in exactly the same way, and so can be pro cessed in the same way.
Recursive data structures and recursive algo rithms tend to go to gether. The Tree o ffers a fairly decent visual
representatio n fo r yo ur brain to latch o nto :

Such recursive algo rithms aren't quite the same as delegatio n, but still, yo u co uld think o f walk() and insert() as
delegating a part o f the pro cessing to the subtrees. When yo u run t re e .py, yo u'll see this:

OBSERVE:
['A', 'B', 'C', 'D', 'F', 'J', 'K', 'Q']
This is ho w the tree actually sto res elements in terms o f Tree o bjects referencing each o ther (the diago nal lines
represent Pytho n references, the letters are the keys):

Altho ugh the keys were added in rando m o rder, the walk() metho d prints them in the co rrect o rder because it prints o ut
the keys o f the left subtree fo llo wed by the key o f the ro o t no de, fo llo wed by the keys o f the right subtree (it deals with
subtrees in the same way).

Great wo rk! Yo u've actually used co mpo sitio n in examples and pro jects. No w that yo u have a handle o n co mpo sitio n, po nder
the many ways yo u co uld inco rpo rate it into o ther pro grams!

When yo u finish the lesso n, do n't fo rget to return to the syllabus and co mplete the ho mewo rk.

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Publish and Subscribe
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

structure pro grams.


publish o bjects.
validate requests and identify o utput.
subscribe to o bjects.

In this lesso n, we'll go o ver pro gram structuring, as well as Publish and Subscribe .

On Program Structure
Ideally, every part o f yo ur pro gram will co mmunicate via kno wn APIs o nly, but acco mplishing that can be a real
challenge. When yo u are writing framewo rks to be used in a wide variety o f circumstances, it can be difficult to predict
what the enviro nment will lo o k like. Data must be pro duced, but it may be co nsumed by a variety o f functio ns. Co nsider
a spreadsheet, fo r example. It may display bo th a bar chart and a pie chart o f the same data. Ho w do es the co de that
updates the cells as users type in new numbers kno w to update the graphics, and ho w many graphics there are? The
answer lies in a generic technique kno wn as "publish-and-subscribe", which is a general mechanism to allo w flexible
distributio n o f data.

Publish and Subscribe


Thanks to publish-and-subscribe and similar systems, data pro ducers do no t need to kno w in advance who will be
using their data. The term "data pro ducer" is deliberately vague, because publish-and-subscribe is a bro ad and
enco mpassing architectural pattern. A data pro ducer (the "publisher" element in publish-and-subscribe) might be a
sto ck price ticker that perio dically spits o ut new prices fo r sto cks, o r a weather fo recasting pro gram that pro duces new
fo recasts every six ho urs, o r even the lo wly ticket machine that pro vides peo ple with numbers to take turns at a gro cery
co unter. Anyo ne who wants to make use o f the data must subscribe (typically by calling a metho d o f the pro ducer
o bject to "register" a subscriptio n) and then when new data is available, it is distributed to all subscribers by the
publisher calling a metho d o f each o f the subscribed o bjects with the new info rmatio n as an argument.

This "lo o sens the co upling" between the pro ducers and co nsumers o f data, allo wing each to be written in a general
way, pretty much independent o f each o ther. Each subscriber needs to kno w o nly abo ut its o wn relatio nship with the
publisher, regardless o f any o ther subscriber.

Publish and Subscribe in Action


Suppo se yo u have a class Publisher, who se instances can be given o bjects to publish, and that a number o f
co nsumers are po tentially interested in co nsuming that "data feed." The Publisher class will need metho ds to
allo w the subscribers to subscribe when they want to start receiving the feed and unsubscribe when they no
lo nger require it.

The co nsumers, in turn, have to kno w ho w the Publisher will transmit the data to them, which will no rmally be
achieved by calling o ne o f its metho ds. So co nsumers may need to pro vide an API to satisfy the
requirements o f the Publisher. We'll create an example.

Fo r o ur purpo ses, we'll write a mo dule that asks fo r lines o f input fro m the user, and then distributes the lines
to any subscribed co nsumers. The subscriber interface will have subscribe and unsubscribe metho ds that
add and remo ve items fro m the publisher's subscriber list. Subscribers must pro vide a "pro cess" metho d,
which the publisher will call with each new input.

We will have the subscribers print the input string after pro cessing it in basic, but distinguishable ways. In the
first example, subscribers print o ut the uppercase versio n o f the string they've received.

Create a Pyt ho n4 _Le sso n0 4 pro ject and add it to yo ur Pyt ho n4 _Le sso ns wo rking set. Then, create
pubandsub.py in yo ur Pyt ho n4 _Le sso n0 4 /src fo lder as sho wn:
CODE TO TYPE:
class Publisher:
def __init__(self):
self.subscribers = []
def subscribe(self, subscriber):
self.subscribers.append(subscriber)
def unsubscribe(self, subscriber):
self.subscribers.remove(subscriber)
def publish(self, s):
for subscriber in self.subscribers:
subscriber.process(s)

if __name__ == '__main__':
class SimpleSubscriber:
def __init__(self, publisher):
publisher.subscribe(self)
self.publisher = publisher
def process(self, s):
print(s.upper())

publisher = Publisher()
for i in range(3):
newsub = SimpleSubscriber(publisher)
line = input("Input {}: ".format(i))
publisher.publish(line)

The pro gram asks yo u fo r three lines o f input. The first is echo ed in uppercase o nce, the seco nd twice, and
the third three times, because each time thro ugh the lo o p, a new subscriber is subscribed to the publisher.

OBSERVE:
Input 0: pub
PUB
Input 1: and
AND
AND
Input 2: sub
SUB
SUB
SUB

The Publisher keeps a list o f subscribers (which starts o ut empty). Subscribing an o bject appends it to the
subscriber list; unsubscribing an o bject remo ves it. The SimpleSubscriber o bject takes a publisher as an
argument to the __init __() metho d and immediately subscribes to the publisher.

These same principles can be applied to pro grams yo u may already use. Fo r example, a spreadsheet
pro gram may have to pro cess spreadsheets where there are multiple graphics based o n the data, all o f which
must be updated as the data changes. One way to arrange that is to enlist the graphics as subscribers to an
event stream publisher, which publishes an alert every time any change is made to the data. To avo id
unnecessary co mputing, the event stream publisher might publish the event after a change o nly when no
further changes were made to the data within a fixed (and preferably sho rt) perio d o f time.

We can refine this pro cess further in vario us ways because it allo ws very loose coupling between the
publisher and the subscriber: neither needs to have advance kno wledge o f the o ther, and the co nnectio ns are
created at run-time rather than determined in advance. We like lo o se co upling in systems design because it's
flexible and allo ws dynamic relatio nships between o bjects.

Validating Requests and Identifying Output


Our initial implementatio n is defective in a co uple o f ways. First, there is no thing to sto p a given subscriber
fro m being subscribed multiple times. Similarly, there is no thing present to check whether a subscriber
requesting unsubscriptio n (co de no t yet exercised in the main pro gram) is actually subscribed. Passing a
no nexistent subscriber wo uld cause the list's remo ve() metho d to raise an exceptio n:
OBSERVE:
>>> [1, 2, 3].remove(4)
Traceback (most recent call last):
File "<console>", line 1, in <module>
ValueError: list.remove(x): x not in list
>>>

In o rder to make the message asso ciated with the exceptio n easier to understand, yo u'll want to trap it o r test
befo rehand fo r the co nditio n that wo uld cause the exceptio n and then raise yo ur o wn, mo re meaningful,
exceptio n.

Finally, the o riginal versio n o f o ur pro gram do es no t identify which specific subscriber is respo nsible fo r an
individual message. We want it to identify the culprit tho ugh, because that will make the o peratio n o f the
pro gram easier to understand. Let's revise it so that each subscriber instance takes an additio nal argument
(its name), which it will then use to identify all o f its o utput. Mo dify pubandsub.py to check fo r erro rs and
identify subscribers

CODE TO TYPE:
class Publisher:
def __init__(self):
self.subscribers = []
def subscribe(self, subscriber):
if subscriber in self.subscribers:
raise ValueError("Multiple subscriptions are not allowed")
self.subscribers.append(subscriber)
def unsubscribe(self, subscriber):
if subscriber not in self.subscribers:
raise ValueError("Can only unsubscribe subscribers")
self.subscribers.remove(subscriber)
def publish(self, s):
for subscriber in self.subscribers:
subscriber.process(s)

if __name__ == '__main__':
class SimpleSubscriber:
def __init__(self, name, publisher):
publisher.subscribe(self)
self.name = name
self.publisher = publisher
def process(self, s):
print(self.name, ":", s.upper())

publisher = Publisher()
for i in range(3):
newsub = SimpleSubscriber("Sub"+str(i), publisher)
line = input("Input {}: ".format(i))
publisher.publish(line)

This versio n o f the pro gram do esn't actually trigger any o f the newly-added exceptio ns, but the inclusio n o f
the tests makes o ur co de mo re ro bust. The SimpleSubscriber.pro cess() metho d identifies each o utput line
with the name o f the instance that was respo nsible fo r it, which can be especially helpful in mo re co mplex
situatio ns. The co de that creates the subscribers generates names such as "Sub0 ", "Sub1" and so o n fo r the
subscribers. Yo u sho uld see o utput that lo o ks like this:
OBSERVE:

Input 0: sub
Sub0 : SUB
Input 1: and
Sub0 : AND
Sub1 : AND
Input 2: pub
Sub0 : PUB
Sub1 : PUB
Sub2 : PUB

If we were to write unit tests fo r this co de, we might include an assertRaises() test to ensure that the do uble-
subscriptio n and attempts to remo ve no n-subscribed o bjects were handled co rrectly. In the absence o f unit
tests, we sho uld at least make sure that exceptio ns will be raised under expected circumstances. We can do
that in an interactive co nso le with the help o f Eclipse.

First, make sure that yo u activate the edito r sessio n co ntaining the pubandsub.py so urce.Then, in the
Co nso le pane, click Ope n Co nso le and select PyDe v Co nso le fro m the dro p-do wn menu that appears:

Yo u will see a dialo g asking yo u which type o f co nso le windo w yo u want to create. Select Co nso le f o r
curre nt ly act ive e dit o r and click OK:

No w yo u will be able to impo rt mo dules fro m the Pytho n4_Lesso n0 4/src directo ry. Next, verify that
exceptio ns are pro perly raised:
INTERACTIVE CONSOLE SESSION

>>> from pubandsub import Publisher


>>> publisher = Publisher()
>>> publisher.unsubscribe(None)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "pubandsub.py", line 16, in unsubscribe
raise ValueError("Can only unsubscribe subscribers")
ValueError: Can only unsubscribe subscribers
>>> publisher.subscribe(None)
>>> publisher.subscribe(None)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "pubandsub.py", line 12, in subscribe
raise ValueError("Multiple subscriptions are not allowed")
ValueError: Multiple subscriptions are not allowed
>>>

Since exceptio ns appear to be raised under the co rrect circumstances, we co uld pro ceed witho ut mo difying
the co de further, but it's a go o d idea to co py and paste the interactive sessio n into yo ur so urce as a do ctest. A
simple co py-and-paste fro m the co nso le panel is no t adequate, ho wever, because the co nso le is designed
to let yo u co py and paste only the co de, so when yo u co py fro m the interactive sessio n in Eclipse, the
necessary pro mpt strings (">>> " and "... ") are absent fro m the pasted co ntent. do ctest and Eclipse do n't
always play nicely to gether. It's a go o d thing Eclipse has so many o ther useful features.

So far o ur pro gram has no t tested the no n-erro r branch o f the unsubscribe co de. We'll perfo rm that test next
by restricting the number o f subscribers. This can be do ne either internally (fro m within the
Publisher.subscribe() metho d, fo r example) o r by truncating the subscriptio n list fro m the main lo o p. We're
go ing to do the latter. We'll add a few lo o ps to make sure that the strategy is pro perly tested. After each new
subscriptio n, we'll remo ve the least recent if the length o f the subscriptio n list exceeds three. This will ensure
that no input sees mo re than three respo nses. Mo dify pubandsub.py as sho wn belo w
CODE TO TYPE:
class Publisher:
def __init__(self):
self.subscribers = []
def subscribe(self, subscriber):
if subscriber in self.subscribers:
raise ValueError("Multiple subscriptions are not allowed")
self.subscribers.append(subscriber)
def unsubscribe(self, subscriber):
if subscriber not in self.subscribers:
raise ValueError("Can only unsubscribe subscribers")
self.subscribers.remove(subscriber)
def publish(self, s):
for subscriber in self.subscribers:
subscriber.process(s)

if __name__ == '__main__':
class SimpleSubscriber:
def __init__(self, name, publisher):
publisher.subscribe(self)
self.name = name
self.publisher = publisher
def process(self, s):
print(self.name, ":", s.upper())

publisher = Publisher()
for i in range(5):
newsub = SimpleSubscriber("Sub"+str(i), publisher)
if len(publisher.subscribers) > 3:
publisher.unsubscribe(publisher.subscribers[0])
line = input("Input {}: ".format(i))
publisher.publish(line)
line = input("Input {}: ".format(i))
publisher.publish(line)

This co de is no t much different fro m the last example, except that there are never mo re than three respo nses
to any input, which indicates that the unsubscribe functio n is wo rking co rrectly. Each time the subscriber co unt
exceeds three it is trimmed fro m the left:

OBSERVE:
Input 0: sub
Sub0 : SUB
Input 1: and
Sub0 : AND
Sub1 : AND
Input 2: pub
Sub0 : PUB
Sub1 : PUB
Sub2 : PUB
Input 3: more
Sub1 : MORE
Sub2 : MORE
Sub3 : MORE
Input 4: inputs
Sub2 : INPUTS
Sub3 : INPUTS
Sub4 : INPUTS

Making the Algorithm More General


At present, the publisher requires subscribers to have a "pro cess" metho d, which it calls to have each
subscriber pro cess the published data. This wo rks well eno ugh, but it do es co nstrain the nature o f the
subscribers. Fo r example, there is no way to subscribe functio ns, because there is no way to add a metho d to
a functio n.
Let's mo dify the pro gram so that it registers the callable metho d directly instead o f registering an instance and
then calling a specific metho d. Our pro gram will then allo w any callable to be registered. We'll verify this by
defining a simple functio n and registering it with the publisher befo re the lo o p begins. Mo dify pubandsub.py
to allo w registratio n o f any callable:

CODE TO TYPE:
class Publisher:
def __init__(self):
self.subscribers = []
def subscribe(self, subscriber):
if subscriber in self.subscribers:
raise ValueError("Multiple subscriptions are not allowed")
self.subscribers.append(subscriber)
def unsubscribe(self, subscriber):
if subscriber not in self.subscribers:
raise ValueError("Can only unsubscribe subscribers")
self.subscribers.remove(subscriber)
def publish(self, s):
for subscriber in self.subscribers:
subscriber.process(s)

if __name__ == '__main__':
def multiplier(s):
print(2*s)

class SimpleSubscriber:
def __init__(self, name, publisher):
publisher.subscribe(self)
self.name = name
self.publisher = publisher
publisher.subscribe(self.process)
def process(self, s):
print(self, ":", s.upper())
def __repr__(self):
return self.name

publisher = Publisher()
publisher.subscribe(multiplier)
for i in range(6):
newsub = SimpleSubscriber("Sub"+str(i), publisher)
line = input("Input {}: ".format(i))
publisher.publish(line)
if len(publisher.subscribers) > 3:
publisher.unsubscribe(publisher.subscribers[0])

The SimpleSubscriber o bject no w registers its (bo und) pro cess metho d as a callable, and the
Publisher.publish() metho d calls the subscribers directly rather than calling a metho d o f the subscriber. This
makes it po ssible to subscribe functio ns to the Publisher:
OBSERVE:
Input 0: pub
pubpub
Sub0 : PUB
Input 1: and
andand
Sub0 : AND
Sub1 : AND
Input 2: sub
subsub
Sub0 : SUB
Sub1 : SUB
Sub2 : SUB
Input 3: and
Sub0 : AND
Sub1 : AND
Sub2 : AND
Sub3 : AND
Input 4: dub
Sub1 : DUB
Sub2 : DUB
Sub3 : DUB
Sub4 : DUB
Input 5: and
Sub2 : AND
Sub3 : AND
Sub4 : AND
Sub5 : AND

The full "publish and subscribe" algo rithm is general eno ugh to allo w co mmunicatio n between
Note co mpletely different pro cesses. Technically, we have been studying a subset o f publish-and-
subscribe also referred to as "the o bserver pattern."

A Note on Debugging
Eclipse has so me advanced debugging features, but we've igno red them. Yo u wo n't always have Eclipse at yo ur
dispo sal (at least when yo u aren't in the lab), so instead, we've directed o ur attentio n to assuring yo ur co de thro ugh
testing.

The relatively simple expedient o f inserting print() calls in yo ur co de is go o d eno ugh to so lve many pro blems, and in
the upco ming pro ject the mo st impo rtant part o f the exercise is to use this technique to disco ver exactly ho w the
suggested mo dificatio n breaks the pro gram. See yo u in the next lesso n!

When yo u finish the lesso n, do n't fo rget to return to the syllabus and co mplete the ho mewo rk.

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Optimizing Your Code
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

fo cus yo ur attentio n to the pro per elements fro m the beginning.


use the pro file mo dule.
identify which elements sho uld be o ptimized.
o ptimize.

Start with Correctness


"Speed is fine, but accuracy is everything."
-Wyatt Earp

Inexperienced pro grammers o ften devo te the majo rity o f their attentio n to speed and perfo rmance. This is a co mmo n
mistake that can o ften lead to additinal mistakes made as a result o f wo rking with accelerated pro gram speeds to o
early o n in the pro gramming pro cess. During develo pment, yo ur initial fo cus sho uld be o n pro ducing pro grams that
wo rk co rrectly and are suppo rted by tests. When yo u do begin to co nsider speed and perfo rmance, yo u're likely to alter
yo ur co de; that's when tests will be indispensible. If yo ur changes break yo ur tests, yo u'll need to fix yo ur co de befo re
yo u address issues o f speed and perfo rmance. The prevailing pro grammer's wisdo m applies, "First, make it wo rk,
then make it faster."

When yo u write a wo rking pro gram, it's generally fast eno ugh already. That isn't to say that yo ur pro grams can't be
made fastermo st o f them canbut a go o d pro grammer kno ws when to leave well eno ugh alo ne.

Usually we o ptimize fo r time (that is, we make the pro gram run as quickly as po ssible), but so metimes a pro gram
appears to use an excessive amo unt o f memo ry. There is generally a trade-o ff between memo ry and time. Yo u can
reduce memo ry usage by using a slo wer algo rithm.

Guido van Ro ssum, Pytho n's invento r, discussed o ptimizing o ne particular functio n. Take a lo o k at that here. This
algo rithm sho ws just ho w many different appro aches there are to so lve a single pro blem.

Where to Optimize
Faced with an under-perfo rming pro gram, yo u first need to determine which parts o f the pro gram are causing
the issues. In o rder to do that, yo u'll need to "pro file" yo ur pro gram, that is, to find o ut ho w much time is being
spent in each part o f the pro gram. This will allo w yo u to see which pieces are taking up the mo st CPU time.
These pieces will then be the primary targets fo r o ptimizatio n. The Pytho n language includes a pro file mo dule
that enables yo u to gather detailed info rmatio n abo ut ho w much time is being spent in different parts o f yo ur
pro gram.

Yo u can determine which pieces o f co de run faster using the facilities o f the t im e it mo dule. Fo r o ur
purpo ses, yo u'll be using just a few features o f the library, but I enco urage yo u to investigate the Pytho n library
do cumentatio n o utside the labs to learn mo re abo ut it. Also , try o ut yo ur o wn versio ns o f co de to learn mo re
abo ut different appro aches to a given pro blem and ho w well they perfro m.

T he Profile Module
The pro file mo dule allo ws yo u to trace yo ur pro gram, by keeping info rmatio n abo ut the functio n call and return events,
as well as exceptio ns that are raised. It can pro vide detailed explanatio ns o f where yo ur pro gram is spending its time.
The mo dule co llects and summarizes data abo ut the vario us functio n calls in a pro gram.

T wo Different Modules
The cPro file mo dule (written in C) functio ns just like the pro file mo dule, o nly faster. cPro file is no t available in
every co mputer's Pytho n, tho ugh. When cPro file is unavailable, use the pro file mo dule instead. Yo u can allo w
yo ur pro gram to make use o f cPro file when it is available, and pro file when it is no t. A quick illustratio n will
help yo u understand these to o ls. Here's ho w to impo rt o ne o f two mo dules with the same name:
OBSERVE:

try:
import cProfile as profile
except ImportError:
import profile

If cPro file is available, it is impo rted under the name pro file. If it isn't available, the attempt to impo rt it raises an
Impo rtErro r exceptio n, and the pro file mo dule is impo rted instead.

Using the Profile Module


Create a new Pydev pro ject named Pyt ho n4 _Le sso n0 5 , assign it to the Pyt ho n4 _Le sso ns wo rking set,
and then create a new file in yo ur Pyt ho n4 _Le sso n0 5 /src fo lder named prf l.py, as sho wn belo w:

CODE TO TYPE:
def f1():
for i in range(300):
f2()

def f2():
for i in range(300):
f3()

def f3():
for i in range(300):
pass

import cProfile as profile


profile.run("f1()")

The pro file.run() functio n takes as its argument, a string co ntaining the co de to be run, and then runs it with
pro filing active. If o nly o ne argument is given, the functio n pro duces o utput at the end o f the run that
summarizes the o peratio n o f the co de.

Save and run it; yo u see so mething like this:

OBSERVE:
90304 function calls in 1.110 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)


1 0.000 0.000 1.110 1.110 <string>:1(<module>)
1 0.000 0.000 1.110 1.110 prfl2.py:1(f1)
300 0.030 0.000 1.110 0.004 prfl2.py:5(f2)
90000 1.080 0.000 1.080 0.000 prfl2.py:9(f3)
1 0.000 0.000 1.110 1.110 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof
iler' objects}

A to tal o f 9 0 30 4 functio n calls are reco rded during the executio n o f the co de in a to tal o f 1.110 seco nds. The
rest o f the o utput is so rted by functio n name by default. The co lumns are:

Co lum n Nam e Me aning


ncalls The to tal number o f calls made to the listed functio n.
to ttime The to tal time spent in executing the listed functio n.
percall (1) The average executio n time fo r a single call o f the functio n.
The cumulative executio n time o f all calls o f this functio n, including the time taken
cumtime
to execute all functio ns called fro m this o ne.
percall (2) The average cumulative executio n time fo r a single call o f the functio n.
filename:lineno (functio n) The details o f the so urce co de defining the functio n..

By lo o king at the "to ttime" co lumn, we can see that the majo rity o f the pro gram's time is spent in the f3()
functio n. In fact, if yo u co uld eliminate the time taken by the rest o f the pro gram alto gether, the impact to the
pro gram's to tal executio n time wo uld be less than 5%. In o ther wo rds, the f3() functio n is taking up 9 5% o f the
pro gram's executio n time. As Guido van Ro ssum says:

Rule number one: only optimize when there is a proven speed bottleneck. Only optimize the innermost loop.
(This rule is independent of Python, but it doesn't hurt repeating it, since it can save a lot of work.) :-)

More Complex Reporting


So metimes yo u'll want mo re specific info rmatio n fro m a pro filing run. When that's the case, yo u'll use the
seco nd argument to pro file.runthe name o f a file to which yo ur pro gram will send the raw pro filing data.
Then yo u can pro cess this data separately using the pstats mo dule. In o rder to give the mo dule eno ugh data
to wo rk with, we'll use ano ther artificially co nstructed pro gram (there is no real co mputatio n taking place, but
many functio n calls). Mo dify prf l.py to add mo re functio n calls:

CODE TO TYPE:
def f1():
for i in range(300):
f2(); f3(); f5()

def f2():
for i in range(300):
f3()

def f3():
for i in range(300):
pass

def f4():
for i in range(100):
f5()

def f5():
i = 0
for j in range(100):
i += j
f6()

def f6():
for i in range(100):
f3()

import cProfile as profile


profile.run("f1()", "profiledata")

When yo u run this pro gram, yo u wo n't see any o utput in the co nso le windo w. The pro gram creates a file
named pro f ile dat a in the fo lder where prf l.py is lo cated (refresh the Package Explo rer windo w [press F5 ]
to see it). No w if yo u start up a co nso le windo w in the same directo ry (make sure the pro gram is in the active
edito r windo w, select PyDe v Co nso le fro m the Ope n Co nso le pull-do wn menu, select Co nso le f o r
curre nt ly act ive e dit o r, then click OK), yo u can wo rk with that file using the pstats mo dule, written precisely
to allo w analysis o f the pro file data.

The primary element in the pstats mo dule is the Stats class. When yo u create an instance, yo u can give it the
name(s) o f o ne o r mo re files as po sitio nal arguments. These files will have been created by pro filing. Yo u
can also pro vide a stream keywo rd argument, which is an o pen file to which o utput will be sent (this defaults
to standard o utput, meaning yo u see the o utput straight away).

The next series o f o peratio ns sho uld all be perfo rmed in the same co nso le windo w, so do no t
Note clo se it do wn between o peratio ns.

Make sure to keep this windo w o pen after this interactive sessio n:
INTERACTIVE CONSOLE SESSION

>>> import pstats


>>> s = pstats.Stats("V:\\workspace\\Python4_Lesson05\\src\\profiledata")

>>> s.print_stats()
Mon Jun 25 17:55:43 2012 V:\workspace\Python4_Lesson05\src\profiledata

121204 function calls in 3.275 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)


1 0.000 0.000 3.275 3.275 {built-in method exec}
300 0.770 0.003 2.458 0.008 V:\workspace\Python4_Lesson05\src\
prfl.py:5(f2)
300 0.259 0.001 0.795 0.003 V:\workspace\Python4_Lesson05\src\
prfl.py:23(f6)
1 0.007 0.007 3.275 3.275 V:\workspace\Python4_Lesson05\src\
prfl.py:1(f1)
1 0.000 0.000 3.275 3.275 <string>:1(<module>)
120300 2.229 0.000 2.229 0.000 V:\workspace\Python4_Lesson05\src\
prfl.py:9(f3)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof
iler' objects}
300 0.010 0.000 0.804 0.003 V:\workspace\Python4_Lesson05\src\
prfl.py:17(f5)

<pstats.Stats object at 0x0000000002955198>


>>>

Note The times and paths in yo ur o utput will vary fro m the values in the abo ve co nso le sessio n.

When yo u create a pstats.Stats instance, it lo ads the data, and yo u can manipulate it befo re pro ducing o utput
(yo u'll see ho w sho rtly). There are several refinements yo u can make to the o utput, by calling metho ds o f
yo ur Stats instance.
INTERACTIVE CONSOLE SESSION

>>> s.strip_dirs() # shorten function references


<pstats.Stats object at 0x0000000002955198>
>>> s.print_stats()
Mon Jun 25 17:55:43 2012 V:\workspace\Python4_Lesson05\src\profiledata

121204 function calls in 3.275 seconds

Random listing order was used

ncalls tottime percall cumtime percall filename:lineno(function)


1 0.000 0.000 3.275 3.275 {built-in method exec}
1 0.007 0.007 3.275 3.275 prfl.py:1(f1)
120300 2.229 0.000 2.229 0.000 prfl.py:9(f3)
300 0.259 0.001 0.795 0.003 prfl.py:23(f6)
300 0.770 0.003 2.458 0.008 prfl.py:5(f2)
1 0.000 0.000 3.275 3.275 <string>:1(<module>)
300 0.010 0.000 0.804 0.003 prfl.py:17(f5)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof
iler' objects}

<pstats.Stats object at 0x0000000002955198>


>>>

The st rip_dirs() metho d has remo ved all o f the directo ry info rmatio n fro m the last co lumn. st rip_dirs() is
applied to the default o utput; the path info rmatio n isn't generally required. Next, yo u can so rt the o utput to give
yo u the mo st significant items first by pro viding o ne o r mo re keys to the St at s.so rt _st at s() metho d. The
keys that are acceptable currently are:

Ke y So rt by ...
The to tal co unt o f calls o f the functio n (including "recursive calls" where a functio n calls
'calls'
itself, o r calls o ther functio ns which in turn call it).
'cum ulat ive ' Cumulative executio n time
'f ile ' File name fro m which the functio n was lo aded
'm o dule ' Same as 'f ile '
'pcalls' Co unt o f primitive calls (i.e. calls made to the functio n while it is no t actually executing)
'line ' Line number
'nam e ' Functio n name
'nf l' Name/file/line
'st dnam e ' So rts by the functio n name as printed
't im e ' Internal time

Yo u may have no ticed that 3 o f the 8 lines o f the o utput aren't particularly useful fo r o ur requirements.
Fo rtunately, yo u can filter o ut the results yo u do n't want by placing o ne o r mo re restrictio ns o n the o utput.
Tho se restrictio ns can take o ne o f three fo rms as additio nal arguments to print _st at s():

An integer will limit the o utput to the given number o f lines.


A flo ating-po int number between 0 and 1 will restrict the o utput to the given pro po rtio n o f entries.
A regular expressio n will limit the o utput to tho se entries who se filename:lineno (functio n) fields
co ntain the given regular expressio n.

Yo u can limit the o utput to o mit the details o f the "structural" entries (tho se that relate strictly to the pro filing
framewo rk) using the simple expressio n r" \.py" , o r, o nce the entries are so rted in the right o rder, by using the
integer 5 in this case.

The restrictio ns are applied in o rder, so print _st at s(0 .1, " t e st " ) repo rts tho se lines o ut o f the to p tenth that
match "test", whereas print _st at s(" t e st " , 0 .1) repo rts a tenth o f all tho se lines matching "test." So , if there
were a hundred lines in the so urce data, print _st at s(0 .1, " t e st " ) wo uld print any lines that co ntain "test"
fro m the first ten. print _st at s(" t e st " , 0 .1) wo uld print o ne tenth o f ALL the lines that co ntain "test." If every
third line co ntained "test", print _st at s(0 .1, " t e st " ) wo uld retrieve lines 3,6 , and 9 . print _st at s(" t e st " ,
0 .1) wo uld retrieve lines 3,6 ,9 , and 11 -- fo ur lines (assuming there were abo ut 40 co ntaining "test").

INTERACTIVE CONSOLE SESSION

>>> s.sort_stats('calls', 'time')


<pstats.Stats object at 0x10057c510>
>>> s.print_stats(r"\.py")
Mon Jun 25 17:55:43 2012 V:\workspace\Python4_Lesson05\src\profiledata

121204 function calls in 3.275 seconds

Ordered by: call count, internal time


List reduced from 8 to 5 due to restriction <'\\.py'>

ncalls tottime percall cumtime percall filename:lineno(function)


120300 2.229 0.000 2.229 0.000 prfl.py:9(f3)
300 0.770 0.003 2.458 0.008 prfl.py:5(f2)
300 0.259 0.001 0.795 0.003 prfl.py:23(f6)
300 0.010 0.000 0.804 0.003 prfl.py:17(f5)
1 0.007 0.007 3.275 3.275 prfl.py:1(f1)

<pstats.Stats object at 0x0000000002955198>


>>> s.print_stats(5)
Mon Jun 25 17:55:43 2012 V:\workspace\Python4_Lesson05\src\profiledata

121204 function calls in 3.275 seconds

Ordered by: call count, internal time


List reduced from 8 to 5 due to restriction <5>

ncalls tottime percall cumtime percall filename:lineno(function)


120300 2.229 0.000 2.229 0.000 prfl.py:9(f3)
300 0.770 0.003 2.458 0.008 prfl.py:5(f2)
300 0.259 0.001 0.795 0.003 prfl.py:23(f6)
300 0.010 0.000 0.804 0.003 prfl.py:17(f5)
1 0.007 0.007 3.275 3.275 prfl.py:1(f1)

<pstats.Stats object at 0x0000000002955198>


>>>

Yo u may have wo ndered why all o f the metho ds o f the pst at s.St at s o bject seem to return the
same pst at s.St at s instance. It's to allo w users to utilize a technique called method chaining.
Since each metho d call returns the instance, yo u can apply a metho d call directly to the result o f
Note a previo us metho d call, as in

s.st rip_dirs().so rt _st at s('calls', 't im e ').print _st at s()

Yo u'll also want to kno w which functio ns call which o ther functio ns. The pstats.Stats o bject has the
print_callers() and print_callees() metho ds that sho w yo u the calling relatio nships between vario us functio ns:
INTERACTIVE CONSOLE SESSION

>>> s.sort_stats('calls', 'time')


<pstats.Stats object at 0x0000000002955198>

>>> s.print_callers(r"\.py")
Ordered by: call count, internal time
List reduced from 8 to 5 due to restriction <'\\.py'>

Function was called by...


ncalls tottime cumtime
prfl.py:9(f3) <- 300 0.005 0.005 prfl.py:1(f1)
90000 1.688 1.688 prfl.py:5(f2)
30000 0.536 0.536 prfl.py:23(f6)
prfl.py:5(f2) <- 300 0.770 2.458 prfl.py:1(f1)
prfl.py:23(f6) <- 300 0.259 0.795 prfl.py:17(f5)
prfl.py:17(f5) <- 300 0.010 0.804 prfl.py:1(f1)
prfl.py:1(f1) <- 1 0.007 3.275 <string>:1(<module>)

<pstats.Stats object at 0x0000000002955198>


>>> s.print_callees(r"\.py")
Ordered by: call count, internal time
List reduced from 8 to 5 due to restriction <'\\.py'>

Function called...
ncalls tottime cumtime
prfl2.py:9(f3) ->
prfl2.py:5(f2) -> 90000 1.080 1.080 prfl2.py:9(f3)
prfl2.py:23(f6) -> 30000 0.355 0.355 prfl2.py:9(f3)
prfl2.py:17(f5) -> 300 0.010 0.365 prfl2.py:23(f6)
prfl2.py:1(f1) -> 300 0.027 1.107 prfl2.py:5(f2)
300 0.004 0.004 prfl2.py:9(f3)
300 0.004 0.369 prfl2.py:17(f5)

<pstats.Stats object at 0x0000000002955198>


>>>

Being aware o f which functio n calls which o ther functio ns can be useful when yo u are trying to lo cate specific
calls that take mo re time than o thers.

What to Optimize
Yo u can use the pro file mo dule to ho ne in o n the parts o f yo ur pro gram that are using the mo st CPU time. Yo ur next
co nsideratio n will be figuring o ut ho w to speed up the co de in yo ur "ho t spo ts." To do this, we'll use the timeit mo dule,
which allo ws yo u to measure the relative speeds o f different Pytho n snippets. The timeit mo dule co ntains mo re
features than we need fo r o ur task, but it's a go o d idea to familiarize yo urself with its do cumentatio n fo r future tasks.

The timeit mo dule defines a Timer class which allo ws yo u full co ntro l o ver the creatio n and executio n o f timed co de,
but we'll just use the mo dule's timeit() functio n; it allo ws yo u to specify a statement to be timed and so me initializatio n
co de to execute befo re timing starts. The functio n runs the initializatio n co de and then executes the co de to be timed
repeatedly, printing o ut the to tal executio n time in seco nds. Take a lo o k:
INTERACTIVE CONSOLE SESSION

>>> from timeit import timeit


>>> timeit("i = i + 1", "i=0")
0.11318016052246094
>>> timeit("i = i + 1", "i=0")
0.11426806449890137
>>> timeit("i = i + 1", "i=0")
0.1136329174041748
>>> timeit("i += 1", "i=0")
0.11641097068786621
>>> timeit("i += 1", "i=0")
0.11541509628295898
>>> timeit("i += 1", "i=0")
0.11439919471740723
>>>

The example demo nstrates that timings are no t co mpletely repeatable (and therefo re sho uldn't be relied upo n fo r
abso lute info rmatio n). Seco ndly, it demo nstrates that there isn't a big difference between the time it takes to execute
regular additio n and the time required to execute the augmented additio n o perato r.

The timeit() functio n creates an entirely new namespace in which to run the co de being timed, so the
Note examples use an initializatio n statement to set i to zero befo re the timed co de is run; witho ut that, yo u'd
see an exceptio n indicating that the i had no t been defined.

No w that yo u kno w ho w mo dules wo rk, we can co ncentrate o n getting yo ur co de to run faster. To help facilitate writing
yo ur timing tests, yo u'll usually define functio ns co ntaining the co de that are called by the timing ro utine.

Loop Optimizations
So metimes yo u write co de and put a co mputatio n inside o f the lo o p when it do esn't need to be. Under tho se
circumstances there are gains to be made by mo ving the co mputatio n o ut o f the lo o p, a technique usually
referred to as "lo o p ho isting." Here is an example o f lo o p ho isting:

INTERACTIVE CONSOLE SESSION

>>> def loop1():


... lst = range(10)
... for i in lst:
... x = float(i)/len(lst)
...
>>> def loop2():
... lst = range(10)
... ln = len(lst)
... for i in lst:
... x = float(i)/ln
...
>>> timeit("loop1()", "from __main__ import loop1")
7.349833011627197
>>> timeit("loop2()", "from __main__ import loop2")
4.197483062744141
>>>

What seems like a small change to the co de makes a substantial difference!

Actually, the best way to o ptimize a lo o p is to remo ve it alto gether. So metimes yo u can do that using
Pytho n's built-in functio ns. Let's time fo ur different ways to build the upper-case versio n o f a list:
INTERACTIVE CONSOLE SESSION

>>> oldlist = "the quick brown fox jumps over the lazy dog".split()
>>> def lf1(lst):
... newlist = []
... for w in lst:
... newlist.append(w.upper())
... return newlist
...
>>> def lf2(lst):
... return [w.upper() for w in lst]
...
>>> def lf3(lst):
... return list(w.upper() for w in lst)
...
>>> def lf4(lst):
... return map(str.upper, lst)
...
>>>
>>> timeit("lf1(oldlist)", "from __main__ import lf1, oldlist")
4.409790992736816
>>> timeit("lf2(oldlist)", "from __main__ import lf2, oldlist")
3.492004156112671
>>> timeit("lf3(oldlist)", "from __main__ import lf3, oldlist")
4.758850812911987
>>> timeit("lf4(oldlist)", "from __main__ import lf4, oldlist")
0.5220911502838135
>>>

Yo u haven't run into the map() buit-in befo re, but it has so me go o d things go ing fo r it. Its first argument is a
functio n (in this case, the unbo und upper() metho d o f the built-in str type), and any remaining arguments are
iterables. There are as many iterables as the functio n takes arguments, and the result is a list co ntaining the
return values o f the functio n when called with co rrespo nding elements o f each iterable (if the iterables are no t
all the same length, map sto ps as so o n as the first o ne is exhausted).

So , why is the map()-based so lutio n so much faster? There are two reaso ns. First, it is the o nly so lutio n that
do es no t need to lo o k up the upper() metho d in the str type each time aro und the lo o p. Seco nd, the lo o ping is
do ne inside map(), which is written in the C language, which saves a lo t o f time.

Ano ther way to remo ve a lo o p is to write the lo o p co ntents o ut as literal co de. This is really o nly practical fo r
sho rt lo o ps with a kno wn number o f iteratio ns, but it can be a very effective technique, as the next example o f
"inlining lo o p co de" sho ws:

INTERACTIVE CONSOLE SESSION

>>> def f1():


... pass
...
>>> def loopfunc():
... for i in range(8):
... f1()
...
>>> def inline():
... f1(); f1(); f1(); f1(); f1(); f1(); f1(); f1()
...
>>> timeit("loopfunc()", "from __main__ import loopfunc")
1.9027259349822998
>>> timeit("inline()", "from __main__ import inline")
1.2639250755310059
>>>

There can be a substantial amo unt o f o verhead in lo o ping. When functio n calls are written o ut explicitly, the
executio n time is 30 % fastera wo rthwhile gain. Of co urse, in this example the lo o p o verhead do es tend to
do minate because there is so little actual co mputatio n happening.

Pre-computing Attribute References


Due to Pytho n's dynamic nature, when the interpreter co mes acro ss an expressio n like a.b.c, it lo o ks up a
(trying first the lo cal namespace, then the glo bal namespace, and finally the built-in namespace), then it lo o ks
in that o bject's namespace to reso lve the name b, and finally it lo o ks in that o bject's namespace to reso lve
the name c. These lo o kups are reaso nably fast; fo r lo cal variables, lo o kups are extremely fast, since the
interpreter kno ws which variables are lo cal and can assign them a kno wn po sitio n in memo ry. There are
definitely gains to be had by sto ring references in lo cal variables. Let's try remo ving Attribute Reso lutio n fro m
lo o ps:

INTERACTIVE CONSOLE SESSION

>>> class Small:


... class Smaller:
... x = 20
... smaller = Smaller
...
>>> small = Small()
>>>
>>> def attr1():
... ttl = 0
... for i in range(50):
... ttl += small.smaller.x
... return ttl
...
>>> def attr2():
... ttl = 0
... x = small.smaller.x
... for i in range(50):
... ttl += x
... return ttl
...
>>> timeit("attr1()", "from __main__ import small, attr1")
11.901235103607178
>>> timeit("attr2()", "from __main__ import small, attr2")
6.448068141937256
>>>

Here, the functio n do esn't actually execute a huge amo unt o f co mputatio n, but we gain a lo t in speed.

Local Variables are Faster than Global Variables


As we mentio ned befo re, the interpreter kno ws which names inside yo ur functio ns are lo cal and it assigns
them specific (kno wn) lo catio ns inside the functio n call's memo ry. This makes references to lo cals much
faster than to glo bals and (mo st especially) to built-ins. Let's test name reference speed fro m vario us spaces:
INTERACTIVE CONSOLE SESSION

>>> glen = len # provides a global reference to a built-in


>>>
>>> def flocal():
... name = len
... for i in range(25):
... x = name
...
>>> def fglobal():
... for i in range(25):
... x = glen
...
>>> def fbuiltin():
... for i in range(25):
... x = len
...
>>> timeit("flocal()", "from __main__ import flocal")
1.743438959121704
>>> timeit("fglobal()", "from __main__ import fglobal")
2.192162036895752
>>> timeit("fbuiltin()", "from __main__ import fbuiltin")
2.259413003921509
>>>

This difference in speed isn't huge here, but it definitely sho ws that accessing a lo cal variable is faster than
accessing a glo bal o r a built-in. If many glo bals o r built-ins are used inside a functio n, it makes sense to
sto re a lo cal reference to them. By co ntrast, if they are used o nly o nce, then yo u'd o nly be adding o verhead to
yo ur functio n!

How to Optimize
Optimizing co de isn't easy, and it wo uld be impo ssible to sho w yo u all the go tchas yo u can intro duce into yo ur co de
here. Fo r no w, here are a few guidelines that can help yo u avo id co mmo n pitfalls.

Don't Optimize Prematurely


Do n't co nsider perfo rmance while yo u're writing the co de (altho ugh it's difficult fo r even experienced
pro grammers to igno re). The primary go al o f the initial pro gramming pro cess is a co rrect, functio ning
algo rithm that is relatively easy to understand. Only after yo ur tests demo nstrate co rrect o peratio n sho uld yo u
address perfo rmance.

Use T imings, Not Intuition


Our intuitio n is no t always the best gauge o f what will run fast. Yo u're much better o ff using timings to
determine ho w well yo ur pro gram is running.

Make One Change at a T ime


If yo u make two changes to a pro gram, and the first makes a 10 % impro vement, that's great, right? But if the
seco nd takes perfo rmance down by 25%, the o verall result will be wo rse than tho se o f the unchanged
pro gram. Make yo ur changes individually and metho dically.

T he Best Way is Not Always Obvious


Guido van Ro ssum has yet mo re wisdo m to share with us (I am a fan). In the article we mentio ned abo ve he
presents us with a pro blem: given a list o f integers in the range 0 -127 (these are ASCII values; Pytho n 2 was
current when Guido wro te this), ho w do es o ne create a string in which the characters have the o rdinal values
held in the co rrespo nding po sitio ns in the list o f integers? Guido (I think we have spent eno ugh quality time
with Guido to be o n a first name basis no w) realized that the fastest way to create such a string was to take
advantage o f the array mo dule's ability to create o ne-byte integers; he came up with this co de:
OBSERVE:
import array
def f7(list):
return array.array('B', list).tostring()

When yo u are writing co de, the o bvio us way is the best. To extract maximum perfo rmance the best way is no t
always o bvio us! Did I really say this was a sho rt lesso n? Time flies when we're deep into the Pytho n! Yo u're
do ing really well so far.

When yo u finish the lesso n, do n't fo rget to return to the syllabus and co mplete the ho mewo rk.

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Using Exceptions Wisely
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

identify which exceptio ns are erro rs.


create exceptio ns and raise instances.
use exceptio ns wisely.

Exceptions Are Not (Necessarily) Errors


Raising an exceptio n alters the flo w o f co ntro l in a pro gram. The interpreter no rmally executes statements o ne after the
o ther (with lo o ping to pro vide repetitio n, and co nditio nals to allo w decisio n-making). When an exceptio n is raised,
ho wever, an entirely different mechanism takes o ver. Precisely because it is exceptio nal, we tend to be less familiar
with it, but kno wing ho w exceptio ns are raised and handled can help yo u to pro gram to fo cus o n the main task, in
co nfidence that when exceptio nal co nditio ns do o ccur, they will be handled appro priately. Kno wing ho w, and when, to
use exceptio ns is a part o f yo ur develo pment as a Pytho n pro grammer.

Exceptio ns o ffer such pro gramming co nvenience that we wo uld likely be quite happy to pay a mo dest penalty in
perfo rmance. The happy fact is, tho ugh, that when used judicio usly exceptio ns can actually enhance yo ur pro grams'
perfo rmance as well as making them easier to read.

Specifying Exceptions
Pytho n's built-in exceptio ns are all available (in the built-in namespace, naturally) witho ut any impo rt. There is
an inheritance hierarchy amo ng them. Fro m the Pytho n do cumentatio n:
Pytho n's Built-In Exceptio n Hierarchy

BaseException
+-- SystemExit
+-- KeyboardInterrupt
+-- GeneratorExit
+-- Exception
+-- StopIteration
+-- ArithmeticError
| +-- FloatingPointError
| +-- OverflowError
| +-- ZeroDivisionError
+-- AssertionError
+-- AttributeError
+-- BufferError
+-- EOFError
+-- ImportError
+-- LookupError
| +-- IndexError
| +-- KeyError
+-- MemoryError
+-- NameError
| +-- UnboundLocalError
+-- OSError
| +-- BlockingIOError
| +-- ChildProcessError
| +-- ConnectionError
| | +-- BrokenPipeError
| | +-- ConnectionAbortedError
| | +-- ConnectionRefusedError
| | +-- ConnectionResetError
| +-- FileExistsError
| +-- FileNotFoundError
| +-- InterruptedError
| +-- IsADirectoryError
| +-- NotADirectoryError
| +-- PermissionError
| +-- ProcessLookupError
| +-- TimeoutError
+-- ReferenceError
+-- RuntimeError
| +-- NotImplementedError
+-- SyntaxError
| +-- IndentationError
| +-- TabError
+-- SystemError
+-- TypeError
+-- ValueError
| +-- UnicodeError
| +-- UnicodeDecodeError
| +-- UnicodeEncodeError
| +-- UnicodeTranslateError
+-- Warning
+-- DeprecationWarning
+-- PendingDeprecationWarning
+-- RuntimeWarning
+-- SyntaxWarning
+-- UserWarning
+-- FutureWarning
+-- ImportWarning
+-- UnicodeWarning
+-- BytesWarning
+-- ResourceWarning

Altho ugh everything inherits fro m the Base Exce pt io n class, its first three subclasses (Syst e m Exit ,
Ke ybo ardInt e rrupt and Ge ne rat o rExit ) sho uld no t be caught and handled by regular pro grams under
no rmal circumstances. Abo ut the mo st general specificatio n to catch wo uld no rmally be e xce pt Exce pt io n,
and that wo uld be reserved fo r pro grams such as lo ng-running netwo rk servers o r equipment co ntro l and
mo nito ring applicatio ns.

The full syntax o f the e xce pt clause allo ws yo u to specify no t just a single exceptio n but a who le class o r set
o f them, all to be handled in the same way by the same except clause. When yo u specify an exceptio n class
then, any o f its subclasses will also be caught (unless, that is, the subclass is in an earlier except clause fo r
the same try and therefo re caught already). In o ther wo rds, if yo ur pro gram catches Arit hm e t icErro r, it also
catches Flo at ingPo int Erro r, Ove rf lo wErro r and Z e ro Divisio nErro r. As the next interactive sessio n
sho uld make plain, under so me circumstances the o rdering o f the except clauses will make a difference in
which handler handles the exceptio n.

Where subclasses are co ncerned, except clause o rdering is significant

>>> try:
... raise ZeroDivisionError
... except ArithmeticError:
... print("ArithmeticError")
... except ZeroDivisionError:
... print("ZeroDivisionError")
...
ArithmeticError
>>> try:
... raise ZeroDivisionError
... except ZeroDivisionError:
... print("ZeroDivisionError")
... except ArithmeticError:
... print("ArithmeticError")
...
ZeroDivisionError
>>>

OBSERVE:
try:
raise ZeroDivisionError
except ArithmeticError:
print("ArithmeticError")
except ZeroDivisionError:
print("ZeroDivisionError")

ArithmeticError
try:
raise ZeroDivisionError
except ZeroDivisionError:
print("ZeroDivisionError")
except ArithmeticError:
print("ArithmeticError")

ZeroDivisionError

In the first example, since Zero Divisio nErro r is a subclass o f ArithmeticErro r, the first e xce pt clause is
triggered, and the Z e ro Divisio nErro r is never tested fo r (since the seco nd e xce pt clause was never
evaluated). In the seco nd example, the Z e ro Divisio nErro r is specifically reco gnized because it is tested fo r
befo re the Arit hm e t icErro r.

Creating Exceptions and Raising Instances


If yo u want to create yo ur o wn exceptio ns, simply subclass the built-in Exceptio n class o r o ne o f its already existing
subclasses. Then create instances as required to raise exceptio ns. Yo u may want to include an __init__() metho d o n
yo ur subclass. The standard Exceptio n.__init__() saves the tuple o f po sitio nal arguments to the args attribute, so yo u
can either do the same yo urself o r call Exceptio n.__init__() to do it o n yo ur behalf. Yo ur exceptio ns may at so me stage
be passed to a piece o f co de that expects to find an args instance variable.

Here's an example o f a user-defined exceptio n.


Ho w to Define an Exceptio n [keep this sessio n o pen and re-use it]

>>> class LocalError(Exception):


... def __init__(self, msg):
... self.args = (msg, )
... self.msg = msg
... def __str__(self):
... return self.msg
...
>>> try:
... raise LocalError("Appropriate message")
... except LocalError as e:
... print("Trapped", e)
...
Trapped Appropriate message
>>> raise LocalError
Traceback (most recent call last):
File "<sonsole>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'msg'>>>

This exceptio n class requires an argument when an instantiatio n call is made to create a new instancewitho ut o ne,
the __init__() metho d do es no t receive eno ugh arguments. Yo u can see this happening when the raise Lo calErro r
statement is executed at the end o f the sessio n: when yo u use a class to raise an exceptio n, the interpreter attempts to
create an instance o f that exceptio n by calling the class with no arguments. So the message yo u see has no thing to do
with the exceptio n yo u have tried to raise; it's repo rting the interpreter's inability to create an exceptio n instance
because o f an argument mismatch in the __init__() metho d.

Exceptio n o bjects are generally simplethe mo st they no rmally do is establish attribute values that can be used by the
handler to extract info rmatio n abo ut the exceptio n. Since they are classes, it is po ssible to add co mplex lo gic in
multiple metho ds, but this is no rmally no t do ne. As usual in Pytho n, simplicity is the o rder o f the day.

Understanding the straightfo rward flo w o f co ntro l when an exceptio n is raised in the try suite is relatively easy. It is less
easy to appreciate what happens when exceptio ns o ccur in the except o r finally suites. To lo o k at that, define a functio n
that raises exceptio ns in o ne o f tho se three places, then see what it do es under tho se circumstances.

Create a new PyDev pro ject named Pyt ho n4 _Le sso n0 6 and assign it to the Pyt ho n4 _Le sso ns wo rking set. Then,
in yo ur Pyt ho n4 _Le sso n0 6 /src fo lder, create f xf in.py as sho wn:
CODE TO TYPE: Create the fo llo wing file as fxfin.py

class LocalError(Exception):
def __init__(self, msg):
self.args = (msg, )
self.msg = msg
def __str__(self):
return self.msg

def fxfin(where):
"Demonstrate exceptions in various places."
try:
if where == "try":
raise LocalError("LocalError in try")
raise ValueError("ValueError in try")
except (ValueError, LocalError) as e:
print("Caught", e)
if where == "except":
raise LocalError("LocalError in except")
print("Exception not raised in except")
finally:
print("Running finalization")
if where == "finally":
raise LocalError("LocalError in finally")
print("Exception not raised in finally")

for where in "try", "except", "finally":


print("---- Exception in %s ----" % where)
try:
fxfin(where)
except Exception as e:
print("!!!", e, "raised")
else:
print("+++ No exception raised +++")

Run the pro gram; yo u see the fo llo wing o utput:

Results o f running fxfin.py


---- Exception in try ----
Caught LocalError in try
Exception not raised in except
Running finalization
Exception not raised in finally
+++ No exception raised +++
---- Exception in except ----
Caught ValueError in try
Running finalization
Exception not raised in finally
!!! LocalError in except raised
---- Exception in finally ----
Caught ValueError in try
Exception not raised in except
Running finalization
!!! LocalError in finally raised

When the exceptio n is raised in the try suite, everything is perfectly no rmal and co mprehensible, and bo th the except
and finally handlers run witho ut interruptio n. By the time the finally suite runs the exceptio n has already been fully
handled. The except suite is always activated, but it can be so either by virture o f the parameter value o r because o f the
final explicit exceptio n. This means the except clause is mo re readable. With the "except" argument the handler raises
a seco nd exceptio n. This terminates the except handler, but the finally handler still runs; o nce it is co mplete, the seco nd
exceptio n is still raised fro m the functio n. When the exceptio n is raised in the finally suite, the finally handler do es no t
run to co mpletio n, and the exceptio n is passed up to the surro unding co de (so the traceback is pro duced because o f
an uncaught exceptio n).

No te that when yo u see a traceback fo r the case where an exceptio n is raised during the handling o f an exceptio n that a
seco nd exceptio n o ccurred during the pro cessing o f the first. This info rmatio n may be co nfusing to end users, but can
be invaluable to a pro grammer.

Using Exceptions Wisely


Let's take a lo o k at the byteco des that the CPytho n 3.1 interpreter pro duces fo r a simple functio n with exceptio n
handling.

Different Pytho n interpreters may use entirely different techniques to handle exceptio ns, but the effect
Note sho uld always be the same as in these descriptio ns.

Examine the CPytho n byte co de fo r try/except

>>> import dis


>>> def fex1():
... try:
... a = 1
... except KeyError:
... b = 2
...
>>> dis.dis(fex1)
2 0 SETUP_EXCEPT 10 (to 13)

3 3 LOAD_CONST 1 (1)
6 STORE_FAST 0 (a)
9 POP_BLOCK
10 JUMP_FORWARD 24 (to 37)

4 >> 13 DUP_TOP
14 LOAD_GLOBAL 0 (KeyError)
17 COMPARE_OP 10 (exception match)
20 POP_JUMP_IF_FALSE 36
23 POP_TOP
24 POP_TOP
25 POP_TOP

5 26 LOAD_CONST 2 (2)
29 STORE_FAST 1 (b)
32 POP_EXCEPT
33 JUMP_FORWARD 1 (to 37)
>> 36 END_FINALLY
>> 37 LOAD_CONST 0 (None)
40 RETURN_VALUE
>>>

The interpreter establishes an exceptio n-handling co ntext by po inting at lo catio n 13 as the place to go if an exceptio n
o ccurs (this is what the SETUP EXCEPT o p co de do es). This is fo llo wed by the bo dy o f the try clause. If the try clause
reaches the end, the POP_BLOCK o pco de thro ws away the exceptio n-handling co ntext and the JUMP_FORWARD
sends the interpreter o ff to perfo rm the implicit re t urn No ne that terminates every functio n.

If an exceptio n is raised, ho wever, co ntro l is transferred to lo catio n 13, where the interpreter attempts to match the
exceptio n to the except specificatio ns. If a match is fo und (and after vario us ho usekeeping o peratio ns we will igno re),
line 26 is where the except suite is perfo rmed, after which ano ther JUMP_FORWARD again selects the implicit re t urn
No ne . If no match is fo und fo r the exceptio n, the END_FINALLY ensures that the exceptio n is re-raised to activate any
surro unding exceptio n-handling co ntexts.

The try/except blo cks in yo ur pro gram can be nested lexically (that is, a try/except can be a part o f the try suite o f
ano ther try suite) o r dynamically (that is, a try suite can call a functio n that activates o ne o r mo re try/excepts). When a try
blo ck is nested dynamically, it will be deactivated by terminatio n o f the functio n even if the return statement is in the try
suite or an except suite. The finally suite is always executed, even when the functio n returns fro m an unexpected place.
An explicit return in the finally suite do es no t allo w that suite to run to co mpletio ninstead the return is executed
(o verriding any return value that might have triggered the executio n o f the finally clause).
Exception T imings
So metimes in o ptimizatio n, it's useful to be able to kno w ho w "expensive" it is in time to handle an exceptio n.
With judicio us co ding, yo u can actually save time using exceptio ns, but yo u (as always) need to think abo ut
what yo u are do ing rather than just applying rules blindly. The next interactive sessio n sho ws that it can be
go o d o r bad to rely o n exceptio ns, depending o n the surro unding circumstances.

Exceptio n timings depend o n ho w frequently the exceptio n is raised

>>> def fdct1():


... wdict = {}
... for word in words:
... if word not in wdict:
... wdict[word] = 0
... wdict[word] += 1
...
>>> def fdct2():
... wdict = {}
... for word in words:
... try:
... wdict[word] += 1
... except KeyError:
... wdict[word] = 1
...
>>> from timeit import timeit
>>> words = "the quick brown fox jumps over the lazy dog".split()
>>> timeit("fdct1()", "from __main__ import fdct1")
4.041514158248901
>>> timeit("fdct2()", "from __main__ import fdct2")
6.705680847167969
>>> words = ["same"] * 9
>>> timeit("fdct1()", "from __main__ import fdct1")
2.6857001781463623
>>> timeit("fdct2()", "from __main__ import fdct2")
2.948345899581909
>>>

Here yo u did two sets o f timings, the first with a wo rd list in which there was o nly o ne duplicate, the seco nd
with o ne where every wo rd was the same. Under the fo rmer co nditio ns the specific test fo r wo rd no t in
wdict wo n o ut against raising an exceptio n. In the seco nd case, ho wever, when the exceptio n was rarely
raised, the exceptio n-based so lutio n was at least co mpetitive altho ugh still no t actually faster. Thus, the
o ptimal co de can depend to so me extent o n the data. If yo u have advance info rmatio n abo ut the make-up o f
yo ur data, that's all very well, but if no t, it wo uld be mo re difficult to try and cho o se between appro aches.

The impo rtant thing is no t to run away with the idea that exceptio ns are so meho w intended to be used in
exceptio nal circumstances. If yo ur lo gic is easier to express with exceptio ns, use them. If fo r so me reaso n
yo ur pro gram, o nce wo rking, do es no t wo rk fast eno ugh, yo u can refacto r it (making sure yo u do no t break
any tests) fo r better perfo rmance.

Co nfidence in using exceptio ns to flag abno rmal pro cessing co nditio ns is impo rtant to keep yo ur lo gic simple. Witho ut
exceptio ns, yo u have to have functio ns return sentinel values to indicate that pro blems o ccurred during pro cessing. With them,
yo u can just write the lo gic o f the main task "in a straight line" inside a try clause, and use except to catch exceptio ns that
indicate special pro cessing is required.

When yo u finish the lesso n, do n't fo rget to return to the syllabus and co mplete the ho mewo rk.

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Advanced Uses of Decorators
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

use deco rato r Syntax.


use Classes as Deco rato rs.
use Class Deco rato rs.
emplo y so me o dd Deco rato r tricks.
utilize Static and Class Metho d Deco rato rs.
parameterize Deco rato rs.

When we discussed pro perties, we no ted that yo u can use the deco rato r syntax to apply a functio n to ano ther functio n. In this
lesso n, we'll immerse yo u a little mo re tho ro ughly in the uses o f deco ratio n. It can be difficult to think o f small examples,
ho wever, because deco rato rs are typically written to be applied in large systems witho ut users having to think to o deeply abo ut
it.

Decorator Syntax
Let's jump right in!

Deco rato r Syntax (use the same interactive sessio n thro ugho ut this lesso n)

>>> def trace(f):


... "Decorate a function to print a message before and after execution."
... def traced(*args, **kw):
... "Print message before and after a function call."
... print("Entering", f.__name__)
... result = f(*args, **kw)
... print("Leaving", f.__name__)
... return result
... return traced
...
>>> @trace
... def myfunc(x, a=None):
... "Simply prints a message and arguments."
... print("Inside myfunc")
... print("x:", x, "a:", a)
...
>>> myfunc("ONE", "TWO")
Entering myfunc
Inside myfunc
x: ONE a: TWO
Leaving myfunc
>>>

In the example abo ve, the trace functio n is a deco rato r. That means that it takes a single argument (which is no rmally
the functio n being deco rated). Internally, it defines a functio n traced() that prints o ut a line o f text, calls the deco rated
functio n with whatever arguments it was called with itself, prints o ut ano ther line o f text and then returns the result
o btained fro m the deco rated functio n. Then, trace returns the function it has just defined.

This means that yo u can apply trace() to any functio n, and the result will do just what the o riginal functio n did as well as
printing o ut a line befo re and after the call to the deco rated functio n. This is ho w mo st deco rato rs wo rk (altho ugh as
always there are so me smart peo ple who have fo und no n-standard ways to use deco rato rs that were no t o riginally
intended by the specificatio n). That's why yo u o ften see the internal functio n written to accept any co mbinatio n o f
po sitio nal and keywo rd argumentsit means that the deco rato r can be applied to any functio n, no matter what its
signature.

Remember, the deco rato r syntax is really just an abbreviatio n; it do esn't do anything that yo u co uldn't do witho ut the
syntax. When yo u write @ t race befo re the definitio n fo r myfunc(), it's exactly equivalent to writing m yf unc =
t race (m yf unc) after the functio n definitio n. The syntax was added because with lo nger functio n definitio ns it was
o ften difficult to no tice the reassignment to the name when it fo llo wed the functio n definitio n. The feature was restricted
to functio ns when it was o riginally intro duced, but no w yo u can also deco rate classes. While this is a little bit mo re
co mplicated than deco rating functio ns, it do es have its uses.

Because the abo ve deco rato r defines a functio n that co ntains a call to the deco rated functio n as a part o f its co de
(traced() in the example abo ve), we say that the deco rato r wraps the deco rated functio n. This has certain unfo rtunate
side effects: mo stly, the name o f the functio n appears to change to the name o f the wrapper functio n fro m inside the
deco rato r, and the do cstring is that o f the wrapper.

The deco rated functio n name differs fro m the undeco rated o ne

>>> trace.__name__ # undecorated


'trace'
>>> myfunc.__name__ # decorated
'traced'
>>> myfunc.__doc__
'Print message before and after a function call.'
>>>

Fo rtunately, this issue can be handled using the wraps deco rato r fro m the f unct o o ls library. This is pro vided
precisely to ensure that deco rated functio ns co ntinue to "lo o k like themselves." Until yo u get the hang o f using it,
ho wever, it seems a little weird because it means yo u end up using a decorator on the wrapper function inside your
decorator! But ho nestly, it isn't difficult.

Use functo o ls.wraps to avo id lo ss o f name and do cstring

>>> from functools import wraps


>>> def simpledec(f):
... "A really simple decorator to demonstrate functools.wraps."
... @wraps(f)
... def wrapper(arg):
... print("Calling f with arg", arg)
... return f(arg)
... return wrapper
...
>>> @simpledec
... def f(x):
... "Simply prints its argument."
... print("Inside f, arg is", x)
...
>>> f("Hello")
Calling f with arg Hello
Inside f, arg is Hello
>>> f.__name__
'f'
>>> f.__doc__
'Simply prints its argument.'
>>>

Classes as Decorators
While deco rato rs are usually functio ns, they do n't need to beany callable can be used as a deco rato r. This means
that yo u co uld use a class as a deco rato r, and when the deco ratio n takes place the class's __init__() metho d is called
with the o bject to be deco rated (whether it's a functio n o r a class: no te that a deco rato r is typically designed to deco rate
either functio ns o r classes but no t bo th because they are fairly different in nature).

If yo u want to deco rate a functio n with a class, remember that calling a class calls its __init__() metho d, and returns an
instance o f the class. As always, the first argument to __init__() is self, the newly created instance, so in this case the
functio n that the interpreter passes to the deco rato r will end up as the seco nd argument to __init__(). Since calling the
class creates an instance, and since no rmally yo u want to be able to call the deco rated functio n, the classes yo u use
as deco rato rs sho uld define a __call__() metho d, which will then be called when the deco rated functio n is called.
Classes can be deco rato rs to o !

>>> class ctrace:


... def __init__(self, f):
... "__init__ records the passed function for later use in __call__()."
... self.__doc__ = f.__doc__
... self.__name__ = f.__name__
... self.f = f
... def __call__(self, *args, **kw):
... "Prints a trace line before calling the wrapped function."
... print("Called", self.f.__name__)
... return self.f(*args, **kw)
...
>>> @ctrace
... def simple(x):
... "Just prints arg and returns it."
... print("simple called with", x)
... return x
...
>>> simple("walking")
Called simple
simple called with walking
'walking'
>>> simple.__name__
'simple'
>>> simple.__doc__
'Just prints arg and returns it.'
>>>

By the time the deco rato r is called, the simple() functio n has already been co mpiled, and it is passed to the deco rato r's
__init__() metho d, where it is sto red as an instance variable. To make sure the deco rated functio n retains its name and
do cstring, tho se attributes o f the functio n are also co pied into instance variables with the same names.

Class Decorators
Up until no w, we have deco rated functio ns, but o nce the feature was intro duced into Pytho n, it was o nly a matter o f time
befo re it was extended to classes. So no w yo u can deco rate classes in just the same way as functio ns. The principle
is exactly the same: the deco rato r receives a class as an argument, and (usually) returns a class. Because classes are
mo re co mplicated than functio ns yo u will find it mo st co nvenient to mo dify the class in place and return the mo dified
class as the result o f the deco rato r.

Deco rato rs can be applied individually to the metho ds o f a class. Essentially they are the same as
Note functio ns, and so exactly the same techniques can be used with metho ds as with regular functio ns.

To demo nstrate this, suppo se that yo u want to be able to have each o f the metho ds o f a class print o ut a trace call
during debugging. Yo u co uld simply apply the trace deco rato r abo ve to each metho d, but that wo uld mean extensive
editing fo r a large class when yo u wanted to switch the debugging o ff. It is simpler fo r pro grammers to use a class
deco rato r, so we might well accept a slightly higher level o f co mplexity in the deco rato r to avo id the editing. Once the
interpreter has pro cessed the class definitio n, it calls the deco rato r with the class as its argument, and the deco rato r
can then either create a new class (which is fairly difficult) o r mo dify the class and return it.

Since the interactive sessio n has already defined a simple tracing functio n, we'll use that to wrap each o f the metho ds
in o ur deco rated class. Finding the metho ds is no t as easy as yo u might imagine. It invo lves lo o king thro ugh the
class's __dict__ and finding callable items who se names do no t begin and end with "__" (it's best no t to mess with the
"magic" metho ds). Once such an item is fo und, it is wrapped with the trace() functio n and replaced in the class
__dict__.
Using a class deco rato r to wrap each metho d

>>> def callable(o):


... return hasattr(o, "__call__")
...
>>> def mtrace(cls):
... for key, val in cls.__dict__.items():
... if key.startswith("__") and key.endswith("__") \
... or not callable(val):
... continue
... setattr(cls, key, trace(val))
... print("Wrapped", key)
... return cls
...
>>> @mtrace
... class dull:
... def method1(self, arg):
... print("Method 1 called with arg", arg)
... def method2(self, arg):
... print("Method 2 called with arg", arg)
...
Wrapped method2
Wrapped method1
>>> d = dull()
>>> d.method1("Hello")
Entering method1
Method 1 called with arg Hello
Leaving method1
>>> d.method2("Goodbye")
Entering method2
Method 2 called with arg Goodbye
Leaving method2
>>>

The __dict__ o f a class (as o ppo sed to that o f an instance) isn't a plain dict like the o nes yo u kno w. It is
actually an o bject called a dict_pro xy. To keep them as lightweight as po ssible, they do no t directly
Note suppo rt item assignment like a standard dict do es. This is why, in the mtrace() functio n, the wrapped
metho d replaces the o riginal versio n by using the setattr() built-in functio n.

The callable() functio n was present by accident in 3.0 . The develo pers had intended to remo ve it, thinking
that it co uld easily be replaced by hasattr(o bj "__call__"). Co nsequently it was remo ved fro m Pytho n 3.1.
Note It was then reinstated in Pytho n 3.2 when so me develo pers po inted o ut that a mo re specific versio n co uld
be written in C with full access to the o bject structures.

As yo u can see, when yo u call metho d1() and metho d2(), they print o ut the standard "befo re and after" trace lines,
because they are no w wrapped by the trace() functio n.

Odd Decorator T ricks


So metimes yo u do n't want to wrap the functio n: instead yo u want to alter it in so me o ther way, such as adding
attributes (yes, yo u can add attributes to functio ns the same way as yo u can to mo st o f the o ther o bjects in Pytho n). In
that case, the deco rato r simply returns the functio n that is passed in as an argument, having mo dified the functio n in
whatever ways it needs to . So next we'll write a deco rato r that flags a functio n as part o f a framewo rk by adding a
"framewo rk" attribute.
Using a deco rato r to add attributes rather than wrapping a functio n

>>> def framework(f):


... f.framework = True
... f.author = "Myself"
... return f
...
>>> @framework
... def somefunc(x):
... pass
...
>>> somefunc.framework
True
>>> somefunc.author
'Myself'
>>>

No te that the deco rato r do es still return a functio n, but since there is no need to wrap the deco rated functio n it simply
returns the functio n that it was passed (no w resplendent with new attributes). Since this avo ids a seco nd functio n call, it
will be slightly quicker and there is no need to use functo o ls.wraps because the functio n is no t being wrapped.

Static and Class Method Decorators


Pytho n includes two built-in functio ns that are intended fo r use in deco rating metho ds. The staticmetho d() functio n
mo difies a metho d so that the special behavio r o f pro viding the instance as an implicit first argument is no lo nger
applied. In fact, the metho d can be called o n either an instance o r the class itself, and it will receive o nly the arguments
explicitly pro vided to the call. It beco mes a static method. Yo u can think o f static metho ds as being functio ns that do n't
need any info rmatio n fro m either their class o r their instance, so they do no t need a reference to it. Such functio ns are
relatively infrequently seen in the wild.

If yo u want to write a metho d that relies o n data fro m the class (class variables are a co mmo n way to share data
amo ng the vario us instances o f the class) but do es no t need any data fro m the specific instance, yo u sho uld deco rate
the metho d with the classmetho d() functio n to create a class method. Like static metho ds, class metho ds can be called
o n either the class o r an instance o f the class. The difference is that the calls to a class metho d do receive an implicit
first argument. Unlike a standard metho d call, tho ugh, this first argument is the class that the metho d was defined o n
rather than the instance it was called o n. The co nventio nal name fo r this argument is cls, which makes it mo re o bvio us
that yo u are dealing with a class metho d.

Yo u may well ask what static and class metho ds are fo rwhy use them when we already have standard metho ds that
are perfectly satisfacto ry fo r mo st purpo ses? Why no t just use functio ns instead o f static metho ds, since no additio nal
arguments are pro vided? The answer to this questio n lies in the fact that these functio ns are metho ds o f a class, and
so will be inherited (and can be o verridden o r extended) by any subclasses yo u may define. Further, the instances o f
the class can reference class variables rather than using a glo balthis is always safer because there is no guarantee,
when yo ur co de lands in so meo ne else's pro gram, that their co de isn't using the same glo bal name fo r so me o ther
purpo se. It is difficult to think o f any example where the use o f a classmetho d wo uld be abso lutely required, but
so metimes it can simplify yo ur design a little.

A typical applicatio n fo r class metho ds has each o f the instances using co nfiguratio n data that is co mmo n to all, and
saved in the class. If yo u pro vide metho ds to alter the co nfiguratio n data (fo r example, changing the frequency a
wireless transmitter wo rks o n, o r changing the functio n that the instances call to allo cate reso urces), they do no t need
to reference any o f the instances, so a class metho d wo uld be ideal.

Parameterizing Decorators
So metimes yo u want to write a deco rato r that takes parameters. Remember, tho ugh, that the deco rato r syntax requires
a callable that takes precisely o ne argument (the class o r functio n to be deco rated). So if yo u want to parameterize a
deco rato r, yo u have to do so "at o ne remo ve"the functio n that takes the arguments has to return a functio n that takes
o ne argument and returns the deco rated o bject. This can be a little brain-twisting, so an example may help. Or, it may
just make yo ur head explo de!

Suppo se that yo u wanted to have yo ur pro gram reco rd the number o f calls that are made to each o f several different
types o f functio n. When yo u define a functio n, yo u want to give a parameter to the deco rato r to specify the classificatio n
o f the deco rated functio n.
Required deco rato r syntax to co unt functio n f as a 'special' functio n
@countable('special')
def f(...):
...

In o ther wo rds, @co untable('special') has to return a functio n that is a co nventio nal deco rato rit takes a single
functio n as an argument and returns the deco rated versio n o f the functio n as its result. This means that we need to
nest functio ns three levels deep! We will use a glo bal variable to sto re a dict, and the different functio n-type strings will
be the keys. Here we go !
Using a parameterized deco rato r

>>> counts = {}
>>> def countable(ftype):
... "Returns a decorator that counts each call of a function against ftype."
... def decorator(f):
... "Decorates a function and to count each call."
... def wrapper(*args, **kw):
... "Counts every call as being of the given type."
... try:
... counts[ftype] += 1
... except KeyError:
... counts[ftype] = 1
... return f(*args, **kw)
... return wrapper
... return decorator
...
>>> @countable("short")
... def f1(a, b=None):
... print("f1 called with", a, b)
...
>>> @countable("f2")
... def f2():
... print("f2 called")
...
>>> @countable("short")
... def f3(*args, **kw):
... print("f3 called:", args, kw)
...
>>> for i in range(10):
... f1(1)
... f2()
... f3(i, i*i, a=i)
...
f1 called with 1 None
f2 called
f3 called: (0, 0) {'a': 0}
f1 called with 1 None
f2 called
f3 called: (1, 1) {'a': 1}
f1 called with 1 None
f2 called
f3 called: (2, 4) {'a': 2}
f1 called with 1 None
f2 called
f3 called: (3, 9) {'a': 3}
f1 called with 1 None
f2 called
f3 called: (4, 16) {'a': 4}
f1 called with 1 None
f2 called
f3 called: (5, 25) {'a': 5}
f1 called with 1 None
f2 called
f3 called: (6, 36) {'a': 6}
f1 called with 1 None
f2 called
f3 called: (7, 49) {'a': 7}
f1 called with 1 None
f2 called
f3 called: (8, 64) {'a': 8}
f1 called with 1 None
f2 called
f3 called: (9, 81) {'a': 9}
>>> for k in sorted(counts.keys()):
... print(k, ":", counts[k])
...
f2 : 10
short : 20
>>>

As yo u can see, f1 and f3 are classified as "sho rt", while f2 is classified as "f2." Every time a @co untable functio n is
called, o ne is added to the co unt fo r its catego ry. There were 30 functio n calls in all, 20 to catego ry "sho rt" (f1 and f3).
Calling co untable() returns a deco rato r who se actio n is to add o ne to the co unt identified by its argument. Yo ur co de
defines a functio n (co untable()) that defines a functio n (deco rato r(), which is a deco rato r, that defines a functio n
(wrapper) that wraps the functio n f pro vided as an argument to de co rat o r, which was pro duced by calling co unt able .
This is pro bably abo ut as far as anyo ne wants to go with deco rato rs (and a little bit further than mo st).

In this survey o f deco rato rs, yo u can appreciate that deco rato rs enable yo u to perfo rm arbitrary manipulatio ns o f the functio ns
and classes that yo u write as yo u write them. Deco rato rs can, o f co urse, also be used (tho ugh witho ut the deco rato r syntax),
tho ugh yo u sho uld exercise extreme cautio n in do ing so . This practice, when applied to "black bo x" co de (co de fo r which yo u
have no co urse, and no kno wledge o f internal structure) is called "mo nkey patching", and is no t generally well regarded as a
pro ductio n technique. But it can be valuable during experimentatio n.

When yo u finish the lesso n, do n't fo rget to return to the syllabus and co mplete the ho mewo rk.

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Advanced Generators
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

explain what generato rs represent.


use infinite sequences.
use the Iterto o ls Mo dule.
use Generato r Expressio ns.

What Generators Represent


Generato rs were added to Pytho n to allo w co mputatio n with sequences witho ut having to actually build a data
structure to ho ld the values o f the sequence. This can yield large savings in memo ry. Earlier yo u saw that generato rs
o bey the same iteratio n pro to co l that o ther iterato rs do , and that yo u can write generato r functio ns and generato r
expressio ns to avo id the creatio n o f such sequences.

Yo u can also use generato rs as "filters," to remo ve so me o f the values fro m an input sequence. The general pattern o f
such a filter is:

The General Fo rm o f a Sequence Filter

def filter(s):
for v in s:
if some_condition_on(v):
yield v

This technique can easily be used to "stack" filters, by pro viding o ne filter as the argument to ano ther. To demo nstrate
this technique, suppo se that yo u wanted to examine a file, igno ring blank lines and lines beginning with a "#." While
there are several ways to do this, it wo uld be fairly simple to use generato rs (remembering that text files are generato rs
to o , in Pytho n). Create a Pyt ho n4 _Le sso n0 8 pro ject and assign it to yo ur Pyt ho n4 _Le sso ns wo rking set. Then, in
the Pyt ho n4 _Le sso n0 8 pro ject, create f ilt e rf ile .py as fo llo ws.

filterfile.py: Using generato rs to filter the co ntents o f a file

"""
Filter file contents using a sequence of generators.
"""
def nocomment(f):
"Generate the non-comment lines of a file."
for line in f:
if not line.startswith("#"):
yield line

def nospaces(f):
"Generate the lines of a file without leading or trailing spaces."
for line in f:
yield line.strip()

def noblanks(f):
"Generate the non-blank lines of a file."
for line in f:
if line:
yield line

if __name__ == "__main__":
for line in nocomment(noblanks(nospaces(open("py08-01.txt")))):
print(line)

No w, create py0 8-0 1.t xt as sho wn:


CODE TO TYPE: py0 8 -0 1.txt

# Excluded because a comment.


# This is also a comment, and the next two lines are blank.

This line should be the first of four lines in the output.


# The next line contains spaces and tabs, and should not appear.

And this should be the second.

# This should not appear (leading spaces but a comment).


# Neither should this (leading tabs but a comment).
This should be the third line of output.

And this should be the last.

Save the files and run filterfile.py:

Expected o utput fro m the filterfile.py

This line should be the first of four lines in the output.


And this should be the second.
This should be the third line of output.
And this should be the last.

The essence o f this pro gram is the fo r lo o p guarded by the if __nam e __ == " __m ain__" : co nditio n. o pe n(" py0 8-
0 1.t xt " ) is used to generate the raw text lines fro m the file, then the no space s() generato r strips the spaces fro m the
lines, after which the no blanks() generato r remo ves blank lines, and then finally the no co mment() generato r yields o nly
the lines that aren't co mments.

Each individual filter perfo rms a very simple task, but used in co mbinatio n they can be much mo re po werful. (This is
the philo so phy behind the UNIX o perating system, by the way: pro vide simple primitive co mmands but allo w them to
be co mbined to gether to create mo re po werful co mmands).

Uses of Infinite Sequences


Yo u can never create all the values o f an infinite sequence. With a generato r, yo u can generate as many members o f a
sequence o f indefinite length as yo u like, which is useful when yo u do no t kno w in advance ho w many values will be
required. This can o ccur, fo r example, when yo u need to generate a value fo r each member o f a sequence o f unkno wn
length. Such requirements can arise in many co ntextswhen the user is entering a series o f values, when yo u are
pro cessing the o utput o f ano ther generato r, and so o n. (The o ne majo r advantage o f sequences o ver generato rs is
that yo u can aways find o ut ho w many elements they co ntain.)

This is the result o f generato rs' "lazy evaluatio n"the values are no t all pro duced first and then co nsumed by the client
co de. Instead, when ano ther value fo r the sequence is required, the generato r pro duces it, and is then suspended
(retaining the values o f all lo cal variables fro m the functio n call) until it is resumed to pro duce the next value in the
sequence. So as lo ng as the client co de eventually sto ps asking fo r values, there really is no pro blem with an infinite
generato r. Just do n't expect it to ever pro duce all its valuesthat wo uld take an infinite amo unt o f time!

T he Itertools Module
Once generato rs and generato r expressio ns were intro duced into the language, iteratio n became a fo cus fo r
develo pment. This led to the intro ductio n o f the it e rt o o ls mo dule, first released with Pytho n 2.3. iterto o ls co ntains
many useful functio ns to o perate o n generato rs and sequences. The algo rithms are implemented in C, and so they run
a lo t faster than pure-Pytho n equivalents. When yo u lo o k at the Pytho n do cumentatio n fo r the mo dule, ho wever, yo u
will find that many o f the functio ns are do cumented to include bro adly-equivalent Pytho n to explain them mo re fully.

It's impo rtant to remember that generato rs are a "o ne-sho t deal": o nce data is co nsumed, it isn't po ssible to go back
and retrieve that data again. Therefo re, mo st o f the o peratio ns yo u perfo rm o n generated sequences are no t
repeatable, unlike o peratio ns o n tuples, lists, and strings.

itertools.tee: duplicating generators


t e e takes two arguments: the first is a generato r and the seco nd is a co unt (2, if no t specified). The result is
the given number o f generato rs that can be used independently o f each o ther.

Because the resulting generato rs can be used independently, the implementatio n must sto re
any values that have been co nsumed fro m o ne o f the result generato rs but no t fro m all the
Note o thers. Co nsequently, if yo ur co de co nsumes mo st o f the values fro m o ne o f the result
generato rs befo re the rest, yo u may find it mo re efficient to simply co nstruct a list and use
multiple iteratio ns o ver that.

In yo ur Pyt ho n4 _Le sso n0 8/src fo lder, create t e e sam p.py as sho wn:

teesamp.py: Tee a generato r to simplify pro gram lo gic


"""
Demonstrate simple use of itertools.tee.
"""
import itertools

actions = "save", "delete"


data = ["file1.py", "file2.py", "save", "file3.py", "file4.py",
"delete", "file5.py", "save", "file6.py",
"file7.py", "file8.py", "file9.py", "save"]
saved = []
deleted = []

def datagen(d):
"A 'toy' data generator using static data"
for item in d:
yield item

commands, files = itertools.tee(datagen(data))


for action in commands:
if action in actions:
for file in files:
if file == action:
break
if action == "save":
saved.append(file)
elif action == "delete":
deleted.append(file)
print("Saved:", ", ".join(saved))
print("Deleted:", ", ".join(deleted))

The pro gram tees a single data so urce co ntaining filenames and co mmands into two separate generato rs. It
then iterates o ver the first generato r until it finds a co mmand. Then, it iterates o ver the seco nd generato r,
perfo rming the requested actio n o n the files it retrieves until it "catches up" with the first generato r (detected
because the co mmand is seen). This avo ids the need to save the filenames in an ancillary list until the
pro gram kno ws what to do with them.

Save and run it:

Results expected fro m teesamp.py


Saved: file1.py, file2.py, file5.py, file6.py, file7.py, file8.py, file9.py
Deleted: file3.py, file4.py

itertools.chain() and itertools.islice(): Concatenating Sequences and


Slicing Generators Like Lists
The chain() functio n can be called with any number o f sequences as arguments. It yields all the elements o f
the first sequence, fo llo wed by all the elements o f the seco nd sequence, and so o n until the last sequence
argument is exhausted.

It isn't po ssible to subscript a generato r like it is a sequence such as a list o r a tuple, because subscripting
requires all the elements o f a sequence to be in memo ry at the same time. So metimes, ho wever, yo u need to
select elements fro m a generated sequence in much the same way yo u do fo r an in-memo ry sequence. The
iterto o ls mo dule allo ws yo u to do this with its islice functio n..

It takes up to fo ur arguments: (seq, [start,] sto p [, step]). If o nly two arguments are pro vided, the seco nd
argument is the length o f the slice to be generated, starting at the beginning o f the sequence. When three
arguments are pro vided, the seco nd argument M is the index o f the starting element and the third argument N
is the index o f the element after the last o ne in the result. This clo sely parallels the seq[M:N] o f standard
sequence slicing. Finally, when all fo ur arguments are present, the last argument is a "stride", which
determines the gap between selected elements. As mentio ned abo ve, slicing o peratio ns o n generated
sequences will no t be repeatable because the o peratio n co nsumes data fro m the sequence, and each value
can be pro duced o nly o nce.

The fo llo wing interactive example demo nstrates the use o f chaining and slicing o n generated sequences.

Using sequence chaining and slicing

>>> import itertools


>>> s1 = (1, 3, 5, 7, 11)
>>> s2 = ['one', 'two', 'three', 'four']
>>> def sqq(n):
... for i in range(n):
... yield i*i
...
>>> s3 = sqq(10)
>>>
>>> input = itertools.chain(s1, s2, s3)
>>> list(itertools.islice(input, 2, 7, 2))
[5, 11, 'two']
>>> list(itertools.islice(input, 3))
['three', 'four', 0]
>>>

It is impo rtant here to o bserve that the seco nd o peratio n o n the chained sequences starts with the first
element not consumed by the previous operation.

itertools.count(), itertools.cycle() and itertools.repeat()


These three functio ns pro vide co nvenient infinite seqences fo r use in o ther co ntexts. co unt (st art =0 ,
st e p=1) generates a sequence starting with the value o f its st art argument and incremented by the step
amo unt (with a default o f 1) fo r each call. cycle (i) takes an iterable argument i and yields each o ne until the
sequence is exhausted, whereupo n it returns to the start o f the sequence and starts again. re pe at (x) simply
yields its argument x every time a value is requested.

itertools.dropwhile() and itertools.takewhile()


So metimes yo u o nly want to deal with the end o f a sequence, and so metimes yo u o nly want to deal with the
beginning. These functio ns allo w yo u to do so by pro viding a predicate function that is used to determine
when to start o r sto p yielding elements. The functio n is applied to successive values in the sequence. In the
case o f dro pwhile (), elements are discarded until o ne is fo und fo r which the functio n returns False, after
which the remaining values are yielded witho ut testing them. t ake while (), o n the o ther hand, returns
elements o f the sequence until it enco unters o ne fo r which the functio n returns False, at which po int it
immediately raises a Sto pIteratio n exceptio n.

Yo u can learn a little mo re abo ut these functio ns in an interactive co nso le sessio n.


Experimenting with dro pwhile() and takewhile()

>>> import itertools


>>> def lt5(n):
... return n<5
...
>>> s1 = [1, 3, 2, 4, 6, 4, 2, 3, 1]
>>> list(itertools.dropwhile(lt5, s1))
[6, 4, 2, 3, 1]
>>> list(itertools.takewhile(lt5, s1))
[1, 3, 2, 4]
>>>

Fo r any functio n f and sequence s:

list (t ake while (f , s)) + list (dro pwhile (f , s)) == list (s)

The two functio ns are therefo re co mplementary in nature.

This has "scratched the surface" o f the iterto o ls mo dule, but there is plenty mo re to reward yo ur reading o f its
do cumentatio n sho uld yo u feel so inclined.

Generator Expressions
In the same way that list co mprehensio ns o ffer a mo re succinct way to create lists, generato r expressio ns help yo u to
use generato rs witho ut having to write a generato r functio n.

Since list and tuple creatio n is relatively fast in Pytho n, yo u will pro bably find that yo u have to be wo rking with fairly
large data sets in o rder to see co mpelling advantages fo r generato rs o ver lists. Try it with so me sample rando m data
to get a feel fo r the relative speed o f lists. In this example, we'll sum a bunch o f numbers fro m a list o f rando m
numbers between 0 and 1 in two ways: the first sums the values using a generato r expressio n, the seco nd creates a
list and sums that.

The lists get so large it is entirely po ssible that there is no t eno ugh memo ry to create the larger o nes. In
that case, yo u may see Me m o ryErro r exceptio ns such as the o ne demo nstrated belo w (this particular
run was made o n a testing machine with limits o n the amo unt o f memo ry o ne pro cess can use, so yo u
may no t see the exceptio n because yo u are using better-reso urced pro ductio n machines in yo ur lab
Note sessio ns). When yo u are finished with the interactive sessio n, yo u sho uld terminate the co nso le with the

butto n and start a new co nso le sessio n. Once an interpreter pro cess has suffered a memo ry erro r, it
may no t be able to reclaim all that memo ry, so it is best to start again.
Test the relative speed o f lists and generato r expressio ns

>>> from random import random


>>> from timeit import timeit
>>> for i in (10000, 100000, 1000000, 10000000, 20000000, 50000000):
... lst = [random() for j in range(i)]
... print("Length", i)
... print(timeit("sum(x+1 for x in lst)", "from __main__ import lst", number=1))
... print(timeit("sum([x+1 for x in lst])", "from __main__ import lst", number=1))
...
Length 10000
0.0032087877090524073
0.0031928638975065268
Length 100000
0.032067762802255595
0.03326847406583798
Length 1000000
0.18962521773018637
0.2972891806081499
Length 10000000
2.405814984865395
2.7992426411736684
Length 20000000
4.417569830802519
5.341360144934622
Length 50000000
10.820288612100143
Traceback (most recent call last):
File "<console>", line 5, in <module>
File "C:\Python\lib\timeit.py", line 213, in timeit
return Timer(stmt, setup, timer).timeit(number)
File "C:\Python\lib\timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "<timeit-src>", line 6, in <listcomp>
MemoryError
>>>

The co de uses t im e it 's num be r argument to ensure that o nly o ne timed o peratio n o f the sample co de is run. This
means that the timings are no t necessary repeatable, but are at least indicative o f the relative times o f the different
o peratio ns. It seems that, the lo nger the sequence, the mo re impro vement yo u can expect to see fro m using a
generato r expressio n. Fo r co mpariso n with the timings o n Windo ws, here is the o utput fro m a MacOS machine (with
mo re memo ry) running the same co de in a new Pytho n co nso le sessio n.

Results o f the same test o n a different machine


Length 10000
0.00169491767883
0.00142598152161
Length 100000
0.017655134201
0.0198609828949
Length 1000000
0.18835401535
0.206699848175
Length 10000000
1.77904486656
2.16294407845
Length 20000000
3.62438511848
4.16168618202
Length 50000000
9.03414511681
76.5883550644
>>>
Yo u can see that there is sufficient memo ry fo r this co mputer to create the larger lists. While the perfo rmance o f the list-
based technique and the generato r expressio ns are the same, the difference do es no t seem to be quite as marked.
These tests were run o n a different o perating system, which may have so mething to do with it. No te that with fifty
millio n elements in the last test iteratio n, the creatio n o f the list starts to add large o verhead, and the generato r
expressio n is markedly faster.

Yo u have already co me acro ss list co mprehensio ns such as [x*x f o r x in se que nce ]. Yo u can, if yo u want, think o f
list co mprehensio ns as generato r expressio ns surro unded by list brackets. The brackets tell the interpreter that it is
required to create a list, so it runs the generato r to exhaustio n and adds each element to a newly-created list. There is
no essential difference between the expressio n abo ve and list (x*x f o r x in se que nce ), but the latter do es seem to
be abo ut 25% slo wer o n implementatio ns current at the time o f writing, whether the sequence is a list o r a generato r
functio n.

Generato rs, while a relatively late additio n to the Pytho n language, are rapidly beco ming an essential part o f it. When yo u are
dealing with large data sets, a go o d co mmand o f generato rs can make all the difference between a slo w pro gram and a fast
o ne. It is therefo re impo rtant to be aware o f their po ssibilities. This is no t to o difficult, o nce yo u realise that they are o ften simply
a faster and mo re efficient way to handle data.

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Uses of Introspection
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

explain 'Intro spectio n.'


attribute Handling Functio ns.
use Intro spectio n.
use the Inspect Mo dule.

T he Meaning of 'Introspection'
The wo rd "intro spectio n" means "lo o king inside." Intro spective peo ple are o nes who think abo ut themselves, usually
to increase self-understanding. In Pytho n, intro spectio n is a way that yo ur pro grams can learn abo ut the enviro nment
in which they o perate and the pro perties o f the mo dules they impo rt.

Yo u have already learned abo ut several o f Pytho n's intro spectio n mechanisms. The built-in dir() functio n, fo r example,
attempts to return (to quo te fro m the do cumentatio n) "an interesting set o f names"meaning the names o f attributes
accessible fro m the o bject passed as an argument. If no argument is passed, it returns the attributes fo und in the
current lo cal namespace.

dir() in Pytho n 3.x has a ho o k that lo o ks fo r a __dir__() metho d o n its argument. If such a metho d is present, it is called
and dir() returns what the metho d returns. This allo ws yo u to determine what users see abo ut yo ur o bject, and this can
be useful if yo u are using "virtual" attributes (that is, if yo ur o bjects handle access to metho ds that do no t appear in the
class's __dict__). If no __dir__() metho d is fo und, dir() uses a standard mechanism to co mpo se its result after
examining its argument.

Some Simple Introspection Examples


x.__class__.__nam e __ will tell yo u the name o f an o bject's class (and is much mo re reliable than trying to
analyze a repr() string):

The Right and Wro ng Way to Extract a Class Name

>>> class Something:


... pass
...
>>> s = Something()
>>> s
<__main__.Something object at 0x10063be50>
>>> repr(s)[1:-1].split()[0].split(".")[1] # WRONG!
'Something'
>>> s.__class__.__name__ # RIGHT (AND SO MUCH EASIER)
'Something'
>>> repr(4)[1:-1].split()[0].split(".")[1] # Fail
Traceback (most recent call last):
File "<console>", line 1, in <module>
IndexError: list index out of range
>>> 4.__class__.__name__
File "<console>", line 1
4.__class__.__name__
^
SyntaxError: invalid syntax
>>> (4).__class__.__name__ # SUCCEED
'int'
>>> str(type(4))[1:-1].split()[1][1:-1] # Way too complex
'int'
>>> str(type(s))[1:-1].split()[1][1:-1] # And only give same result for built-in
s (see s above)
'__main__.Something'
The failed attempt to extract the class name fro m the integer 4's repr() string sho ws just ho w fragile the
"wro ng" metho d is: it applies o nly to o bjects with a very specific representatio n. When handed an int instance
it explo des, raising an exceptio n. The syntax erro r o ccurred because the interpreter to o k the perio d (".") to be
part o f a number, and then co uld no t understand why it was fo llo wed by an identifier. Putting the (4) in
parentheses allo ws the lexical analysis ro utines to parse things co rrectly, and we see that the class name is
available fro m built-in classes just as it is o n self-declared o nes. If yo u find yo urself writing co de like the first
and last examples, yo u sho uld questio n whether there isn't a better way: Pytho n is designed to avo id the
need fo r such co nto rtio ns.

so m e _o bje ct .__do c__ can be useful, but if things are pro perly written, yo u'll get better presentatio n fro m
he lp(so m e _o bje ct ), which is designed to print necessary do cumentatio n in a legible way.

Attribute Handling Functions


If yo u to o k earlier co urses in this Certificate Series (o r o therwise po ssibly fro m private study) yo u've enco untered the
getattr(o bj), setattr(o bj), and delattr(o bj) functio ns, and learned that they result in a call to their argument o bj's
__getattr__(), __setattr__(), and __delattr__() metho ds. There is also the hasattr() predicate, which can be used to
determine whether o r no t a given attribute is present in an o bject. There is, ho wever, no co rrespo nding __hasattr__()
metho d. Yo u might wo nder what hasattr() do es to find o ut what value to return, and the answer to that questio n is
co mplex eno ugh to have received the attentio n o f so me o f the best minds in Pytho n.

Witho ut go ing to o deeply into the internals, it is fairly easy fo r yo u to determine whether o r no t __getattr__() gets called
by hasattr() under at least so me circumstances. Yo u simply write a class who se instances repo rt calls o f their
__getattr__() metho d, and then call hasattr() o n an instance:

INTERACTIVE SESSION:

>>> class X:
... def __getattr__(self, name):
... print("getattr", name)
... return 0
...
>>> x = X()
>>> hasattr(x, "thing")
getattr thing
True
>>>

hasat t r(o bj, " __call__" ) can be used to tell yo u whether o r no t o bj can be called like a functio n. Older versio ns o f
Pytho n pro vide a callable() built-in functio n, which sho uld have been remo ved in Pytho n 3.0 because the given test is
no w all that is requiredeverything callable has a __call__ attribute. Its deletio n was o mitted in erro r fo r the 3.0
release, with the result that callable is available fo r that release. It was then remo ved fro m 3.1 (the versio n in use when
this co urse was being written), but has returned in 3.2 because the abo ve test turns o ut no t to be quite as specific as
the versio n that can be written in C with full access to the o bject structures. Being able to determine the presence o r
absence o f a particular attribute is o ccasio nally useful in o ther co ntexts.

Yo u sho uld avo id writing co de where "to o much" (a judgment call) o f the lo gic depends o n the presence
Note o r absence o f specific attributes, unless yo u are writing deliberately intro spective co de as part o f a
framewo rk o r library.

Of co urse yo u can implement who le "virtual namespaces" within yo ur o wn o bjects by using getattr() and setattr(), but
remember that these functio ns can also be used (assuming yo u can gain access to the required namespaces) to
mo dify yo ur current enviro nment. Understand that do ing so in this way is no t reco mmended except in rather extreme
cases, because it results in "magical" changeschanges who se o rigin is difficult o r impo ssible to discern by reading
the pro gram co de:
'Magical' changes to the mo dule's namespace

>>> import sys


>>> __name__
'__main__'
>>> module = sys.modules[__name__]
>>> a
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'a' is not defined
>>> setattr(module, "a", 42)
>>> a
42
>>>

Befo re the setattr() call, there was no "a" defined in the mo dule's namespace. Since all impo rted mo dules are available
under their natural names fro m sys.m o dule s, yo u can access the current mo dule's namespace by lo o king it up.

If it were po ssible to subclass the mo dule o bject to change its attribute access metho ds, we co uld be faced with so me
extremely hard-to -understand co de! Fo rtunately this is no t so mething yo u need to wo rry abo ut in practice. Mo st o f the
co de yo u will enco unter do es no t use such tricks (indeed, the Django framewo rk mentio ned earlier had a perio d in its
develo pment devo ted to "magic remo val" to make the co de easier fo r Pytho n pro grammers and beginners to
understand, and pro vide a framewo rk that was less brittle).

What Use is Introspection?


Framewo rks use intro spectio n frequently, to disco ver the capabilities o f o bjects the user has passed; fo r example,
"do es this o bject's class have a so m e t hing() metho d? If so , call the o bject's do _so m e t hing(); o therwise call the
do _so m e t hing_sim ilar() framewo rk functio n with the o bject as an argument." So me built-in functio ns also do this
kind o f intro spectio n. The dir() built-in mentio ned abo ve returns the result o f the argument o bject's __dir__() metho d if it
has o ne; o therwise it uses built-in functio nality to pro vide an "interesting" set o f names (the result is no t defined mo re
clearly than that anywhere in the co de).

A framework is an enviro nment that pro vides a wealth o f facilities to pro grammers. Yo u can think o f it as
being like an "o perating system fo r a particular type o f pro gramming task." The users o f framewo rks are
Note generally applicatio n pro grammers, using the framewo rk (fo r example, Django o r Tkinter) to build a
particular type o f applicatio n (in Django 's case, they wo uld be web applicatio ns; in Tkinter's case, they
wo uld be windo wed applicatio ns).

T he Inspect module
This mo dule allo ws yo u to dig as deep as yo u ever need to in terms o f intro spectio n. It pro vides many functio ns by
which yo u can determine the pro perties o f o bjects, incuding sixteen predicates that allo w yo u to easily determine
whether an o bject is o f a particular type.

T he getmembers() Function
inspe ct .ge t m e m be rs(o bj[, pre dicat e ]) returns a list o f two -element (name, value) tuples. If yo u pro vide a
seco nd argument, it is called with the value as its o nly argument and the item o nly appears in the resulting list
if the result is True. This makes the predicates mentio ned in the last paragraph very useful if yo u are o nly
interested in o bjects o f a particular type. Fo llo wing are so me special attributes especially wo rth kno wing
abo ut (co lumns to the right explain which attributes yo u can expect to see o n five given types o f o bject).

Built -
At t ribut e Purpo se Mo dule Class Me t ho d Funct io n
in
__do c__ Do cumentatio n string
Path to the file fro m which the o bject was
__file__
lo aded
Name o f mo dule in which the o bject was
__mo dule__
impo rted
__name__ Name o f o bject
__func__ The implementatio n o f the metho d
Instance to which this metho d is bo und (o r
__self__
No ne )
__co de__ Co de o bject co ntaining functio n's byteco de
__defaults__ Do cumentatio n string
__glo bals__ Do cumentatio n string

The predicates that yo u can use with ge t m e m be r() are:

Pre dicat e nam e Purpo se


ismo dule(x) Returns True if x is a mo dule.
isclass(x) Returns True if x is a class, whether built-in o r user-defined.
ismetho d(x) Returns True if x is a bo und metho d written in Pytho n.
Returns True if x is a functio n (including functio ns created by lambda
isfunctio n(x)
expressio ns).
isgenerato rfunctio n(x) Returns True if x is a Pytho n generato r functio n.
isgenerato r(x) Returns True if x is a generato r.
istraceback(x) Returns True if x is a traceback object (created when an exceptio n is handled).
isframe(x) Returns True if x is a stack frame (can be used to debug co de interactively).
isco de(x) Returns True if x is a code object.
isbuiltin(x) Returns True if x is a built-in functio n o r a bo und built-in metho d.
isro utine(x) Returns True if x is a user-defined o r built-in functio n o r metho d.
Returns True if x is an abstract base class (o ne meant to be inherited fro m rather
isabstract(x)
than instantiated).
Returns True if x is a metho d descripto r unless ismetho d(x), isclass(x),
ismetho ddescripto r(x)
isfunctio n(x) o r isbuiltin(x) is True.
Returns True if x is a data descriptor (has bo th a __ge t __() and a __se t __()
isdatadescripto r(x)
metho d).
isgetsetdescripto r(x) Returns True if x is a getsetdescripto rthese are used in extensio n mo dules.
ismemberdescripto r(x) Returns True if x is a member descripto rthese are used in extensio n mo dules.

The seco nd argument to inspect.getmembers() allo ws yo u to access members o f a particular type easily:
Experimenting with getmembers()

>>> import inspect


>>> from smtplib import SMTP
>>> from pprint import pprint
>>> pprint(inspect.getmembers(SMTP))
[('__class__', <class 'type'>),
('__delattr__', <slot wrapper '__delattr__' of 'object' objects>),
('__dict__', <dict_proxy object at 0x1006b7910>),
('__doc__',
"This class manages a connection to an SMTP or ESMTP server.\n
SMTP Objects:\n
SMTP objects have the following attributes:\n
helo_resp\n
This is the message given by the server in response to the\n
most recent HELO command.\n\n
ehlo_resp\n
This is the message given by the server in response to the\n
most recent EHLO command. This is usually multiline.\n\n
does_esmtp\n
This is a True value _after you do an EHLO command_, if the\n
server supports ESMTP.\n\n
esmtp_features\n
This is a dictionary, which, if the server supports ESMTP,\n
will _after you do an EHLO command_, contain the names of the\n
SMTP service extensions this server supports, and their\n
parameters (if any).\n\n
Note, all extension names are mapped to lower case in the\n
dictionary.\n\n
See each method's docstrings for details. In general, there is a\n
method of the same name to perform each SMTP command. There is also a\n
method called 'sendmail' that will do an entire mail transaction.\n
"),
('__eq__', <slot wrapper '__eq__' of 'object' objects>),
('__format__', <method '__format__' of 'object' objects>),
...
('__str__', <slot wrapper '__str__' of 'object' objects>),
('__subclasshook__',
<built-in method __subclasshook__ of type object at 0x1b002aed0>),
('__weakref__', <attribute '__weakref__' of 'SMTP' objects>),
('_get_socket', <function _get_socket at 0x1007a3d98>),
('close', <function close at 0x116437958>),
...
('verify', <function verify at 0x116437628>),
('vrfy', <function verify at 0x116437628>)]
>>>
>>> pprint(inspect.getmembers(SMTP, inspect.ismethod))
[]
>>> pprint(inspect.getmembers(SMTP, inspect.isfunction))
[('__init__', <function __init__ at 0x1007a3c88>),
('_get_socket', <function _get_socket at 0x1007a3d98>),
('close', <function close at 0x116437958>),
...
('verify', <function verify at 0x116437628>),
('vrfy', <function verify at 0x116437628>)]
>>> smtp = SMTP()
>>> pprint(inspect.getmembers(smtp, inspect.ismethod))
[('__init__',
<bound method SMTP.__init__ of <smtplib.SMTP object at 0x100644c90>>),
('_get_socket',
<bound method SMTP._get_socket of <smtplib.SMTP object at 0x100644c90>>),
('close', <bound method SMTP.close of <smtplib.SMTP object at 0x100644c90>>),
...
('verify',
<bound method SMTP.verify of <smtplib.SMTP object at 0x100644c90>>),
('vrfy', <bound method SMTP.verify of <smtplib.SMTP object at 0x100644c90>>)]
>>>
Yo u will get rather mo re o utput than we sho wed here, and the do cstring has been refo rmatted to make it
easier to read in the listing, but there is no reaso n to list eveything that is o utput. The detail presented is
sufficient to demo nstrate that the SMTP class has many member attributes, including the standard "dunder"
names, many o f them inherited fro m the o bje ct type.

Asking fo r the metho ds o f the class (using the ism e t ho d() predicate as a seco nd argument to
ge t m e m be rs()) changes it to return the empty list. This is no t to o surprising, as the predicate is do cumented
as returning True o nly fo r bound metho dsmetho ds asso ciated with a particular instance. The isf unct io n()
predicate used in the third example returns the metho ds that are specifically defined o n the class, but no t
tho se inherited fro m superclasses (which in practice means the o bje ct type). Creating an instance o f the
SMTP class and querying that fo r metho ds gives a much mo re interesting result.

Introspecting Functions
There are vario us attributes o f a co de o bject that can be used to disco ver info rmatio n abo ut the functio n to
which it belo ngs. The inspe ct mo dule pro vides so me co nvenience functio ns to avo id the need to use them
under mo st circumstances, ho wever.

inspe ct .ge t f ullargspe c(f ) returns a named tuple FullArgSpe c(args, varargs, varkw, de f ault s,
kwo nlyargs, kwo nlyde f ault s, anno t at io ns) co ntaining info rmatio n pertaining to the functio n argument f:

args is a list o f the names o f the standard (po sitio nal and keywo rd) arguments.
The de f ault s member co ntains the default values fo r the arguments specified by keywo rd (which
always fo llo w the po sitio nals).
varargs and varkw are the names o f the * and ** arguments, if present. The value No ne is used
when there are no such arguments.
kwo nlyargs is a list o f the arguments that must be pro vided as keywo rd arguments
kwo nlyde f ault s is the list o f default values o f tho se arguments.
anno t at io ns is a dict that maps argument names to anno tatio ns (which will usually be empty,
because we will no t co ver functio n anno tatio ns in this co urse)

inspe ct .f o rm at argspe c(args[, varargs, varkw, de f ault s, kwo nlyargs, kwo nlyde f ault s,
anno t at io ns]) takes the o utput fro m ge t f ullargspe c() and re-creates the arguments part o f the functio n
signature.

Here is a little example to sho w yo u ho w they wo rk.

Functio n intro spectio n

>>> import inspect


>>> def f(a, b, c=1, d="one", *args, **kw):
... print('a', a, 'b', b, 'c', c, 'd', d, 'args', args, 'kw', kw)
...
>>> inspect.getfullargspec(f)
FullArgSpec(args=['a', 'b', 'c', 'd'], varargs='args', varkw='kw', defaults=(1,
'one'), kwonlyargs=[], kwonlydefaults=None, annotations={})
>>> inspect.formatargspec(*inspect.getfullargspec(f))
"(a, b, c=1, d='one', *args, **kw)"
>>>

As yo u can see, f o rm at argspe c() pro duces a parenthesized list o f argument specificatio ns that can easily
be translated back into the o riginal fo rmat (o r so mething equivalent to it) using the f o rm at argspe c()
functio n.

There are o ther facilities that co me as part o f the inspe ct mo dule, and yo u can read the do cumentatio n fo r that mo dule when
yo u feel the need to learn mo re. Using the features yo u have learned abo ut in this lesso n, ho wever, yo u sho uld be able to
disco ver what yo ur pro gram needs to kno w abo ut the co de that surro unds it.

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.


Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Multi-Threading
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

utilize Threads and Pro cesses.


use the Threading Library Mo dule.

T hreads and Processes


When yo u are new to pro gramming (as so me students were when they started this Certificate Series), yo u do n't
necessarily think to o much abo ut all the o ther things that the co mputer is do ing besides running yo ur pro grams. Yo u
co nnect to a Windo ws system using remo te deskto p pro to co ls, and the same co mputer that is suppo rting yo ur
sessio n may be suppo rting o ther student sessio ns as well. It has to share its attentio n between these different tasks,
as well as handling yo ur keybo ard and mo use input and pro viding o utput in vario us GUIs. There is an eno rmo us
amo unt o f activity go ing o n in a mo dern server co mputer.

Multiprogramming
Early co mputers wo rked o n exactly o ne pro blem at a time. As their reso urces grew and they became faster,
peo ple o bserved that much o f the co mputer's time was spent idling, waiting fo r so me external event (such as
reading an 8 0 -co lumn card punched with data). Techniques were develo ped to allo w several pro grams to
reside in the co mputer at the same time, so that when o ne pro gram was waiting, the pro cesso r co uld be
wo rking o n ano ther. The classic name fo r this technique is multiprogramming.

In a mo dern co mputer, each pro gram is written as tho ugh it had exclusive use o f the machine that it runs o n,
even tho ugh in fact the o perating system will share its available pro cessing po wer amo ng hundreds o r even
tho usands o f processes. Each pro cess is iso lated fro m the o thers by running in a special pro tected mo de,
which can o nly access the memo ry that the o perating system has allo cated to it. To use sto rage and
co mmunicatio ns features, fo r example, pro cesses have to make calls to the o perating system. Thus the
separate pro cesses are iso lated fro m each o ther. Only the o perating system has the ability to access all
pro cesses' memo ry.

Multiprocessing
No wadays, the engineers who design the chips that go into co mputers are running up against so me fairly
fundamental speed co nstraints. Generally yo u can make things run faster by making them smaller (because
this reduces the travelling time o f the minute almo st-light-speed electrical currents o n which lo gic circuits
rely). The faster a circuit wo rks, the mo re energy it dissipates as heat. But when yo u make the chips to o small
o r to o fast they melt, because to o much energy is being dissipated in to o small a space, leading to
o verheating.

To try and o verco me the speed limitatio ns chip designers have started instead to build co mputers with mo re
than o ne pro cesso r o n the same chip, and co mputer engineers are putting several o f tho se chips o n a single
mo therbo ard to build so -called multi-pro cesso r co mputers. The different pro cesso rs share memo ry and
peripherals but are o therwise independent o f each o ther. As lo ng as there are no co nflicting requirements fo r
reso urces, each o f the pro cesso rs can be running a different pro cess in parallelliterally, the different
pro cesses are executed at the same time o n different pro cesso rs, and the o perating system tries to keep all
the pro cesso rs as busy as it can. So speed increases to day are being achieved by running several
co mputatio ns in parallel o n separate pro cesso rs. This ability to execute several instructio n streams truly
simultaneo usly is referred to as multiprocessing.

Multi-T hreading
In the same way that the o perating system shares the pro cesso r po wer between lo ts o f pro cesses all
co ntending fo r its use at certain times, so yo u can write pro grams that take a similar appro ach. They manage
lo ts o f separate activities in essentially the same way, but independently o f each o ther. Each independent
activity is usually referred to as a thread, and pro grams that manage multiple threads are said to be multi-
threaded.

Fo r example, aro und the turn o f the century I was asked to help a client send its mo nthly invo ices o ut by e-
mail. It was impractical to write a pro gram that sent the emails o ne by o ne. Firstly, fo rmulating the messages
to o k a significant amo unt o f time, with waits fo r data to co me in fro m the database and the netwo rked do main
name system that translates names like ho ldenweb.co m into IP addresses like 174.120 .139 .138 .
Furthermo re, there can be significant ho ldups in co mmunicatio n when a server is no lo nger present, and a
co nnectio n attempt takes minutes to time o ut. Early experiment established that it wo uld take upwards o f two
days to send o ut the invo ices, and that perfo rmance wo uld be flaky with o ccasio nal co mplete hang-ups.

Co nsequently, I had to take a different appro ach. Because I had written the co de to send an email as a Pytho n
functio n, it was relatively easy to refacto r the co de so that the functio n became the run() metho d o f a Pytho n
t hre ading.t hre ad subclass. This allo wed me to easily create threads to send individual emails. So me
additio nal plumbing was required, with a thread extracting invo icing tasks fro m the database, dispatching
threads to send the emails, and finally updating the database with the reco rd o f success o r failure. The
plumbing co de co uld easily be adjusted to create and use any number o f threads, and after a very sho rt time
the client was able to send o ut almo st 50 ,0 0 0 emails in under two ho urs using 20 0 parallel threads.

That represented a mo nthly saving o f at least $10 ,0 0 0 to the client in po stage, so the time spent
pro gramming was well wo rthwhile.

T hreading, Multiprocessing, CPython and the GIL


The CPytho n implementatio n o f Pytho n is currently the o nly implementatio n o f Pytho n 3, tho ugh the
develo pers o f the o ther majo r implementatio ns (PyPy, Jytho n and Iro nPytho n) have all expressed a
co mmitment to suppo rt this latest versio n o f Pytho n. The CPytho n implementatio n retains a feature fro m
Pytho n versio n 2 (which was the basis fo r develo pment o f the Pytho n 3.x co de)the so -called Global
Interpreter Lock, better kno wn as the GIL.

Only o ne thread in a Pytho n pro gram can ho ld the GIL at any time. In effect this means that multi-threaded
programs in Python find it very difficult to take advantage of more than one processorthe purpo se o f the GIL is
to allo w speed-up o f co mmo n primitive o peratio ns by ensuring that the same o bject is never being accessed
in inco mpatible ways at the same time by two pro cesso rs.

Guido van Ro ssum, Pytho n's invento r, is o n public reco rd as saying that he sees no reaso n to remo ve the
GIL fro m CPytho n. He suggests that peo ple wanting to take advantage o f hardware parallelism sho uld either
write their applicatio ns to run as multiple co o perating pro cesses o r use a Pytho n implementatio n that do es
no t rely o n a GIL fo r thread safety. As yo u will see in a later lesso n, o nce yo u understand ho w to use the
t hre ading library, it is no t much mo re effo rt to use the m ult ipro ce ssing library to achieve a true multi-
pro cess so lutio n. Since this runs multiple pro cesses rather than multiple threads, each pro cess runs with an
independent interpreter, and can take full advantage o f multipro cessing hardware if pro cesses are created in
sufficient number.

In essence yo u will o nly see benefits fro m multi-threading if the tasks perfo rmed by each thread require significant
"waiting time" (such as awaiting a respo nse fro m a user, o r fro m a remo te co mputer, o r fro m so me file). In CPytho n
o nly o ne thread at a time can ho ld the GIL, so multiple threads can o nly take advantage o f multiple pro cesso rs if they
use C extensio ns specifically written to release the GIL while perfo rming wo rk that do es no t require access to the
interpreter's reso urces. Multi-threaded so lutio ns are frequently seen as "difficult" to co mmunicate to beginners, but
mo st threading pro blems seem to co me fro m no t retaining strict iso latio n between the namespaces and o bject space
used by different threads. This is no t as simple as it seems, because so me standard library functio ns can alter the
enviro nment o f all threads in a particular process.

T he T hreading Library Module


t hre ading is the primary library fo r handling threads in Pytho n. In many implementatio ns, yo u will find there is also an
underlying _t hre ad mo dule, used to access threading libraries fro m the underlying system. In all cases, the threading
library wo rks in ro ughly the same way.

When multiple threads are present, the interpreter will share its time between the threads. Threads can beco me blo cked
fo r the same reaso ns that pro cesses can beco me blo cked: they need to wait fo r so mething (inco ming netwo rk data, a
co nnectio n request, data fro m filesto re). In CPytho n, the interpreter runs a certain number o f byteco des o f o ne thread
befo re mo ving o n to the next in a ro und-ro bin between no n-blo cked threads. If a thread is ho lding the GIL, no o ther
threads can be scheduled (except tho se that have explicitly released it, usually in an extensio n mo dule).

Creating T hreads (1)


The simplest way to create a new thread is by instantiating the t hre ading.t hre ad class. Yo u are expected to
pro vide a t arge t keywo rd argument, which will be called in the co ntext o f the new thread when it is started.
Yo u can also pro vide args, a tuple o f po sitio nal arguments and kwargs, a dict o f keywo rd arguments. These
arguments will be passed to the t arge t call when the thread is started. Finally, yo u can give yo ur thread a
name if yo u want by passing a nam e keywo rd argument. Default names fo r threads are typically names like
"Thread-N." Create a Pyt ho n4 _Le sso n10 pro ject and assign it to the Pyt ho n4 _Le sso ns wo rking set, and
then, in yo ur Pyt ho n4 _Le sso n10 /src fo lder, create t hre ad.py as sho wn:
thread.py: do ing six things in parallel
"""
thread.py: demonstrate creation and parallel execution of threads.
"""

import threading
import time

def run(i, name):


"""Sleep for a given number of seconds, report and terminate."""
time.sleep(i)
print(name, "finished after", i, "seconds")

for i in range(6):
t = threading.Thread(target=run, args=(i, "T"+str(i)))
t.start()
print("Threads started")

The pro gram defines a functio n that sleeps fo r a while, then prints a message and terminates. It then lo o ps,
creating and starting six threads, each o f which uses the functio n to sleep a seco nd lo nger than the last befo re
repo rting, using its given name. When yo u run this pro gram, yo u see:

Results o f running thread.py


T0 finished after 0 seconds
Threads started
T1 finished after 1 seconds
T2 finished after 2 seconds
T3 finished after 3 seconds
T4 finished after 4 seconds
T5 finished after 5 seconds

As so o n as the interpreter has mo re than o ne active thread it starts sharing its time between the threads. This,
co upled with the zero wait time fo r the first task, means that the very first thread created has finished even
befo re the main thread has co mpleted its creatio n and starting o f all six threads (which is when it prints the
"Threads started" message. The o ther threads then repo rt in at o ne-seco nd intervals.

When a running pro gram is asso ciated with the co nso le windo w, its "Terminate" and "Terminate
All" ico ns will be red, indicating that the co nso le is mo nito ring an active pro cess. As yo u run the
Note pro gram, yo u will see that even tho ugh the main thread (the o ne which started pro gram
executio n) terminates, Ellipse still sho ws the co nso le as co ntaining an active pro cess until the
last thread has terminated.

When Pytho n creates a new thread, that thread is to a degree iso lated fro m the o ther threads in the same
pro cess. Threads can share access to mo dule-glo bal variables, altho ugh yo u must be very careful no t to
change anything that co uld be changed co ncurrently by any o ther thread. There are safe ways fo r threads to
co mmunicate with each o ther (discussed in the next lesso n), and yo u sho uld use tho se. The namespace o f
the functio n call that starts the thread is unique to the thread, ho wever, and any functio ns that are called
similarly have new namespaces created.

Waiting for T hreads


Our initial thread.py pro gram just assumed that the threads wo uld all terminate in the end and everything
wo uld co me o ut nicely. If yo u do n't want to make this assumptio n, yo u can either mo nito r the thread co unt o r
yo u can wait fo r individual threads. The first appro ach is rather simpler, but it relies o n yo ur main thread being
the o nly part o f the pro gram that is creating threads. Otherwise, the thread co unt wo uld vary apparently
rando mly. The functio n to access the current number o f threads is t hre ading.act ive _co unt ().
Mo dify thread.py to mo nito r the number o f active threads
"""
thread.py: demonstrate simple monitoring of execution of threads.
"""

import threading
import time

def run(i, name):


"""Sleep for a given number of seconds, report and terminate."""
time.sleep(i)
print(name, "finished after", i, "seconds")

bgthreads = threading.active_count()
for i in range(6):
t = threading.Thread(target=run, args=(i, "Thread-"+str(i)))
t.start()
print("Threads started")
while threading.active_count() > bgthreads:
print("Tick ...")
time.sleep(2)
print("All threads done")

Yo ur o utput lo o ks like this:

Output o f updated thread.py


Thread-0 finished after 0 seconds
Threads started
Tick ...
Thread-1 finished after 1 seconds
Thread-2 finished after 2 seconds
Tick ...
Thread-3 finished after 3 seconds
Thread-4 finished after 4 seconds
Tick ...
Thread-5 finished after 5 seconds
All threads done

The pro gram no w takes a thread co unt befo re starting any threads, and then after starting them waits in a
timed lo o p until the thread co unt returns to what it was befo re. An alternative is to wait fo r each thread to
co mplete by calling its jo in() metho d. This blo cks the current thread until the thread who se jo in() metho d
was called has finished. Generally this wo rks best when the o rder o f the threads is kno wn, o r unimpo rtant:
o nce yo ur thread blo cks o n a jo in() it can do nothing until that thread terminates.
Mo dify thread.py to wait fo r each thread using jo in()
"""
thread.py: demonstrate thread monitoring by awaiting termination.
"""

import threading
import time

def run(i, name):


"""Sleep for a given number of seconds, report and terminate."""
time.sleep(i)
print(name, "finished after", i, "seconds")

bgthreads = threading.active_count()
threads = []
for i in range(6):
t = threading.Thread(target=run, args=(i, "Thread-"+str(i)))
t.start()
threads.append((i, t))
print("Threads started")
while threading.active_count() > bgthreads:
print("Tick ...")
time.sleep(2)
for i, t in threads:
t.join()
print("Thread", i, "done")
print("All threads done")

The "wo rker" threads actually terminate in the o rder in which the main thread createdand waits fo rthem,
and so the o utput sho ws each thread lo gged as terminated as so o n as it terminates.

Threads finish in the same o rder the main thread waits


Thread-0 finished after 0 seconds
Threads started
Thread 0 done
Thread-1 finished after 1 seconds
Thread 1 done
Thread-2 finished after 2 seconds
Thread 2 done
Thread-3 finished after 3 seconds
Thread 3 done
Thread-4 finished after 4 seconds
Thread 4 done
Thread-5 finished after 5 seconds
Thread 5 done
All threads done

A very simple mo dificatio n to the so urce makes the threads started earlier finish later:
thread.py still waits, but wo rker threads finish last first
"""
thread.py: demonstrate thread monitoring by awaiting termination.
"""

import threading
import time

def run(i, name):


time.sleep(i)
print(name, "finished after", i, "seconds")

threads = []
for i in range(6):
t = threading.Thread(target=run, args=(6-i, "Thread-"+str(i) ))
t.start()
threads.append((i, t))
print("Threads started")
for i, t in threads:
t.join()
print("Thread", i, "done")
print("All threads done")

This time the threads are all repo rted to gether, because by the time the first thread co mpletes, all o thers have
already co mpleted, and so their jo in() metho ds return immediately. This changes the nature o f the o utput
so mewhat.

Once the first jo in() returns so will all o thers


Threads started
Thread-5 finished after 1 seconds
Thread-4 finished after 2 seconds
Thread-3 finished after 3 seconds
Thread-2 finished after 4 seconds
Thread-1 finished after 5 seconds
Thread-0 finished after 6 seconds
Thread 0 done
Thread 1 done
Thread 2 done
Thread 3 done
Thread 4 done
Thread 5 done
All threads done

Creating T hreads (2)


The seco nd way to create threads is to define a subclass o f t hre ading.t hre ad, o verriding its run() metho d
with the co de yo u want to run in the threaded co ntext. In this case, yo u are expected to pass any data in
thro ugh the __init__() metho d, which also means making an explicit call to t hre ading.T hre ad.__init __()
with appro priate arguments. So there is a co st asso ciated with creating threads this way, because the
pro gramming is a little mo re detailed.

The appro ach can win if the lo gic gets co mplex, ho wever, because o ther metho ds can be added to the
subclass and used to implement co mplex functio nality in a reaso nably mo dular way: all lo gic is still attached
to a single class. Further, each thread is a separate instance o f the class and so the metho ds can
co mmunicate via instance variables as well as explicit arguments. When the thread is run as a functio n, there
is no co rrespo nding "glo bal" namespace that can be used.

First let's try and re-cast the thread.py pro gram to use a threading.thread subclass. When yo u use such
subclasses, it is po ssible to access the thread name, so the o nly argument required will be the sleep time.
This argument is saved in an instance variable, and any o ther arguments are passed to the standard thread
initializatio n ro utine (tho ugh arguments are no t no rmally passed to instantiate subclasses with run()
metho ds, who kno ws ho w the API may change in the futurethis way is future-pro o f). When the thread is
started, its run() metho d begins to execute and the sleep time is extracted fro m the instance variable. As
befo re, the main thread ticks every two seco nds and waits fo r the thread co unt to go back to its "main thread
o nly" value.
Mo dify thread.py to subclass threading.thread
"""
thread.py: Use threading.Thread subclass to specify thread logic in run() method
.
"""
import threading
import time

class MyThread(threading.Thread):
def __init__(self, sleeptime, *args, **kw):
threading.Thread.__init__(self, *args, **kw)
self.sleeptime = sleeptime
def run(self):
print(self.name, "started")
time.sleep(self.sleeptime)
print(self.name, "finished after", self.sleeptime, "seconds")

def run(i, name):


time.sleep(i)
print(name, "finished after", i, "seconds")

threads = []
bgthreads = threading.active_count()
tt = [MyThread(i+1) for i in range(6)]
for t in tt:
for i in range(6):
t = threading.Thread(target=run, args=(6-i, "Thread-"+str(i) ))
t.start()
threads.append((i, t))
print("Threads started")
for i, t in threads:
t.join()
print("Thread", i, "done")
while threading.active_count() > bgthreads:
time.sleep(2)
print("tick")
print("All threads done")

There sho uld be no surprises in the o utput:

Subclassing threading.thread wo rks to o !


Thread-1 started
Thread-2 started
Thread-3 started
Thread-4 started
Thread-5 started
Thread-6 started
Threads started
Thread-1 finished after 1 seconds
tick
Thread-2 finished after 2 seconds
Thread-3 finished after 3 seconds
Thread-4 finished after 4 seconds
tick
Thread-5 finished after 5 seconds
Thread-6 finished after 6 seconds
tick
All threads done

So far, the threads we've written haven't do ne very muchsimply sleeping and printing a message do esn't
really amo unt to a co nvincing co mputatio n. The co mputer is still do ing no thing but wait (in o ur pro cess) fo r
sleep times to expire. No w let's see what happens when we replace the sleep with so me real co mputatio n.
Mo difying thread.py to co mpute instead o f sleep
"""
thread.py: Use threading.Thread subclass to specify thread logic in run() method
.
"""
import threading
import time

class MyThread(threading.Thread):
def __init__(self, sleeptime, *args, **kw):
threading.Thread.__init__(self, *args, **kw)
self.sleeptime = sleeptime
def run(self):
print(self.name, "started")
time.sleep(self.sleeptime)
for i in range(self.sleeptime):
for j in range(500000):
k = j*j
print(self.name, "finished pass", i)
print(self.name, "finished after", self.sleeptime, "seconds")

bgthreads = threading.active_count()
tt = [MyThread(i+1) for i in range(6)]
for t in tt:
t.start()
print("Threads started")
while threading.active_count() > bgthreads:
time.sleep(2)
print("tick")
print("All threads done")

Yo u can see that this time the o utput fro m the different threads is intermingled, indicating that all active
threads are receiving so me pro cesso r time rather than o ne thread running until it finishes. Witho ut this
"scheduling" behavio r, threading wo uld no t be very po pular.
thread.py no w sho ws threads sharing co mpute reso urce
Threads started
Thread-1 finished pass 0
Thread-1 finished after 1 seconds
Thread-4 finished pass 0
Thread-3 finished pass 0
Thread-2 finished pass 0
Thread-5 finished pass 0
Thread-6 finished pass 0
Thread-4 finished pass 1
Thread-3 finished pass 1
Thread-2 finished pass 1
Thread-2 finished after 2 seconds
Thread-5 finished pass 1
Thread-4 finished pass 2
Thread-5 finished pass 2
Thread-6 finished pass 1
Thread-3 finished pass 2
Thread-3 finished after 3 seconds
Thread-4 finished pass 3
Thread-4 finished after 4 seconds
Thread-6 finished pass 2
Thread-5 finished pass 3
tick
Thread-6 finished pass 3
Thread-6 finished pass 4
Thread-5 finished pass 4
Thread-5 finished after 5 seconds
Thread-6 finished pass 5
Thread-6 finished after 6 seconds
tick
All threads done

Yo ur results will pro bably differ fro m tho se sho wn abo ve, precisely because the way the different threads are
scheduled may well no t be as "equitable" as yo u think. When yo u lo o k at the lo ng-lived threads, yo u can see
that Thread-4 finishes pass 3 befo re Thread-6 has finished pass 2. But ultimately all threads are co mputing
and they are all "pushed alo ng" at ro ughly the same speed.

Multi-threading is o ne way to achieve asynchro no us pro cessing. Fo r the CPytho n implementatio n (and o thers relying o n
single-pro cesso r guarantees to speed pro cessing) this will no t help if the applicatio n is CPU-bo und, as all pro cessing must
take place o n a single pro cesso r, and so the applicatio n canno t benefit fro m multiple pro cesso rs in the co mputer it runs o n.

Next, we will co nsider ho w to synchro nize multiple threads, amd ho w to pass data safely fro m o ne thread to ano ther.

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
More on Multi-Threading
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

synchro nize threads.


access the Queue Standard Library.

T hread Synchronization
threading.Lock Objects
Because attempts to access (and particularly to mo dify) the same reso urce fro m different threads can be
disastro us, the threading library includes Lo ck o bjects that allo w yo u to place a lock o n reso urces, sto pping
any o ther thread that tries to access the reso urce in its tracks (in fact, sto pping any thread that attempts to
acquire the same lo ck). A t hre ading.Lo ck has two states: lo cked and unlo cked, and it is created in the
unlo cked state.

When a thread wants to access the reso urce asso ciated with a specific Lo ck, it calls that Lo ck's acquire()
metho d. If the Lo ck is currently lo cked, the acquiring thread is blo cked until the Lo ck beco mes unlo cked and
allo ws acquisitio n. If the Lo ck is unlo cked, it is lo cked and acquired immediately. A Lo ck o bject beco mes
unlo cked when its release() metho d is called.

In the next example, we'll mo dify the thread.py co de fro m the last lesso n so that the "critical reso urce" is the
ability to sleep. Befo re sleeping fo r a tenth o f a seco nd each thread has to acquire a single lo ck shared
between all threads. Even tho ugh each thread o nly has to sleep fo r a to tal o f a seco nd, because there are six
threads and o nly o ne o f them can be sleeping at a time, it takes the pro gram six seco nds to run.

Mo dify thread.py to lo ck while sleeping

"""
thread.py: Use threading.Lock to ensure threads run sequentially.
"""
import threading
import time

class MyThread(threading.Thread):
def __init__(self, lock, *args, **kw):
threading.Thread.__init__(self, *args, **kw)
self.sleeptime = sleeptime
self.lock = lock
def run(self):
for i in range(10):
for j in range(500000):
k = j*j
self.lock.acquire()
time.sleep(0.1)
self.lock.release()
print(self.name, "finished pass", i)
print(self.name, "finished")
print(self.name, "finished after", self.sleeptime, "seconds")

lock = threading.Lock()
bgthreads = threading.active_count()
tt = [MyThread(lock) for i in range(6)]
for t in tt:
t.start()
print("Threads started")
while threading.active_count() > bgthreads:
time.sleep(2)
print("tick")
print("All threads done")
Save and run it:

The threads appear to finish deterministically in Eclipse


Threads started
tick
tick
tick
Thread-1 finished
Thread-2 finished
Thread-3 finished
Thread-4 finished
Thread-5 finished
Thread-6 finished
tick
All threads done

In different enviro nments, ho wever, the o utput fro m this pro gram will typically vary each time yo u run it,
because there are eno ugh acquisitio ns and releases to allo w different threads to get an advantage in the
scheduling (which is no t a simple deterministic ro und-ro bin). Here is the o utput fro m a run o f the same
pro gram under Pytho n 3.1.3 o n MacOS 10 .6 :

The threads finish in apparently rando m o rder o ver six seco nds
AirHead:src sholden$ python3 thread.py
Threads started
Thread-3 finished
tick
Thread-6 finished
tick
Thread-1 finished
Thread-4 finished
Thread-5 finished
tick
Thread-2 finished
tick
All threads done
AirHead:src sholden$

The simple expedient o f remo ving the lo ck acquisitio n allo ws the threads to sleep in parallel, and witho ut the
limitatio n that o nly o ne thread can sleep at a time, all threads have terminated befo re the first (and last) tick
fro m the main thread. Because the sleeps are intermingled, and again subject to rando m timing variatio ns, the
o rder o f the threads finishing is unpredictable. [Yo u sho uld verify this assertio n by making several runs o f
yo ur pro gram].
Remo ving the lo cks means o ne thread need no t wait fo r o thers
"""
thread.py: Without threading.Lock, threads sleep in parallel.
"""
import threading
import time

class MyThread(threading.Thread):
def __init__(self, lock, *args, **kw):
threading.Thread.__init__(self, *args, **kw)
self.lock = lock
def run(self):
for i in range(10):
self.lock.acquire()
time.sleep(0.1)
self.lock.release()
self.lock.acquire()
print(self.name, "finished")
self.lock.release()

lock = threading.Lock()
bgthreads = threading.active_count()
tt = [MyThread(lock) for i in range(6)]
for t in tt:
t.start()
print("Threads started")
while threading.active_count() > bgthreads:
time.sleep(2)
print("tick")
print("All threads done")

No w the six threads are all sleeping pretty much in parallel, and so all terminate after o ne seco nd. The main
thread therefo re ticks o nce and sees all threads already terminated, and so the pro gram ends after two
seco nds. Again yo u sho uld find that the o rder in which the "wo rker" threads terminate is unpredictable,
because o f unco ntro llable timing differences. No w it is much mo re likely that different threads co uld be
printing at the same time, which co uld lead to garbled o utput, so we use the lo cks to ensure this canno t
happen. A typical o utput fo llo ws.

It's all o ver befo re the first tick!


Threads started
Thread-4 finished
Thread-5 finished
Thread-6 finished
Thread-2 finished
Thread-1 finished
Thread-3 finished
tick
All threads terminated

Int e ract ive t hre ading e xpe rim e nt s can be t ricky in IDEs: yo u may find, if yo u experiment with
threads fro m the Ellipse interactive co nso le, that o utput fro m a thread running in the backgro und do es no t
always appear immediately. This is because the IDE co ntro ls o utput in an attempt to ensure that yo ur
Note input is never interspersed with o utput fro m running co de (which wo uld make sessio ns extremely difficult
to understand). So frequently yo u need to press Ent e r at the ">>> " pro mpt to allo w o utput to beco me
visible. A true interactive co nso le sessio n in a terminal windo w will no t generally cause the same issues.

If yo u are starting to enjo y the po ssibilities o pened up by the t hre ading library, yo u sho uld definitely lo o k at its
do cumentatio n to learn abo ut Rlo ck, Co ndit io n, Se m apho re and Eve nt o bjects.

T he Queue Standard Library


This library was pro duced to pro vide pro grammers o f threaded pro grams with a safe way fo r their threads to exchange
info rmatio n. The que ue mo dule defines three classes that each have the same interface but queue things in slightly
different ways. que ue .Que ue is a FIFO (first-in first-o ut) queue in which the first o bjects added to the queue are the
first to be retrieved. This is the mo st usual type to use fo r handing o ut wo rk to wo rker threads. que ue .Lif o Que ue
o bjects implement a stack o f so rts. The next item retrieved is the mo st recently-added item. Finally,
que ue .Prio rit yQue ue items are always retrieved in natural so rt o rder.

When creating a queue, yo u can establish a maximum length fo r it by pro viding that length as an argument. If this
maximum length is no t pro vided, the queue will be o f po tentially infinite length, and further items may always be added
to it. With a maximum length, there are o nly a given number o f free slo ts, and attempts to add to a full queue will either
blo ck the thread that is attempting the add o r raise an exceptio n to sho w that the queue is full (o r a co mbinatio n o f
bo th). The thread-safety guarantees made by the library mean that the same queue item can be accessed by multiple
threads witho ut any need to lo ck the queue (lo cking as necessary is taken care o f internally by the queue metho ds).
When a queue is empty, any attempt to extract an item will either blo ck o r raise an exceptio n (o r bo th).

We are making o nly the simplest use o f queues here, by using the put () and ge t () metho ds, to present a way o f
writing scalable threaded pro grams. There are many refinements yo u can ado pt by reading the mo dule do cumentatio n
o nce yo u understand the basics. In threaded applicatio ns, simplest is almo st always best, as mo st o f us have brains
that can o nly co nceptualize a limited amo unt o f parallelism and have difficulty predicting situatio ns that cause
pro blems in practice (such as deadlo cks, where Thread A is blo cked waiting fo r Thread B, which is blo cked waiting fo r
Thread A: since neither can pro gress, the two threads are do o med to wait fo r each o ther fo rever).

Adding Items to Queues: Queue.put()


que ue .Que ue .put (it e m , blo ck=T rue , t im e o ut =No ne ) adds the given item to the queue. If blo ck
evaluates false, either the item is added immediately o r an exceptio n is raised. When blo ck is True (the
default case), either the item is added immediately o r the putting thread blo cks. If timeout remains None, this
could leave the thread blocked indefinitely in a non-interruptible state. If a timeo ut (in seco nds) is given, an
exceptio n will be raised if the item has no t been added befo re the timeo ut expires.

Removing Items from Queues: Queue.get()


que ue .Que ue .ge t (blo ck=T rue , t im e o ut =No ne ) attempts to remo ve an item fro m the queue. If an item is
immediately available, it is always returned. Otherwise, if blo ck evaluates false, an exceptio n is raised. When
blo ck evaluates true, the pro cess blo cks either indefinitely (when timeo ut is No ne) o r until the timeo ut (in
seco nds) has expired, in which case an exceptio n is raised if no item has arrived.

Monitoring Completion: Queue.task_done() and Queue.join()


Every time an item is successfully added to a queue with put(), a task count is incremented. Removing an item
with get() does not decrement the counter. To decrement the co unter, the remo ving thread sho uld wait until
pro cessing is co mplete and then call the queue's task_do ne() metho d.

If a queue is expected to end up empty, a thread can declare itself interested in the queue's exhaustio n by
calling its jo in() metho d. This method blocks the calling thread until all tasks have been recorded as complete.
Yo u sho uld be co nfident that threads are all go ing to terminate co rrectly befo re using this technique, since it
can lead to indefinite delays.

A Simple Scalable Multi-T hreaded Workhorse


We'll finish the lesso n by building a fairly general framewo rk to allo w yo u to run pro grams with "any number"
o f threads (so metimes the system places limits o n the number o f threads yo u can create).

The idea is to have a co ntro l thread that generates "wo rk packets" fo r a given number o f wo rker threads (with
which it co mmunicates by means o f a queue). The wo rker threads co mpute the necessary results, and deliver
them to a final o utput thread (by means o f a seco nd queue) which displays the results. The structure is quite
general: wo rk units can be generated by reading database tables, accepting data fro m web services, and the
like. Co mputatio ns can invo lve no t o nly calculatio n but further database wo rk o r netwo rk co mmunicatio n, all
o f which can invo lve so me (in co mputer terms) fairly extensive waiting.

The co ntro l thread is the main thread with which every pro gram starts o ut (the only thread o f all pro grams
befo re these lesso ns). It creates an input and an o utput queue, starts the wo rker threads and the o utput
thread, and thereafter distributes wo rk packets to the wo rker threads until there is no mo re wo rk. Since the
wo rker threads are pro grammed to terminate when they receive No ne fro m the wo rk queue, the co ntro l
thread's final act is to Queue No ne fo r each wo rker thread and then wait fo r the queue to finally empty befo re
terminating. The wo rker threads put a No ne to the o utput queue befo re terminating. The o utput thread co unts
these No nes, and terminates when eno ugh No ne values have been seen to acco unt fo r all wo rkers.

T he Output T hread
The o utput thread simply has to extract o utput packets fro m a queue where they are placed by the wo rker
threads. As each wo rker thread terminates, it po sts a No ne to the queue. When a No ne has been received
fro m each thread, the o utput thread terminates. The o utput thread is to ld o n initializatio n ho w many wo rker
threads there are, and each time it receives ano ther No ne it decrements the wo rker co unt until eventually there
are no wo rkers left. At that po int, the o utput thread terminates. Create a new PyDev pro ject named
Pyt ho n4 _Le sso n11 and assign it to the Pyt ho n4 _Le sso ns wo rking set. Then, in yo ur
Pyt ho n4 _Le sso n11/src fo lder, create o ut put .py as sho wn:

o utput.py: the o utput thread definitio n


"""
output.py: The output thread for the miniature framework.
"""
identity = lambda x: x

import threading
class OutThread(threading.Thread):
def __init__(self, N, q, sorting=True, *args, **kw):
"""Initialize thread and save queue reference."""
threading.Thread.__init__(self, *args, **kw)
self.queue = q
self.workers = N
self.sorting = sorting
self.output = []
def run(self):
"""Extract items from the output queue and print until all done."""
while self.workers:
p = self.queue.get()
if p is None:
self.workers -= 1
else:
# This is a real output packet
self.output.append(p)
print("".join(c for (i, c) in (sorted if self.sorting else identity)(sel
f.output)))
print ("Output thread terminating")

In this particular case, the o utput thread is receiving (index, character) pairs (because the wo rkers pass
thro ugh the po sitio n argument they are given as well as the transfo rmed character, to allo w the string to be
reassembled no matter in what o rder the threads finish). Rather than o utput each o ne as it arrives, the o utput
thread sto res them until the wo rkers are all do ne, then so rts them (unless so rting is disabled with
so rt ing=False ) and the characters extracted and jo ined to gether.

T he Worker T hreads
The Wo rker threads have been cast so as to make interactio ns easy. The wo rk units received fro m the input
queue are (index, character) pairs, and the o utput units are also pairs. The pro cessing is split o ut into a
separate metho d to make subclassing easiersimply o verride the pro cess() metho d. Create wo rke r.py as
sho wn:
wo rker.py: the simple wo rker thread
"""
worker.py: a sample worker thread that receives input
through one Queue and routes output through another.
"""
from threading import Thread

class WorkerThread(Thread):
def __init__(self, iq, oq, *args, **kw):
"""Initialize thread and save Queue references."""
Thread.__init__(self, *args, **kw)
self.iq, self.oq = iq, oq
def run(self):
while True:
work = self.iq.get()
if work is None:
self.oq.put(None)
print("Worker", self.name, "done")
self.iq.task_done()
break
i, c = work
result = (i, self.process(c)) # this is the "work"
self.oq.put(result)
self.iq.task_done()
def process(self, s):
"""This defines how the string is processed to produce a result"""
return s.upper()

Altho ugh this particular wo rker thread is no t do ing particularly interesting pro cessing (merely co nverting a
single character to upper case), yo u can imagine mo re co mplex wo rk units, perhaps with numerical inputs
and the need fo r database lo o kup as well as interactio n with lo cal disk files.

T he Control T hread
Everything is started o ff by the co ntro l thread (which impo rts the o utput and wo rker threads fro m their
respective mo dules). It first creates the input and o utput queues. These are standard FIFOs, with a limit o f
50 % mo re than the number o f wo rker threads to avo id lo cking up to o much memo ry in buffered o bjects. Then
it creates and starts the o utput thread, and finally creates and starts as many wo rker threads as co nfigured by
the WORKERS co nstant. Wo rker threads get fro m the input queue and put to the o utput queue. The co ntro l
thread then simply keeps the input queue lo aded as lo ng as it can befo re sending the No ne values required to
shut the wo rker threads do wn. Once the input queue is empty, the thread terminates.
co ntro l.py: The thread that drives everything else
"""
control.py: Creates queues, starts output and worker threads,
and pushes inputs into the input queue.
"""
from queue import Queue
from output import OutThread
from worker import WorkerThread

WORKERS = 10

inq = Queue(maxsize=int(WORKERS*1.5))
outq = Queue(maxsize=int(WORKERS*1.5))

ot = OutThread(WORKERS, outq)
ot.start()

for i in range(WORKERS):
w = WorkerThread(inq, outq)
w.start()
instring = input("Words of wisdom: ")
for work in enumerate(instring):
inq.put(work)
for i in range(WORKERS):
inq.put(None)
inq.join()
print("Control thread terminating")

Running the pro gram causes a pro mpt fo r input, which is then split up into individual characters and passed
thro ugh the input queue to the wo rker threads. At present, ten threads o perate in parallel, but the number can
easily be varied by changing the definitio n o f WORKERS in the so urce file. The o utput fro m a typical run is
sho wn belo w.

A Bizarrely Co mplex Way to Co vert a String to Upper Case?


Words of wisdom: Elemental forces are at work to change the way we live.
Worker Thread-2 done
Worker Thread-3 done
Worker Thread-4 done
Worker Thread-10 done
Worker Thread-9 done
Worker Thread-8 done
Worker Thread-11 done
Worker Thread-7 done
Worker Thread-5 done
Worker Thread-6 done
Control thread terminating
ELEMENTAL FORCES ARE AT WORK TO CHANGE THE WAY WE LIVE.
Output thread terminating
Control thread terminating

Yo u will appreciate the need fo r the so rting if yo u study this o utput, fro m a typical run where the o utput thread
was created with so rting=False:
Why so rting is required
Words of wisdom: Does the string really appear correct?
Worker Thread-7 done
Worker Thread-6 done
Worker Thread-4 done
Worker Thread-2 done
Worker Thread-11 done
Worker Thread-3 done
Worker Thread-9 done
Worker Thread-5 done
Worker Thread-10 done
Worker Thread-8 done
DOES THE STRING PEAELRYA PLAR CORERCT?
Output thread terminating
Control thread terminating

This ends o ur discussio n o f the queue.Queue o bject, and with it o ur so mewhat lengthy study o f threading.

Other Approaches
In the last two lesso ns, we've made use o f the t hre ading library mo dule to write classes who se instances
run as separate threads. If eno ugh o f these are started, the waiting that each thread has to do can be filled by
useful wo rk fo r o ther threads, and so a fairly high-bandwidth netwo rk channel can be kept busy and individual
ho ld-ups can be made to matter much less. There are a number o f o ther schemes that have been develo ped
to co ntro l multiple asynchro no us tasks.

The o ldest (and the o nly o ne currently included in the standard library) is the asynco re mo dule. With asynco re,
each client pro cess is a "channel," and yo u pro gram the channels to respo nd to specific netwo rk events in
specific ways. Asynchat is layered o n to p o f asynco re and allo ws yo u to specify pro to co l handling by lo o king
fo r specific sequences in the inco ming data and triggering events when tho se sequences are detected.

The Twisted library is a system devised by Glyph Lefko witz that has been used to go o d effect by many
surprisingly large enterprises (including o ne business that has since been purchased by Go o gle). Operatio ns
that will po tentially blo ck (cause the pro cess to wait) return a De f e rre d o bject, which is effectively a pro mise
o f future data. A Deferred o bject is asked fo r its result by calling specific metho ds; if the data is no t currently
available, the Twisted scheduler suspends that activity until the Deferred request can be satisfied, and returns
to so me o ther suspended task that can no w be restarted.

Stackless Pytho n was an early attempt by Christian Tismer to allo w massively parallel co mputing in Pytho n
by the pro visio n o f so -called "micro -threads." It has been used to great effect by a gaming co mpany to
pro vide a space "sho o t-'em-up" enviro nment fo r o ver 50 ,0 0 0 simultaneo us players. Mo re recent versio ns
allo w advanced capabilities like saving a co mputatio n o n o ne co mputer and resto ring it o n ano ther. This was
very helpful in running co de o n a 250 -CPU cluster.

A mo re recent appro ach to asynchro no us netwo rking is the Kamaelia package, initially develo ped by Michael
Sparks fo r BBCResearch in the UK. Kamaelia, as far as I am aware, pio neered the use o f generato r functio ns
to interact with the task scheduling enviro nment. This appro ach has also been taken in Mo no cle, ano ther
even mo re recent develo pment by Raymo nd Hettinger.

All in all, if yo u decide to venture beyo nd the standard library, a wealth o f cho ices awaits yo u and no t all o f
them rely o n threading.

Multi-threading is o ne way to achieve asynchro no us pro cessing. Fo r the CPytho n implementatio n (and o thers relying o n
single-pro cesso r guarantees to speed pro cessing) this will no t help if the applicatio n is CPU-bo und, as all pro cessing must
take place o n a single pro cesso r, and so the applicatio n canno t benefit fro m multiple pro cesso rs in the co mputer it runs o n.

Next, we'll go o n to co nsider ho w to share wo rk between multiple pro cesses, which can be do ne o n different pro cesso rs and
therefo re extract mo re wo rk fro m mo dern multi-pro cesso r hardware.

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Multi-Processing
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

use the Multipro cessing Library Mo dule.


create a Multipro cessing Wo rker Pro cess Po o l.

T he Multiprocessing Library Module


The m ult ipro ce ssing mo dule was written specifically to o ffer features clo sely parallel to the t hre ading library but
allo wing the individual threads o f co ntro l to be pro cesses rather than threads within a single pro cess. This allo ws the
o perating system to take advantage o f any parallelism inherent in the hardware design, since generally pro cesses can
run co mpletely independently o f o ne ano ther, and o n separate pro cesso rs if they are available.

multiprocessing Objects
The m ult ipro ce ssing library defines vario us classes, mo st o f which o perate in the same way as similar
classes in the t hre ading and related mo dules. Whereas in using t hre ading yo u also impo rted reso urces
fro m o ther mo dules, the m ult ipro ce ssing mo dule tries to put all necessary reso urces into o ne co nvenient
place, simplifying impo rts. But yo u will easily reco gnize the pro gram style fro m yo ur recent wo rk o n multi-
threading.

A Simple Multiprocessing Example


Our first multipro cessing example is marked up belo w as tho ugh we were editing the first thread.py example.
This sho ws ho w similar the two enviro nments are. Create a new pydev pro ject named Pyt ho n4 _Le sso n12
and assign it to the Pyt ho n4 _Le sso ns wo rking set. Then, in yo ur Pyt ho n4 _Le sso n12/src fo lder, create
pro ce ss.py as sho wn:

CODE TO TYPE: pro cess.py

"""
process.py: demonstrate creation and parallel execution of processes.
"""

import multiprocessing
import time
import sys

def run(i, name):


"""Sleep for a given number of seconds, report and terminate."""
time.sleep(i)
print(name, "finished after", i, "seconds")
sys.stdout.flush()

if __name__ == "__main__":
for i in range(6):
t = multiprocessing.Process(target=run, args=(i, "P"+str(i)))
t.start()
print("Processes started")

No te that this pro gram has been co rrectly written as a mo dule, so that the actio n o f starting six pro cesses is
o nly perfo rmed by the pro cess that runs this co de, and no t in any pro cesses that may try to impo rt the
mo dule. This is very impo rtant, because the subpro cesses have to get their descriptio n o f the wo rk to be
do ne fro m so mewhere, and they do that by impo rting the main mo dule. So in this case the subpro cesses will
impo rt the pro ce ss mo dule (so the test __nam e __ == " __m ain__" is false) to access the run() functio n.
No t all platfo rms require that the main mo dule be "impo rtable" in that way. Since it do es no t hurt
Note to write yo ur pro grams this way, ho wever, we reco mmend that yo u do so every time. Then,
platfo rm differences are less likely to "bite" yo u.

The o utput sho uld no t be at all surprising:

Waiting in pro cesses rather than threads


Processes started
P0 finished after 0 seconds
P1 finished after 1 seconds
P2 finished after 2 seconds
P3 finished after 3 seconds
P4 finished after 4 seconds
P5 finished after 5 seconds

A Multiprocessing Worker Process Pool


The lesso n o n multi-threading co ncluded with an example that used a po o l o f wo rker threads to co nvert the characters
o f a string into upper case. To demo nstrate the (at least superficial) similarities between m ult ipro ce ssing and
t hre ading and friends, we'll no w adapt that co de.

So first, co py the three pro grams (o ut put .py, wo rke r.py, and co nt ro l.py fro m yo ur Pyt ho n4 _Le sso n11/src fo lder
to yo ur Pyt ho n4 _Le sso n12/src fo lder.

T he Output Process
The fo llo wing listing sho ws the co de fo r the m ult ipro ce sso r versio n alo ngside the equivalent t hre ading-
based co de. The differences are small eno ugh to be negligible, and to allo w anyo ne who understo o d the
threaded co de to also understand the multi-pro cess versio n.

Mo difying o utput.py fo r multi-pro cesso r o peratio ns


"""
output.py: The output process for the miniature framework.
"""
identity = lambda x: x

import multiprocessing
import sys

class OutThread(multiprocessing.Process):
def __init__(self, N, q, sorting=True, *args, **kw):
"""Initialize process and save queue reference."""
multiprocessing.Process.__init__(self, *args, **kw)
self.queue = q
self.workers = N
self.sorting = sorting
self.output = []

def run(self):
"""Extract items and print until all done."""
while self.workers:
p = self.queue.get()
if p is None:
self.workers -= 1
else:
# This is a real output packet
self.output.append(p)
print("".join(c for (i, c) in (sorted if self.sorting else identity)(sel
f.output)))
print ("Output process terminating")
sys.stdout.flush()
The main difference between the two pieces o f co de is the use o f m ult ipro ce ssing.pro ce ss in place o f
t hre ading.T hre ad, and asso ciated changes to a co uple o f co mments. It is also necessary to flush the
pro cess's standard o utput stream to make sure that it is captured befo re the pro cess terminateso therwise
yo u will see a co nfusing lack o f o utput! (Feel free to try running the pro gram with the flush() call co mmented
o ut to verify this).

T he Worker Process
The next listing sho ws the differences in the wo rker co de when pro cesses are being used instead o f threads.

Mo difying wo rker.py fo r multi-pro cesso r o peratio ns

"""
worker.py: a sample worker process that receives input
through one queue and routes output through another.
"""

from multiprocessing import Process


import sys

class WorkerThread(Process):
def __init__(self, iq, oq, *args, **kw):
"""Initialize process and save Queue references."""
Process.__init__(self, *args, **kw)
self.iq, self.oq = iq, oq
def run(self):
while True:
work = self.iq.get()
if work is None:
self.oq.put(None)
print("Worker", self.name, "done")
self.iq.task_done()
break
i, c = work
result = (i, self.process(c)) # this is the "work"
self.oq.put(result)
self.iq.task_done()
sys.stdout.flush()
def process(self, s):
"""This defines how the string is processed to produce a result."""

return s.upper()

Again the o nly change is to use Pro ce ss fro m m ult ipro ce ssing instead o f T hre ad fro m t hre ading. (Two
o f the differences are again in co mments.)

T he Control Process
The co ntro l pro cess again needs very little change: que ue o bjects co me fro m the m ult ipro ce ssing mo dule
rather than the que ue mo dule, and in that mo dule if yo u are go ing to jo in() a que ue then yo u must use a
J o inable Que ue . The rest o f the lo gic is exactly the same, with the exceptio n that the code must now be
guarded so that it isn't executed when the module is imported by the multiprocessing module. This means yo u
have to indent the majo rity o f the lo gic. This is easy in Ellipse: just highlight all the lines o f co de (making sure
yo u are selecting who le lines) and then press T ab o nce.
Mo difying co ntro l.py fo r multi-pro cesso r o peratio ns
"""
control.py: Creates queues, starts output and worker processes,
and pushes inputs into the input queue.
"""
from multiprocessing import Queue, JoinableQueue
from output import OutThread
from worker import WorkerThread

if __name__ == '__main__':
WORKERS = 10

inq = JoinableQueue(maxsize=int(WORKERS*1.5))
outq = Queue(maxsize=int(WORKERS*1.5))

ot = OutThread(WORKERS, outq, sorting=True)


ot.start()

for i in range(WORKERS):
w = WorkerThread(inq, outq)
w.start()
instring = input("Words of wisdom: ")
# feed the process pool with work units
for work in enumerate(instring):
inq.put(work)
# terminate the process pool
for i in range(WORKERS):
inq.put(None)
inq.join()
print("Control process terminating")

This versio n o f co ntro l.py do es exactly what the threading versio n did, except that the individual characters are no w
being passed to o ne o f a po o l o f processes rather than o ne o f a po o l o f threads. The co mputatio n is trivial, but the
principle wo uld be the same if the wo rk packets were filenames and the o utputs were MD5 checksums o f the co ntents
o f the file (which co uld require substantial co mputatio n and I/O in the case o f lo ng files). Since the pro cesses run
independently o f each o ther, they can be run o n different pro cesso rs at the same time, allo wing pro grams to take true
advantage o f hardware parallelism. The o utput will seem pro saic fo r the amo unt o f wo rk that is being do ne!

Output o f the multipro cessing upper-case co nverter


Words of wisdom: No words of wisdom at all, in fact. Just a rather long and boring line
of text.
Worker Thread-2 done
Worker Thread-3 done
Worker Thread-4 done
Worker Thread-5 done
Worker Thread-6 done
Worker Thread-7 done
Worker Thread-8 done
Worker Thread-9 done
Worker Thread-10 done
Worker Thread-11 done
Control thread terminating
NO WORDS OF WISDOM AT ALL, IN FACT. JUST A RATHER LONG AND BORING LINE OF TEXT.
Output thread terminating.

Do no t make the mistake o f thinking that this brief treatment has taught yo u all yo u need to kno w abo ut multipro cessing. There
are many mo re things to learn abo ut it including, fo r example, limitatio ns o n what can be transmitted fro m pro cess to pro cess
thro ugh a m ult ipro ce ssing.Que ue . These restrictio ns are fairly co mmo nsense, and are the result o f having to pickle the
o bjects to transmit them to the remo te pro cess. As lo ng as yo u stick to Pytho n's basic data o bjects (and co mbinatio ns thereo f),
yo u sho uld be fine. Other restrictio ns are less o bvio us: when yo u subclass m ult ipro ce ss.Pro ce ss, the instances sho uld be
pickleable (because the class has to be instantiated in a new pro cess when the instance's start() metho d is called).

As systems evo lve, multipro cesso r so lutio ns will beco me mo re and mo re co mmo n, and it will be necessary to put systems
to gether to take co ntro l o f multi-pro cesso r machines. This lesso n is intended to give yo u the necessary gro unding so that yo u
can take the next steps with co nfidence.
When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Functions and Other Objects
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

interact with mo re Functio ns.


emplo y mo re Magic Metho ds.

A Deeper Look at Functions


Required Keyword Arguments
Yo u already kno w that the arguments passed to a functio n call must match the parameter specificatio ns in the
functio n's definitio n. Any mismatch can be taken up in the definitio n, where a parameter o f the fo rm *name
asso ciates unmatched po sitio nal arguments with a tuple and o ne o f the fo rm **name asso ciates the names
and values o f unmatched keywo rd arguments with the keys and values o f a dict.

Yo u have also seen that a po sitio nal argument may be asso ciated with a keywo rd parameter and vice versa.
Yo u currently have no way, ho wever, o f requiring that specific arguments be presented as keywo rd
arguments. Yo u can specify such a requirement by inserting an asterisk o n its o wn as a parameter
specificatio n: any parameters that fo llo w the star (o ther than the *args and **kwargs arguments, if present)
must be pro vided as keywo rd arguments o n the call.

Investigating this pheno meno n is quite easy in the interactive co nso le:

Investigating functio n signatures

>>> def f(a, *, b, c=2):


... print("A", a, "B", b, "C", c)
...
>>> f(1, 2)
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: f() takes 1 positional argument but 2 were given
>>> f(1, c=3)
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: f() missing 1 required keyword-only argument: 'b'
>>> f(1, b=2, c=3)
A 1 B 2 C 3
>>> f(1, b=2)
A 1 B 2 C 2
>>>

Attempting to pro vide a po sitio nal argument fo r b raises an exceptio n because o f the wro ng number o f
po sitio nal arguments. The seco nd test is the mo st telling o ne, as that explains the requirement fo r a keywo rd
argument b.

Function Annotations
We mentio n this feature because yo u may co me acro ss so me co de that uses it, and wo nder what o n Earth is
go ing o n. In Pytho n 3, functio ns and their parameters can be annotated. A parameter is anno tated by fo llo wing
its name with a co lo n and an expressio n, and a functio n is anno tated by fo llo wing its parameter list with "->"
and an expressio n.

The language definitio n specifically avo ids asso ciating any kind o f meaning to anno tatio ns. The stated
intentio n is that if peo ple find ways o f using anno tatio ns that find general acceptance, specific semantics may
be added to the interpreter at a later date; fo r no w yo u can access them thro ugh the __anno t at io ns__
attribute o f the functio n o bject. This is a dict in which each o f the functio n's anno tated parameters is sto red
against the parameter name as key. The functio n's return-value anno tatio n, if present, is sto red against key
"return" which, being a Pytho n keywo rd, canno t be the name o f any parameter.
Just to sho w yo u ho w anno tatio ns appear in practice, we'll create an anno tated functio n in an interactive
interpreter sessio n:

INTERACTIVE SESSION:

>>> def f(i: int, x:float=1.2) -> str:


... return str(i*x)
...
>>> f.__annotations__
{'i': <class 'int'>, 'x': <class 'float'>, 'return': <class 'str'>}
>>>

Altho ugh there is no restrictio n o n the expressio ns used as anno tatio ns, in practice mo st peo ple see them as
being useful fo r making assertio ns abo ut the types o f arguments and the functio n's return value. At present,
no thing in the interpreter uses the ano tatio n info rmatio n at all. Yo u wo uld need to specifically actio n such
uses with additio nal co de if yo u do n't want yo ur anno tatio n data to be igno red. It is likely that, as the feature
beco mes better kno wn, framewo rks will emerge to make use o f different types o f anno tatio n data.

Nested Functions and Namespaces


Altho ugh yo u have seen functio ns with functio n definitio ns inside them, we have no t yet fo rmalized the rules
fo r lo o king up names within tho se functio ns. Yo u already kno w the general rule fo r (unqualified) name
reso lutio n in Pytho n: first lo o k in the lo cal namespace, then lo o k in the (mo dule) glo bal namespace, and
finally lo o k in the built-in namespace.

The o nly additio nal co mplexity that nested functio ns intro duce is that the lo cal namespace is actually
enhanced by names fro m surro unding functio ns (unless they are redefined in the co ntained functio n).
Remember that a name is only considered local to a function if the name is bound in that function. So when a
functio n is defined inside a functio n, a name can be a reference fro m the functio n call's namespace, o r a
reference to the namespace o f the functio n call during which the inner functio n was defined, and this regress
can go o n until the o utermo st functio n call is enco untered.

Understanding Pytho n as yo u do no w, yo u will see that it requires so me trickery to allo w a functio n to return
ano ther functio n defined inside the first functio n. That is because the returned functio n may co ntain references
to values defined in the lo cal namespace o f the (no w co mpleted) functio n call that returned it! We do no t need
to examine the mechanism the interpreter uses to reso lve this issue, but since it is a genuine feature o f the
language, it is o ne that every implementatio n has to so lve in its o wn way.

Pytho n 3 also intro duces a seco nd declaratio n statement, the no nlo cal statement. This can be used to fo rce
an apparently lo cal variable to instead be treated as tho ugh it came fro m the co ntaining sco pe where it is
already defined. This is slightly different fro m the global statement, in that the interpreter searches the
co ntaining sco pes (functio n namespaces) to lo cate the o ne that already co ntains a definitio n o f the name(s)
listed after the no nlo cal keywo rd. (The glo bal statement always and unambiguo usly places the name in the
mo dule glo bal namespace, whether it has been defined there o r no t).

Create a new PyDev pro ject named Pyt ho n4 _Le sso n13 and assign it to the Pyt ho n4 _Le sso ns wo rking
set. Then, in yo ur Pyt ho n4 _Le sso n13/src fo lder, create no nlo c.py as sho wn:
Difference between glo bal and no nlo cal: create this as no nlo c.py
a, b, c = "Module a", "Module b", "Module c"
def outer():
def inner():
nonlocal b
global c
a = "Inner a"
b = "Inner b"
c = "Inner c"
print("inner", a, b, c)
a = "Outer a"
b = "Outer b"
c = "Outer c"
print("outer", a, b, c)
inner()
print("outer", a, b, c)

print("module", a, b, c)
outer()
print("module", a, b, c)

Save and run it:

The result o f running no nlo c.py


module Module a Module b Module c
outer Outer a Outer b Outer c
inner Inner a Inner b Inner c
outer Outer a Inner b Outer c
module Module a Module b Inner c

Just as the glo bal statement allo ws the inner() functio n to refer to the mo dule-glo bal "c" name, so the
no nlo cal statement allo ws it to use the name "b" to refer to the o uter functio n's "b." After the call to o uter(),
o nly the mo dule-glo bal "c" has changed, because o nly "c" was declared as glo bal in the inner() functio n.

Partial Functions
Yo u learned abo ut the f unct o o ls mo dule when we were discussing deco rato rs earlier in this co urse. The
mo dule co ntains ano ther useful functio n that allo ws yo u to take a functio n and define ano ther functio n that is
the same as the first functio n, but with fixed values for some arguments. The signature o f the functio n is:

f unct o o ls.part ial(f [, *args[, **kw]]) returns a functio n g which is the same as f with the po sitio nal
arguments args giving values fo r the initial po sitio nal arguments and the keywo rd arguments kw setting
default values fo r the given named arguments. The intentio n is to allo w yo u to fix so me arguments o f a
functio n, leaving yo u with a functio n-like o bject to which the remaining arguments can be applied at yo ur
co nvenience. The resulting partial functio n o bjects canno t be called with quite the same abando n as real
functio ns, ho wever, since certain co unterintuitive behavio rs can o ccur.
Partial functio n examples

>>> import functools


>>> def fp(a, b, c="summat", d="nowt"):
... print("a b c d", a, b, c, d)
...
>>> fp("ayeup", "geddaht")
a b c d ayeup geddaht summat nowt
>>> fp1 = functools.partial(fp, 1, b=2)
>>> fp1()
a b c d 1 2 summat nowt
>>> fp1("ayeup", "geddaht")
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: fp() got multiple values for argument 'b'
>>> fp1(c="ayeup", d="geddaht")
a b c d 1 2 ayeup geddaht
>>> fp2 = functools.partial(fp, 1, c="two")
>>> fp2("ayeup", "geddaht")
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: fp() got multiple values for argument 'c'
>>> fp2
functools.partial(<function fp at 0x000000000349B2F0>, 1, c='two')
>>> fp2("ayeup", c="geddaht")
a b c d 1 ayeup geddaht nowt
>>>

f p1 is o stensibly a functio n taking two keywo rd arguments (its two po sitio nals having been applied in the
creatio n o f the partial). The expressio n f p1(" aye up" , " ge ddaht " ), ho wever, makes it plain that the first
po sitio nal argument is being pro vided to match up with f p()'s b argument, and that when the same keywo rd
argument is later applied a duplicatio n is detected.

The simplest so lutio n to this dilemma is to always replace po sitio nal parameters with po sitio nal arguments
and replace keywo rd parameters with keywo rd arguments when using partial(). This rule also has to be
extended to the calls o f the partial functio ns. The first call to f p2() sho ws that altho ugh the partial functio n has
o ne po sitio nal and o ne keywo rd parameter, it is no t po ssible to match a po sitio nal argument to the keywo rd
parameter d as wo uld be po ssible with a real functio n. So remember to treat partials carefully when yo u
enco unter them.

One very nice little example fro m the do cumentatio n sho ws ho w a default can be applied to a required
argument. The int() built-in type can be called with a number o r a string as an argument. When called with a
string, a seco nd argument base can be pro vided which determines the number system used to interpret the
string. Pro viding that argument creates a partial o bject that will co nvert base-2 strings to integers.
Partial(int) functio n co nverts binary strings

>>> from functools import partial


>>> basetwo = partial(int, base=2)
>>> basetwo.__doc__ = "Convert base-2 string to int."
>>> basetwo("1111")
15
>>> basetwo("1001010")
74
>>> help(basetwo)
Help on partial object:

class partial(builtins.object)
| partial(func, *args, **keywords) - new function with partial application
| of the given arguments and keywords.
|
| Methods defined here:
|
| __call__(self, /, *args, **kwargs)
| Call self as a function.
|
| __delattr__(self, name, /)
| Implement delattr(self, name).
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| __reduce__(...)
|
| __repr__(self, /)
| Return repr(self).
|
| __setattr__(self, name, value, /)
| Implement setattr(self, name, value).
|
| __setstate__(...)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
|
| args
| tuple of arguments to future partial calls
|
| func
| function object to use in future partial calls
|
| keywords
| dictionary of keyword arguments to future partial calls

Beware o f the differences between partial o bjects and true functio ns, and respect them. While partials can be
very helpful, they are o nly a sho rthand and no t a co mplete replacement.

More Magic Methods


We have explained in the past ho w certain o peratio ns and functio ns cause the interpreter to invo ke vario us "magic"
metho dsmetho ds who se names usually start and end with a do uble undersco re, causing so me peo ple to refer
them as "dunder metho ds." In particular yo u sho uld no w be aware o f the attribute access metho ds (__getattr__(),
__setattr__(), and __delattr__()) and the indexing metho ds (__getitem__(), __setitem__(), and __delitem__(), which
parallel the attribute access metho ds but o perate o n mappings rather than namespaces (and can also be used to
index lists and o ther sequences, with slice o bjects as arguments where necessary).
No w we are go ing to co ver a few mo re o f tho se magic metho ds and explain a little mo re abo ut the interpreter's
interfaces to the vario us o bjects yo u can create. Understanding in this area allo ws yo u to take advantage o f the natural
o peratio n o f the interpreter. It's a little like jiu-jitsu: yo u write yo ur o bjects to fit in with the way the interpreter naturally
do es things rather than trying to o verpo wer the interpreter.

How Python Expressions Work


This simplified treatment expresses the way that the interpreter wo rks to a first appro ximatio n. As always, we
try to be as precise as po ssible witho ut necessarily pro viding exact detail o f what go es o n in the mo re
co mplex co rner cases.

When yo u see the expressio n s = x + y in a pro gram, the interpreter has to decide ho w to evaluate it. It do es
so by lo o king fo r specific metho ds o n the x and y o bjects. Fo r additio n, the relevant metho ds are __add__()
and __radd__(). First the interpreter lo o ks fo r an x.__add__() metho d (special/magic metho ds are always
lo o ked up o n the class and its parents, never o n the instance). If such a metho d exists. x.__add__(y) is
called. If this call returns a result, that beco mes the value o f the expressio n.

The metho d may, ho wever, cho o se to indicate that it is unable to co mpute a respo nse (fo r example because
y is inco mpatible) by returning a special built-in value No t Im ple m e nt e d. In that case, the interpreter next
lo o ks fo r a y.__radd__() metho d ("radd" is intended to be a mnemo nic fo r "reflected add"). If such a metho d
exists, y.__radd__(x) is called and, unless it returns No t Im ple m e nt e d, the return value beco mes the value
o f the expressio n. There is o ne exceptio n to this rule: if the two values are o f the same type, the __radd__()
metho d is no t called. The assumptio n is that if a and b are o f the same type and yo u can't (say) add a to b,
then yo u sho uldn't be able to add b to a either, and there is no po int trying.

Try it o ut in an interactive sessio n:


Verifying use o f reflected o perato rs

>>> class mine:


... def __add__(self, other):
... print("__add__({}, {})".format(self, other))
... return NotImplemented
... def __radd__(self, other):
... print("__radd__({}, {})".format(self, other))
... return 42
... def __repr__(self):
... return "[Mine {}]".format(id(self))
...
>>> class yours:
... def __add__(self, other):
... print("__add__({}, {})".format(self, other))
... return NotImplemented
... def __radd__(self, other):
... print("__radd__({}, {})".format(self, other))
... return NotImplemented
... def __repr__(self):
... return "[Yours {}]".format(id(self))
...
>>> m1 = mine()
>>> m2 = mine()
>>> m1, m2
([Mine 4300644112], [Mine 4300643600])
>>> y1 = yours()
>>> y2 = yours()
>>> y1, y2
([Yours 4300644240], [Yours 4300643728])
>>>
>>> m1+m2
__add__([Mine 4300644112], [Mine 4300643600])
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'mine' and 'mine'
>>> y1+y2
__add__([Yours 4300644240], [Yours 4300643728])
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'yours' and 'yours'
>>> m1+y2
__add__([Mine 4300644112], [Yours 4300643728])
__radd__([Yours 4300643728], [Mine 4300644112])
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'mine' and 'yours'
>>> y1+m2
__add__([Yours 4300644240], [Mine 4300643600])
__radd__([Mine 4300643600], [Yours 4300644240])
42
>>>

As yo u can see, since bo th classes' __add__() metho ds return No t Im ple m e nt e d, attempting to add a
m ine to a m ine o r a yo ur to a yo ur will fail, raising an exceptio n. The third case also raises an exceptio n
because the __radd__() metho d o f the yo urs right-hand o perand also returns the value No t Im ple m e nt e d.
The final test wo rks, ho wever, because m ine .__radd__() actually returns a value (albeit o ne that do es no t
depend o n its o perands at all).

There is ano ther series o f special metho ds asso ciated with the augmented arithmetic o peratio ns (that is,
"+=", "-=" and so o n). When yo u see a statement such as x += y (that is to say, any statement using
augmented assignment o peratio ns) in a pro gram, the interpreter evaluates it by lo o king fo r a specific metho d
o n the x o bject. Fo r additio n, the relevant metho d is __iadd__(). If this metho d do es no t exist, the statement
is treated as tho ugh it read x = x+y. If the x.__iadd__() metho d is fo und, ho wever, it is called with y as an
argument, and the result (which may be a mo dified versio n o f the existing o bject o r a co mpletely new o bject,
entirely at the o ptio n o f the implemento r o f the o bject in questio n) is bo und to x. Fo llo wing are the metho ds
co rrespo nding to the basic Pytho n arithmetic o peratio ns.

Ope rat o r St andard Me t ho d Re f le ct e d Me t ho d Augm e nt e d Me t ho d


+ __add__() __radd__() __iadd__()
- __sub__() __rsub__() __isub__()
* __mul__() __rmul__() __imul__()
/ __truediv__() __rtruediv__() __itruediv__()
// __flo o rdiv__() __rflo o rdiv__() __iflo o rdiv__()
% __mo d__() __rmo d__() __imo d__()
divmo d() __divmo d__() __rdivmo d__() __idivmo d__()
** __po w__() __rpo w__() __ipo w__()
<< __lshift__() __rlshift__() __ilshift__()
>> __rshift__() __rrshift__() __irshift__()
& __and__() __rand__() __iand__()
^ __xo r__() __rxo r__() __ixo r__()
| __o r__() __ro r__() __io r__()

So yo u no w understand a little mo re abo ut functio ns in Pytho n, and understand mo re o f the ro le o f "magic" metho ds in Pytho n.

In the next lesso n, we co nsider so me o f the differences between small pro jects and large o nes.

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Context Managers
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

use ano ther Pytho n Co ntro l Structure called the With Statement.
use Decimal Arithmetic and Arithmetic Co ntexts in Pytho n.

Another Python Control Structure: T he With Statement


One o f the mo re recently added co ntro l co nstructs in Pytho n is the wit h statement. This allo ws yo u to create
reso urces fo r the duratio n o f an indented suite and have them auto matically released when no lo nger required. The
statement's basic syntax is:

with statement syntax


with object1 [as name1][, object2 [as name2]] ...:
[indented suite]

The o bje ct s are referred to as context managers, and if the indented suite needs to refer to them, they can be named in
the as clause(s) (which can o therwise be o mitted). No wadays, files are co ntext managers in Pytho n, meaning that it is
po ssible to write file pro cessing co de witho ut explicitly clo sing the files yo u o pen.

Using a Simple Context Manager


Create the usual pro ject fo lder (Pyt ho n4 _Le sso n14 ) and assign it to the Pyt ho n4 _Le sso ns wo rking set.
In yo ur Pyt ho n4 _Le sso n14 fo lder, create a file named lo calt e xt f ile . Then, o pen an interactive co nso le
sessio n and enter co mmands as sho wn:

The fo llo wing interactive co nso le sessio n sho ws ho w to use files as co ntext managers.
An Intro ductio n to Co ntext Managers

>>> with open(r"v:\workspace\Python4_Lesson14\src\localtextfile") as f:


... print("f:", f)
... print("closed:", f.closed)
... for line in f:
... print(line, end='')
...
f: <_io.TextIOWrapper name='v:\\workspace\\Python4_Lesson14\\src\\localtextfile'
mode='r' encoding='cp1252'>
closed: False
>>> f
<_io.TextIOWrapper name='v:\\workspace\\Python4_Lesson14\\src\\localtextfile' mo
de='r' encoding='cp1252'>
>>> f.closed
True

>>> f = open(r"v:\workspace\Python4_Lesson14\src\localtextfile", 'r')


>>> 3/0
Traceback (most recent call last):
File "<console>", line 1, in <module>
ZeroDivisionError: division by zero
>>> f
<_io.TextIOWrapper name='v:/workspace/Python4_Lesson14/src/localtextfile' mode='
r' encoding='cp1252'>
>>> f.closed
False
>>> with open(r"v:\workspace\Python4_Lesson14\src\localtextfile") as f:
... 3/0
...
Traceback (most recent call last):
File "<console>", line 2, in <module>
ZeroDivisionError: division by zero
>>> f.closed
True
>>>

Yo u can see that the wit h statement is a way o f co ntro lling the co ntext o f executio n fo r the co ntro lled suite.
Yo u might wo nder why we didn't simply bind the Pytho n file o bject (the result o f o pening the file) using an
assignment statement. The majo r purpo se o f using wit h in this case is to ensure that, if anything go es wro ng
inside the co ntext-co ntro lled indented suite, the file will be co rrectly clo sed (similarly to the way it might be in
the f inally clause o f a t ry ... f inally statement.
Files, in and o ut o f co ntext
>>> with open(r"v:\workspace\Python4_Lesson14\src\localtextfile") as f:
... print("f:", f)
... print("closed:", f.closed)
... for line in f:
... print(line, end='')
...
f: <_io.TextIOWrapper name='v:\\workspace\\Python4_Lesson14\\src\\localtextfile'
mode='r' encoding='cp1252'>
closed: False
The open function returns a file object.
This has an __enter__() method that simply
returns self. Its __exit__() method calls
its __close__() method.
>>> f
<_io.TextIOWrapper name='v:\\workspace\\Python4_Lesson14\\src\\localtextfile' mo
de='r' encoding='cp1252'>
>>> f.closed
True
>>> f = open(r"v:\workspace\Python4_Lesson14\src\localtextfile", 'r')
>>> 3/0
Traceback (most recent call last):
File "<console>", line 1, in <module>
ZeroDivisionError: division by zero
>>> f
<_io.TextIOWrapper name='v:\\workspace\\Python4_Lesson14\\src\\localtextfile' mo
de='r' encoding='cp1252'>
>>> f.closed
False
>>> f.close()
>>> with open(r"v:\workspace\Python4_Lesson14\src\localtextfile") as f:
... 3/0
...
Traceback (most recent call last):
File "<console>", line 2, in <module>
ZeroDivisionError: division by zero
>>> f.closed
True
>>>

In the first wit h example, we saw that f was a st andard IO Wrappe r o bje ct (in po int o f fact, exactly the same
o bject returned by the o pe n() call, tho ugh as yo u will learn this is not typical o f co ntext managers). When the
indented suite is run, the file is initially o pen. Next we see that the file o bject (still available after the wit h) is
clo sed when the wit h statement terminates, even though no explicit action was taken to close it. Yo u will
understand this after the next interactive sessio n.

Next yo u reminded yo urself that whe n an e xce pt io n o ccurs during re gular f ile pro ce ssing the file
remains o pen unless explicit actio n is taken to clo se it. When the e xce pt io n o ccurs inside t he suit e o f
t he wit h st at e m e nt , ho wever, o nce again we see that the file is magically clo sed witho ut any explicit actio n
being taken. The magic is quite easily explained (as usual in Pytho n, where a simple, easy-to -understand
style is preferred) by two file magic metho ds we have no t previo usly discussed.

T he Context Manager Protocol: __enter__() and __exit__()


The wit h statement has rules fo r interacting with the o bject it is given as a co ntext manager. It pro cesses wit h
e xpr by evaluating the expressio n and saving the resulting context manager object. The co ntext manager's
__enter__() metho d is then called, and if the as nam e clause is included, the result o f the metho d call is
bo und to the given name. Witho ut the as nam e clause, the result o f the __enter__() metho d is no t available.
The indented suite is then executed.

As the executio n o f the suite pro gresses, an exceptio n may be raised. If so , the executio n o f the suite ends
and the co ntext manager's __exit__() metho d is called with three arguments to gether referencing detailed
info rmatio n abo ut the causes and lo catio n o f the exceptio n.

If no exceptio n is raised and the suite terminates no rmally (that is, by "dro pping o ff the end"), the co ntext
manager's __exit__() metho d is called with three No ne arguments.
There are o ther ways that the wit h suite can be exited, all fairly no rmalho w many ways can yo u think o f? In
tho se circumstances, the co ntext manager's __exit__() metho d is called with three No ne arguments, and
then the no rmal exit is taken.

The reaso n fo r the name "co ntext manager" is that the indented suite in a with statement is surro unded by
calls to the manager's __enter__() and __exit__() metho ds, which can therefo re pro vide so me co ntext to the
executio n o f the suite. No te carefully that the __exit__() metho d is always calledeven when the suite raises
an exception.

Writing Context Manager Classes


As is so o ften the case in Pytho n, it is quite easy to write a class that demo nstrates exactly ho w the co ntext
manager o bjects wo rk with the interpreter as it executes the wit h statement. Since there are two alternative
strategies fo r handling the raising o f an exceptio n in the indented suite, an __init__() metho d can reco rd in an
instance variable which strategy the creato r (the co de calling the class) cho o ses. If no exceptio n is raised, this
will make no difference.

Besides the very simple __init__() o utlined (which is no t itself a part o f the co ntext manager pro to co l), yo u just
need the __enter__() and __exit__() metho ds. If yo u are o nly interested in finding o ut ho w the wit h statement
wo rks, these metho ds do n't have to do a lo t except print o ut useful info rmatio n. Try this o ut in an interactive
interpreter sessio n:
Investigating the with Statement

>>> class ctx_mgr:


... def __init__(self, raising=True):
... print("Created new context manager object", id(self))
... self.raising = raising
... def __enter__(self):
... print("__enter__ called")
... cm = object()
... print("__enter__ returning object id:", id(cm))
... return cm
... def __exit__(self, exc_type, exc_val, exc_tb):
... print("__exit__ called")
... if exc_type:
... print("An exception occurred")
... if self.raising:
... print("Re-raising exception")
... return not self.raising
...
>>> with ctx_mgr(raising=True) as cm:
... print("cm ID:", id(cm))
...
Created new context manager object 4300642640
__enter__ called
__enter__ returning object id: 4300469808
cm ID: 4300469808
__exit__ called
>>> with ctx_mgr(raising=False):
... 3/0
...
Created new context manager object 4300642768
__enter__ called
__enter__ returning object id: 4300469904
__exit__ called
An exception occurred
>>> with ctx_mgr(raising=True) as cm:
... 3/0
...
Created new context manager object 4300642640
__enter__ called
__enter__ returning object id: 4300469744
__exit__ called
An exception occurred
Re-raising exception
Traceback (most recent call last):
File "<console>", line 2, in <module>
ZeroDivisionError: division by zero
>>>

Yo ur co ntext manager o bject do es no t get to o much o f a wo rko ut in the abo ve sessio n, but as always yo u
sho uld feel free to try o ut o ther things. Yo u are unlikely to cause a fire o r bring the server to a halt by being a
little adventuro us: yo u are no w a seaso ned Pytho n pro grammer, and can (we ho pe) be trusted to flex yo ur
muscles a little. Let's just review the o utput fro m that sessio n:
What Just Happened?
>>> with ctx_mgr(raising=True) as cm:
... print("cm ID:", id(cm))
...
Created new context manager object 4300642640
__enter__ called
__enter__ returning object id: 4300469808
cm ID: 4300469808
__exit__ called
>>> with ctx_mgr(raising=False):
... 3/0
...
Created new context manager object 4300642768
__enter__ called
__enter__ returning object id: 4300469904
__exit__ called
An exception occurred
>>> with ctx_mgr(raising=True) as cm:
... 3/0
...
Created new context manager object 4300642640
__enter__ called
__enter__ returning object id: 4300469744
__exit__ called
An exception occurred
Re-raising exception
Traceback (most recent call last):
File "<console>", line 2, in <module>
ZeroDivisionError: division by zero
>>>

In the f irst e xam ple , yo u can see that this co ntext manager returns an entirely different o bject as the result o f
its __enter__() metho d. The print statement which fo rms the indented suite demo nstrates that the name cm is
bo und in the wit h statement to the result o f the co ntext manager's __enter__() metho d and no t the co ntext
manager itself. (The file o pen() example earlier is atypical, as a file o bject's __enter__() metho d returns se lf ).
No exceptio n is raised by the indented suite, and so the __exit__() metho d simply repo rts it has been called.

The se co nd e xam ple raises an exceptio n in the co ntext o f a co ntext manager that was created not to re-
raise the exceptio n. So it do es repo rt the fact that an exceptio n was raised, but then it again terminates
no rmally (because its se lf .raising attribute has the value False , and so the metho d returns T rue ).

The t hird e xam ple is exactly the same as the seco nd except that the instance is created with its raising
attribute T rue . This means that o nce the instance has repo rted the exceptio n it anno unces its intentio n to re-
raise it, and do es so by returning False .

Library Support for Context Managers


Altho ugh yo u have just seen it is very easy to write a simple co ntext manager class, it can be even easier to
use co ntext managers if yo u use the co nt e xt lib mo dule. This co ntains a deco rato r called
co nt e xt m anage r that yo u can use to create co ntext managers really simply. There is no need to declare a
class with __enter__() and __exit__() metho ds.

Yo u must apply the co nt e xt lib.co nt e xt m anage r deco rato r to a generato r functio n that co ntains precisely
o ne yie ld expressio n. When the deco rated functio n is used in a wit h statement, the (deco rated) generato r's
ne xt metho d is called fo r the first time, so the functio n bo dy runs right up to the yie ld. The yielded value is
returned as the result o f the co ntext manager's __enter__() metho d, and the indented suite o f the wit h
statement then runs.

If the indented suite raises an exceptio n, it appears inside the co ntext manager as an exceptio n raised by the
yie ld. Yo ur co ntext manager can cho o se to handle the exceptio n (by pro cessing the yie ld as part o f the
indented suite o f a t ry statement) o r no t (in which case the exceptio n must be re-raised after lo gging o r o ther
actio ns if the surro unding lo gic is to see it). So yo ur co ntext manager can trap exceptio ns raised by the
indented suite and suppress them simply by cho o sing no t to re-raise them.
Experimenting with co ntextlib.co ntextmanager

>>> from contextlib import contextmanager


>>> @contextmanager
... def ctx_man(raising=False):
... try:
... cm = object()
... print("Context manager returns:", id(cm))
... yield cm
... print("With concluded normally")
... except Exception as e:
... print("Exception", e, "raised")
... if raising:
... print("Re-raising exception")
... raise
...
>>> with ctx_man() as cm:
... print("cm from __enter__():", id(cm))
...
Context manager returns: 4300470512
cm from __enter__(): 4300470512
With concluded normally
>>> with ctx_man(False) as cm:
... 3/0
...
Context manager returns: 4300801264
Exception division by zero raised
>>> with ctx_man(True) as cm:
... 3/0
...
Context manager returns: 4300801280
Exception division by zero raised
Re-raising exception
Traceback (most recent call last):
File "<console>", line 2, in <module>
ZeroDivisionError: division by zero
>>>

This interactive sessio n sho ws that it is po ssible to create equivalent co ntext managers using this appro ach.
The same parameterizatio n o f the functio nality is pro vided (so yo u can say when creating the co ntext
manager whether o r no t it sho uld re-raise exceptio ns). co nt e xt lib.co nt e xt m anage r pro vides a nice
co mpro mise between writing a full co ntext manager and using o lder, less well-co ntro lled metho ds (such as
t ry ... e xce pt ... f inally) o f co ntro lling the executio n co ntext. Yo u will find that the o ther members o f the
co nt e xt lib library can also be useful in creating and suppo rting co ntext managers.

Nested Context Managers


The statement:

OBSERVE:
with expr1 as name1, expr2 as name2:
[indented suite]

is equivalent to :

OBSERVE:
with expr1 as name1:
with expr2 as name2:
[indented suite]

This sho ws that the e xpr1 co ntext wraps the nam e 2 co ntext. If an exceptio n o ccurs in the indented suite, it
will present as a call to expr2.__exit__() with the necessary exceptio n-related arguments. As always, the
__exit__() metho d has the cho ice o f returning T rue (which suppresses the exceptio n, resulting in a call to
e xpr1.__e xit __() with three No ne arguments) o r False , in which case the exceptio n is auto matically re-
raised and e xpr1.__e xit __() is called with the traceback arguments. It also has the cho ice o f returning T rue
to suppress the exceptio n o r False to re-raise it a seco nd time.

The multi-co ntext fo rm o f the wit h statement is a simple syntactic co nvenience; no new functio nality is
intro duced, but it do es reduce the indentatio n level required fo r the indented suite. This enhances readability
witho ut co mpro mising simplicity.

Decimal Arithmetic and Arithmetic Contexts


Decimal arithmetic is quite a large to pic, and we do n't co ver it anywhere near fully in this chapter. The de cim al mo dule
was designed to allo w easy decimal calculatio ns, which are much mo re appro priate when accurate answers are
required than the so metimes-slightly-inaccurate flo ating-po int numbers built into the language. This is typically the
case in co mmerce and acco unting, where strict decimal arithmetic has been used fo r hundreds o f years and
inaccuracies in representatio n canno t be permitted.

Fixe d-po int vs. f lo at ing-po int . In fixed-po int representatio ns, a digit in a given po sitio n always has a
specific value. Thus in the number represented as "3.14159 ", the digit after the decimal po int always
represents so me number o f tenths, and the given fixed-po int representatio n can represent numbers
between -9 .9 9 9 9 and +9 .9 9 9 9 , with the smallest difference beteen two numbers being 0 .0 0 0 1 (which is
the difference between every pair o f "adjacent" numbers). Flo ating-po int representatio ns allo w the po int
Note (in this case, the decimal po int) to mo ve. This means that the size o f the numbers yo u can represent is
independent o f the number o f digits o f precisio n yo u can represent, and depends primarily o n the range
o f expo nents. If we allo w expo nents to range fro m -5 to +5, with five digits the smallest po sitive number
yo u can represent is 0 .0 0 0 0 1 * 10 ^ -5 (which is 0 .0 0 0 0 0 0 0 0 0 1) and the largest is 0 .9 9 9 9 9 * 10 ^ 5 (o r
9 9 9 9 9 .0 ). But the gaps between the adjacent larger numbers are much greater than the gaps between the
smaller numbers. The value 0 .9 9 9 9 9 * 10 ^ 5 is co nventio nally written as 0 .9 9 9 9 9 E5.

Decimal Arithmetic Contexts


This sectio n will briefly intro duce the de cim al mo dule, to who se do cumentatio n yo u are referred fo r further
info rmatio n. The co ntext in which decimal arithmetic is perfo rmed has several elements:

At t ribut e Me aning
Specifies precisionho w many digits are retained in calculatio ns (the default is 28 decimal
digits). The decimal po int may o ccur many places befo re o r after the significant digits, since
pre c decimal arithmetic can handle a flo ating decimal po int. de cim al kno ws ho w to maintain
pro per precisio n thro ugh calculatio ns, so fo r exampleDe cim al(" 2.5 0 " ) * De cim al(" 3.6 0 " )
evaluates to De cim al(" 9 .0 0 0 0 " ).
One o f a set o f co nstants defined in the de cim al mo dule that tells the arithmetic ro utines ho w
ro unding
to ro und when precisio n must be discarded.
A list o f signals (discussed belo w) who se flags are currently set. Flags are usually clear when
f lags a co ntext is created, and set by abno rmal co nditio ns in arithmetic o peratio ns, altho ugh they
can be set when the co ntext is created if required.
A list o f signals who se setting by an arithmetic o peratio n sho uld cause an exceptio n to be
t raps
raised.
An integer co ntaining the minimum value the expo nent is allo wed to take. This sets a lo wer
Em in
bo und o n the values that numbers can represent.
An integer co ntaining the maximum value the expo nent is allo wed to take. This sets an upper
Em ax
bo und o n the values that the numbers can represent.
T rue (the default) to use an upper-case "E" in expo nential representatio ns, False to use a
capit als
lo wer case "e".
T rue (the default) to ensure that numbers are represented as ten to the po wer o f the expo nent
times so me number in the range 0 .1 <= mantissa < 1.0 . This ensures easy interchange with
clam p o ther co mputers using standard "IEEE 754" decimal representatio n. False allo ws so me
latitude in representatio n, allo wing a wider range o f numbers with fewer digits o f actual
precisio n at the co st o f lo sing "IEEE no rmalizatio n" at the extremes o f the value range.

The de cim al mo dule has been carefully written to ensure that each thread can have an independent decimal
co ntext (because it wo uld be disastro us if o ne thread co uld affect ano ther by making changes to a shared
co ntext).

Mo st o f the attributes o f the co ntext are fairly eso teric stuff that yo u really do n't need to alter. Fo r many
applicatio ns, yo u can just use the default co ntext. While pre c and ro unding are fairly frequently adjusted,
capit als and clam p are rarely to uched.

Decimal Signals
Certain things can happen during arithmetic o peratio ns that cause the results to be imprecise o r o therwise
misleading, and the o peratio ns raise signals to indicate this. The de cim al co de respo nds to these signals by
setting flags in the arithmetic co ntext. If the trap co rrespo nding to a signal is set, an exceptio n is raised after
the flag is set. The fo llo wing flags are defined:

Signal Raise d whe n ...


When a number's representatio n had to be mo dified to no rmalize it to a mantissa
Clamped
range o f 0 .1 to 0 .9 9 9 9 9 9 9 9 9 9 9 9 ...
No t raised: this is simply a base class fo r the o thers, and a subclass o f the built-in
DecimalExceptio n
ArithmeticErro r exceptio n.
Divisio nByZero Either a divisio n o r a mo dulo o peratio n had a left o perand o f zero .
Inexact Indicates that ro unding to o k place after an o peratio n.
This o ften o ccurs when o peratio ns are perfo rmed o n decimal infinities o r "No t a
InvalidOperatio n
Number" o bjects.
Overflo w The result canno t be represented with an expo nent Emax o r less.
Ro unding has o ccurred. If the digits ro unded were all zero , no info rmatio n has been
Ro unded
lo st.
Subno rmal The number canno t be represented with an expo nent o f Emin o r larger.
The result o f an arithmetic o peratio n was so small in magnitude that the mo st accurate
Underflo w
way to represent it is as 0 .

T he Default Decimal Context


Yo u can access the default decimal co ntext using the ge t co nt e xt () functio n fro m the de cim al mo dule.
Co ntexts kno w ho w to present themselves in a fairly readable fo rm, and yo u can mo dify the co ntext just by
assigning to its vario us attributes. Yo u can also create co pies o f co ntexts and switch between them. Finally,
o f co urse, yo u can create instances o f the de cim al.Co nt e xt class, pro viding the no n-default required
attributes as keywo rd arguments. No te that if yo u mo dify de cim al.De f ault Co nt e xt , it will change the default
values used to create future co ntexts. This is useful fo r setting up defaults befo re creating multiple threads, but
sho uld no t be used casually in no n-threaded pro grams.
Understanding Decimal Co ntexts

>>> from decimal import *


>>> myothercontext = Context(prec=60, rounding=ROUND_HALF_DOWN)
>>> setcontext(myothercontext)
>>> getcontext()
Context(prec=60, rounding=ROUND_HALF_DOWN, Emin=-999999, Emax=999999, capitals=1
, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])
>>> Decimal(1) / Decimal(7)
Decimal('0.142857142857142857142857142857142857142857142857142857142857')
>>> ExtendedContext
Context(prec=9, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1,
clamp=0, flags=[], traps=[])
>>> setcontext(ExtendedContext)
>>> getcontext()
Context(prec=9, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1,
clamp=0, flags=[], traps=[])
>>> Decimal(1) / Decimal(7)
Decimal('0.142857143')
>>> Decimal(42) / Decimal(0)
Decimal('Infinity')
>>> setcontext(BasicContext)
>>> getcontext()
Context(prec=9, rounding=ROUND_HALF_UP, Emin=-999999, Emax=999999, capitals=1, c
lamp=0, flags=[], traps=[Clamped, InvalidOperation, DivisionByZero, Overflow, Un
derflow])
>>> Decimal(42) / Decimal(0)
Traceback (most recent call last):
File "<console>", line 1, in <module>
decimal.DivisionByZero: [<class 'decimal.DivisionByZero'>]
>>> with localcontext() as ctx:
... ctx.prec = 42
... s = Decimal(1) / Decimal(7)
... print(s)
...
0.142857142857142857142857142857142857142857
>>> s = +s
>>> print(s)
0.142857143
>>>

Yo u can see that the de cim al mo dule pro vides a number o f "ready-made" co ntexts, which can be mo dified
easily by attribute assignment. It is easy to make changes to the current co ntext's attributes, but these
changes are permanent. The de cim al.lo calco nt e xt () functio n returns a co ntext manager that sets the
active thread's current co ntext to the co ntext pro vided as an argument o r (in the case abo ve where no
argument is pro vided) the current co ntext. The wit h statement pro vides a natural way to perfo rm such
lo calised changes. No te that the unary plus sign in "+s" do es actually perfo rm a co nversio n, because it is an
arithmetic o peratio n who se result must be co nditio ned by the (no w resto red) o riginal co ntext.

With co ntext managers and the wit h statement, Pytho n gives yo u the chance to clo sely co ntro l the co ntext o f executio n o f yo ur
co de. Yo u sho uld co nsider them whenever yo u might co nsider t ry ... e xce pt ... f inally.

Yo u are getting clo se to the end o f the Certificate Series in Pytho n! Well do ne! Keep it up!

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Memory-Mapped Files
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

utilize Memo ry Mapping.


pro cess a Memo ry-Mapped Example.

Memory Mapping
Files can be so large that it is impractical to lo ad all o f their co ntent into memo ry at o nce. The m m ap.m m ap() functio n
creates a virtual file o bject. No t o nly can yo u perfo rm all the regular file o peratio ns o n a memo ry-mapped file, yo u can
also treat it as a vast o bject (far larger than any real o bject co uld be) that yo u can address just like any o ther sequence.

This technique deals with files by mapping them into yo ur pro cess's address space. The m m ap mo dule allo ws yo u to
treat files as similar to byt e array o bjectsyo u can index them, slice them, search them with regular expressio ns and
the like. Many o f these o peratio ns can make it much easier to handle the data in a file: witho ut memo ry mapping, yo u
have to read the file in chunks and pro cess the chunks (assuming the files are to o large to read into memo ry as a
single chunk). This makes it very difficult to pro cess strings that o verlap the inter-chunk bo undaries. Memo ry mapping
allo ws yo u to pretend that all the data is in memo ry at the same time even when that is no t actually the case. The
necessary manipulatio ns to allo w this are perfo rmed auto matically.

In this lesso n, we primarily co ver o nly the details o f m m ap that apply acro ss bo th Windo ws and Unix platfo rms, and a
few Windo ws-specific features. Yo u sho uld be aware that there are different additio nal feature sets available fo r
Windo ws and Unix platfo rms. The do cumentatio n o n the mo dule is fairly specific abo ut the implementatio n differences.

Memory-Mapped Files Are Still Files


In standard file o peratio ns, there is no difference between a memo ry-mapped file and o ne that is o pened in
the regular wayall regular file access metho ds co ntinue to wo rk, and yo u can also treat the file co ntent pretty
much like a bytearray.

Here's a simple example fro m the mo dule's do cumentatio n to get yo u started.

Getting Started with Memo ry-Mapped Files

>>> with open("v:/workspace/Python4_Lesson15/src/hello.txt", "wb") as f:


... f.write(b"Hello Python!\n")
...
14
>>> import mmap
>>> with open("v:/workspace/Python4_Lesson15/src/hello.txt", "r+b") as f:
... mapf = mmap.mmap(f.fileno(), 0)
... print(mapf.readline()) # prints b"Hello Python!\n"
... print(mapf[:5]) # prints b"Hello"
... mapf.tell()
... mapf[6:] = b" world!\n"
... mapf.seek(0)
... print(mapf.readline()) # prints b"Hello world!\n"
... mapf.close()
...
b'Hello Python!\n'
b'Hello'
14
b'Hello world!\n'
>>>

The co de abo ve o pens a file, then memo ry maps it. It exercises the readline() metho d o f the mapped file,
demo nstrating that it wo rks just as with a standard file. It then reads and writes slices o f the mapped file (an
equally valid way to access the mapped file's co ntent, which do es no t alter the file po inter). Finally the file
po inter is repo sitio ned at the start and the (updated) co ntents are read in. (The "14" is the return value o f the
write() functio n, which always returns the number o f bytes written.)

OBSERVE:
>>> with open("v:/workspace/Python4_Lesson15/src/hello.txt", "wb") as f:
... f.write(b"Hello Python!\n")
...
14
>>> with open("v:/workspace/Python4_Lesson15/src/hello.txt", "r+b") as f:
... mapf = mmap.mmap(f.fileno(), 0)
... print(mapf.readline()) # prints b"Hello Python!\n"
... print(mapf[:5]) # prints b"Hello"
... mapf.tell()
... mapf[6:] = b" world!\n"
... mapf.seek(0)
... print(mapf.readline()) # prints b"Hello world!\n"
... # close the map
... mapf.close()
...
b'Hello Python!\n'
b'Hello'
14
b'Hello world!\n'
>>>

As we o bserved in an earlier lesso n, f ile o bje ct s are co nt e xt m anage rs, albeit o f a slightly degenerate
kind (because they return themselves as the result o f their __enter__() metho d). The first argument to
m m ap.m m ap is a file number (an internal number used to identify the file to the o perating system), which is
o btained by calling t he f ile 's f ile no () m e t ho d. The call t o re adline () demo nstrates no rmal file
handling, but then yo u see inde xe d acce ss t o t he co nt e nt , which nevertheless demo nstrates that t he
f ile po int e r is unchange d by such access.

Next yo u see that the co ntent o f the file can also be changed by subscript ing, tho ugh in this case it is
essential that the new co ntent is the same length as the slice being assigned. Finally yo u o bserved that the
file had been changed by restarting at the beginning.

The difference between using a memo ry-mapped file and a standard o ne is that standard files are
independently buffered in each pro cess that uses them, meaning that a write to a file fro m o ne pro gram is no t
necessarily immediately written to disk, and will no t necessarily be seen immediately by a separate pro gram
reading the file using its o wn buffers.

T he mmap Interface
Fo r calls to m m ap.m m ap() to be cro ss-platfo rm co mpatible they sho uld stick to the fo llo wing signature:

OBSERVE:
mmap(fileno, length, access=ACCESS_WRITE, offset=0)

The f ile num be r is used simply because this mirro rs the interface o f the underlying C library (no t always the
best design decisio n, but fo rtunately the file number is easily o btained fro m an o pen file's fileno () metho d).
Using a file number o f -1 creates an ano nymo us share (o ne that canno t be accessed fro m the filesto re).

The call abo ve maps le ngt h bytes fro m the beginning o f the file, and returns an mmap o bject that gives bo th
file- and index-based access to that po rtio n o f the file's co ntents. If le ngt h exceeds the current length o f the
file, the file is extended to the new length befo re o peratio ns co ntinue. If le ngt h is zero , the mmap o bject will
map the current length o f the file, which in turn sets the maximum valid index that can be used.

The o ptio nal acce ss argument can take o ne o f three values, all defined in the mmap mo dule:

Acce ss Value Me aning


ACCESS_READ Any attempt to assign to the memo ry map raises a TypeErro r exceptio n.
ACCESS_WRITE Assignments to the map affect bo th the map's co ntent and the underlying file.
Assignments to the memo ry map change the map's co ntents but do no t update the file
ACCESS_COPY
o n which the map was based (a co py-o n-write mapping).
The o f f se t argument, when present, establishes an o ffset within the file fo r the starting po sitio n o f the
memo ry map. The o ffset must be a multiple o f the co nstant m m ap.ALLOCAT IONGRANULARIT Y (which is
typically the size o f a virtual memo ry blo ck, 40 9 6 bytes o n many systems).

What Use is mmap(), and How Does it Work?


The real benefit o f mmap o ver o ther techniques is two fo ld: first, the file is mapped directly into memory (hence
the name). When o nly o ne pro cess is using the mapped file, this is a pedestrian applicatio n, but remember
that mo dern co mputers use virtual memory systems. Each pro cess's memo ry co nsists o f a list o f "memo ry
pages." The actual address o f the memo ry page do es no t matter to the pro cess: the pro cess accesses
"virtual memo ry," and the hardware uses a "memo ry map" to determine whereabo uts in a pro cess's memo ry
a particular page appears.

When a file is memo ry-mapped, the o perating system effectively reserves eno ugh memo ry to ho ld the who le
file's co ntents (o r that po rtio n o f the file that is being mapped) in memo ry, and then maps that memory into the
process's address space. If ano ther pro cess co mes alo ng and maps the same file, then exactly the same block
of memory is mapped into the second process's address space. This allo ws the two pro cesses to exchange
info rmatio n extremely rapidly by writing into the shared memo ry. Since each is writing into the same memo ry,
each can see the o ther's changes immediately.

Be care f ul wit h large f ile s. Remember that if yo u memo ry map a file it gets mapped into yo ur
pro cess's virtual address space. If yo u are using 32-bit Pytho n (either because yo u are running
o n a 32-bit system o r because yo ur system administrato rs cho se to install a 32-bit Pytho n
Note interpreter o n a system built using 6 4-bit techno lo gy), each pro cess has a 4GB upper limit o n
the size o f its address space. Since there are many o ther claims o n a pro cess's memo ry, it is
unlikely yo u will be able to map all o f a file much abo ve 1GB in size in a 32-bit Pytho n
enviro nment.

A Memory-Mapped Example
The fo llo wing example co de gives yo u so me idea ho w memo ry-mapped files might be used fo r interpro cess
co mmunicatio n. The pro gram creates a file that will ho ld data (enco ded by the struct mo dule) to be passed between
the main pro gram and its subpro cesses. The file is split up into "slo ts," each large eno ugh to ho ld a byte used to
indicate the status o f the slo t, a 7-character string, and three do uble-length flo ating-po int numbers. The status starts as
EMPT Y, and is set to the slo t number every time new data beco mes available. When there is no mo re data, the status
is set to TERM, which indicates to the subpro cess that there is no mo re wo rk available.

The who le pro gram is given in the listing belo w. This is a rather larger pro gram than we no rmally ask yo u to enter in
o ne go , but by no w yo u sho uld be able to understand what a lo t o f the co de do es as yo u type it in (explanatio ns fo llo w
the listing).
Enter the fo llo wing co de as mpmmap.py

"""
mpmmap.py: use memory-mapped file as an interprocess communication area
to support multi-processed applications.
"""

import struct
import mmap
import multiprocessing as mp
import os
import time
import sys

FILENAME = "mappedfile"
SLOTFMT = b"B7s3d"
SLOTSIZE = struct.calcsize(SLOTFMT)
SLOTS = 6 # Number of subprocesses
EMPTY = 255
TERM = 254

def unpackslot(byte_data):
"""Return slot data as (slot#, string, float, float, float)."""
return struct.unpack(SLOTFMT, byte_data)

def packslot(slot, s, f1, f2, f3):


"""Generate slot string from individual data elements."""
return struct.pack(SLOTFMT, slot, s, f1, f2, f3)

def run(slot):
"""Implements the independent processes that will consume the data."""
offset = SLOTSIZE*slot
print("Process", slot, "running")
sys.stdout.flush()
f = open(FILENAME, "r+b")
mapf = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_WRITE)
while True:
while mapf[offset] == EMPTY:
time.sleep(0.01)
if mapf[offset] == TERM:
print("Process", slot, "done")
sys.stdout.flush()
mapf.close()
return
x, s, f1, f2, f3 = unpackslot(mapf[offset:offset+SLOTSIZE])
print(x, slot, ":", s, f1*f2*f3)
sys.stdout.flush()
mapf[offset] = EMPTY

def numbers():
"""Generator: 0.01, 0.02, 0.03, 0.04, 0.05, ..."""
i = 1
while True:
yield i/100.0
i += 1

if __name__ == "__main__":
f = open(FILENAME, "wb")
f.write(SLOTSIZE*SLOTS*b'\0')
f.close()
f = open(FILENAME, "r+b")
mapf = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_WRITE)

ptbl = []
for slot in range(SLOTS):
offset = slot*SLOTSIZE
mapf[offset] = EMPTY
p = mp.Process(target=run, args=(slot, ))
ptbl.append(p)
print("Starting", p)
p.start()

numseq = numbers()
b = next(numseq)
c = next(numseq)
for i in range(4):
for slot in range(SLOTS):
a, b, c = b, c, next(numseq)
offset = slot*SLOTSIZE
while mapf[offset] != EMPTY:
time.sleep(0.01)
mapf[offset+1:offset+SLOTSIZE] = packslot(slot, b"*******", a, b, c)[1:]
mapf[offset] = slot

for slot in range(SLOTS):


offset = SLOTSIZE*slot
while mapf[offset] != EMPTY:
time.sleep(0.01)
mapf[offset] = TERM

for p in ptbl:
p.join()

mapf.close()
print(f.read())
sys.stdout.flush()
f.close()
os.unlink(FILENAME)

There are a co uple o f utility functio ns fo r packing and unpacking the slo t data; these are simple calls to standard
st ruct functio ns that yo u may remember. Next co mes the run() functio n that will be the meat o f the subpro cesses.
When we call it, we pass the pro cess's slo t number, and it uses the co mputed size o f the slo t to wo rk o ut where its
particular po rtio n o f the data file begins. It then establishes a mapping o nto the standard data file and go es into an
infinite lo o p (which will be terminated by the lo gic it co ntains). It repeatedly lo o ks at the first byte o f its slo t, until the
EMPTY value it starts with is changed (by the main pro gram). The pro cess sleeps between different lo o ks at the first
byte, to avo id using to o much CPU. The sleep sho uld be lo ng eno ugh that the co mputatio ns in the lo o p take a
relatively insignificant time. If the value has changed to TERM, the pro cess clo ses everything do wn and terminates.
Otherwise it extracts the data fro m the slo t, perfo rms a calculatio n and prints o ut the results, and then sets the slo t
indicato r back to EMPTY so the main pro gram will refill the slo t.

The run() functio n is fo llo wed by a simple numbers() generato r functio n that separates the task o f generating numbers
fro m their use inside the main pro gram. It is an infinite generato r that yields numbers starting at 0 .0 1 and increasing by
0 .0 1 each call.

No w, we see the lo gic o f the main pro gram. The pro gram first creates a data file large eno ugh to co ntain the mapped
data fo r all slo ts, then maps the file into memo ry. It then iterates o ver the slo ts, setting their status to EMPTY, creates a
new pro cess with the current slo t number, saves it in a list and starts it. The newly-started pro cess will wait until its slo t
is switched fro m EMPTY status befo re taking any actio n.

Next the pro gram lo o ps fo ur times o ver all the slo ts, filling them with data and o nly then setting the slo t indicato r to the
slo t number. This avo ids a po tential hazard which might o ccur if the slo t status was set at the same time as the rest o f
the data: it is just po ssible that a subpro cess might see its status change and start trying to act befo re the rest o f the
data is co pied in. Yes, this wo uld be a lo w-pro bability o ccurrence, but that do es no t mean yo u are at liberty to igno re it.

Once the main lo o p is o ver, the pro gram waits fo r each slo t to beco me EMPTY and sets it to TERM to indicate that the
asso ciated pro cess sho uld terminate. Finally, the pro gram waits fo r all the pro cesses it started to terminate, deletes
the file it created at the start o f the run, and itself terminates. When yo u run the pro gram, yo u sho uld see the fo llo wing
o utput.
Output fro m mpmmap.py
Starting <Process(Process-1, initial)>
Starting <Process(Process-2, initial)>
Starting <Process(Process-3, initial)>
Starting <Process(Process-4, initial)>
Starting <Process(Process-5, initial)>
Starting <Process(Process-6, initial)>
Process 0 running
0 0 : b'*******' 6e-06
Process 1 running
1 1 : b'*******' 2.3999999999999997e-05
Process 3 running
3 3 : b'*******' 0.00012
Process 2 running
2 2 : b'*******' 5.9999999999999995e-05
Process 5 running
5 5 : b'*******' 0.00033600000000000004
Process 4 running
4 4 : b'*******' 0.00021000000000000004
0 0 : b'*******' 0.0005040000000000001
0 0 : b'*******' 0.0027300000000000002
3 3 : b'*******' 0.00132
1 1 : b'*******' 0.00072
4 4 : b'*******' 0.001716
2 2 : b'*******' 0.00099
5 5 : b'*******' 0.002184
3 3 : b'*******' 0.004896
4 4 : b'*******' 0.005814000000000001
2 2 : b'*******' 0.00408
1 1 : b'*******' 0.00336
0 0 : b'*******' 0.007980000000000001
4 4 : b'*******' 0.0138
5 5 : b'*******' 0.006840000000000001
3 3 : b'*******' 0.012143999999999999
2 2 : b'*******' 0.010626
1 1 : b'*******' 0.00924
5 5 : b'*******' 0.0156
Process 0 done
Process 4 done
Process 3 done
Process 1 done
Process 5 done
Process 2 done
b'\xfe*******R\xb8\x1e\x85\xebQ\xc8?\x9a\x99\x99\x99\x99\x99\xc9?\xe1z\x14\xaeG\xe1\xca
?\xfe*******\x9a\x99\x99\x99\x99\x99\xc9?\xe1z\x14\xaeG\xe1\xca?)\\\x8f\xc2\xf5(\xcc?\x
fe*******\xe1z\x14\xaeG\xe1\xca?)\\\x8f\xc2\xf5(\xcc?q=\n\xd7\xa3p\xcd?\xfe*******)\\\x
8f\xc2\xf5(\xcc?q=\n\xd7\xa3p\xcd?\xb8\x1e\x85\xebQ\xb8\xce?\xfe*******q=\n\xd7\xa3p\xc
d?\xb8\x1e\x85\xebQ\xb8\xce?\x00\x00\x00\x00\x00\x00\xd0?\xfe*******\xb8\x1e\x85\xebQ\x
b8\xce?\x00\x00\x00\x00\x00\x00\xd0?\xa4p=\n\xd7\xa3\xd0?'

The pro gram abo ve is fo r demo nstratio n purpo ses o nly so yo u can start to understand the advantages
o f shared memo ry. The m ult ipro ce ssing mo dule actually has o ther ways to keep pro cesses
Note synchro nized, and yo u sho uld investigate tho se fo r pro ductio n purpo ses. But if yo u understand the lo gic
o f the co de abo ve, yo u kno w what mapped files do and ho w they wo rk, which is a significant piece o f
learning.

Memo ry-mapped files allo w yo u to treat huge tracts o f data as tho ugh they were large strings, and also allo w yo u to share
tho se large chunks o f data between independent pro cesses. They allo w yo u to use inter-pro cess co mmunicatio n.

In the final lesso n, we co nsider so me o f the differences between small pro jects and large o nes.

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.


Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.
Your Future with Python
Lesson Objectives
When yo u co mplete this lesso n, yo u will be able to :

find co o l Pytho n Co nferences.


explo re the Pytho n jo b market and career cho ices.
explo re new develo pments in Pytho n.
access a few new Pytho n tips and tricks.

Python Conferences
Pytho n is a rapidly gro wing language that attracts pro grammers all o ver the wo rld. In the early 19 9 0 s an Internatio nal
Pytho n Co nference was started, which became the principal fo rum fo r live discussio n o f Pytho n's uses and
develo pment (naturally extensive discussio ns were also held o nline, as they co ntinue to bebut face-to -face
meetings are still incredibly useful, and usually mo re pro ductive than mailing list dicussio ns).

In 20 0 2, I was asked by Guido van Ro ssum to chair a new type o f co nference, o ne that invo lved the co mmunity
members far mo re, and was priced to allo w tho se who didn't have pro fessio nal budgets to co me alo ng and co ntribute.
That first PyCo n, in March 20 0 3, attracted o ver 250 peo ple, and established co mmunity co nferences as the preferred
mechanism fo r meeting up with o ther Pytho n users (fo llo wing in the fo o tsteps o f Euro Pytho n, which had been held in
Go thebo rg, Sweden, a co uple o f mo nths befo re). I chaired the first three co nferences (by which time attendance had
swo llen to 450 ) and then handed the to rch to Andrew Kuchling.

At the same time, PyCo ns were gro wing up in o ther co untries, who se Pytho n enthusiasts started to run natio nal
PyCo ns, and o ther, smaller, co nferences are no w held regio nally in the USA (the first three o f these were PyOhio ,
PyTexas and PyArkansas). PyCo n Ireland, Kiwi PyCo n, Pytho n Brasil, PyCo n AR (Argentina), PyCo n UK, PyCo n Italy,
and many o thers. The Asia Pacific regio n recently started a pan-Asian co nference (PyCo n Asia Pacific) to suppo rt
Pytho n users in that regio n. It seems as tho ugh so o n it will be impo ssible to avo id Pytho n events clashing with each
o ther simply because there are so many in the wo rldwide calendar.

In 20 11 PyCo n had 120 0 delegates, and it currently lo o ks set to start capping gro wth so me time in the next two to three
years (there is a general feeling that o ver-large co nventio ns run the risk o f lo sing the "co mmunity" flavo r that is such
an impo rtant part o f co nferences like PyCo n). PyCo n even runs a financial assistance scheme that regularly helps
peo ple who wo uld o therwise no t be able to affo rd to travel and attend PyCo n. To learn mo re abo ut these co nferences,
the best starting po int is the ro o t PyCo n web site.

There are also a gro wing number o f lo cal user gro ups thro ugho ut the wo rld. So me such gro ups use the po pular
MeetUp system to o rganize their gro ups, as it allo ws peo ple to easily sign up fo r meetings and allo ws the meeting
administrato rs reaso nable co ntro l o ver attendance and the like.

All these activities are, in essence, run by members o f the co mmunity (tho ugh certainly the larger o nes like PyCo n US
are assisted by pro fessio nal co nference co mpanies: vo lunteers canno t have their depth o f experience, and must o ften
make their co ntributio ns o utside o ffice ho urs.

T utorials
As a co nference matures and the o rganizers acquire mo re experience, yo u will o ften see tuto rials o ffered at
very reaso nable prices. Wo rld autho rities o n vario us aspects o f Pytho n o ffer tuto rials to help the Pytho n
So ftware Fo undatio n to raise funds thro ugh the co nference.

These tuto rials are an amazing bargain, and an incredible way to learn new Pytho n skills and techniques.
Many o f them canno t be taken anywhere else, and wo uld alo ne be wo rth the price o f co nference registratio n.

T alks
The talks are the "meat" o f mo st co nferences, and Pytho n co nferences are no exceptio n. Any given
co nference might include papers fo r beginners abo ut so me mo re o bscure aspect o f the language,
intermediate papers o n applicatio ns, o r advanced stuff o n ho w a particular framewo rk achieves a certain effect
using aspects o f Pytho n to achieve high efficiency (o r o ther desirable aspects o f their case).

Talks will typically be thirty minutes to an ho ur lo ng, including time fo r questio ns. A lo t o f co nferences are no w
putting o ut live video streams as the talks are presented (tho ugh with mo re than a co uple o f independent
tracks this can get rather demanding in bandwidth). The same video stream will be reco rded, and there is a
huge amo unt o f Pytho n-related material saved and available o n the web. The primary searchable reso urce is
the Pytho n Miro Co mmunity, which tries to o rganize and index the material.

T he Hallway T rack
Much favo red by experienced co nference-go ers, the hallway track is the discussio ns that take place between
peo ple o utside the meeting ro o ms where talks are given. These discussio ns o ften arise co mpletely
spo ntaneo usly, but give better value than the rest o f the co nference. Even if yo u are new to co nference-go ing,
yo u sho uld definitely keep yo ur schedule o pen eno ugh to take in the hallway track. And do n't be surprised if
so me rando m co nversatio n leads yo u to abando n yo ur plans and use the hallway track instead.

Open Space
Many co nferences no w set aside space fo r participants to use fo r activities o f their o wn cho ice. There are
particular rules traditio nally asso ciated with the term "o pen space," but so metimes (to the anno yance o f
purists) the Pytho n co mmunity simply interprets it as "ro o ms yo u can use fo r pretty much any co nference-
related activity." It is no t unusual fo r speakers to invite interested audience members to an o pen space
sessio n where their questio ns can be answered in a mo re participative framewo rk. Yo u can get to meet so me
anazing peo ple in o pen space .

Lightning T alks
Often the mo st entertaining sessio ns o f the who le co nference, the lightning talk sessio ns use five-minute
slo ts in which speakers, who can o ften o nly sign up in perso n at the co nference, must co mplete their
presentatio n within the slo t o r be cut sho rt by the sessio n chairman.

If yo u are interested in beco ming a co nference speaker, presenting a lightning talk is a go o d way to dip a to e
in the water. Audiences are very fo rgiving to new speakers and tho se who are no t presenting in their first
language. To pics are o ften light-hearted (o ne I particularly remember was "Ho w I replaced myself with a small
Pytho n script"), and quite o ften intro duce yo u to no vel techno lo gies that yo u wo uld o therwise no t have co me
acro ss. Because the talks are sho rt, the sessio ns go by quickly, and every speaker gets a ro und o f applause.

Birds of a Feather Sessions (BOFs)


These are usually evening sessio ns, no t fo rmally o rganized but o ften using ro o ms in the co nference venue,
where peo ple with a co mmo n interest in o ne specific area (testing, Django , numerical co mputing, Twisted
netwo rking, ...) get to gether and just share info rmatio n in any suitable way. The Testing BOF has beco me a
traditio n o n Saturday night at PyCo n US, and runs lightning talks all o f its o wn. In 20 11 Testing BOF speakers
were required to wear a white lab co at.

Sprints: Moving Ahead


Co nferences are o ften fo llo wed by sprintsfo cused effo rts o n getting so me aspect o f a pro ject up and
running, by a team that might be scattered aro und the wo rld when no t actually at the same co nference.

Sprints are a great place to learn abo ut existing co de bases: yo u can o ften get to talk with and learn fro m the
peo ple who wro te and/o r are maintaining the co de. Once yo u have met so me o f the peo ple who co ntribute to
the develo pment, it is far less intimidating to jo in in and beco me a co ntributo r yo urself. The o pen so urce
wo rld o nly exists because peo ple like us ro ll up their sleeves and start building things.

Whether lo cal, natio nal, o r regio nal, Pytho n co nferences are an amazing way to impro ve yo ur Pytho n kno wledge and
increase yo ur skill level. They are so cial as well as technical events, and when yo u beco me a regular co nference-go er
yo u will do ubtless find, as do I, that there are peo ple yo u lo o k fo rward to meeting again and again, even if yo u o nly
ever meet them at co nferences.

T he Python Job Market and Career Choices


Pytho n is emplo yed in such diverse ways, it is hard to think o f an area o f life that isn't affected by it o ne way o r ano ther.
Go o gle is well-kno wn as an o rganizatio n where Pytho n is used extensively. Many o rganizatio ns, including mo st o f
the USA's 10 0 largest newspapers, use a Pytho n-based web framewo rk called Django to build their web sites and
maintain jo urnalistic co ntent.

In the scientific and engineering wo rld, Pytho n is equally versatile. The SciPy and Numpy packages put blazingly fast
calculatio ns and publicatio n-quality graphics into the hands o f scientists. This is do ne by using Pytho n as a "glue"
language to ho ld to gether high-speed lo gic written in co mpiled languages like Fo rtran and C, with mo st o f the
co mputatio n taking place in the co mpiled co de.
The PyPy Pytho n pro ject is no w reliably pro ducing benchmark results that are several times faster than
Note tho se o f the CPytho n interpreter, altho ugh at present o nly available fo r Pytho n 2.7. If this pro gress
co ntinues, Pytho n co uld beco me a viable language in which to write numerical algo rithms!

If yo u enjo y pro gramming and want to carry o n do ing it, yo u will pro bably always find so mething to do . Pro gramming
is a great career if yo u like to find o ut abo ut ho w things are (o r can be) do ne. Of co urse, fo r many peo ple pro gramming
will o nly be a part o f their jo b, but that do es no t mean they can't enjo y it. Pytho n can be used in so -called "embedded
devices," the co mputers that are increasingly built into o ther equipment to act as a co ntro lling element. Technicians o f
all kinds will find themselves thrust into pro gramming as a part o f their jo bs, and having the intro ductio n to Pytho n that
this Certificate Series has pro duced is a great intro ductio n to pro gramming generally (if yo u can pro gram in Pytho n, it
is much easier to learn o ther languages).

If yo u want to kno w what jo bs are currently available, the Internet is as usual yo ur friend. The Pytho n So ftware
Fo undatio n maintains a Jo bs Bo ard o n which emplo yers po st jo bs. Track that page fo r a while to get an idea o f the
range o f jo bs likely to be available, but many emplo yers never find o ut abo ut the Jo bs Bo ard. Ho w do yo u find the
o ther jo bs? Well, the co nventio nal ways all apply. Fo r example, yo u can go to jo b search sites and enter "Pytho n" as a
keywo rd. Yo u will find that many large co mpanies are lo o king fo r Pytho n skills.

In fact, as these wo rds are being written, there is a wo rldwide sho rtage o f Pytho n skills. Clearly there is no guarantee
ho w lo ng this situatio n will last, but as lo ng as it do es, even fairly new pro grammers sho uld be able to find jo bs.

The difference between yo u as an O'Reilly Scho o l student and o ther applicants is that yo u have, o ver the co urse o f
yo ur studies, been required to demo nstrate understanding o f the material and practical skills in applying it. Yo u can
sho w peo ple co de yo u have written, and can pro ve that yo u understand it and can talk sensibly abo ut its structure.
Even if yo u have no t been studying fo r vo catio nal reaso ns, we ho pe that yo u have fo und these metho ds helpful; if
yo u're lo o king fo r wo rk, they will set yo u apart fro m the average applicant. I have had to hire peo ple, and it can be
ho rrifying ho w many applicatio ns co me fro m candidates who are o bvio usly ill-qualified fo r the ro le o r have o nly the
shakiest grasp o f pro gramming co ncepts. So emphasize yo ur practical experience: emplo yers sho uld regard it as
valuable.

Python Development
This lesso n is no t intended to recruit Pytho n co re develo pers, but I am quite happy to enco urage peo ple with a sense
o f adventure to co nsider beco ming o ne. So me beginners feel that they will no t be wanted, o r that their effo rts will be
unappreciated. This can seem so if the new develo per's co ntributio ns are no t reviewed sensitively, but peo ple being
peo ple, this do es no t necessarily always happen.

Altho ugh no t all develo per-specific, mo st o f the lists mentio ned o n the Pytho n Mailing Lists page are co ncerned with
so me aspect o f Pytho n develo pment o r applicatio ns. The o ne exceptio n is the general "co mp.lang.pytho n" list, which
is a bro ad church in which yo u might expect to find anything fro m using a C debugger to whether Schro dinger's cat
really do es exist in two parallel states. It is fairly eclectic, and threads can ramble all o ver the place.

There is a new co re-mento rship mailing list started specifically so that tho se with an interest in beco ming a develo per
co uld interact with a gentler gro up than the who le develo pers list, and get a mo re welco ming receptio n. Once they have
been inducted into the necessary pro cesses, they are intro duced to the rest o f the develo per co mmunity. Intro ductio ns
are easier o nce so meo ne has made an initial co ntributio n.

Do no t make the mistake o f assuming that because the CPytho n interpreter is written in C yo u have to kno w the C
language befo re yo u beco me a co re co ntributo r. The standard library and its test suite have lo ts o f co de written entirely
in Pytho n, and it needs maintenance just like everything else. Yo ur Pytho n skills are needed if yo u want to jo in the
o pen so urce co mmunity!

There is quite a bit o f material intended to help and enco urage the new o r wo uld-be Pytho n develo per, co ncentrated in
the pytho n.o rg site. Altho ugh there may be a steep learning curve, co ntributing to Pytho n's develo pment can give yo u
aweso me rewards in terms o f self-respect, and will also earn yo u kudo s in the o pen so urce wo rld that sho uld transfer
into o ther areas to o .

T ips and T ricks


There is no sto red co llectio n o f tips and tricks fo r yo u to rummage in (well, there is, it's called "Go o gle" and it's
accessible o n the web). Our tips and tricks have been passed o n as yo u have pro ceeded thro ugh yo ur co urse wo rk, in
email discussio ns with yo ur mento r, and thro ugh the training materials yo u have used.

Yo u may even by no w have begun to develo p so me sense o f what is and is no t "Pytho nic," which sho uld have
impro ved the quality o f yo ur co de so mewhat. The simple rules co ntinue to apply: as yo u write, express yo ur co de in
the simplest way yo u can. Co de that is easy to write is easy to read, and co de that is easy to read is easy to maintain.
Co de that is easy to maintain saves mo ney in the lo ng run because co mputer co sts no wadays tend to be do minated
by the co sts o f the peo ple to pro gram and run the systems.

Thus, if yo u stick with what yo u have learned, yo u sho uld be able to get Pytho n to help yo u do pretty much whatever
yo u want it to do . That practical skill is the added value behind these classes.

Co ngratulatio ns! Yo u've just finished the final lesso n in the fo urth co urse o f o ur Pytho n Certificate Series! Ho w co o l are yo u?
We sincerely ho pe that yo u've enjo yed these co urses, and that yo u're a co nfident Pytho n pro grammer. Yo u've earned it!

When yo u finish the lesso n, do n't fo rget to co mplete the ho mewo rk!

Copyright 1998-2014 O'Reilly Media, Inc.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
See http://creativecommons.org/licenses/by-sa/3.0/legalcode for more information.

You might also like