You are on page 1of 47

The Weather of the Century:!

Data Visualization With


MongoDB And Python

A. Jesse Jiryu Davis


Senior Engineer, MongoDB
@jessejiryudavis
Serious MongoDB Talk

Database
Serious MongoDB Talk
This Talk
Wheres the data from?
Wheres the data from?
How Much Is There?

2.5 billion documents


4 TB (1.6k per document)
Medium data
What Does It Look Like?
0303725053947282013060322517+40779-073969FM-15+0048KNYC
V0309999C00005030485MN0080475N5+02115+02005100975
ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999
GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859...

{
Station Identifier "st" : "u725053",
(NYC Central Park)
"ts" : ISODate("2013-06-03T22:51:00Z"),
"airTemperature" : {
"value" : 21.1,
"quality" : "5"
},
"atmosphericPressure" : {
"value" : 1009.7,
"quality" : "5"
}
}
{!
ts: ISODate("1991-01-01T00:00:00Z"),!
position: {!
type: "Point",!
coordinates: [!
-94.6,!
39.117!
]!
},! GeoJSON
airTemperature: {!
value: 27,!
quality: "1"!
}!
}!
Visualization
Visualization Pipeline

Python
MongoDB PyMongo NumPy SciPy Matplotlib
dicts
{!
ts: ISODate("1991-01-01T00:00:00Z"),!
position: {!
type: "Point",!
coordinates: [!
-94.6,!
39.117!
]!
},!
airTemperature: {!
value: 45,!
quality: "1"!
}!
}!
import numpy!
import pymongo!
!
data = []!
db = pymongo.MongoClient().my_database!
!
for doc in db.collection.find(query):!
data.append((!
doc['position']['coordinates'][0],!
doc['position']['coordinates'][1],!
doc['airTemperature']['value']))!
!
arrays = numpy.array(data)!
# NumPy column access syntax.!
lons = arrays[:, 0]!
lats = arrays[:, 1]!
temps = arrays[:, 2]!
from scipy import griddata!
from matplotlib import pyplot!
!
xs = numpy.linspace(-180, 180, 361)!
ys = numpy.linspace(-90, 90, 181)!
zs = griddata(lats, lons, temps,!
(xs, ys),!
method='linear')!
Magic!!
!
pyplot.contour(xs, ys, zs)!
Also magic!!
from matplotlib import pyplot!
!
xs = numpy.linspace(-180, 180, 361)!
ys = numpy.linspace(-90, 90, 181)!
zs = griddata(lats, lons, temps,!
(xs, ys),!
method='linear')!
!
pyplot.contour(xs, ys, zs)!
Triangulation
Triangulation
Triangulation

What temperature?
Barycentric Interpolation
48

54

51.1

Weighted Average

What temperature? 53
Interpolation

51.1
Interpolation
Interpolation
Contours
Contours
import numpy! Not terrifically fast
import pymongo!
!
data = []!
db = pymongo.MongoClient().my_database!
!
for doc in db.collection.find(query):!
data.append((!
doc['position']['coordinates'][0],!
doc['position']['coordinates'][1],!
doc['airTemperature']['value']))!
!
arrays = numpy.array(data)!
MongoDB-to-NumPy Performance

Querying: 109k documents per second


(On localhost)
Can we go faster?
Enter Monary
Monary
by David Beach

Python
MongoDB PyMongo NumPy Matplotlib
dicts

MongoDB Monary NumPy Matplotlib


import monary!
!
data = []!
connection = monary.Monary()!
!
arrays = monary_connection.query(!
db='my_database',!
coll='collection',!
query=query,!
fields=[!
'position.coordinates.0',!
'position.coordinates.1',!
'airTemperature.value'],!
types=[!
'float32',!
'float32',!
'float32'])!
Monary

PyMongo: 109k documents per second


Monary: 817k documents per second
Visualization
Monary
Author:
David Beach
Contributors from MongoDB, Inc.:
Kyle Suarez
Matt Cotter
Anna Herlihy
Mentors:
A. Jesse Jiryu Davis
Jason Carey
Monary

Recent features:
Easy installation
Nested field access
Aggregation
Python 3
Monary

Future:
Insert, update, remove
SSL and authentication mechanisms
Improved API and logging
parallelCollectionScan
!
MongoDB
Python
Monary
NumPy
SciPy
Matplotlib
Thanks
Thank you

A. Jesse Jiryu Davis


Senior Python Engineer, MongoDB

#MongoDBWorld
Presents
1. http://bit.ly/century-links

2. October MongoDB certification exams!



price *= 0.8

Code MongoDBBoston20
university.mongodb.com

3. Ask The Experts!!

You might also like