You are on page 1of 15

911 Calls Capstone Project

Data and Setup

Import numpy and pandas

In [2]: import pandas as pd


import numpy as np

Import visualization libraries and set %matplotlib inline.

In [3]: import seaborn as sns


%matplotlib inline

Read in the csv file for 911 calls over a year for Montgomery County, PA via Kaggle

In [4]: df = pd.read_csv('911.csv')

Check the info() of the df

In [5]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
lat 99492 non-null float64
lng 99492 non-null float64
desc 99492 non-null object
zip 86637 non-null float64
title 99492 non-null object
timeStamp 99492 non-null object
twp 99449 non-null object
addr 98973 non-null object
e 99492 non-null int64
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB

Check the head of df

In [6]: df.head()

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[6]:
lat lng desc zip title timeStamp twp addr e

REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT
0 40.297876 NEW 19525.0 10 1
75.581294 PAINS/INJURY HANOVER & DEAD END
HANOVER; 17:40:00
Station ...

BRIAR PATH &


EMS: 2015-12- BRIAR PATH &
- WHITEMARSH HATFIELD
1 40.258061 19446.0 DIABETIC 10 WHITEMARSH 1
75.264680 LN; HATFIELD TOWNSHIP
EMERGENCY 17:40:00 LN
TOWNSHIP...

HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1
75.351975 2015-12-10 @ ODOR/LEAK
17:40:00
14:39:21-St...

AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST &
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1
75.343513 NORRISTOWN; SWEDE ST
EMERGENCY 17:40:01
Station 308A;...

CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER
4 40.251492 END; LOWER NaN 10 CT & DEAD 1
75.603350 DIZZINESS POTTSGROVE
POTTSGROVE; 17:40:01 END
S...

I want to find the top 5 zipcodes for 911 calls?

In [7]: df['zip'].value_counts().head(5)

Out[7]: 19401.0 6979


19464.0 6643
19403.0 4854
19446.0 4748
19406.0 3174
Name: zip, dtype: int64

I want to find the top 5 townships (twp) for 911 calls?

In [8]: df['twp'].value_counts().head(5)
Out[8]: LOWER MERION 8443
ABINGTON 5977
NORRISTOWN 5890
UPPER MERION 5227
CHELTENHAM 4575
Name: twp, dtype: int64

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
I Want to break out the reason for each 911 call into a new column

In [9]: df['Reason']= df['title'].apply(lambda x: x.split(':')[0])


df['Reason'].head(5)

Out[9]: 0 EMS
1 EMS
2 Fire
3 EMS
4 EMS
Name: Reason, dtype: object

What is the most common Reason for a 911 call based off of this new column?

In [10]: df['Reason'].value_counts().head(3)
Out[10]: EMS 48877
Traffic 35695
Fire 14920
Name: Reason, dtype: int64

Going to look at a count of the 911 calls by Reason.

In [27]: sns.countplot(data=df, x=df['Reason'])

Out[27]: <matplotlib.axes._subplots.AxesSubplot at 0x2ab44bce048>

What is the data type of the objects in the timeStamp column?

In [11]: type(df['timeStamp'][0])

Out[11]: str

I want to break out the time stamp into hours days and months

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
In [12]: df['timeStamp'] = pd.to_datetime(df['timeStamp'])
type(df['timeStamp'][0])
Out[12]: pandas.tslib.Timestamp

In [13]: df['Hour']= df['timeStamp'].apply(lambda x: x.hour)


df['Month']= df['timeStamp'].apply(lambda x: x.month)
df['Day of Week']= df['timeStamp'].apply(lambda x: x.dayofweek)
df.head()
Out[13]:
Day
lat lng desc zip title timeStamp twp addr e Reason Hour Month of
Week

REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT
0 40.297876 NEW 19525.0 10 1 EMS 17 12 3
75.581294 PAINS/INJURY HANOVER & DEAD END
HANOVER; 17:40:00
Station ...

BRIAR PATH &


EMS: 2015-12- BRIAR PATH &
- WHITEMARSH HATFIELD
1 40.258061 19446.0 DIABETIC 10 WHITEMARSH 1 EMS 17 12 3
75.264680 LN; HATFIELD TOWNSHIP
EMERGENCY 17:40:00 LN
TOWNSHIP...

HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1 Fire 17 12 3
75.351975 2015-12-10 @ ODOR/LEAK
17:40:00
14:39:21-St...

AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST &
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1 EMS 17 12 3
75.343513 NORRISTOWN; SWEDE ST
EMERGENCY 17:40:01
Station 308A;...

CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER
4 40.251492 END; LOWER NaN 10 CT & DEAD 1 EMS 17 12 3
75.603350 DIZZINESS POTTSGROVE
POTTSGROVE; 17:40:01 END
S...

Now I want to convert the numeric values of the days of the week into a string

In [14]: dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}


df['Day of Week'] = df['Day of Week'].map(dmap)
df.head()

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[14]:
Day
lat lng desc zip title timeStamp twp addr e Reason Hour Month of
Week

REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT
0 40.297876 NEW 19525.0 10 1 EMS 17 12 Thu
75.581294 PAINS/INJURY HANOVER & DEAD END
HANOVER; 17:40:00
Station ...

BRIAR PATH &


EMS: 2015-12- BRIAR PATH &
- WHITEMARSH HATFIELD
1 40.258061 19446.0 DIABETIC 10 WHITEMARSH 1 EMS 17 12 Thu
75.264680 LN; HATFIELD TOWNSHIP
EMERGENCY 17:40:00 LN
TOWNSHIP...

HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1 Fire 17 12 Thu
75.351975 2015-12-10 @ ODOR/LEAK
17:40:00
14:39:21-St...

AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST &
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1 EMS 17 12 Thu
75.343513 NORRISTOWN; SWEDE ST
EMERGENCY 17:40:01
Station 308A;...

CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER
4 40.251492 END; LOWER NaN 10 CT & DEAD 1 EMS 17 12 Thu
75.603350 DIZZINESS POTTSGROVE
POTTSGROVE; 17:40:01 END
S...

Now I can plot out the type of 911 call by the day of the week

In [15]: import matplotlib.pyplot as plt


sns.countplot(data=df, x='Day of Week', hue=df['Reason'])
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

Out[15]: <matplotlib.legend.Legend at 0x203261f1ef0>

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Easy to see that EMS has the highest volume of calls regardless of week closely
followed by Traffic calls. Fires are very consistent throughout the week.

Maybe if I sort by Month I can see a different trend

In [16]: sns.countplot(data=df, x='Month', hue=df['Reason'])


plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

Out[16]: <matplotlib.legend.Legend at 0x2032658c6a0>

Again, we get roughly the same outcome. Although I noticed we are missing some
months. A line plot might help fill in this information

I can gropuby Month and use the count() method to get the rows to be the months.
Then any column will yeild the amount of calls for that month and I can plot a line plot

In [17]: byMonth = df.groupby('Month').count()


byMonth.head()

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[17]:
Day of
lat lng desc zip title timeStamp twp addr e Reason Hour
Week

Month

1 13205 13205 13205 11527 13205 13205 13203 13096 13205 13205 13205 13205

2 11467 11467 11467 9930 11467 11467 11465 11396 11467 11467 11467 11467

3 11101 11101 11101 9755 11101 11101 11092 11059 11101 11101 11101 11101

4 11326 11326 11326 9895 11326 11326 11323 11283 11326 11326 11326 11326

5 11423 11423 11423 9946 11423 11423 11420 11378 11423 11423 11423 11423

In [18]: byMonth['addr'].plot()
Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0x203265aa400>

This graph tells us that the most calls were made in January with a spike in the
downward trend around July. Notice that the y-axis starts at 8000 so there are still a
large amount of calls even though the graph suggests a large drop. I'll need to set the
index to a columns to create a linear fit on the number of calls per month. Again, any
column will work for "y" to get the number of calls

In [19]: sns.lmplot(data= byMonth.reset_index(), x='Month', y='twp')


Out[19]: <seaborn.axisgrid.FacetGrid at 0x20327ba9da0>

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
This plot shows what I suggested from the line plot. The trend is down with the outliers
being in the month of july where we saw a peak in calls.

Lets take a look of calls by date

In [20]: df['Date'] = df['timeStamp'].apply(lambda x: x.date())


df.head()

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[20]:
Day
lat lng desc zip title timeStamp twp addr e Reason Hour Month of Date
Week

REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT 2015-
0 40.297876 NEW 19525.0 10 1 EMS 17 12 Thu
75.581294 PAINS/INJURY HANOVER & DEAD END 12-10
HANOVER; 17:40:00
Station ...

BRIAR PATH &


EMS: 2015-12- BRIAR PATH &
- WHITEMARSH HATFIELD 2015-
1 40.258061 19446.0 DIABETIC 10 WHITEMARSH 1 EMS 17 12 Thu
75.264680 LN; HATFIELD TOWNSHIP 12-10
EMERGENCY 17:40:00 LN
TOWNSHIP...

HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS- 2015-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1 Fire 17 12 Thu
75.351975 2015-12-10 @ ODOR/LEAK 12-10
17:40:00
14:39:21-St...

AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST & 2015-
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1 EMS 17 12 Thu
75.343513 NORRISTOWN; SWEDE ST 12-10
EMERGENCY 17:40:01
Station 308A;...

CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER 2015-
4 40.251492 END; LOWER NaN 10 CT & DEAD 1 EMS 17 12 Thu
75.603350 DIZZINESS POTTSGROVE 12-10
POTTSGROVE; 17:40:01 END
S...

In [21]: byDate = df.groupby('Date').count()


byDate['lat'].plot()
plt.tight_layout()

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Hmmm... I'd rather look at this by the type of incident

In [187]: df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
Out[187]: <matplotlib.text.Text at 0x2ab6b1312e8>

The graph above suggest there are more traffic incidents reported in the winter
months which makes sense with driving conditions being at their worst.

In [188]: df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot()
plt.title('Fire')
Out[188]: <matplotlib.text.Text at 0x2ab6b1bfb38>

I would have expected more fire calls to be during the summer months, and although
we see a spike around July, we know july to be a high volumne month. This graph
suggests more incidents in the winter.

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
In [22]: df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
Out[22]: <matplotlib.text.Text at 0x203275e5518>

EMS seems to be fairly consistent throughout the year with few spikes

A heatmap could be useful to determine what time of day most 911 calls were made. I'll
first have to arrange the data frame into a matrix

In [23]: dayHour = df.groupby(['Day of Week','Hour']).count()['Reason'].unstack(level=


-1)

In [24]: dayHour.head()
Out[24]:
Hour 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23

Day
of
Week

Fri 275 235 191 175 201 194 372 598 742 752 ... 932 980 1039 980 820 696 667 559 514 474

Mon 282 221 201 194 204 267 397 653 819 786 ... 869 913 989 997 885 746 613 497 472 325

Sat 375 301 263 260 224 231 257 391 459 640 ... 789 796 848 757 778 696 628 572 506 467

Sun 383 306 286 268 242 240 300 402 483 620 ... 684 691 663 714 670 655 537 461 415 330

Thu 278 202 233 159 182 203 362 570 777 828 ... 876 969 935 1013 810 698 617 553 424 354

5 rows 24 columns

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
In [219]: sns.heatmap(dayHour,cmap='coolwarm')

Out[219]: <matplotlib.axes._subplots.AxesSubplot at 0x2ab738155f8>

This shows that most calls are made between 7am and 7pm. The highest volume of calls
are made around 4 or 5pm on nearly every day of the week. It is notable that the
weekend has the lowest volumn of calls in general.

Now create a clustermap using this DataFrame.

In [220]: sns.clustermap(dayHour, cmap='coolwarm')


C:\Users\jman0\Anaconda3\lib\site-packages\matplotlib\cbook.py:136: Matplotli
bDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use
facecolor instead.
warnings.warn(message, mplDeprecation, stacklevel=1)
Out[220]: <seaborn.matrix.ClusterGrid at 0x2ab7130a1d0>

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
This cluster map suggests what I determined from the last graph. The weekend days are
grouped together showing they have the lowest volume and the normal sleeping hours
are to the left of the x-axis showing 911 calls were less likely to be made then. That all
makes sense.

Now I want to manipulate the DataFrame to show the Month as the column.

In [25]: dayMonth = df.groupby(['Day of Week','Month']).count()['Reason'].unstack(leve


l=-1)
dayMonth.head()

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[25]:
Month 1 2 3 4 5 6 7 8 12

Day of Week

Fri 1970 1581 1525 1958 1730 1649 2045 1310 1065

Mon 1727 1964 1535 1598 1779 1617 1692 1511 1257

Sat 2291 1441 1266 1734 1444 1388 1695 1099 978

Sun 1960 1229 1102 1488 1424 1333 1672 1021 907

Thu 1584 1596 1900 1601 1590 2065 1646 1230 1266

In [223]: sns.heatmap(dayMonth, cmap='coolwarm')


Out[223]: <matplotlib.axes._subplots.AxesSubplot at 0x2ab70c90320>

In [224]: sns.clustermap(dayMonth, cmap='coolwarm')


C:\Users\jman0\Anaconda3\lib\site-packages\matplotlib\cbook.py:136: Matplotli
bDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use
facecolor instead.
warnings.warn(message, mplDeprecation, stacklevel=1)
Out[224]: <seaborn.matrix.ClusterGrid at 0x2ab7370dd30>

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
This is interesting. Saturdays in January had the highest volume of calls despite the
weekend having the lowest amount of 911 calls. We can also summize that the summer
months had the lowest volume of calls.

PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com

You might also like