Professional Documents
Culture Documents
and Cloudera
Josh Wills, Senior Director of Data Science
Cloudera
One Definition
versus Another
What I Think I Do
What I Actually Do
10
11
12
13
14
Web index
Recommendation
systems
Sensor data
Market basket analysis
Online advertising
Enter Hadoop
15
16
Map Stage
Embarrassingly parallel
Like a DATA Step
Like PROC SORT
Reduce Stage
Process all of the values that have the same key in a single
step
Like PROC MEANS with a BY statement
17
Apache Hive
SQL-based query
language
18
19
20
21
22
Going Supernova
23
24
Cloudera Impala
25
SAS LASR
26
27
28
Iterative Algorithms
29
30
31
32
33
34
K-Means Clustering
35
36
K-Means++
37
38
39
40
41
42
Operational Analytics
43
44
Question-driven
Interactive
Ad-hoc, post-hoc
Fixed data
Output is embedded into a
report or in-database
scoring engine
Operational Analytics
Metric-driven
Automated
Systematic
Fluid data
Output is a production
system that makes
customer-facing decisions
45
Thank you!
@josh_wills