You are on page 1of 1

Hadoop

Solution Design & Target Architecture for Hadoop as an Enterprise-Wide Shared Service
What data goes into
Hadoop?
New data types
 Documents
 Email

 Voice to Text

 Web Logs

Inhaltliche Entwicklung: Jrgen Urbanski

How is Hadoop consumed?


SQL Access through Hive
1. Excel
2. Established BI and EDW

Teradata

SAP Business Objects

SAP Hana

Tableau
Microstrategy
Pentaho

3. Emerging BI for Hadoop



Datameer

Platfora

Analytic App Accessing Resources


through YARN

Data Science and Machine


Learning Applications

Apps are certified on YARN as the Hadoop Operating


System, which grants direct access to HDFS,
MapReduce, etc.

Apps are certified on YARN as the Hadoop Operating


System or directly generate MapReduce code

1. SAS

2. Mahout

2. Some new custom applications



For unstructured data

Sometimes using Pig

1. R statistical libraries
3. Spark

4. Custom applications, notably for structured data

 Click Streams

What is Hadoop?

Data Integration
& Governance

Data Access
 Social Networks

Batch

Script

SQL

Online DB

Real-Time
Streams

Inmemory

Search

Graph

Others

MapReduce

Pig

Hive

HBase

Storm

Spark

Solr

Giraph

Apache
Drill

Accumulo

Spark
Streaming

ElasticSearch

Graph-X

 Sensors

Data Workflow
Data Lifecycle
Falcon

Impala

Flume
Sqoop
WebHDFS
NFS

Metadata Management
(HCatalog)

 Machine Generated

Operations

Authentication, Authorization,
Accountability, Data Protection
a
 cross Storage: HDFS
Data Protection: Snapshots,
Disaster Recovery/Business
Continuity
Resources: YARN
A
 ccess: Hive, Drill,
Spark SQL, Impala
P
 ipeline: Falcon
Cluster: Knox

Real-time and
Batch Ingest

Spark SQL

Security

Provision, Manage &


Monitor
Ambari
M
 icrosoft Systems
Center
and other Third Party
Tools

Scheduling
Oozie

Data Management
Multitenant Processing: YARN

How much does Hadoop cost?1

(Hadoop Operating System)

 Geolocation

Fully Loaded Cost per Raw TB Deployed of Hadoop versus Alternatives

Storage: HDFS

(Hadoop Distributed File System)

US$ 000s

Cloud Storage 0.1 to 0.3

Where does Hadoop run?


Existing data types
 ERP

 CRM

 SCM etc.

Environment

NAS

Linux

Windows

Engineered System2

Deployment Model
On Premise
Appliance

sponsored by

HADOOP

EDW / MPP

Virtualize
Commodity HW

Cloud/Hosted

Min
Max
1
Hardware, software, installation
2
E.g., Oracle Exadata

SAN

0.250 to 1
10 to 20
12 to 18
20 to 80
36 to 180
0 5 10 15 20 25 30 35 40

You might also like