You are on page 1of 12

Cisco Confidential 1 2010 Cisco and/or its affiliates. All rights reserved.

Cisco Data Lake


March 3, 2014
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
Data Lake Definition
Current Hadoop Landscape
Why to Build Data Lake
Benefits
Data Lake Design



2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
Data Lake - a place to store practically unlimited amounts of data of any format, schema and
type that is relatively inexpensive and massively scalable. Data processing software like
Hadoop can transform the data from its raw state to a finished product.

--Revelytix

If you think of a datamart as a store of bottled water cleansed and packaged and structured
for easy consumption the data lake is a large body of water in a more natural state. The
contents of the data lake stream in from a source to fill the lake, and various users of the lake
can come to examine, dive in, or take samples.

--Pentaho
The difference between a data lake and a data warehouse is that in a data warehouse, the
data is pre-categorized at the point of entry, which can dictate how its going to be analyzed.

--Forbes
2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
Databases

Current Hadoop Landscape

Unstructured Data
Docs, Cases, Content
IoE, Machine Data,
Clickstream
Service Renewal
Opportunities
Marketing
Campaigns

ERP
SFDC
Database N
Data Sources Hadoop Platform
Data Consumption
IB, Contracts,
Hierarchies
Network Logs
CPAI
IB, Cases,
Hierarchies,
Customer
Network Logs
Collab
CSTG
Customer,
Hierarchies
Cisco.com
logs
Marketing
Customer Network
Config,
Product Quality
v
Bookings,
Hierarchies
etc
Data Science Program
Data Science Program
Excercises

2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
Every project team spends resources in bringing its data
Difficult to track data elements availability in the platform
Redundant platform resource utilization for data acquisition & maintenance
Data quality and reliability issues
Project teams develop their data acquisition flows manually

2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
Databases

Data Lake

Unstructured Data
Docs, Cases, Content
IoE, Machine Data,
Clickstream
Service Renewal
Opportunities
Marketing
Campaigns

ERP
SFDC
Database N
Data Sources Hadoop Platform
Data Consumption
IB, Contracts, Cases
Hierarchies, Bookings,
Customers, Supply Chain
Etc

Network Logs,
Cisco.com logs,
Documents,
etc
Data Lake (EDS)
Customer Network
Config,
Product Quality
Data Science Program
Excercises

CPAI
Marketing
Data Science
CSTG
2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
Data reuse bring data once and consumed by multiple projects
Data stored in raw format can be used by variety of apps and tools
Automated framework can be quickly configured to get data from any source
Better resource utilization frees resources in source systems and hadoop
platform
Quick project deliveries


2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
Databases

High Level Data Lake Architecture

Unstructured Data
Docs, Cases, Content
IoE, Machine Data,
Clickstream
ERP
SFDC
Database N
Data Sources Hadoop Platform
IB, Contracts, Cases
Hierarchies, Bookings,
Customers, Supply Chain
Etc

Network Logs,
Cisco.com logs,
Documents,
etc
Data Lake (EDS)
CPAI
Marketing
Data Science
ETL Offload
Tidal

Data Lake
Load Process

Hadoop Edge Node
Data lake
Metadata
(TD)
2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
Unstructured
Sources
Data Lake Population and Consumption
Transformed Layer
Data Lake


(Source Like
Structure) T
L
F1
F2
F3
F5 F6
F4
S
O
R
Any Source
Structured
Sources
CG1
TD
Docs, Cases,
Content
IoE, Machine
data,
Clickstream
ETL Offload
(3NF Model)
What data model to use?
Data Lake Source Like
Structure
Processed Data - 3NF
Model
What are Sources to Data
Lake?
Any structured /
unstructured data source
Do we build a transformed
layer?
Yes
HADOOP
SSOT to be computed in one place,
consumed by many platforms
Functional Areas can consume from
Data Lake, not allowed to share with
other Functional Areas
EDS Governs Data Lake and
Transformed Layer
Thank you.
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

You might also like