Professional Documents
Culture Documents
Overview
Martin Pavlk
+420 731 435 691
martin_pavlik@cz.ibm.com
Billing
ERP Network Switches
CRM RFID
2 2012 IBM Corporation
BIG DATA is not just HADOOP
Understand and navigate
Federated Discovery and Navigation
federated big data sources
No copying of data
Value statement
Get up and running quickly
Solution
Vivisimo Velocity renamed to
IBM InfoSphere DataDiscovery
Value statement
Gain new insight
Solution
IBM InfoSphere BigInsights
IT
Business Users
Delivers a platform to
Determine what enable creative
question to ask discovery
IT Business
Structures the Explores what
data to answer questions could be
that question asked
HDFS
Map/Reduce
management
Integration
Connectivity to Netezza,
DB2, JDBC databases, etc
Breadth of capabilities
10 2012 IBM Corporation
Spreadsheet-style Analysis
Web-based analysis
and visualization
Spreadsheet-like
interface
Define and manage
long running data
collection jobs
Ad-Hoc analysis
BigInsights Text
DB2, Netezza,
(Integration)
(SystemML)
(BigSheets)
Streams,
(R module)
Statistical
Text analytics
Analytics
Machine
learning
Analysis
Statistical analysis
Machine learning
Ad-hoc analysis
Jaql
Integration point for Jaql Core Jaql
Jaql I/O
various data sources Operators Modules
Relational sources
(Warehouses,
operational data bases)
13 2012 IBM Corporation
BigInsights and the data warehouse Traditional
analytic
Big Data
tools
analytic
applications Data warehouse
BigInsights
Value statement
Speed: 10 100x better performance
Simplicity: Administration costs reduced by 75% - 90%
Scalability
Smart system
In-database analytics
Out-of-the box integration with SPSS
Solution
IBM Netezza renamed to
PureData System for Analytics
IT
Analyst
Analyst IT
Analyst IT
IT
Analyst
Analyst IT
Dedicated device
Optimized for purpose
Complete solution
Fast installation
Very easy operation
Standard interfaces
Low cost
Proof-Of-Concept Project
New EnterpriseDataWarehouse platform selection
Comparison of existing and other platforms
Selection Criteria
Performance
Operational Savings
Operational Benefits
Storage savings (no data replicas)
Administration costs reduction(DBA)
Infrastructure Simplification
Lower environment complexity
Original Netezza
Platform
Workflow Reporting 2 hours 1 minute
Value statement
Leverage the architecture of parallel processing in Hadoop
Solution
IBM InfoSphere BigInsights
BigInsights
InfoSphere BigInsights
Value statement
React in real-time to take an oppurtinity
before it expires
Solution
IBM InfoSphere Streams
Multiple processing nodes Volume / rate very high => scalability required
Store & mine approach doesnt work Because of very high volume of data (and its rates)
Data Integration,
data mining,
InfoSphere machine learning,
statistical modeling
Streams
1. Data Ingest
Data
InfoSphere
2. Bootstrap/Enrich BigInsights,
Database &
Warehouse
Control
Data ingest,
preparation,
flow
online analysis,
model validation
3. Adaptive Analytics Model
Analytic Applications
BENEFITS IN DETAIL
BI / Exploration / Functional Industry Predictive Content
Reporting Visualization App App BI /
Analytics Analytics
Reporting
Increase over By moving from entry to a 2nd
time and 3rd project
IBM Big Data Platform
Lowering Shared components Visualization Application Systems
deployment costs & Discovery Development Management
Integration
Accelerators
Points of leverage Shared text analytics for
Streams and BigInsights
Hadoop Stream Data
System Computing Warehouse
HDFS connectors (data
integration (ETL, ),
Streams)
Accelerators
Build across multiple Information Integration & Governance
engines
THINK