You are on page 1of 36

Big Data Architectures

Beyond the Elephant Ride


June 29, 2012 10:00am PT/ 1:00pm ET

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Outline
The Hadoop ecosystem and challenges Big Data solutions beyond Hadoop

Some use cases Big Data Architecture Strategy


2012 Impetus Technologies

How and where to use them?

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Disclaimer
Not advising to discard Hadoop Will discuss Big Data technologies that
complement and supplement Hadoop

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

What is Hadoop?
Scalable data processing engine

DFS: Scalable fault-tolerant distributed filesystem Map Reduce: Parallel processing programming model

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Hadoop Ecosystem

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Where to use Hadoop?


Risk Analysis Recommendation

Customers who purchased this also liked

Intrusion detection, Credit scoring

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Where to use Hadoop?


Sentiment Analysis

Positive, Negative or Neutral sentiment in sentences

Targeted Ads
Display ads based on user behavior and preferences

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Where to use Hadoop?


Machine Learning
And a lot of other areas

Spam vs. Not Spam

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Challenges with Hadoop

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Limitations
Data security Dependence on OS/Language MapReduce programming Batch processing only
2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Beyond Hadoop

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Faster Hadoop
MapR

Simple to manage (NFS)

MapR Express Lane


Handles real-time data flows

Dominant Players HortonWorks, Cloudera, Hadapt


2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Transactional Systems
E-commerce Websites ATM Traditional solutions - MySQL, Oracle,
MSSQL Go NewSQL - VoltDB, Clusterix
2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Explaining VoltDB

2012 Impetus Technologies

Recorded version available at


http://www.impetus.com/webinar_registration?event=archived&eid=60

Real Time Computation


Continuous computation - Trending
topics

Stream processing - Twitter Firehose


Try.. Storm, Esper, S4, CloudScale!

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Explaining Storm
Spouts Data Source Bolts Data Processors

Topologies Combination of Spouts and Bots

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Real-time Traffic

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Doing it Right

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Graph Computation
Page Rank Shortest Path Friends of my friends friends
We suggest Giraph, Pregel
2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Linkedin
Degrees of Separation

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Fast Key-Value Access


Show latest items listing in your
homepage

Caching
Explore NoSQL - Redis and Riak!

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

How it works?
Latest comments posted by user
Traditional Approach: Query in Runtime
SELECT * FROM foo WHERE ... ORDER BY time DESC LIMIT 10

Redis Live Cache Approach


FUNCTION get_latest_comments(start,num_items): id_list = redis.lrange("latest.comments",start,start+num_items-1) IF id_list.length < num_items id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...")

END
RETURN id_list END
2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

2012 Impetus Technologies

Recorded version available at

Good to Know

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Recap
Already Invested in Hadoop Explore Faster Hadoop HortonWorks, Cloudera, MapR, Hadapt HPCC, Disco New Gen SQL VoltDB, Clustrix, Hadapt CloudScale, Storm, Esper Redis, Riak Alternatives to Hadoop Complex business queries, online transaction processing Real Time Analytics Fast Key-Value Access

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Bleeding Edge
A peek into the future..
High performance super computing Highly efficient, large scale graph computing Low latency queries over very large data sets Incremental updates on massive datasets Open MPI , BSP

Pregel, Giraph

Dremel
Percolator (Caffeine)

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Architecture Strategy

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Recommendations
Think Beyond the Warehouse

Time for Real-time

Not Only Hadoop


Hadoop is an enabler for better data warehouse solutions, not a replacement

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Recommendations
Back To SQL? Integrations & Visualizations

Realtime
2012 Impetus Technologies

SQL is not bad

Hadoop and SQL complement each other

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Recommendations
Big Data in upstream operational
systems

Forecasting Systems
Supply Chains CRMs

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Our Architecture Strategy

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

About Impetus

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Strategic partners for software product


engineering and R&D

Thought leaders in cutting-edge


technologies

Mature processes and practices that


are methodical, yet flexible

Diverse domain expertise


2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Q&A

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Thank You!
For more information, write to us at bigdata@impetus.com Follow us on Twitter @impetustech
2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

Legal
2012 Impetus Technologies. All rights reserved. You are prohibited from making a copy or modification of, or from redistributing, rebroadcasting, or re-encoding of this content without the prior written consent of Impetus Technologies. Hadoop is the trademark of Apache Software Foundation.

The logos for Hadoop, Mahout, Hive, Hbase, Pig, Whirr and Zookeeper are all also trademarks of Apache Software Foundation.

2012 Impetus Technologies

Recorded version available at

http://www.impetus.com/webinar_registration?event=archived&eid=60

You might also like