Professional Documents
Culture Documents
Whats Inside
Introduction to big data
systems
Real-time processing of web-
scale data
Tools like Hadoop, Cassandra,
and Storm
Extensions to traditional
database skills
SUMMARY
Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware
along with new tools designed specifically to capture and analyze webscale data. It describes a scalable, easyto
understand approach to big data systems that can be built and run by a small team. Following a realistic example, this
book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy
and operate them once they're built.
/dtechpress
/dreamtechpress
dreamtechpress.wordpress.com
TABLE OF CONTENTS
1 A new paradigm for Big Data
1.1 How this book is structured
1.2 Scaling with a traditional database
1.3 NoSQL is not a panacea
1.4 First principles
1.5 Desired properties of a Big Data system
1.6 The problems with fully incremental
architectures
1.7 Lambda Architecture
1.8 Recent trends in technology
1.9 Example application:
SuperWebAnalytics.com
1.10 Summary
PART 1 BATCH LAYER
2 Data model for Big Data
2.1 The properties of data
2.2 The factbased model for representing
data
2.3 Graph schemas
2.4 A complete data model for
SuperWebAnalytics.com
2.5 Summary
3 Data model for Big Data: Illustration
3.1 Why a serialization framework?
3.2 Apache Thrift
3.3 Limitations of serialization frameworks
3.4 Summary
4 Data storage on the batch layer
4.1 Storage requirements for the master
dataset
4.2 Choosing a storage solution for the batch
layer
4.3 How distributed filesystems work
4.4 Storing a master dataset with a
distributed filesystem
4.5 Vertical partitioning
4.6 Lowlevel nature of distributed
filesystems
4.7 Storing the SuperWebAnalytics.com
master dataset on a distributed
filesystem
4.8 Summary
5 Data storage on the batch layer: Illustration
5.1 Using the Hadoop Distributed File System
5.2 Data storage in the batch layer with Pail
5.3 Storing the master dataset for
SuperWebAnalytics.com
5.4 Summary
6 Batch layer
6.1 Motivating examples
6.2 Computing on the batch layer
Published by:
6.3
DREAMTECH PRESS
19-A, Ansari Road, Daryaganj
New Delhi-110 002, INDIA
Tel: +91-11-2324 3463-73, Fax: +91-11-2324 3078
Email: feedback@dreamtechpress.com
Website: www.dreamtechpress.com
Distributed by:
Regional Offices: Bangalore: Tel: +91-80-2313 2383, Fax: +91-80-2312 4319, Email: blrsales@wiley.com
Mumbai: Tel: +91-22-2788 9263, 2788 9272, Telefax: +91-22-2788 9263, Email: mumsales@wiley.com
/dtechpress
/dtechpress
/dreamtechpress
dreamtechpress.wordpress.com