You are on page 1of 2

BIG DATA

ABOUT THE BOOK


Webscale applications like social networks, realtime
analytics, or ecommerce sites deal with a lot of data,
whose volume and velocity exceed the limits of
traditional database systems. These applications
require architectures built around clusters of
machines to store and process data of any size, or
speed. Fortunately, scale and simplicity are not
mutually exclusive.
This book requires no previous exposure to large
scale data analysis or NoSQL tools. Familiarity with
traditional databases is helpful.

Whats Inside
Introduction to big data

systems
Real-time processing of web-

scale data
Tools like Hadoop, Cassandra,

and Storm
Extensions to traditional

database skills

` 599 /ISBN: 9789351198062 Pages: 328

Authors: Nathan Marz


with James Warren

SUMMARY
Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware
along with new tools designed specifically to capture and analyze webscale data. It describes a scalable, easyto
understand approach to big data systems that can be built and run by a small team. Following a realistic example, this
book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy
and operate them once they're built.

ABOUT THE AUTHORS


Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture
for big data systems. James Warren is an analytics architect with a background in machine
learning and scientific computing.
/dtechpress

/dtechpress

/dreamtechpress

dreamtechpress.wordpress.com

Books are available on:

TABLE OF CONTENTS
1 A new paradigm for Big Data
1.1 How this book is structured
1.2 Scaling with a traditional database
1.3 NoSQL is not a panacea
1.4 First principles
1.5 Desired properties of a Big Data system
1.6 The problems with fully incremental
architectures
1.7 Lambda Architecture
1.8 Recent trends in technology
1.9 Example application:
SuperWebAnalytics.com
1.10 Summary
PART 1 BATCH LAYER
2 Data model for Big Data
2.1 The properties of data
2.2 The factbased model for representing
data
2.3 Graph schemas
2.4 A complete data model for
SuperWebAnalytics.com
2.5 Summary
3 Data model for Big Data: Illustration
3.1 Why a serialization framework?
3.2 Apache Thrift
3.3 Limitations of serialization frameworks
3.4 Summary
4 Data storage on the batch layer
4.1 Storage requirements for the master
dataset
4.2 Choosing a storage solution for the batch
layer
4.3 How distributed filesystems work
4.4 Storing a master dataset with a
distributed filesystem
4.5 Vertical partitioning
4.6 Lowlevel nature of distributed
filesystems
4.7 Storing the SuperWebAnalytics.com
master dataset on a distributed
filesystem
4.8 Summary
5 Data storage on the batch layer: Illustration
5.1 Using the Hadoop Distributed File System
5.2 Data storage in the batch layer with Pail
5.3 Storing the master dataset for
SuperWebAnalytics.com
5.4 Summary
6 Batch layer
6.1 Motivating examples
6.2 Computing on the batch layer

Published by:

6.3

Recomputation algorithms vs.


incremental algorithms
6.4 Scalability in the batch layer
6.5 MapReduce: a paradigm for Big Data
computing
6.6 Lowlevel nature of MapReduce
6.7 Pipe diagrams: a higherlevel way of
thinking about batch computation
6.8 Summary
7 Batch layer: Illustration
7.1 An illustrative example
7.2 Common pitfalls of dataprocessing tools
7.4 Composition
7.5 Summary
8 An example batch layer: Architecture and
algorithms
8.1 Design of the SuperWebAnalytics.com
batch layer
8.2 Workflow overview
8.3 Ingesting new data
8.4 URL normalization
8.5 Useridentifier normalization
8.6 Deduplicate pageviews
8.7 Computing batch views
8.8 Summary
9 An example batch layer: Implementation
9.1 Starting point
9.2 Preparing the workflow
9.3 Ingesting new data
9.4 URL normalization
9.5 Useridentifier normalization
9.6 Deduplicate pageviews
9.7 Computing batch views
9.8 Summary

PART 3 SPEED LAYER


12 Realtime views 207
12.1 Computing realtime views
12.2 Storing realtime views
12.3 Challenges of incremental computation
12.4 Asynchronous versus synchronous
updates
12.5 Expiring realtime views
12.6 Summary
13 Realtime views: Illustration
13.1 Cassandra's data model
13.2 Using Cassandra
13.3 Summary
14 Queuing and stream processing
14.1 Queuing
14.2 Stream processing
14.3 Higherlevel, oneatatime stream
processing
14.4 SuperWebAnalytics.com speed layer
14.5 Summary
15 Queuing and stream processing: Illustration
15.1 Defining topologies with Apache Storm
15.2 Apache Storm clusters and deployment
15.3 Guaranteeing message processing
15.4 Implementing the
SuperWebAnalytics.com
uniquesovertime speed layer
15.5 Summary
16 Microbatch stream processing
16.1 Achieving exactlyonce semantics
16.2 Core concepts of microbatch stream
processing
16.3 Extending pipe diagrams for microbatch
processing
16.4 Finishing the speed layer for
PART 2 SERVING LAYER
SuperWebAnalytics.com
10 Serving layer
16.5 Another look at the bouncerateanalysis
10.1 Performance metrics for the serving layer
example
10.2 The serving layer solution to the
16.6 Summary
normalization/ denormalization problem
17 Microbatch stream processing: Illustration
10.3 Requirements for a serving layer
17.1 Using Trident
database
17.2 Finishing the SuperWebAnalytics.com
10.4 Designing a serving layer for
speed layer
SuperWebAnalytics.com
17.3 Fully faulttolerant, inmemory, micro
10.5 Contrasting with a fully incremental
batch processing
solution
17.4 Summary
10.6 Summary
18 Lambda Architecture in depth
11 Serving layer: Illustration
18.1 Defining data systems
11.1 Basics of ElephantDB
18.2 Batch and serving layers
11.2 Building the serving layer for
18.3 Speed layer
SuperWebAnalytics.com
18.4 Query layer
11.3 Summary
18.5 Summary

DREAMTECH PRESS
19-A, Ansari Road, Daryaganj
New Delhi-110 002, INDIA
Tel: +91-11-2324 3463-73, Fax: +91-11-2324 3078
Email: feedback@dreamtechpress.com
Website: www.dreamtechpress.com

WILEY INDIA PVT. LTD.


4435-36/7, Ansari Road, Daryaganj
New Delhi-110 002, INDIA
Tel: +91-11-4363 0000, Fax: +91-11-2327 5895
Email: csupport@wiley.com
Website: www.wileyindia.com

Distributed by:

Regional Offices: Bangalore: Tel: +91-80-2313 2383, Fax: +91-80-2312 4319, Email: blrsales@wiley.com
Mumbai: Tel: +91-22-2788 9263, 2788 9272, Telefax: +91-22-2788 9263, Email: mumsales@wiley.com
/dtechpress

/dtechpress

/dreamtechpress

dreamtechpress.wordpress.com

You might also like