InfoQ EMag The Current State of NoSQL Databases

FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT
The Current State of
nosql databases
eMag Issue 47 - Nov 2016
NoSQL
ARTICLE
ARTICLE
INTERVIEW
Using Redis as a
Time-Series
Database
Building a MarsRover Application

with DynamoDB
Current
State of NoSQL
Databases
Highly Distributed Computations

Without Synchronization
Synchronization of data across systems is expensive and impractical when running systems
at scale. Traditional approaches for performing computations or information dissemination are not viable. In this article Basho Sr. Software Engineer Chris Meiklejohn explores
the basic building blocks for crafting deterministic applications that guarantee convergence
of data without synchronization.
Using Redis as a
Time-Series Database
In this article, Dr. Josiah Carlson, author of

the book Redis in Action, explains how to
use Redis and sorted sets with hashes for time
series analysis.
Building a Mars-Rover
Application with DynamoDB
DynamoDB is a NoSQL database service that
aims to be easily managed, so you dont have to
worry about administrative burdens such as operating and scaling. This article shows how to use
Amazon DynamoDB to create a Mars Rover application. You can use the same concepts described
in this post to build your own web application.
Key Lessons from

Transition to NoSQL
at a Gambling Website
Virtual Panel: Current
State of NoSQL Databases
In this article, author Dan Macklin discusses the transition to Riak NoSQL and Erlang
based architecture coupled with Convergent
Replicated Data Types (CRDTs) and lessons
learned with the transition.
NoSQL databases have been around for several years now and have
become a choice of data storage for managing semi-structured and
unstructured data. These databases offer lot of advantages in terms
of linear scalability and better performance for both data writes and
reads. InfoQ spoke with four panelists to get different perspectives on
the current state of NoSQL databases.
FOLLOW US
CONTACT US
GENERAL FEEDBACK feedback@infoq.com
ADVERTISING sales@infoq.com
EDITORIAL editors@infoq.com
facebook.com
/InfoQ
@InfoQ
google.com
/+InfoQ
linkedin.com
company/infoq
SRINI PENCHIKALA
currently works as Senior Software Architect in Austin,

Texas. He is also the Lead Editor for Big Data and
NoSQL Database communities at InfoQ (http://www.
infoq.com/author/Srini-Penchikala). Srini has over 22
years of experience in software architecture, design
and development. He has presented at conferences
like JavaOne, SEI Architecture Technology Conference
(SATURN), IT Architect Conference (ITARC), No Fluff
Just Stuff, NoSQL Now, Enterprise Data World, and
Project World Conference.
A LETTER FROM
THE EDITOR
NoSQL databases have been around for several years
now and have become the preferred choice of data
management for a variety of business use cases.
With the emergence of other trends like distributed
systems, cloud computing, social media, mobile devices, and Internet of Things (IoT), the need for NoSQL
database solutions has only become more critical in
the recent years.
NoSQL databases offer a lot of advantages compared
to the traditional relational databases, in terms of linear scalability, better performance and cost effectiveness for managing the data.
They provide features like data partitioning and replication out-of-the box which are critical for running
applications in distributed system environments.
NoSQL databases also offer built-in integration with
big data technologies like Hadoop and Spark.
Its important to take a look at the current state of
NoSQL databases and learn about whats happening
now in the NoSQL space and whats coming up in the
future for these database technologies.
This eMag focuses on the current state of NoSQL
databases. It includes articles, a presentation and a
virtual panel discussion covering a variety of topics
ranging from highly distributed computations, time
series databases to what it takes to transition to a
NoSQL database solution.
The Highly Distributed Computations without Synchronization article covers the concept of Con-
flict-Free Replicated Data Types (CRDTs) to help with

consistency and developing applications that guarantee convergence in the event of concurrent operations.
In the article Using Redis as a Time Series Database:
Why and How, author Josiah Carlson discusses the
time series databases with use cases, advanced data
analytics, how to store and analyze the events using
Redis NoSQL database.
A sample Mars Rover application architecture is discussed in Building a Mars Rover Application with
DynamoDB article to demonstrate the capabilities of
DynamoDB key value NoSQL database.
Erlang programming language and Riak Key Value
NoSQL data store are the highlights of the architecture of an online gaming web application discussed
in the article Key Lessons Learned from Transition to
NoSQL at an Online Gambling Website.
In the Virtual Panel: Current State of NoSQL Databases article, four panelists from different NoSQL database backgrounds discuss the current state of NoSQL
databases and how to use NoSQL databases and big
data technologies together.
The goal of this eMag is to bring our readers up-todate on where NoSQL database technologies are today and more importantly where they are going in
the future.
Read online on InfoQ
Highly Distributed Computations

Without Synchronization
Christopher Meiklejohnis a senior software engineer with Basho Technologies and a

contributing member of the European research project SyncFree.Christopher also frequently
blogs about distributed systems on hisblog.
Synchronization of data across systems is expensive and impractical,

especially when running systems at the scale seen by institutions that
deploy applications on mobile devices or provide Internet of Things
services. Not only does the cost increase with the number of clients
but it is also not possible to synchronize operations when clients have
limited access to connectivity. This makes traditional approaches,
such asPaxosorstate-machine replication, unviable for coordinating
replicated state.
High availability and fault tolerance of these systems are also
major concerns given that, for
most of these companies, downtime is linked directly to revenue,
as exemplified by Amazon in its
work onDynamo, where it popularized the concept of eventual
consistency as one solution to
4
the problem. However, theres a

minimum to just how much state
can be reduced while still performing useful distributed computations.
example, a shared virtual wallet

across all devices owned by a
particular user or a shared list of
items across all team members
devices.
Consider a large mobile-gaming

company that needs to share client state across user devices: for
In the ideal situation, we would

like operations performed on
this shared, replicated data to
The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016
succeed when clients are offline.

However, allowing operations
to be performed on shared data
structures while avoiding synchronization is vacuous and a
recipe for incorrect programs.
Therefore, we aim to create deterministic applications that, when
operating over data structures
that guarantee convergence in
the event of concurrent operations, guarantee convergence of
the applications themselves.
Conflict-free replicated
data types
Conflict-free replicated data

types (CRDTs) provide one solution to the semantic resolution
problem described in Amazons
Dynamo paper. A problem exists
where concurrent additions and
removals of items to its replicated, highly-available shopping
cart can result in divergence of
the shopping cart: in this case,
causality-tracking mechanisms
such asversion vectorsorvector
clocks, which can determine the
order of events in a system, can
only determine that the operations occurred concurrently.
Dynamo addresses this by storing both copies of the divergent
item and returning both to the
user next time they attempt to
retrieve the key. (Several of the
Dynamo clones that surfaced
after the publication of the original Dynamo paper also take this
strategy, such as LinkedIns Project Voldemortand BashosRiak.)
At this point, the user is supposed
to resolve these conflicting writes
and write back the resolved object. In the shopping-cart example, the two shopping carts are
joined using a set-union operation to perform the resolution
however, depending on how the
items in the set are modeled, deleted items may be resurrected
under this resolution logic.
Figure 1
In Conflict-free replicated data

types, Shapiro et al. formulate
a model of strong eventual consistency (SEC) in which an object
meets these criteria if the object
is both eventually consistent and
has a strong convergence property. Its convergence property
isdefined as correct replicas that
have delivered the same updates
have equivalent state.
Under this model, objects are no
longer susceptible to these concurrency anomalies because
objects that observe SEC are
designed to converge correctly
under both concurrency and failure. This property makes these
data types very powerful for ensuring correctness in distributed
systems, especially distributed
databases that use optimistic
replication techniques.
These data types come in two
flavors: state-based, which rely
on the properties ofsemilattices,
and operation-based, which are
more space efficient and rely on
the commutativity of all operations. These data types take a
principled approach to eventual
consistency: the data structures
by design encode information
about the events used to create

them and through this metadata can resolve concurrent operations deterministically. This
article will focus on state-based
CRDTs.
So, how can we compose CRDTs
into programs while ensuring
that the strong convergence
properties of individual CRDTs
are preserved through this composition?
Distributed
deterministic dataflow
To solve this problem, we turn to

deterministic dataflow programming, a form of functional programming in which a series of
agents, or processes, synchronize
on the binding of variables in a
shared single-assignment variable store. The following figure
shows an example of processes
communicating with the shared
constraint store (for more detail
on this model, see Chapter 4 of
Concepts, Techniques, and Models of Computer Programming):
(Figure 1)
In this model, represents a
shared variable store and P1 and
as a state chart: it shows what

directions the variables state is
allowed to travel. In this case, we
can change an unbound variable
to have the value 1 (or 2, or 3, or
so on); however, if we attempt to
change its value once the value
is bound, we move to the error
state.
Consider the following natural-number lattice that computes
the maximum observed value:
In this example, subsequent bind operations
compute
thejoinbetween the
arguments passed
to the operation
and the current
value; the result of
thisjoinis then used
as the value of the
variable. Similar to
before, think of this
as a state chart: as
long as the number
keeps
increasing,
we can continue to
change the value,
whereas before any
subsequent change
would trigger an error.
Figure 2
P2 represent processes. Our store

provides two primitive operations: read and bind. Read is a
blocking operation against the
store to read a variable; this operation will block until the variable
is bound. The bindoperation allows assignment of a variable in
the store to a particular value or
to the value of another variable.
Whats a joinsemilattice?
We can extend this model to

state-based CRDTs, as previously
discussed. Recall that state-based
CRDTs rely on the monotonicity
properties of join-semilattices.
A join-semilattice is a partially ordered set that has a binary operation called thejoin. Thisjoinoperation is associative, commutative,
and idempotent and computes a
least upper bound with respect
to the partial order.
To give an example, the natural

numbers form a lattice where
thejoinoperation is themax operation.
Generalizing to joinsemilattices
Lets start by looking at the single-assignment case as a lattice.

Consider the following: (Figure 2)
If we generalize this model, we
can allow variables to re-bind
as long as the update is an inflation that will trigger a bind of a
new state, which is higher than
the lattice. Lets walk through an
example to see how this works.
In this example, for simplicity,
assume that the single-assignment version of our dataflow
language allows variables to be
bound to natural numbers. Here,
we represent the unbound state
as , while we represent the error state as . This lattice serves
Additionally, we extend the

model to provide an additionalprimitive thats similar and relatedto the threshold read operation as described by Kuper and
Newton in LVars: lattice-based
data structures for deterministic
parallelism.This additional read
primitive takes an activation
value, which prevents the read
operation from completing until
the value of the variable being
read is equal to or higher in the
semilattice order.
Distribution
Distribution is also important for

both high availability and fault
tolerance. In this model, we as-
store, and it either succeeds or

fails based on whether a quorum
of replicas can be contacted. This
is shown in figure b.
Replication of
applications
The model also provides the ability to run an entirely replicated

application, by introducing two
new primitive operations: register, to remotely load a program, and execute, to remotely
execute the program.
Consider the case in which a program is going to operate on data
stored in one replica set: rather
than run the entire application
remotely and perform roundtrip quorum operations against a
replica set, we can push the entire application to the replica set.
To execute the application and
get the results, we simply can select one of the replicas results to
return to the user. This is shown
in figure c of the above image.
What are the

applications?
Lets look at an example of an application that requires communication between a series of clients
and servers: an eventually consistent advertisement counter.
Figure 3
sume either replication of each

variable in the data store or replications of entire applications.
Model
The model assumes Dynamo-style partitioning and replication of data. It uses hash-space
partitioning and consistent hashing to break up the hash space
into a group of disjoint replication sets, each of which has a
group of replicas responsible for
full replication of the data within

that set. This is shown in Figure 3.
Replication of variables
When partitioning and replicating variables, we assume the client application will run outside
of the cluster or spread across a
series of nodes internal or external to the cluster. Each operation,
such as bind or read, is turned
into a request and sent across the
network to the cluster responsible for managing the constraint
Were going to look at Erlang

code written using a library
called Lasp, which implements
the programing model weve
been discussing.
Advertisement counter
Heres an example of an advertisement counter written in our

prototype programming language, called Lasp, which supports the programming model discussed in this article. It is
made up of two sets of coordinating processes:
servers responsible for tracking advertisement impressions for all clients and
clients responsible for incrementing the advertisement
impressions.
This example uses a grow-only
counter, a counter that can handle concurrent increment options in a safe and convergent
manner, but cannot track decrements. (Code 1)

In this snippet, we initialize a series of clients, each of which is
given the list of advertisements
they are responsible for displaying to the user. These clients represent processes running at the
client near the end user.
Each client process handles three
things: returning the list of ac-
tive advertisements, viewing

advertisements, and removing
advertisements. We use a simple
recursive process that blocks on
receiving messages to perform
each of these operations.
When a request to view an advertisement arrives, we choose
an advertisement to display and
increment the counter for this
particular advertisement.
001 %% @doc Client process; standard recursive looping server.

002 client(Id, Ads) ->
003
004
%% Receive messages from server processes.
005
receive
006
%% Respond to the view advertisement message.
007
008
view_ad ->
009
%% Choose an advertisement to display; we simply choose
010
011
%% the first item in a list.
012
Ad = hd(Ads),
013
%% Update ad by incrementing value; issue an update
014
015
%% to increment the counter.
016
{ok, _} = lasp:update(Ad, increment, Id),(increment, Id, Value),
017
018
client(Id, tl(Ads) ++ [Ad]);
019
020
{remove_ad, Ad} ->
021
022
%% Remove ad.
023
client(Id, Ads -- [Ad])
024
end.
Code 1
001 %% @doc Server functions for the advertisement counter.
002 server(Ad, Clients) ->
003
%% Perform a blocking read, which will only unblock
004
%% once the counter reaches at least 5.
005
{ok, _, _} = lasp:read(Ad, 5),
006
007
%% For each client, send a message telling the client
008
%% to disable the advertisement from being displayed again.
009
lists:map(fun(Client) ->
010
%% Tell clients to remove the advertisement.
011
Client ! {remove_ad, Ad}
012
end, Clients),
013
%% Print a message to the screen that advertisement
014
015
%% limit has been reached.
016
io:format(Advertisement ~p reached display limit!, [Ad]).
Code 2
This bind operation succeeds

because in this case, the value
we are pushing back to the constraint store is an inflation of the
lattice; the counter is only ever
going to grow.
model to detect when causal+

consistency is required and when
a weaker consistency model will
suffice given program requirements?
Next, we initialize one server process per advertisement. Heres

what that code looks like: (Code
2)
Different distribution
models
Each of these server processes performs a threshold read

against the counter for the advertisement its tracking; this
threshold read operation will
block, thereby suspending execution of the server process until
the counter has reached at least
five impressions.
Once the threshold has been
reached, the server process will
unblock and notify all clients to
stop displaying the advertisement.
Where do we go from
here?
Can we rewrite applications that

operate on a particular set of
data into smaller applications
operating on disjoint subsets
that can be executed in a parallel,
fault-tolerant, manner? Is it possible to break programs up between hierarchical sets of clients
transparently in the programming model, in order to support
offline, and correct, operation?
Feedback
Conflict-free
Replicated Data
Types (CRDTs)
provide the solution
to the semantic
resolution problem
described in
Amazons Dynamo
paper.
We would love to hear your feedback, given a large part of our

evaluation is based on whether
or not the programming model
makes it easier to reason about
program behavior and correctness.
Our programming model for

eventually consistent computations is still very much in an early
stage of development; it continues to be ongoing research driven by the requirements of our
industry partners and feedback
from our reference implementation.
In terms of features, we have
identified a series of work that
we have planned to explore over
the next year of development on
the programming model. Some
examples of this work include
the following:
Causal+ consistency
What changes are needed to

both the programming model
and distribution model to support causal+ consistency? Is it
possible for the programming
Using Redis as a Time-Series Database
JosiahCarlsonis a seasoned database professional and an active contributor to the Redis

community. As a startup veteran,Josiahrecognized the value and purpose of Redis after being
introduced to Salvatore Sanfilippos work in 2010. Josiahcurrently resides as VP of technology
at OpenMail, an early-stage startup in Los Angeles, and is happy to tell you how Redis could be
your answer to some of your companys problems.
Redis has been used for storing and analyzing time-series data since
its creation. Initially intended as a buffer and destination for logging,
Redis has grown to include five explicit and three implicit structures/
types that offer different methods for analyzing data. This article
intends to introduce the reader to the most flexible method of using
Redis for time-series analysis.
A note on race
conditions and
transactions
While individual commands in

Redis are atomic, multiple commands executed sequentially are
not necessarily atomic, and may
have data race conditions that
could lead to incorrect behavior.
To address this limitation, this ar-
10
ticle will use transactional pipelines and Lua scripts to prevent

data race conditions.
With Redis and the Python client
we are using to access Redis, a
transactional pipeline (usually
called a transaction or MULTI/
EXEC transaction in the context
of other clients) is construct-
ed by calling the .pipeline()

method on a Redis connection
without arguments or with a
Boolean True argument. Under
the covers, the pipeline will collect all of the commands that
are passed until the .execute()
method is called, at which point
the client sends Redis the MULTI
command, followed by all of the
collected commands, and finally EXEC. When Redis

executes this group of commands, it does so without
being interrupted by any other commands, thus ensuring atomic execution.
As an additional option for providing atomic operations over a series of commands, Redis offers server-side Lua scripting. Generally speaking, Lua scripting behaves much like stored procedures in relational
databases, limited to using Lua and an explicit Redis
API from Lua for execution. Much like transactions,
scripts in Lua generally cannot be interrupted during
execution1, though an unhandled error will cause a
Lua script to terminate prematurely. Syntax-wise, we
load a Lua script into Redis by calling the .register_script() method on a Redis connection object. That will return an object that can be used like
a function to call the script inside Redis, instead of
another method on the Redis connection, and uses a
combination of the SCRIPT LOAD and EVALSHA commands to load and execute the script.
Use cases
Initial questions about Redis and its use as a time-series database concern the use or purpose of a
time-series database itself. Its use cases relate to the
data involved specifically that the data is structured as a series of events or samples of one or more
values or metrics over time. A few examples include
(but are not limited to):

sell price and volume of a traded stock,

total value and delivery location of an order
placed at an online retailer,
actions of a user in a video game, or
data gathered from sensors embedded inside IoT
devices.
Basically, any time something happens or we make a
measurement, we can record that with a timestamp.
Once we have collected some events, we can analyze
those events either as they are collected in real time
or after the fact as part of a more complex query.
Advanced analysis using a sorted set

with hashes
The most flexible method for storing and analyzing

data as a time series combines two different structures in Redis: the sorted set and the hash.
The sorted set is a structure that combines the features of a hash table with those of a sorted tree (internally, Redis uses a skiplist, but you can ignore that
detail for now). Briefly, each entry in a sorted set is
a combination of a string member and a double

score. The member acts as a key in the hash, with
the score acting as the sorted value in the tree. With
this combination, we can access members and scores
directly by member or score value, and there are several methods for accessing the members and scores
by order based on score value.2
Storing events
Storing time-series data as a combination of one or

more sorted sets and some hashes is one of the most
common uses of Redis. It represents an underlying
building block used to implement a wide variety
of applications; from social networks like Twitter to
news sites like Reddit and Hacker News and all the
way to an almost-complete relational-object mapper
on top of Redis itself.
For this example, lets say that we are receiving
events that represent user actions on a website
with four shared properties among all events and a
variable number of other properties depending on
the event type. Our known properties are going to
be: id, timestamp, type, and user. To store each
individual event, we are going to use a Redis hash,
whose key is derived from the eventid. To generate
each event id, we can use one of a number of sources, but for now we will generate it using a counter
in Redis. Using 64-bit Redis on a 64-bit platform will
allow for 263-1 unique events, primarily limited by
available memory.
When we have our data ready for recording/insertion, we will store our data as a hash then insert a
member/score pair into our sorted set that will map
our event id (member) to the event timestamp
(score). The code for recording an event is as follows.
001 def record_event(conn, event):
002
id = conn.incr(event:id)
003
event[id] = id
004
event_key = event:{id}.
format(id=id)
005
pipe = conn.pipeline(True)
006
007
pipe.hmset(event_key, event)
008
pipe.zadd(events, **{id:
event[timestamp]})
009
pipe.execute()
In the record_event() function, we receive an
event, get a new calculated id from Redis, assign it to
the event, and generate the key where the event will
be stored by concatenating the event string and
the new id, separated by a colon.3 We then create
11
a pipeline and prepare to set all of the data for the

event, as well as prepare to add the event id/timestamp pair to the sorted set. After the transactional
pipeline has finished executing, our event is recorded and stored in Redis.
Event analysis
We now have many options for analyzing our time

series. We can scan the newest or oldest event ids
with ZRANGE,4 maybe later pulling the events themselves for analysis. We can get the 10 or even 100
events immediately before or after a timestamp
with ZRANGEBYSCORE combined with the LIMITargument. We can count the number of events that occurred in a specific time period withZCOUNT. We can
even implement our analysis as a Lua script. See the
following for an example that counts the number of
different types of events over a provided time range
with a Lua script.
The function defined as count_types() prepares
arguments to pass to the wrapped Lua script, and
decodes the JSON-encoded mapping of event types
to their counts. The Lua script first sets up a table of
results (the countsvariable) and then reads the list
of event ids in the desired time range withZRANGEBYSCORE. After getting the ids, the script reads the
type property from each event one at a time, incrementing the event count table as it goes along,
finally returning a JSON-encoded mapping when
finished.
001 import json
002
003 def count_types(conn, start, end):
004
counts = count_types_
lua(keys=[events], args=[start,
end])
005
return json.loads(counts)
006
007 count_types_lua = conn.register_
script(
008 local counts = {}
009 local ids = redis.
call(ZRANGEBYSCORE, KEYS[1],
ARGV[1], ARGV[2])
010 for i, id in ipairs(ids) do
011
local type = redis.call(HGET,
event: .. id, type)
012
counts[type] = (counts[type] or
0) + 1
013 end
014
015 return cjson.encode(counts)
016 )
12
Performance considerations and data

modeling
As written, our method to count different event

types in the specified time range works but requires
actively reading the type attribute from every event
in the time range. For time ranges with only a few
hundred or a few thousand events, this analysis will
be reasonably fast, but what happens when our time
range includes tens of thousands or even millions of
events? Quite simply, Redis will block while calculating the result.
One way to address performance issues resulting
from long script execution when analyzing event
streams is to consider the queries that need to be
answered in advance. In particular, if we know that
we need to query for event counts of each type
over arbitrary time ranges, we can keep an additional sorted set for each event type, each of which
would only store id/timestamp pairs for events for
that specific type. Then when we need to count the
number of events of each type, we can perform a
series of ZCOUNT or equivalent calls5and return that
result instead. Lets look at what a record_event()
function would look like if it also wrote to sorted sets
based on event type.
001 def record_event_by_type(conn,
event):
002
003
event[id] = id
004
format(id=id)
005
type_key = events:{type}.
format(type=event[type])
006
ref = {id: event[timestamp]}
007
008
009
010
pipe.zadd(events, **ref)
011
pipe.zadd(type_key, **ref)
012
pipe.execute()
Many of the same things in the newrecord_event_
by_type() function act as they did in the old record_event() function, though there are some new
operations. In the new function, we also calculate a
type_key, which is where we will store the index
entry for this event in the type-specific sorted set.
After preparing to add the id/timestamp pair to
the events sorted set, we also prepare to add the
id/timestamp pair to the type_keysorted set, and
then perform all of our data insertions as before.
To now count events of a single type between two
time ranges, we only need to call the ZCOUNTcommand with the specific key for the event type we
want to count along with the start and ending timestamps.

001 def count_type(conn, type, start,
end):
002
format(type=type)
003
return conn.zcount(type_key,
start, end)
If we knew all of the possible event types in advance,
we could call the above count_type() function for
each different type and construct the table we did
earlier in count_types(). For those cases where we
dont know all of the possible event types in advance
or may be adding event types in the future, we can
add each type to a set structure, then later use the
set to discover all unique event types. Our updated
record-event function would read as follows.
The most flexible method for

storing and analyzing data as
a time series combines two
different structures in Redis:
the Sorted Set and the Hash.
001 def record_event_types(conn, event):

002
003
event[id] = id
004
format(id=id)
005
format(type=event[type])
006
ref = {id: event[timestamp]}
007
008
009
010
pipe.zadd(events, **ref)
011
pipe.zadd(type_key, **ref)
012
pipe.sadd(event:types,
event[type])
013
pipe.execute()
The only change from earlier is that we have added
the event type to the set named event:types.
And now we must update our count_types() function from earlier.
001 def count_types_fast(conn, start,
end):
002
event_types = conn.
smembers(event:types)
003
counts = {}
004
for event_type in event_types:
005
counts[event_type] = conn.
zcount(
006
events:{type}.
format(type=event_type), start, end)
007
return counts
For more than small numbers of events in a time
range, this newcount_types_fast()function is going to be faster than the earliercount_types() func-
13
tion simply because ZCOUNT is

faster than fetching each event
type from a hash.
Redis as data storage
While the analytics tools built

into Redis, with its commands
and Lua scripting, are flexible
and perform well, some types of
time-series analysis benefit from
specialized computational methods, libraries, or tools. For those
cases, storing data in Redis can
still make a lot of sense as it is
incredibly fast at storing and retrieving data.
As an example, there are only 5.3
million per-minute samples in
10 years of pricing data for a single stock, which is easily stored
in Redis. But to calculate almost
any nontrivial function over that
data with a Lua script inside Redis could require porting pre-existing optimized libraries to Redis
and debugging them. Instead,
with Redis merely storing the
data, you can fetch time ranges of
data (using the sorted set as your
index to get keys, then using the
keys to get hash data like before)
from Redis and drop those into
your existing optimized kernels
for moving averages, price volatility, etc.
Why not use a relational database instead? Speed. Redis stores

everything in RAM and in optimized data structures (as in our
sorted-set example). This combination of in-memory-optimized
data structures not only performs three orders of magnitude
faster than even SSD-backed
databases, but can also perform
one to two orders of magnitude
faster than simple in-memory
key-value data storage or storing
data serialized in memory.
Next steps
When using Redis for time-series analytics and, really, any

sort of analytics it can make
sense to record certain common
attributes and values among different events in a common location to aid in searching for events
that share those attributes and
values. We did this above with
per-event-type sorted sets and
just started talking about using
sets. While this article primarily
addresses the use of sorted sets,
there are more structures to Redis, and there are many more options for using Redis in analysis.
Other commonly used structures
used for analytics in addition to
sorted sets and hashes include
bitmaps, array-indexed byte
strings, HyperLogLogs, lists, and
geo-indexed
mands.6
sorted-set
com-
Adding related data structures

for more specific data-access
patterns is a subject that we
will periodically revisit when using Redis. Almost invariably, the
form in which we choose to store
data will be both a function and
a limiting factor of the types of
queries we can perform. This is
important because unlike those
in a typical, familiar relational
database, the queries and operations that are available in Redis
are restricted based on the types
used to store data.
Moving on from these few examples of analyzing time-series
data, you can read more about
methods of building indexes for
finding related data in chapter
seven of Redis in Action in the
eBooks section at RedisLabs.
com. Chapter eight offers an almost complete implementation
of a social network like Twitter,
including followers, lists, timelines, and a streaming server, all
of which are good starting points
for understanding how Redis can
be used to store and answer queries about timelines and events
in a time series.
Read-only scripts can be interrupted if we have enabled the lua-time-limit configuration option, and the script has been
executing for longer than the configured limit.
1
When scores are equal, items are sub-ordered by the lexicographic ordering of the members themselves.
While we generally use colons as name/namespace/data separators when operating with Redis data in this article, you can feel
free to use whatever character you like. Other users of Redis use periods, semicolons, and more. Picking some character that
doesnt usually appear in your keys or data is a good idea.
ZRANGE and ZREVRANGE offer the ability to retrieve elements from a sorted set based on their sorted position, indexed 0 from the
minimum score in the case of ZRANGE and indexed 0 from the maximum score in the case of ZREVRANGE.
ZCOUNT as a command does count the values in a range of data in a sorted set, but does so by starting at one endpoint and incrementally walking through the entire range. For ranges with many items, this can be quite expensive. As an alternative, we can
use ZRANGEBYSCORE and ZREVRANGEBYSCORE to discover the members at both the starting and ending points of the range. By
using ZRANK on the members of both ends, we can discover the indices of those members in the sorted set. And with both indices,
a quick subtraction of the two (plus one) will return the same answer with far less computational overhead, even if it may take a
few more calls to Redis.
Much like the Z*LEX commands introduced in Redis 2.8.9, which uses sorted sets to provide limited prefix searching, Redis 3.2
offers limited geographic searching and indexing with GEO* commands.
6
14
Building a Mars-Rover Application

with DynamoDB
Daniela Miaois currently a software-development engineer at Amazon Web Services, working

on the DynamoDB developer-ecosystem team. The team is dedicated to improving the
customer experience of using DynamoDB through writing libraries and tools to ease the writing
of applications on DynamoDB. She hopes to help lower the barrier to using DynamoDB through
developer education via walkthrough examples, sample applications, blog posts, etc. If you have
questions, suggestions or simply seeking more information on DynamoDB, please reach out
todynamodb-feedback@amazon.com.
Kenta Yasukawais a senior solutions architect for Amazon Web Services. He has mainly focused
on designing cloud-based solutions for gaming customers, mobile-application back ends, socialnetwork services, and so on. He is passionate about designing scalable and reliable architectures
that take full advantage of the capabilities of the AWS cloud. Amazon DynamoDB is a key
component in the architecture design and he has seen many of his customers successfully build
highly scalable and reliable architecture with Amazon DynamoDB. If you would like to hear such
success stories, please feel free to reach out.
Amazon DynamoDB is a fast and flexible NoSQL database service that

you can easily manage so you dont have to worry about administrative
burdens such as operating and scaling your databases. Instead, you
can focus on designing your application, and launch it on DynamoDB
with a few simple steps.
The sample application in this
article demonstrates the capabilities of the DynamoDB database.
The web application showcases
data that NASA has made pub-
licly available: images that the

Curiosity rover has been sending
back from Mars along with their
metadata in JSON format. A short
snippet of the NASA JSON data is
shown below the snapshot of the

demo application you can try
out thelive demo itself! We call it
the MSL Image Explorer.
15
Figure 1: A screenshot of the MSL Image Explorer demo application.
Figure 2: A snippet of JSON image data from NASA.
Prior to launching the demo

application, we collected all of
NASAs JSON image data and imported it into a DynamoDB table
for later querying. Once the data
was in DynamoDB, we could
perform various queries and updates on the tables to generate
the MSL Image Explorer app,
which displays beautiful image
galleries as shown in the demo.
The default view of the application is a timeline of all images
16
received from one of Curiositys

cameras or instruments, displayed in reverse chronological
order. Users can vote for their
favorite pictures and a real-time
vote count is maintained for each
image. In addition, users can
open the Mission Control side
menu to choose a different instrument, change the date range,
or sort images by vote count instead. Finally, users can view images they have voted for under a
My Favorites option.
All these features are enabled

through querying the DynamoDB table that stores the image data. In order to build such
an application, we would normally need to consider various
functional components such as
access control, user tracking, serializing/de-serializing data, etc.
We want to show you how DynamoDB simplifies this by explaining how we built our demo and
how you can build your own application with DynamoDB.
Before we deep dive into the demo, lets go through

a quick primer on DynamoDB.
Data model
The DynamoDB data-model concepts include tables,items,andattributes. A table is a collection of

items and each item is a collection of attributes.
DynamoDB is a schema-less NoSQL database. Individual items in a DynamoDB table can have any
number of attributes. Each attribute in an item is a
name/value pair. An attribute can be single-valued
or a multi-valued set (details of data types will follow,
below). In addition, the newly released JSON document support allows JSON objects to be stored directly as items in DynamoDB, up to 400 kB per item.
For example, NASA provides each rover image as a
JSON object, so each image can be stored as a single
item in DynamoDB and we can directly import its attributes such as location and time.
Table 2: The marsDemoImages table with hash

primary key shaded in red.
The hash and range primary key, the other type, is
made of two attributes: the first attribute is the hash
attribute and the second is the range attribute. In
the demo, if we want to group items first by imageid
then by votes, the hash attribute will be imageid and
the range attribute will be votes.
Consider storing a set of photos from a Mars rover in

DynamoDB. We can create a table, marsDemoImages, with a unique imageid attribute assigned to each
image (called its primary hash key) in the format of:
marsDemoImages (imageid, etc.).
Each item in this table could have several other attributes, as shown below.
Table 3: Sample items of the marsDemoImages

table with hash and range primary keys shaded
in red.
Queries, updates, and scans
In addition to using primary keys to access and manipulate specified items, DynamoDB also allows us to
search for specific data with query, update, and scan:

Query A query operation finds items in a table using only primary key attribute values. We
must provide a hash key attribute/value pair and
optionally a range key attribute/value pair. For
instance, in the MSL Image Explorer app, we can
query for a specific picture by setting imageid =
201.
Update An update operation is similar to a

query, except we can also modify attributes of the
item. A conditional update allows us to modify an
item only when certain, pre-specified conditions
are met. We will see an example of this later when
we want to update the vote count of images in
the MSL Image Explorer app.
Scan A scan operation examines every

item in the table. By default, a scan returns all
of the data attributes for every item.
Table 1: Sample items of the marsDemoImages

table
Note that imageid is the only required attribute here.
All other attributes could be automatically imported
from NASAs JSON image data and, in this case,
imageid 101 doesnt have a camera_model attribute. Mission+InstrumentID is a composite attribute,
which will be explained later.
Primary key
When we create a table, we must specify the primary

key of the table. DynamoDB supports two types of
primary keys.
The hash type of primary key is made of a single attribute: a hash attribute. In the preceding example,
the hash attribute for the marsDemoImages table is
imageid.
17
Secondary indexes
Instead of scanning the entire table, which can sometimes be inefficient, we can create secondary indexes
to help the querying process. Secondary indexes on
a table will help optimize querying on non-key attributes. DynamoDB supports two kinds of secondary
indexes: a local secondary index that has the same
hash key as the table but a different range key and
a global secondary index that has a hash and range
keys that can differ from those on the table.
Secondary indexes can be thought of as separate tables that are first grouped by the index hash key then
by the range hash key. For example, in the marsDemoImages table, we might want to look up images
from a specific mission and instrument, filtered by a
time range. To do this, we could create a secondary
index grouped first by the Mission+Instrument attribute (hash key) then by the TimeStamp attribute
(range key).
ments within lists and arrays, even if those elements

are deeply nested. This is an exciting feature of DynamoDB that makes developing web applications with
JSON data easy and intuitive.
Under the hood
DynamoDBs support for JSON documents has made

building the MSL Image Explorer easy and intuitive.
We built our application usingAngularJS, a popular
JavaScript web-application framework, but the concepts in this demo should apply to any other language. Oursource code is openly available under the
Amazon Web Services Labs GitHub account.
To understand how the application operates, lets
take a look at the overall architecture of the application shown and go through each component step by
step. (Figure 3)
Browser retrieves app code from

Amazon S3
Table 4: Example of a secondary index for the

marsDemoImages table.
We will go into more details of secondary indexes for
the marsDemoImages table in the next section.
Data types and JSON support
Amazon DynamoDB supports a newly expanded set

of data types:

scalar types of number, string, binary, Boolean,

and null;
multi-valued types of string set, number set, and
binary set; and
document types of list and map.
For example, in the marsDemoImages table, imageid
is a number attribute and camera_model is a string
attribute.
Most noteworthy here are the new list and map
types, which are ideal for storing JSON documents.
The list data type is similar to a JSON array and the
map data type resembles a JSON object. There are
no restrictions on the data types that can be stored
in list or map elements other than maximums of 400
kB per item and 32 levels of nested attributes. In
addition, DynamoDB lets you access individual ele-
18
Whenever users visit the demo website, the browser

fetches the application code, which contains HTML,
CSS, and JavaScript from Amazon S3. Using DynamoDB and S3, we are able to run this application entirely on the client side so we need no servers that we
have to manage ourselves.
Application authenticates user via

Amazon Cognito
During this step, the application grants users access

to the DynamoDB table by using Amazon Cognito,
a simple user-identity and data-synchronization service that helps establish unauthenticated guest connections to DynamoDB. This allows any user to query
only the DynamoDB tables associated with the application and update a limited set of attributes in the
tables.
(If you want to launch the demo locally for development or testing purposes, you can run it on your
own machine withDynamoDB Local. Instructions for
launching the app locally can be found in the app
source codesREADMEon GitHub.)
We used Cognito both to manage guest access to
our DynamoDB tables and to collect relevant statistics like the number of visitors, etc. (Figure 4)
With Cognito, we can create unique user identifiers
for accessing AWS cloud services by using public
login providers such as Amazon, Facebook, and Google or by using your own user-identity system. Users
can also start using your app as unauthenticated
js and provides it with AWS

credentials according to
pre-specified configurations.
You can edit these configurations before you launch the
demo application locally or
before you create a distribution package withGrunt. The
configuration for switching
to the Cognito identity pool
can be found in viewer/lib/
mynconf.coffee.
Queries and updates to

DynamoDB
Figure 3: Architecture of the Mars-rover application.
Users can make custom image

selections based on dates, votes,
or their favorites. All selections
trigger queries to the DynamoDB
tables and indexes. To understand how this process works, we
need to dive into several aspects
of DynamoDB:
1.
table schema and global

secondary index (GSI) setup,
2. query execution, and

Figure 4: Screenshot of the Cognito statistics interface.
guests. We use unauthenticated guest access to provide AWS
credentials to web browsers and
uniquely identify each visitor. We
deployed our application into
production by following these
steps, and you can do the same
with your own application:
1.
Create an Amazon Cognito

identity pool for the application. This can be done on the
Amazon Cognito management console, using the default options while making
sure only Enable Access to
Unauthenticated Identities
is checked.
2. ConfigureAWS Identity and

Access Management (IAM)to
give minimum permissions
required for the demo application:

Read from marsDemoImagestable.
Query
with
dategsiandvotes-gsi.
GetItem.
Write to marsDemoImagestable.
Updatevotesfield.
Read fromuserVotestable.
Queryon the users own item,
but not to the others.
Write touserVotestable.
PutItem on the users own
item, but not to the others
3. Modify configurations in the

app to use Amazon Cognito. The MSL Image Explorer
application instantiates the
DynamoDB client in viewer/
app/scripts/services/AWS.
3. update execution.
Table schema and GSI

setup
Lets start by creating a DynamoDB table. You can do this via

the AWS management console
with AWS SDKs. In our case, we
used JavaScript, written in CoffeeScript, available under /viewer/lib/prepare_tables.coffee.
Most important are the schema
and GSI setup for the DynamoDB
table used to store the image
data, represented below. (Table
5)
We decided to combine the Mission and InstrumentID data fields
to allow querying on multiple
attributes at once. Since each
view in the application is always
specific to one instrument of one
particular mission, it makes sense
19
executes automatically when

we follow the instructions in the
source codesREADME.
Query execution
Table 5: Table schema for marsDemoImages.
Table 6: Global secondary index schema for marsDemoImages.
Table 7: Table schema for userVotes.
to concatenate Mission and InstrumentID, to use this combined

attribute as the GSI hash key,
then to make a third attribute
the GSI range key. For instance,
users can view all images from
the front hazcam instrument of
the Curiosity Mars rover, filtered
by date. GSIs facilitate this type
of querying. The GSIs for the table are shown above: (Table 6)
We created the date GSI to allow
users to filter images by photo
creation date, based on a specific instrument and mission. GSIs
group items together by index
hash and range key; this means
the date GSI contains image data
grouped first by Mission+Instrument then by Timestamp.
This allows the application to
quickly find images of a specific
date, such as pictures taken on
10/04/2014 from the Curiosity+Front Hazcam mission-instrument combination.
20
Similarly, the vote GSI powers the

Top Voted Mars Images view in
the application. In this case, the
index hash key is still Mission+Instrument but the range key is
votes. Remember that this index
will group items first by Mission+Instrument then by votes,
meaning it optimizes querying of
images of a specific mission and
instrument combination, sorted
by their vote counts.
Next, we need a separate table
to keep track of which users have
voted for which photos to prevent a user from voting for the
same photo multiple times. This
table has a simple schema and
no GSIs: (Table 7)
Finally, we invoked the createTable method to created all
tables and secondary indexes in
DynamoDB. This is completed
as a part of the /viewer/lib/prepare_tables.coffee script, which
MSL Image Explore operates

on the popular AngularJS web
development framework. Essentially, each view of the web
application is generated by its
respective controller: timeline
view has a timeline controller,
favorites view has a favorites
controller, etc. These controllers
all use a common Amazon DynamoDB service to communicate
with the DynamoDB table. This
MarsPhotoDBAccess service can
be found in viewer/app/scripts/
services and contains all query and update operations used
in the application. In particular,
the queryWithDateIndex function uses the document-level Javascript SDK to make accessing
items simple and intuitive: (Code
1)
Similarly, users can query the
vote GSI to view images sorted
by number of votes received in
descending order: (Code 2)
Update execution
Voting for a photo works similarly to querying except that we

need to update existing items in
the table. Before that, however,
we need to check to see if a user
has already voted for any given photo. We can do this with a
conditional write to a second DynamoDB table, userVotes, which
we create to keep track of users
who have previously voted for
photos. The condition can be set
using the Expected parameter:
(Code 3)
In this code snippet, we specify
our expectation that the item
with the given imageid and
userid should not exist since
this should be the first time this
We explain how
JSON document
support for
DynamoDB has
made building
the Mars
Rover demo
application easy
and intuitive
Code 1
Code 2
Code 3
user votes for this photo. Next, we
try to put the item into the userVotes table, with the condition in
place: (Code 4)
fields to update in a simple and intuitive way. Lets take a look at how
the incrementVotesCount function
works: (Code 5)
Once the checking process completes, we can update the total

vote count in the marsDemoImages table using the JSON-document
SDK, which allows individual JSON
Note that the parameters UpdateExpression and ExpressionAttributeValues are introduced in
JSON data access. For full details,

please refer to the repo on the
GitHub accountand specific documentation onmodifying item attributes.
Retrieve thumbnail
the JSON-document SDK, which images from DynamoDB
provides much more support for
Upon querying the DynamoDB

table, the browser receives the
21
Code 4
Code 5
JSON results, at which point the

thumbnail images can be retrieved from Amazon S3. While
this is the current implementation on the demo website, we
have also enabled the storage all
thumbnail binary data in DynamoDB under the data attribute
for each item. DynamoDB can
store binary data without the
need to specify type, constraint,
or scheme, as long as it is not a
hash or range key attribute and
respects the 400-kB item size limit. We chose to load images from
S3 rather than directly from DynamoDB in the public live demo
to conserve read throughput
costs. However, DynamoDB Free
Tier does provide 25 GB of storage space at up to 25 capacity
units for both reads and writes.
Its a great way to get some
hands-on experience with your
own web application and DynamoDB.
22
Conclusions
You can use the concepts described in this post to build your
own web application. Lets recap
the process:
1.
Design your DynamoDB table, including the schema,

hash and optional range primary keys, and secondary
indexes.
2.
Create the table and indexes via theAWS Management

Console or using one of our
AWS SDKs. We used the JavaScript SDK in our demo.
3.
Choose the language and

web-development
framework you want to use. We
chose JavaScript and theAngularJSframework.
4.
Code your application by

writing functions that query
or update your DynamoDB

table. This is made especially easy if you use our document-level SDK when working with JSON.
5. Launch your application!
Key Lessons from Transition to NoSQL

at a Gambling Website
Dan Macklinis a hands-on technical manager who loves to learn, make things happen, and get
things done. After running his own business for 10 years, Dan is now the head of Research and
Development at Bet365.
As one of the worlds biggest online gambling websites serving

around 19 million customers in nearly 200 hundred countries
everything aboutBet365and its IT operates on a huge scale.
It has to. At peak times, our betting platform runs at more than
half a million transactions per
second and must serve in excess
of 2.5 million concurrent users.
Downtime is expensive, both
in terms of loss of income and
brand perception. Customers are
quickly frustrated by any dip in
service and are very vocal on social media in expressing that disappointment! Clearly, in an environment such as this, availability
is a primary requirement.
But large and demanding as our

systems already are, maintaining
our leadership position calls for
more than just a flawless service.
In addition to ensuring that we
can meet the growing demand,
we are also under pressure to introduce new services that move
the customer experience forward.
As youd expect, introducing new
services places even more demand on an already highly com-
plex system and there is only so

hard you can push.
As it was, our SQL architecture
was being pushed to its limits.
Having already scaled our databases as far as it was cost effective to go, it was clear we needed
to find a new way of working.
The R&D division in which I work
had been set up a few years earlier to deal with challenges of this
nature and it fell to my team to
find the solution.
23
We found it in open-source software. By moving to the Erlang

programming language and
theRiak KV key-value NoSQL data
store coupled with convergent
replicated data types (CRDTs), we
can now quickly build new systems to handle the load.
Erlang
We chose Erlang because we

found that it makes it easier to
build reliable, simple systems
that scale horizontally. These attributes come from Erlangs concurrency semantics, let it crash
error-handling philosophy, and
functional-programming nature.
Erlang concurrency is designed
around the actor model and
encourages an elegant style of
programming in which problems
are modelled by many isolated
processes (actors) that communicate through immutable message passing.
Each process has its own heap
and is lightweight (512 bytes)
by default, making it practical to
spin up many hundreds of thousands of processes on commodity-type servers.These individual
processes are scheduled by a
virtual machine over all available processor cores in a soft real-time manner, making sure that
each process gets a fair share of
processing time.
The fact that each Erlang process
has its own heap means that it
can crash independently without
corrupting shared memory. This
makes it far easier for programmers to deal with failure and
build software in which individual subsystems can fail without
bringing down the entire system. If something goes wrong at
a process level, we simply let it
crash and rely on Erlangs supervision functionality to automatically restart it.
24
Finally, Erlangs functional heritage makes it easy (with a little

discipline) to compose complex
functionality from many small,
side-effect-free functions that
you can easily test.The number
of lines of code needed to solve
a problem is reduced and on the
whole, once a programmer gets
used to the syntax, the code becomes far simpler and easy to
maintain.
By adopting Erlang, we could
greatly improve the parallelism,
throughput, and reliability of
our software, whilst writing fewer lines of simpler code. In our
experience, simple code is good
code.
NoSQL databases
When we first started investigating NoSQL databases, we looked

at many types of systems from
many different companies. We
became interested in using a
key-value store because it was
a good fit for our use cases and
there are several key-value databases on the market.
We were drawn to Amazons
well-known Dynamo system, a
reliable, distributed, masterless
key-value store used to maintain shopping carts.TheDynamo
paper lists several companies
have implemented open-source
versions of the architecture. We
studied these and chose Riak
KVdatabase because we liked its
features to support data consistency at scale.
Lets look at some of these features that are critical to our requirements and use cases.
The problem for us is that as soon
as weve got data in a distributed
system, managing the consistency of that data becomes much
more difficult, particularly if you
value performance and avail-
ability as we do. In the event of a

network partition a temporary
failure or network split between
datacentres some members of
the distributed system cant talk
to each other for a period of time.
Even if we have the most expensive kit in the world, failures will
happen, servers will go down,
and networks will break. Our
distributed system runs many
nodes (our infrastructure uses
hundreds of nodes) among
many datacentres, which makes
it imperative that systems can
deal with failures without losing
or corrupting data.
The CAP theorem says that in
the event of a network partition
where a quorum of nodes is unable to communicate, we have
two options. We can go for consistency with a CA system or we
can go for availability with an AP
system. If we choose consistency,
it effectively means that our system will sacrifice availability to
preserve the consistency of our
data. For us, this would be unacceptable.
In an AP system, the kind thats essential for us, the system will carry on working. The problem here
is that if a system keeps working
in a network-partitioned environment or in the face of many
concurrent unsynchronised updates, we cant hope to maintain
data consistency. Data is bound
to get out of sync eventually
and we will need to repair it. The
most that we can hope for is that
it will become eventually consistent eventually.
Eventual consistency throws up
a whole host of new challenges. For example, an eventually
consistent system cant even
guarantee that a client will be
able to read its own writes. So
how do we build systems that
can cope?
The answer is we have to have a

deep understanding of our use
cases and take a probabilistic approach: how likely is it that our
data is inconsistent; how long
will it take to repair; how does
the user interact with the application; what is the probability of
that happening; and, if it does
happen, can the business live
with the risk? We need to adopt
mechanisms that detect inconsistency and repair it without introducing a burden on developer
productivity.
Fortunately, Riak KV has a number of features in this area that
differentiate it from the competition.
Firstly, Riak uses a mechanism
called vector clocks (a causality-tracking mechanism based
on tuples of actors and counters)
to detect concurrent updates to
data without the need for a central clock (which is very hard if not
impossible to keep completely in
sync in a distributed system).
If Riak detects a problem (for example, the same data has been
updated on two sides of a network partition), it stores both
copies of the data (known as siblings) and returns them to the
client application for resolution
on the next read.
This means we may have two,
three, five, or even more copies
of our data saved and weve got
to write a merge function that
makes sense of it.This is a great
step forward and is much better
than simply using timestamps
to keep the last written value
(an approach taken by other systems) but still leaves developers
with the difficult task of writing
bespoke merge functions to repair data.
When we initially looked into this
issue, we concluded that while
Riak alone is great technology,

writing merge functions was
likely to confuse the hell out of
our developers and slow development something that is
also not going to be acceptable
for a company like ours, which is
all about getting things done at
pace. So we did more research
and thats when we discoveredCRDTs.
A state-based CRDT is a relatively
new class of algorithm that typically operates over sets and has
the mathematical properties of
associativity, commutativity, and
idempotence (which just happen
to be the algorithmic characteristics required to produce a deterministic merge function over a
number of sets).
After some analysis, we fortunately found that much of our
data could be modelled within
sets so by deploying CRDTs, our
developers dont have to worry
about writing bespoke merge
functions for 95% of carefully selected use cases. This gave us the
confidence to push ahead with
the project as it gives the best of
both worlds.
Weve got an available system
with which we deal with eventual consistency and if something
goes wrong or something happens in parallel that were not
expecting to happen, weve got
a way of maintaining a level of
consistency (eventually). Basho,
the developers of Riak, have built
numerous types of CRDT into
Riak as of version 2.0.
CRDTs can be initially quite confusing and mysterious. Even the
name is confusing, as there are
two slightly different types of
CRDT. Operation-based CRDTs
are known as commutative replicated data types and statebased CRDTs are known as convergent replicated data types.
In operation-based CRDTs, commutative operations are propagated and applied to all data
replicas in situ, and as such their
implementation resembles a
distributed log-ship. Whilst this
approach reduces transmission
volumes, the fact that these operations are not idempotent
means that practical systems require additional network protocol guarantees that are not easy
to implement. Therefore, most
state-of-the-art production systems end up going with statebased CRDTs, in which full state
updates are sent to all replicas.
Upon receiving a new state, the
replica queries its own state, runs
an update function (which must
monotonically increase the state
for example, adding a value
to an ordered set is a monotonic
operation), and finally calls a
merge function (which must be
commutative, associative, and
idempotent) to bring causal consistency to the new and previous
states.
Some examples will hopefully clarify this. At Bet365, we
ended up using the ORSWOT
(observed-remove set without
tombstones) CRDT as it facilitated add and remove operations in
a relatively space-efficient manner.Each ORSWOT is stored as a
value in a key-value store. (Figure
1)
The grey text represents
our data. In this case, its a
set of Erlang binary strings
([<<Data1>>, <<Data2>>,
<<Data3>>]). The green text
represents a version vector (a
type of server-side vector clock)
whose job is to keep track of the
entire sets top-level causal history.The blue text next to each
element represents its dotted
version vector (or dot).The dot
stores the actor and its associated count for the last mutation.
25
Figure 1
So lets look at what happens during an add operation.For simplicity, lets set our initial state:
001 {[{x,1}], [ {<<Data1>>, [{x,1}]}]}
Adding <<Data2>> to this set using the unique
actoryresults in:
001 {[{x,1},{y,1}], [ {<<Data1>>,
[{x,1}]}, {<<Data2>>,[{y,1}]}]}
Note that the new actoryhas been added to the version vector and <<Data2>> has been added to the
set with a birth dot of[{y,1}].
That was easy enough, but what happens if we need
to merge two concurrently updated ORSWOTs (ORSWOT A and ORSWOT B)?
First, well set up some new data for our example.
Here is ORSWOT A:
001 ORSWOT A = {[{x,1},{y,2}], [
002 {<<Data1>>,[{x,1}]},{<<Data2>>,
[{y,1}]},{<<Data3>>,[{y,2}]}]}
ORSWOT A has seen the addition of:
1. element <<Data1>> via actor x,
2. element <<Data2>> via actor y, and
3. element <<Data3>> via actor y.
We will merge ORSWOT A with the following ORSWOT B:
001 ORSWOT B = { [{x,1},{y,1},{z,2}],
[ {<<Data2>>,[{y,1}]},
{<<Data3>>,[{z,1}]},
002 {<<Data4>>,[{z,2}]}]}
ORSWOT B has seen:
1.
2.
3.
4.
5.
26
actor x add element <<Data1>>,

actor y add element <<Data2>>,
actor z add element <<Data3>>,
actor z add element <<Data4>>, and
actor z remove element <<Data1>>.
This results in a merged result of:

001 {[{x,1},{y,2},{z,2}],[{<<Data2>>,
[{y,1}]},{<<Data3>>,[{y,2},
{z,1}]},{<<Data4>>,[{z,2}]}]}
This reduces to [<<Data2>>, <<Data3>>,
<<Data4>>] once all of the ORSWOT metadata is
removed.
I will now try to demystify the merge function. At
a high level, this involves looking for common elements and establishing a happens before relationship between each elements dot and the version
vectorin the other set.
First, well look at the common elements in both sets,
as they are the easiest to understand.
Data2 is retained as it exists in both sets with the
same dot:{y,1}.
But Data3 is more complicated. Its element in

ORSWOT A has a dot {y,2} which has a greater
count than {y,1} from ORSWOT Bs version vector
of[{x,1},{y,1},{z,2}].
Its element in ORSWOT B has a dot {z,1}, which
does not exist in ORSWOT As version vector of
[{x,1},{y,2} ] so it is implicitly greater
Therefore, the dots {y,2}and {z,1} are merged to
give a new dot of [{y,2},{z,1}], thereby maintaining the elements causal history for subsequent
operations.
Next, lets look at the elements that appear solely in
ORSWOT A.
<<Data1>> is deleted because its dot of[{x,1}]is
not dominated by (less than or equal to) ORSWOT

Bs version vector of [{x,1},{y,2},{z,2}]. This
suggests that ORSWOT B has seen the addition of
<<Data1>> by actor x and its subsequent removal
by actor z.
Finally, lets look at the elements

solely from ORSWOT B:
<<Data4>> is included because its dot of [{z,2}] domi-
nates (is greater than) ORSWOT

As version vector [{x,1},
{y,2}].
Hopefully, this has given a taste
of how the ORSWOT state-based
CRDT works. For more details, I
recommend the riak_dt GitHub
page, which contains a full Erlang
implementation of the algorithm.
Conclusions
Given the probabilistic nature of

eventually consistent systems
(even those that use CRDTs), one
of our main insights is that its important that development teams
really understand their use cases.
We approached this by building many proofs of concept so
that we could demonstrate the
different failure scenarios and
see where eventual consistency
could safely be applied to our
business.
Talking about something theoretically is one thing; showing exactly how this failure or that failure could happen is something
else altogether. In R&D, we take a
very activist attitude, and rely on
proofs of concept to make sure
things work as expected and to
demonstrate trade-offs rather
than pore over the theoretical
aspects.
Its natural for development

teams to resist radical change, to
be cautious and stick to what is
known. But if an R&D team like
ours wants to lead the market
then we must use technology to
get a business edge. To do this
we need to innovate and take
calculated risks its crucial to
think differently and most
importantly we must bring the
wider development teams along
with the change.
A lot of the marketing hype
about NoSQL suggests that you
can just implement it and it will
work straight out of the box.Our
experience is that if youre solving large and interesting real-world problems and/or dealing with existing systems, this is
rarely the case.
Riak uses a
mechanism called
Vector Clocks to
detect concurrent
updates to data
without the need
for a central clock.
Some of the techniques that we

now use to structure our data are
very alien if we applied them
to our SQL databases we would
get fired but they work well
in a NoSQL environment. Sometimes you really have to think in
a totally different way about how
you structure the system and,
sometimes, non-standard approaches work.
The environment in which we

operate has been changing rapidly and the move to distributed
systems and technologies like
Erlang and Riak NoSQL have
been progressing concurrently,
so our developers have had to
deal with multiple changes in the
way they work.
27
Virtual Panel:
Current State of NoSQL Databases
by Srini Penchikala
THE PANELISTS
Seema Jethaniwas until recently the director of product management at Basho Technologies
for Bashos flagship products, the distributed NoSQL databases Riak KV and Riak TS. Prior to
joining Basho, she held product management and strategy positions at Dell, Enstratius, and IBM.
She can be found on Twitter as @seemaj.
Perry Krugis a principal solutions architect and customer advocate for Couchbase. Perry has
worked with hundreds of users and companies to deploy and maintain Couchbases NoSQL
database technology. He has over 10 years of experience in high-performance caching and
database systems.
Jim Webberis chief scientist with Neo Technology, the company behind the popular opensource graph database Neo4j, where he works on graph-database server technology and writes
open-source software. Jim is interested in using big graphs like the Web for building distributed
systems, which led him to co-write the book REST in Practice. He previously wrote Developing
Enterprise Web Services: An Architects Guide.
Tim Berglundis a teacher, author, and technology leader with DataStax, where he serves as the
director of training. He can frequently be found speaking at conferences in the United States and
all over the world. He is the co-presenter of various OReilly training videos on topics ranging
from Git to distributed systems, and is the author of Gradle Beyond the Basics. He tweets as @
tlberglund,blogs very occasionally,and lives in Littleton, Colo. with his wife and their youngest
child.
28
NoSQL databaseshave been around for several years and have become
the preferred choice of data storage for managing semi-structured and
unstructured data.
These databases offer lot of advantages in terms of linear scalability and better performance
for both data writes and reads.
With the emergence of Internet of Things (IoT) devices and
sensors and their generation of
time-series data, its important
to take a look at the current state
of NoSQL databases and learn
about whats happening now
and whats coming up in the future for these databases.
InfoQ spoke with four panelists
from different NoSQL-database
organizations to get different
perspectives on the current state
of NoSQL databases.
InfoQ: NoSQL databases have

been around now for more
than 10 years. What is the
state of NoSQL databases in
terms of industry adoption?
Seema Jethani: We are at the
point where every industry has
some NoSQL deployment. Web
scale, social, and mobile apps
drove the first wave of adoption;
IoT will drive the next big wave to
mass adoption.
Perry Krug: We have typically looked at NoSQL adoption
as taking place in three broad
phases. Phase one refers to grassroots developer adoption. Organizations are typically trying out
and/or deploying NoSQL under
non-mission-critical apps if in
production at all. Phase two refers to broader adoption where
NoSQL is playing a much stronger
role for mission/business-critical
applications but is not yet a stan-
dard part of the organizations

portfolio. Phase three signifies a
strategic initiative in an organization and broad re-platforming
to make NoSQL a standard within their organization. Depending
on the organization, Phase three
may see exclusive use of NoSQL
or simply a well-understood
balance between NoSQL and
RDBMs.
Our view of the industry has been
that organizations move through
these phases at their own paces.
Companies like Google, Facebook, PayPal, LinkedIn, etc. have
obviously been in phase three
for many years now, whereas
other companies (without naming names) are still progressing
through phases one and two.
Overall, there is no denying that
RDBMs still hold the vast majority of market share, but they are
growing at a much slower rate
than NoSQL. This rate is driven in
part by the relative size between
the two, but also by the fact that
the need for NoSQL is growing at
a much faster rate.
Jim Webber: Broadly, Id say that
NoSQL databases have moved
from a position of curious technology for early adopters and
Web giants into a category that
is quite accepted at least by the
early majority. Thats compounded by the presence of many
NoSQL databases (of all flavors)
in the top 20 of the DB-Engines
rankings of database popularity.
Anecdotally, it feels like NoSQL
is well known in the developer
and OSS community and its related applications like big data
are somewhat understood by the
business community.
Tim Berglund: Even five years

ago, NoSQL was cool. It was
something worth talking about
at a conference all by itself; I had
a very popular talk back then
called NoSQL Smackdown that
compared a few popular products. That talk could fill rooms
just because of the buzzword.
Today, it is still the case that
NoSQL adoption is interesting
to developers but it has become
more commonplace. Developers who work at companies in
seemingly ordinary industries
like finance, retail, and hospitality are building real systems using
non-relational databases. Corporate IT decision makers no longer
need to be particularly progressive to commit to NoSQL. We
have a long way to go before the
category reaches maturity and
competes on a completely level technology-selection playing
field with relational databases,
but the trend line seems obvious
to me.
InfoQ: What are some of the

best practices for data modeling in NoSQL database projects?
Jethani:
Denormalize all the things:
using denormalization, one
can group all data that is
needed to process a query
in one place. This speeds up
queries.
Deterministic
materialized
keys: combine keys into composites and fetch data deterministically so that you can
avoid searching for data.
29
Application side joins: as joins

are not universally supported in NoSQL solutions, joins
need to be handled at design
time
Krug: The flexibility of data modeling and management is one of
the more important driving factors for NoSQL adoption (the other being the need for operationability in terms of performance,
scale, availability, etc.). When
talking about data modeling, the
overarching best practice is to allow a much closer alignment of
data model/structure between
the applications objects and the
database. The idea of an ORM
layer, which involves taking the
applications objects and breaking them out into rigid rows and
tables and then joining those
back together, is quickly eroding.
Webber: Thats rather a broad
term given the range of data
models supported under the
NoSQL umbrella! In some data
models, the key is understand
how to denormalize your data
into keys and values, columns, or
documents, including any necessary user-level tricks to make it
perform then to ponder how
to support that model by indexing and so on. In graphs, which
is my area of expertise, modeling
is rather different but altogether
more pleasant because of the
data model (nodes, relationships,
and labels) and the processing
model (graph traversal).
In a native graph database
like Neo4j, the engine natively
supports joins. These arent set
joins as were used to in relational databases, but the ability to
efficiently reconcile two related
records based on a relationship
between them. Because of that
join performance (many millions
of joins per second, even on my
laptop), we can traverse large
graphs very quickly. In Neo4j,
30
such joins are implemented as

the ability to traverse relationships between nodes in a graph
by cheap pointer chasing. This
is an aspect of native graph databases known as index free adjacency that allows O(n) cost for
accessing n graph elements as
opposed to O(n log n) or worse
for non-native graph tech.
Then, the power to traverse
large connected data structures
cheaply and quickly actually
drives modeling. Given a typical
domain model of circles and lines
on a whiteboard, we find that it is
often the same as the data model
in the database: what you draw
is what you store. Further, as we
expand the questions we want to
ask of that data, it leads us to add
more relationships and layers,
expand subgraphs, and refine
names and properties. We think
of this as being query-driven
modeling (QDD, if you will). This
means the data model is open
to domain experts rather than
just database specialists and
supports high-quality collaboration for modeling and evolution.
Graph modeling gives us such
freedom: draw what you store,
with a small set of affordances for
making sure youre mechanically
sympathetic to your underlying
stack.
Berglund: If we were to confine
our discussion to what I have
traditionally seen as the heavies
Cassandra, MongoDB, and
Neo4J we can see that there
is no one answer to this question. Each of these databases is
as different from each other as
they are from relational databases. So there really is no one approach to NoSQL data modeling;
instead, we must approach each
databases on its own terms and
learn data-modeling techniques
appropriate to it.
At present, my work is dedicated to producing educational resources for Cassandra. Some Cassandra applications, like storing
time-series data, have specific
data models that work best and
can be learned as canned solutions. More general business-domain modeling, such as is intuitive for many developers using
a relational database, require a
specific methodology that differs
from the received relational tradition. At one point, we were all
new to relational data modeling,
and we had to learn how to do it
well. Its the same thing with the
NoSQL databases. Someone has
to sit us down and explain to us
how they work and how best to
represent the world using the
data models they expose.
InfoQ: What should developers

choose when there is a conflict between data-modeling
requirements that call for specific NoSQL database (for example, a document store) but
performance requirements
may require a different type
of database (like a key-value
store)?
Jethani: Developers should
always choose performance.
Data models can be modified to
meet the needs of the use case.
Granted, additional work may be
needed in the application but
you cant get performance out of
thin air. Always try to get better
performance.
Krug: In my opinion, document
and key-value are too similar to
see as options for this sort of decision. A better example would
be to compare key-value/document versus graph versus columnar....
This is certainly one of the major

challenges facing developers and
architects. In one sense, there is a
growing convergence of capabilities and fitness between these
different types, with some vendors providing multi-model or
just expanding the features and
functionality of one type so that
it can handle more and more use
cases. On the other hand, the difference between these different
types of technologies is rooted
in the idea that NoSQL is not just
another sledgehammer for every nail in the way that RDBMSs
are. Its a double-edged sword
of added choice and complexity coupled with being able to
choose the right tool for the job.
In the not-too-distant future, we
expect to see more and more
consolidation of these types of
technologies so that customers
dont necessary have to choose
between such widely different
choices but can still tune an individual system to meet the needs
of their different applications
(without getting too far down
the line of not being good at anything).
Webber: Ive been pondering
this a lot lately as various graph
libraries have appeared on top
of non-native graph databases.
I think its important to understand what your database is native for and what it is non-native
for. For example, Neo4j is native
for graphs, and is optimized for
graph traversals (those cheap
joins via pointer chasing) and
ACID transaction for writes. Layering a (linked) document store
on top of Neo4j could make
sense because linked documents
can benefit from the graph model the two compose well, as
systems like Structr demonstrate.
The reverse isnt true, though.
If you have, say, a document or
column database that doesnt
understand joins, then grafting

relationships onto them (in the
application or via a graph-wrapper library) is going against
the grain. The joins that Neo4j
natively eats for breakfast via
low-level pointer chasing have to
be done at the application or library level. Retrieve a document
over the network, examine its
content, resolve another address,
and retrieve the document over
the network, and so on. This places substantial practical limits on
efficiency and traversal performance of non-native approaches.
For balance, Id point out that
Neo4j isnt, for example, a native
time-series database. If you want
to do time series in Neo4j, you
probably end up encoding a time
tree (a kind of indexing pattern
that looks somewhat like a B+
tree) into your model and explicitly querying against that tree.
A native time-series database
would automate much of that
work for you. So the only time
youd choose Neo4j as a non-native time-series database is when
you want to mix in other (graph)
data to accompany the time
points (e.g. transaction history,
geospatial, etc.). At that point,
you tip the balance and choose
graph even though it isnt native
for one of your dimensions.
Berglund: Performance requirements always have to win. You
cant turn a failing latency SLA
into a success if you are asking
the underlying database to outdo its best-case performance.
You can, however, bend a Cassandra data model (for example)
to support document-like storage and access patterns if you
need to. This is not to say that
there arent systems that are best
represented in a document store,
but ultimately even a tabular
data model can represent any real-world state of affairs, however
inelegantly in some corner cases.

Performance is not as flexible.
InfoQ: Can you discuss some

of the tools that will help
improve the developer productivity when working on
NoSQL-based applications?
Jethani: There are two key aspects of developer productivity
that need to be addressed: ease of
use and features that allow them
to easily do powerful things with
the database. Ease of use can be
enhanced by providing out-ofthe-box clustering, which would
save developers valuable time
during the on-boarding process.
Rich features such as support for
higher-level languages and client libraries in various languages
enable developers to run complex queries without having to
do a lot of heavy lifting in the application. Finally, tracing and debugging tools allow developers
to quickly identify the root cause
of a problem, freeing them from
time spent debugging.
Krug: This is definitely an area
that is both lacking today as
well as rapidly growing. From a
deployment and provisioning
perspective, there is fairly good
standardization across technologies and integration into the
common tool chains. However,
for the most part, each technology currently provides a siloed set
of tools for their own developers.
I think some degree of standardization of language/API across a
few different technologies will
be very interesting to watch out
for over the next few years. Its a
pretty big unknown at this point
how soon that will happen, if at
all.
Combined with that, there are a
host of new languages (e.g. Node.
31
js, Go) that are changing the way

applications are designed, developed, and deployed.
The most useful tools we see out
there today are around reference
architectures/implementations
that can provide copy/paste examples to build upon. This also
includes training and hands-on
workshop-style engagements
from the experts in each technology.
Webber: Its clear to me that the
relational databases are more
mature in their integration with
developer tooling than the
NoSQL databases thats just a
function of time. But that is rapidly changing as the NoSQL market
shakes out and the database and
tooling vendors begin to consolidate around a small number of
front-runners, supported by an
enthusiastic OSS community.
In Neo4j specifically, weve been
working hard over the last five
years to produce a very productive query language called Cypher that provides humane and
expedient access to the graph.
That language is now in the early stages of standardization as
openCypher, and will appear as
the API to other graph technology over time (e.g. there is an initiative to port Cypher to Spark).
In our recent 3.0 release, we
worked hard to make access to
the database boringly familiar.
We now have a binary RPC protocol that is used by idiomatic
drivers in several popular languages (C#, Java, Python, PHP, Javascript,...) that feels like the kind
of drivers that RDBMSs use.
That same network stack can
also be used to invoke server-side procedures written in any
JVM language. While initially I
thought procedures might be an
32
interesting footnote condemned

by a sorry history of stored procs
to irrelevance, it turns out theyre
actually amazing. Neo4j procedures are just code. That code
can be rigorously tested in your
IDE (TDDd even) way before deployment. Brilliantly, while we
intended those procedures to
be used for iterative graph algorithms, we now see theyre being used to bring in data from
other systems (including nongraph databases, web services,
and even spreadsheets) and mix
that data into the same Cypher
queries that operate on the local
graph. The productivity this enables is simply amazing.
Atop all of that, we and others in
the graph world are busy working on visualizations for graphs
so that non-experts can interact
with the model. We saw how this
played out recently where the
Panama Papers were exposed
by a combination of Neo4j for
graph query and Linkurious for
visualization. In working with sophisticated connected data sets,
this kind of tooling is becoming
increasingly important to developers, too.
Berglund: Depending on your
tooling preference, either NoSQL
looks like a tooling wasteland
or looks just fine. If you want a
simple visual data-model exploration tool and command-line
query capability, were pretty
much there today. Any of the major databases will give you that.
Most of them have connectors
in all of the major (and many of
the minor) data-integration tools
as well.
But if you want, say, round-trip visual-modeling support, you will
still be disappointed. I am hopeful the next five years will see this
tooling gap close for those databases for which it is appropriate.
InfoQ: What do you think

about multi-model databases?
What are the pros and cons
of the multi-model database
option versus polyglot persistence?
Jethani: Polyglot persistence
advocates using multiple databases to store data, based upon
the way individual applications
or components of a single application use the data i.e. you
pick the right database for the
right use case. However, this
approach presents operational
and skills challenges. In contrast,
multi-model databases are designed to support multiple data
models against a single, integrated back end. These databases are
designed to offer the data-modeling advantages of polyglot persistence without the complexity
of operating disparate databases
and the need to be proficient in
multiple databases as opposed
to one.
Krug: I think I covered a little
bit of this above. Polyglot persistence is a sliding bar between
the right tool for the job and
too many tools! Multi-model is
also a sliding bar between good
for many things and not good
at anything. The legacy users of
NoSQL have generally preferred
polyglot because each individual
technology excelled at a limited
set of use cases and there was a
need for many of them underneath a single broad application.
However, newcomers to NoSQL
generally prefer a much smaller
set of technologies and are looking to leverage each of them for
a broader set of use cases.
Personally, I am fearful that
multi-model databases will end
up conflicting with their own
feature sets and will not be really good at anything. I have seen
better results with a relatively

small set of one to three technologies that can handle all or
the majority of an organizations
needs. There will always be the
need for super-specialized technologies; thats not unique to databases.
Webber: In a world of dominated
systems composed from (micro)
services, I think that developers
choose the right database or databases for their service and then
compose those services to deliver functionality. As such, I think
that polyglot has strong credibility.
I also think the jury is out on
multi-model. Anecdotally, its
hard to swallow that any single
database can be all things to all
people. We saw that in the era of
the RDBMS when we shoehorned
all things into the relational data
model. That lesson was what
spawned NoSQL!
But where my thinking is at, as I
mentioned earlier, is the notion
of a database being native for
something and non-native for
other things, and whether the
native model can be sympathetically composed into the non-native model. Graph happens to
be a good native position there
because it is the richest model
narrowing its affordances to other models is therefore plausible
(should you choose to do that).
Berglund: I have never been
too persuaded by polyglot
persistence as an architectural strategy. It may well emerge
in some particular system that
is composed of the integration
of several legacy systems, but
its probably an anti-pattern for
green-field work. My reason for
this is partly operational and
partly due to design considerations. Operationally speaking,
it is more difficult to manage uptime and performance SLAs for

multiple complex pieces of software infrastructure than just one.
In terms of the code itself, it is
also difficult to juggle many data
models in a single project. The
value of the different models has
to exceed these two costs for it to
be a rational choice, and I think
this situation is rare.
That said, real use cases do exist for the different models. An
architect might prefer to model
part of her system as a graph, do
ad hoc SQL queries over another,
and meet extremely aggressive
performance SLAs with a part
of the system that can be modeled more simply. Multi-model is
a good solution to this problem,
since it answers the operational
challenge by putting all of the
models in the same database
and it has the potential of simplifying the API problem by making
the different models interfaces
share as much API surface area
as possible. I think, in the future,
all of the major NoSQL databases will tend to share features of
one anothers native models as
much as they can. Im excited to
see what the next decade brings
in this area.
InfoQ: Gartner says the leading database vendors will offer

multiple data models, relational and NoSQL, in a single
platform. What do you think
about this assessment?
Jethani: Some leading database
vendors offer NoSQL today with
little market traction. They are a
long way off from being able to
offer multi-model, RDBMS, and
NoSQL from a single platform.
And even if they do, there are
many functional and perfor-
mance tradeoffs that it will limit

its attractiveness. For now, we
see the world as moving towards
multi-model NoSQL alongside
RDBMS.
Krug: I think this may be true
in terms of what vendors like
Oracle and IBM would like to
provide, but I think it is false in
terms of what the market really
wants/needs. There will certainly be some degree of overlap,
but there are fundamental design and architecture differences
between relational and NoSQL.
Simple laws of physics (not to
mention CAP) dictate that certain capabilities around transactions, replication, distribution,
etc. cannot be mixed between
the purest of needs of relational
and NoSQL. In the end, a single
vendor may provide multiple
choices, but I believe they will
have to be dealt with as very different products.
Webber: At this point, it seems
that Gartner is the main proponent of this message, unsurprisingly. While I see some databases
starting to offer multiple models,
Im not totally impressed with
the notion. Like I said, we tried
the all things to all people approach with relational databases.
But pragmatically, I think it again
comes back to what your database is native for. If youre native
for graph, you can probably offer
a reasonable document view of
your data. Conversely, if youre
native for columns, its difficult
to deliver native graph performance when your underlying
engine cant process joins.
On the multi-model versus
polyglot persistence, I wonder
whether the fault lines run along
CIO and delivery responsibilities.
As a CIO, Id like to rationalize the
number of databases that I have
running my business. Whereas
33
as someone who builds and operates software, Ive long since

grown used to using the right
tool for the right job (management permitting). Its obvious
which community Gartner addresses and at some point it
does make sense to play to your
crowd.
Berglund: I think that sentence
calls to mind the image of a developer API that may never materialize, but apart from that, I
think the broader trend is already
happening. Any non-relational
database for which a Spark integration exists already offers
a relational and non-relational
data model through SparkSQL.
After having initially discounted the importance of relational
databases features for any new
database we create (as all NoSQL
advocates have done at some
point!), we tend to re-implement
the relational algebra on top of
that database over time. This lets
us explore the space of different
operational and performance
characteristics (e.g. elastic scalability, low-latency writes, schema-less data model, etc.) while
still retaining the utility of the
relational model over time.
InfoQ: Can you talk about

using NoSQL databases and
big-data technologies (like
Hadoop and Spark) together
to solve big-data problems?
Seema Jethani: NoSQL databases and other big-data technologies like Hadoop, Spark, and
Kafka are used to build various
data-analysis pipelines for large
data sets. One such example involves using Riak for short-term
operational data that is queried
or updated often, using Hadoop
for long-term storage as a data
warehouse, while using Spark for
34
ingestion, real-time analysis on

Riak, and batch analysis on Hadoop.
Krug: This goes a little bit to
the above comments around
polyglot persistence. For almost
always, there has been a separation between technologies for
operational databases and those
for analytics databases. Even
if the same technology can be
used for both in some places, it
is usually deployed differently to
meet those different needs.
Big data is the very broad buzzword that encompasses both
NoSQL (operations) and the traditional big-data technologies
like Hadoop and Spark (analytics). For an entire application
(imagine Facebook or LinkedIn),
combining NoSQL and Hadoop
technologies is absolutely critical to meeting overall needs. The
idea of a lambda architecture
with data being handled both
in real time as well as in batch is
becoming fairly well established.
This can also be looked at in the
light of NoSQL versus RDBMS:
the designs, architectures, and
resource requirements of NoSQL
systems are very different from
Hadoop and batch-processing
systems. This is for good reason
as the goals of each are very different, and there is usually relatively little overlap between the
two. Spark starts to blur the lines
between batch and real time, but
its still not an operational-database technology.
I expect we will continue to see
convergence between operational/online and batch/offline
technologies, but I expect that
there will always be a separation
of the two requirements within
an application.
Webber: That bifurcates easily in
Neo4js world view. Neo4j is by far
the leading technology in graph

storage and query, but thats
only half the story. The other half
is graph compute and the leader
there is clearly Spark. In numerous use cases, we see Neo4j as
the repository of the authoritative graph, feeding downstream
systems and running graph query workloads. But we also see
graph-processing infrastructure
like Spark taking projected subgraphs from Neo4j, parallel-processing them, and returning the
results to the graph, enriching
the model in a virtuous cycle.
Berglund: First, its important to
note that NoSQL doesnt always
mean scale. Some NoSQL databases choose to innovate in performance and data model while
not fundamentally scaling any
differently than relational databases. However, for those NoSQL
databases that also belong to
the big-data category, integration with distributed computation tools is a key architectural
differentiator. In particular, integrating Spark with databases like
Cassandra or Riak adds the flexibility of ad hoc analysis on top of
data models that otherwise do
not support ad hoc queries very
well. This architecture offers the
promise of doing analytics on
top of an operational data store
with zero ETL between the two
systems. This is a new approach
that architects are just starting
to build out, but its a successful
approach that will win over traditional analytics systems at least
some of the time.
InfoQ: Microservices are getting lot of attention recently

for developing modular and
scalable enterprise applications. Can you talk about how
microservices can work with
NoSQL databases?
Jethani: There are many different ways microservices can work

with NoSQL databases. At one
end of the spectrum, each service
may have its own database instance. This is operationally challenging and not a recommended
approach. A use-case-oriented
approach, in which services that
address a particular problem
share a database cluster, is a
better fit with the microservice
architecture. Riak is very popular
and a good fit for solutions that
use microservices architecture.
Specifically as Riak is master-less, with each node running
the same code a Riak cluster
can be scaled horizontally up
and down without dependency on other services. Riak has
well-defined HTTP and PBC APIs
for data reads, writes, updates,
and searches, provides a REST
endpoint for remote monitoring
and has a ready-to-use Ansible
playbook and Chef recipe for automated deployment and configuration.
pendent microservices demands

tooling.
databases, are less important in

the latter architectural form.
Fortunately, Neo4j is well known

for microservices management
where we consider the system as
a whole to be a graph of interacting services. Theres an excellent
video from our recent GraphConnect conference where the folks
from Lending Club talk through
their approach to managing their
microservice estate with Neo4j.
But as NoSQL adoption moves

from large and rare web properties to more commonplace
corporate IT applications, the
single-application architectural
assumption is less likely to hold:
many different applications and
services inside the company may
want access to the data in the
NoSQL database.
Having a graph view of your system enables you to have a predictive and reactive analysis of
faults and contention. You can
ascribe value to points of failure
and reason about their costs and
roll all of this up to the end user.
You can locate single points of
failure, too, and you can keep
your model up to date with live
monitoring data that then gives
an empirically tempered view of
the dependability of your whole
system and the risks to which it is
subjected.
Microservices are an excellent

solution to this problem. They
allow the architect to stand up
a single piece of code to talk to
the database, which holds to
the original assumption under
which NoSQL databases were
designed, yet also to make that
service available to other consumers in the corporate IT application stack. If microservices are
adopted enterprise-wide, this
method of integration becomes
the native approach, and internal
tooling and expertise grow up
around it. The missing features
that the databases of the 1990s
gave us seem less important under the new paradigm.
Krug: Microservices work great

with NoSQL databases. The main
advantage that NoSQL provides
(usually, depends on the vendor) is the ease of setting up and
running many small instances/
clusters while still allowing each
of those to scale quickly as the
needs of the application/microservice grow. From a hardware,
cost, and setup perspective this
is less true with RDBMSs, but in
theory it could be.
And if youre feeling particularly

plucky (as was one large telco I
talked to a couple of years back),
you can think about making
the graph the authoritative description of your microservices
deployment. As a side effect of
traversing the graph, you can
create deployment scripts that
actually build your system atop
your PAAS: graph as the ultimate
configuration management database.
Webber: In general, it makes

sense for each microservice to
use the best database for its
own case (if it needs one). But
managing systems composed
from microservices is itself a demanding challenge. Not only are
distributed systems hard from a
computing-science point of view
(in particular handling failures)
but managing the evolution of
a network of many mutually de-
Berglund: Relational databases

grew up in a world in which a
single schema supported a number of small client/server applications. NoSQL databases grew up
in a world in which a single large
website was served by one database. Features like programmatic transactional boundaries and
granular security, which are often
lacking or immature in NoSQL
InfoQ: Container technologies provide the mechanism

to deploy software applications in isolated deployment
environments. What are the
advantages and limitations of
running NoSQL databases in a
container like Docker?
Jethani: Containers are easy to
set up and their lightweight nature allows more efficient use
of hardware. At the same time,
challenges around discovery,
networking, and ephemeral storage remain:
Discovery Its possible to
stand up multiple clusters
35
inside of Docker containers.

But how do we connect to
the one we need? How do we
keep track of which container
holds the right cluster? There
are tools to help with this like
Weave but the issue of discovering the host:port to connect
to can be a problem.
Ephemeral data storage
Unless you take operational
care to start a database cluster on the same node and
using the same data directory, youll get a fresh cluster.
In some cases, this is exactly
what you want. But not all.
This is especially problematic
in cloud environments where
nodes could flap often. You
dont want to pay the penalty
of convergence and re-allocating data partitions.
Networking Internal Docker IPs that the database binds
to are not necessarily accessible outside the Docker daemon. Thus, when the database must respond to a client
with a coverage plan that indicates which nodes the data
resides on, it needs to supply
addresses that the client will
understand.
Krug: While running databases
in containers is still fairly nascent,
I think that it holds a lot of promise and will quickly become well
adopted. The advantages for
NoSQL+Docker are essentially
the same for anything+Docker: removing the performance
overhead of a hypervisor while
allowing for even more flexible
deployment than VMs provide
today.
In my opinion, there are a few
disadvantages, but they are more
factors of the maturity of running
these two technologies together
rather than inherent limitations
between them:
36
1. Security and resource segregation is a big one, but will

be resolved through technology improvements and best
practices.
2. At the moment, containers
are typically seen as being
very stateless whereas databases tend to want persistent
storage. This is also something that is being improved
upon at the container level.
Jim Webber: Neo4j, like most
NoSQL databases, happily lives
in a container (or indeed in other
virtualization schemes). In fact,
there is an official Docker image
supported by both Neo4j and
Dockerhere.
The only real issue with virtualization of databases is the uncertainty about the behavior of your
neighbors. Databases (including
Neo4j) love RAM, and failing that
they appreciate a fast, uncontended channel to disk. If there is
contention for either of those because of unpredictable or greedy
neighbors, then the performance
of your instance will suffer. But if
you have a good handle on that,
everything should be okay.
Berglund: Again, it helps to
think of the distributed NoSQL
databases here. These databases
expect to be deployed to many
servers, and their clusters may
potentially be scaled up and
down elastically.
The advantages of deployment
automation and immutable infrastructure apply to any computer program you might want
to deploy, and databases are no
exception. However, Docker is
probably slightly less valuable
for deploying a NoSQL database
than it is in deploying individual
instances of, say, a given microservice. The case for containers in
a microservice architecture relies
on the fact that the deployed

image changes as often as you
change the code. Ideally, the deployed image of each database
node does not change nearly as
often.
This is not to say that Docker is
the wrong idea when it comes
to NoSQL. If youre already using
Docker elsewhere in your system, it might be smart to include
your database in the fun, but
NoSQL by itself will probably not
convince you to switch to a container-based approach.
InfoQ: What is the current security and monitoring support

in NoSQL databases?
Jethani: Monitoring is often provided through integration with
monitoring providers such as
New Relic, Datadog, etc. or set up
in house using Nagios, for example, both using metrics gathered
and provided by the database.
Security features such as authentication, data governance, and
encryption are generally provided out of the box. Each database
provides varying degrees of support for each.
Riak, for example, supports access control and authentication with support for multiple
auth sources as well as group/
user roles. They can be audited
and we have access logging. For
monitoring, we have calculated and raw stats available from
command line and HTTP interfaces. Enterprise customers also
get access to JMX monitoring
and SNMP stats and traps.
Krug: These two should really
be separated or maybe clarified
further. Whereas monitoring has
always been a critical part of run-
ning and managing NoSQL databases, security has not been until
more recently. Thats not to say
that some technologies didnt
provide better or worse capabilities in each area, but monitoring
has been a topic of discussion
and improvement for much longer and I think is in a much better
state overall.
I think security is the more interesting topic to talk about. The
early adopters of NoSQL technology didnt place a high value or
have a great need for very robust
security capabilities. When faced
with an endless list of possible
features/improvements, the creators of NoSQL technologies followed what was most important
to their consumers. Over the last
few years, that level of value/importance on security has shifted
directly in line with the kinds of
applications and organizations
adopting NoSQL and the creators of those technologies have
followed suit.
In my opinion, it is not valid to
compare NoSQL to RDBMSs in
terms of security. RDBMSs have
had 30 to 40 years of history to
build those features. Looking
back into their history will show
that they made similar use-casedriven decisions to those NoSQL
has made in its early years. I have
no doubt that security will play
a more and more important role
for NoSQL and that the leading
technologies will continue to
build the features that their users
require.
Webber: I actually dont know
what the general case is, but I
imagine its reasonable since
NoSQL databases underpin lots
of production systems. In Neo4js
case, we have long had security
and monitoring baked into the
product and have a team whose
entire responsibility is these kinds
of operability opportunities. In
future, well have LDAP, Kerberos,

and AD integration out of the
box (some of that code is already
visible on our GitHub repo, of
course), and refine our monitoring surface. Id like to think well
also expose system-monitoring
things to client apps through our
binary protocol, too, since that
would make monitoring apps
just like normal apps.
Berglund: I can speak most
readily about Cassandra in this
area, since its where I specialize.
Open-source Cassandra has very
basic security, and can be monitored through other open-source
tools like Nagios. The commercial
version, DataStax Enterprise, has
more sophisticated features like
integration with LDAP and Kerberos (rather than storing security credentials in the database
itself ), and has a custom-built
management tool optimized for
the management needs of a production Cassandra cluster.
Performance
requirements always
have to win. You cant
turn a failing latency SLA
into a success if you are
asking the the underlying
database to outdo its
best-case performance.
InfoQ: What do you see as

the new features and innovations coming up in the NoSQL
space?
Jethani: An interesting area in
research, for example, is how coordination may be avoided even
in the case of concurrent transactions to maintain correctness
and thus make transactions not
only possible but well-performing in distributed databases. You
can find the details of this researchhere.
Over the years, NoSQL databases
have been closing the gap between advantages of relational
databases and flexibility and
scalability offered by NoSQL databases. Research and innovations such as the above make it
37
possible for us to enjoy the best

of both worlds.
Krug: Its a very hard topic to
discuss at the broadest level of
NoSQL since each technology
is fairly different and evolving
along different paths. I think we
will see more and more features
that will attempt to address or
mimic features found in RDBMSs.
I dont think it is a good idea
simply to copy, but we also cant
deny that some applications do
need the same type of feature in
order to meet requirements (Im
thinking specifically about transactions here but could easily expand that to others).
There will also be continuing
expansion of and overlap across
the different technologies, which
will lead them to differentiate
more based upon performance
and reliability. I think this will
also lead to a consolidation of
technologies/vendors with the
vast majority of NoSQL databases (more than a hundred) fading
away.
Truly innovation-wise, I expect
there will be more real-time analytical capabilities built into
NoSQL and perhaps the emergence of a standard language/
API.
Webber: My first job out of grad
school was in transaction processing (under the auspices of
InfoQ editor Mark Little no less).
That whole area became deeply
uncool with the advent of NoSQL
and the popularization of eventual consistency. But around
2012 that changed: researchers like Peter Bailis (HA transactions), Diego Ongaro (Raft),
Emin Gun Sirer (linear transactions), and many others started
to reconsider transactions and
consensus protocols in the light
of high-throughput, coordination-avoiding scalable systems.
38
This resurgence of interest in

strong consistency for a highly available world is profoundly exciting and I expect to see
this thinking impact the NoSQL
world generally. Already it has
impacted the way Neo4j works
it very much shapes some of
the fault-tolerance and scale aspects of our future product roadmap. Some days I cant believe
my boss actually pays me to work
on this stuff sucker!
Berglund: There are a few things
I expect to see:
better support for real-time
analytics over operational
data in horizontally scalable
databases,
The processing power at the

edge is rapidly increasing, as is
the need for applications to work
in an offline or semi-connected
fashion. Whether or not NoSQL
databases will play a role on the
mobile device, and what that
role might be, is an interesting
discussion.
Webber: Databases remain an
exciting field in which to be involved. But I wonder for how
much longer well keep the
NoSQL umbrella. Column, KV,
document, and graph all have
their own strong identities now
and itll be interesting to see how
those categories forge ahead. Its
interesting times indeed.
improved tooling, and

more and more of the relational algebra available natively.
Some panelists provided additional comments on NoSQL databases.
Krug: At a very high level, NoSQL
is really about providing two
things: data/development flexibility, and better operationability (performance, scale, HA, etc.).
This panel seemed to focus more
on the first one, which is primarily focused towards developers,
but didnt spend as much time
on the ops discussion, which
is critical for an organization/
application to rely upon. Historically, different technologies in
the NoSQL space have also focused on one or the other... but
customers are demanding both.
The choices being made by organizations, whether to use NoSQL
at all and which NoSQL technologies to choose, need to take both
of these into consideration.
There is a growing importance of
mobile computing/applications.
PREVIOUS ISSUES
45
A Preview of C# 7
The C# programming language was first released to

the public in 2000. and since that time the language
has evolved through 6 releases to add everything
from generics to lambda expressions to asynchronous methods and string interpolation. In this eMag
we have curated a collection of new and previously
content that provides the reader with a solid introduction to C# 7 as it is defined today.
44
Cloud Lock-In
46
Architectures Youve
Always Wondered
About
This eMag takes a look back at five of the most popular

presentations from the Architectures Youve Always Wondered About track at QCons in New York, London and San
Francisco, each presenter adding a new insight into the
biggest challenges they face, and how to achieve success.
All the companies featured have large, cloud-based, microservice architectures, which probably comes as no surprise.
Technology choices are made, and because of a variety of reasons--such as multi-year licensing cost,
tightly coupled links to mission-critical systems,
long-standing vendor relationships--you feel locked
into those choices. In this InfoQ eMag, we explore the
topic of cloud lock-in from multiple angles and look
for the best ways to approach it.
Exploring Container
Technology in the Real
World
43
The creation of many competing, complementary

and supporting container technologies has followed
in the wake of Docker, and this has led to much hype
and some disillusion around this space. This eMag
aims to cut through some of this confusion and explain the essence of containers, their current use cases, and future potential.

InfoQ EMag The Current State of NoSQL Databases

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

InfoQ EMag The Current State of NoSQL Databases

Uploaded by

Copyright:

Available Formats

FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT

The Current State of

Building a MarsRover Application

Highly Distributed Computations

In this article, Dr. Josiah Carlson, author of

Key Lessons from

currently works as Senior Software Architect in Austin,

flict-Free Replicated Data Types (CRDTs) to help with

Read online on InfoQ

Highly Distributed Computations

Christopher Meiklejohnis a senior software engineer with Basho Technologies and a

Synchronization of data across systems is expensive and impractical,

the problem. However, theres a

example, a shared virtual wallet

Consider a large mobile-gaming

In the ideal situation, we would

The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016

succeed when clients are offline.

Conflict-free replicated data

In Conflict-free replicated data

about the events used to create

To solve this problem, we turn to

The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016

as a state chart: it shows what

P2 represent processes. Our store

We can extend this model to

To give an example, the natural

Lets start by looking at the single-assignment case as a lattice.

Additionally, we extend the

Distribution is also important for

The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016

store, and it either succeeds or

The model also provides the ability to run an entirely replicated

What are the

sume either replication of each

full replication of the data within

Were going to look at Erlang

Heres an example of an advertisement counter written in our

The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016

manner, but cannot track decrements. (Code 1)

tive advertisements, viewing

001 %% @doc Client process; standard recursive looping server.

The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016

This bind operation succeeds

model to detect when causal+

Next, we initialize one server process per advertisement. Heres

Each of these server processes performs a threshold read

Can we rewrite applications that

We would love to hear your feedback, given a large part of our

Our programming model for

What changes are needed to

The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016

Read online on InfoQ

Using Redis as a Time-Series Database

JosiahCarlsonis a seasoned database professional and an active contributor to the Redis

While individual commands in

ticle will use transactional pipelines and Lua scripts to prevent

ed by calling the .pipeline()

The Current State of NoSQL Databases // eMag Issue 47 - Nov 2016

collected commands, and finally EXEC. When Redis

sell price and volume of a traded stock,

Advanced analysis using a sorted set

The most flexible method for storing and analyzing