Professional Documents
Culture Documents
Architecting*the*Event-Driven*Future:*
Building*a*Scalable*Stream*Analytics*Platform*
Mark Madsen
Third Nature Inc.
!!!!!!!!!!!!!!!!
TABLE OF CONTENTS
Executive summary ........................................................................................................................... 3
The Value of Being Reactive .............................................................................................................. 4
Stream Analytics Applications Are Different and More Difficult..................................................... 6
Building a Stream Analytics Application .......................................................................................... 9
Operational Requirements .............................................................................................................. 10
How Realtime Stream Analytics Applications Have Been Built .................................................... 12
Using the BI environment ............................................................................................................... 12
Integrating Products to Build a Complete Application .................................................................. 14
Custom Application Development .................................................................................................. 15
Evolving to a Realtime Stream Analytics Platform......................................................................... 17
Key Takeaways................................................................................................................................. 19
About Mark Madsen ........................................................................................................................20
About Third Nature .........................................................................................................................20
!!!!!!!!!!!!!!!!
Executive summary
Stream analytics applications are not like traditional transaction processing applications (OLTP) or
business intelligence (BI) tools, both of which are human-initiated, on-demand architectures. Eventdriven ecosystems operate continuously, with people acting to govern and coordinate tasks and activities,
rather than initiate them.
Since initial use cases for stream analytics applications are often very focused, building a first application
is often deceptively simple. However, as applications become more sophisticated (and useful), complexity
skyrockets as developers add features and attempt to correlate streams. Stream analytics also place a
burden on operations, which is often unprepared for the type of monitoring and management these
applications require. To escape this complexity spiral, development and operations must abstract away
the repetitive, low-value work through use of a platform.
Key findings include:
! Stream analytics platforms put insights directly in the hands of business users, bypassing scarce,
expensive gatekeepers, but those solutions can be brittle, slow-to-evolve, and expensive to modify.
! The speed of data flow and the emergence of non-human consumption models have
fundamentally changed application development.
! The ongoing need for caching and persistence of context data add complexity to realtime stream
analytics applications.
! To mitigate increasing code complexity and maintenance costs, developers should build their
efforts around a realtime platform that encapsulates repeatable, reusable elements.
! This platform will not address all needs. Many realtime use cases are simpler, requiring a different
toolset.
!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!
!
The last major technology generation BI, data warehouses and analytics enabled organizations to see
and analyze across processes. It created a capability to see and understand what had happened and
monitor the business daily or even hourly. It improved the ability to react over the older style of siloed
views from single-department application reporting.
The design of these systems presumed that humans would be the endpoint of data delivery. That is no
longer the case, as work moves into the network and enables processes to be automated and managed by,
rather than driven by, people. The speed of processing and networks enabled a shift in how we can
approach the management and execution of business processes.
Relatively recent additions to the technology stack are streaming databases and similar realtime engines,
which could process realtime, record-based data as it flowed. Today we are seeing the development of
realtime application platforms that permit parallel processing over many different types of data. This
allows algorithms to be placed in the flow of data and connected back to the sources of that data.
The new model for applications is stream monitoring, detection and actuation. Its event-driven rather
than triggered purely on-demand. It creates a new set of technical requirements. The true challenge to
stream analytics is latency, not data volume. Scalability is much less of a problem in the software
platforms that support realtime stream analytics application. The IT market is evolving, reaching a stage
where rethinking of application design and infrastructure is mandated.
!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!
!
anomaly. This sort of information is checked to see what is happening at the moment or to look at
the context surrounding a problem one has been alerted to.
! The information people look at when theres an exception. If an alert is sent, or someone
spots a problem on a dashboard, its necessary to look at the context surrounding the event. This
is more detailed data than is displayed on a dashboard, and possibly different data that can be
used as a sanity check to avoid reacting to a spurious alert.
The primary purpose of this information is to help isolate and diagnose a problem. For example,
comparing the stream of events that generated an alert to other streams of events might indicate a
broader a problem. It might be useful to replay events that occurred before and after the alert and
similar alerts in the past to spot a pattern. This older or more detailed or different data is likely
not going to be seen, except when there is a reason to seek it out.
! The information people never see. Some stream analytics processes are managed entirely by
the system with little to no human intervention. In these processes, the application monitors data
continuously in realtime, executes code that generates new data or events from the stream, and
then takes action or sends data to another application. An example of this would be a
recommendation system on a retail website, or a next-best-action system in a banks call center.
The information used to monitor a stream analytics process can change frequently during the
building of an application. At first, little is known about the patterns and variability of the process,
so the information used is narrow. People start by monitoring basic data, devise key performance
indicators (KPIs) and adjust the process to reduce variation, after which they refine their metrics.
Over time, dealing with some problems becomes routine, at which point the actions taken can be
programmed into the system. They might be fully automated, or more likely, set up so that a person can
approve a system-recommended action. In this manner, actions that were once human-directed are
replaced by programmed and automated responses. Eventually, the application evolves to a more stable
state and the rate of change to the data required and the actions to take slows.
This style of building and refining realtime applications requires tools to make the application easy to
change. It should be easy and fast to adjust data sources, both to add new data and metrics and to remove
metrics that are no longer useful. The ease of removing data and metrics is as important as the ease with
!!!!!!!!!!!!!!!!
!
which they can be added. Otherwise, realtime data displays become cluttered with excess information,
making problems harder to see and diagnose.
Most traditional applications encounter difficulty with this cycle of change because they are designed with
an assumption of stable requirements, and because they are deployed in a hard-to-change system design.
They assume permanence. Stream analytics applications, by contrast, expect the business conditions
they're designed to address will change over time.
Stream analytics applications are not like traditional transaction processing applications (OLTP), nor are
they like business intelligence (BI) tools. They differ in several key areas:
! Due to the requirement to react as streaming events occur, they must live on the network, always
on and processing the streams of data.
! Stream analytics applications need access to data as it arrives, but they also need access to past
history. Without the historical context, they cant define baselines from which to detect deviations,
or to determine new actions, or to provide replays of what happened so a person can analyze the
problem.
! Realtime stream analytics applications need to present continuous data to people visually for
monitoring, as well as send events back into the network where other machines can act on them.
This is a mix of features from both OLTP and BI applications, and no OLTP or BI stack can
address this mix when it involves realtime data.
! Stream analytics applications tend to evolve over time, much like dashboards and reports in a BI
system evolve as people learn how to use the new information they are given. Unlike BI, in which
the people use data documents authored in a tool, these are applications created by developers.
That makes them harder to maintain over time, as changes are required.
Current software architectures do not meet these requirements. Despite the fact that both OLTP and BI
stacks are often provided by a single vendor, they do not provide an integrated suite that can meet the
needs of realtime applications. As a result, businesses are required to build applications by assembling
and integrating a variety of products that perform some portion of the required functions, or by
developing custom code on top of basic infrastructure and tools.
!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!
!
Another piece that is rarely thought through at the beginning of realtime application development is the
need for contextual data. Simple trending and display of realtime data doesnt require history. The first
question most users have is What do I compare this to? Whats normal and what are the exception
thresholds?
Comparing a trend to a baseline requires data that precedes current events. This might be weeks or
months of history in the case of cyclic or seasonal trends, or it might be a few minutes worth. In the latter
case, adding an event cache is sufficient. If more history is needed, then the system must be able to read
from files or a database and turn them back into a form consumable by stream processing, adding data
access and management components that were not originally in the design.
The need for external data goes beyond history. Most events carry narrow data payloads. This is fine for
processing and feeding output to other computers. When theres a need to display data to a person,
enrichment data is required so the values or labels are meaningful. This information is usually looked up
in a database using identifiers in the event data.
The bottom line is that even though a realtime application works with streams, it must also deal with
caching and persistence of context data. This adds complexity to the application.
Adding more applications drives a different sort of complexity. Because most applications start small and
grow, the design is generally bottom to top, from the event stream through custom code to the front end
(or the output of new events that flow to other systems).
Even a well-designed realtime application suffers from the single-purpose nature of its construction.
Addressing changes over time is difficult, as was discussed above, and their single-purpose nature limits
reusability of the code, data and interfaces.
Operational Requirements
One element that is rarely taken into consideration until too late is the operation of these applications in
production. Operation of the system is most often handled by another group in the organization, and
maintainability is not on the forefront of a developers mind. The developer's most urgent work is to
complete functionality.
10
!!!!!!!!!!!!!!!!
!
A realtime system drives operational requirements that may not be supported. Many times, the
monitoring of systems is not up to the task. This is because a realtime system must be monitored in
realtime as well. The health of data streams, the data processing execution, the output and the user
applications must all be logged and monitored. If one of these applications should fail, the effects will
likely be felt much more quickly than traditional systems.
Building complex applications like these is challenging. Building platform elements to enable reusability
and easy changes to data is harder, yet this is what organizations really need. The desire is ease-of-use for
developers so they can focus on the hard but valuable parts - the metric calculations, algorithms,
detection and correlation, providing tangible business value.
By custom assembling or building realtime applications without abstraction over the base layers of
technology, the ability to iterate applications quickly decreases as they grow more complex and as more
applications are added. If you take away the repetitive and reusable, boring code components that every
application needs, the cycle time for both development and maintenance will be faster.
11
!!!!!!!!!!!!!!!!
12
!!!!!!!!!!!!!!!!
!
aggregate value of abandoned carts across all shoppers can easily add up to a significant revenue loss in
2.5 minutes.
In a data warehouse, reducing latency isnt as easy as it sounds. The first problem is that the data models
presume all the data will be loaded and consistent at a point in time. For example, all of the customer
data will be loaded for all of the orders that have been loaded. With a low latency feed, some parts of the
model will have newer data and be inconsistent with other parts of the model, creating potential
problems in the rest of the BI environment.
13
!!!!!!!!!!!!!!!!
!
The second and more important challenge is the latency that a store-then-query architecture introduces.
Events are arriving in realtime. They must be formatted, cleaned and stored before they can be used. This
introduces delay between the time an event is visible and the time it can be queried.
There is added delay because a database stores data, but doesnt send data back out. An application must
periodically query the database for new data, adding another delay. This repeated polling, in combination
with the continuous loading of data, creates contention for resources and slows the database. Depending
on the frequency and volume of streaming data, the database may have difficulty storing it quickly
enough while simultaneously supporting indexing and queries over that data. The database cant easily be
tuned for this because the workloads are at odds. This is one of the reasons OLTP and BI architectures
diverged in the early 1990s.
There is a fundamental mismatch between a BI architecture and a stream analytics application. A data
warehouse puts data into a form suitable for query, while a realtime stream analytics application needs to
put data into a form suitable for monitoring, alerting and actuation. The goal is to create a system that
solves the problem of access to data without a delay, such that it can analyze, act on, and display it,
essentially automating the process from event to its effect in a continuous cycle.
14
!!!!!!!!!!!!!!!!
!
Scalability in an application assembled from discrete products is also difficult. Many products were
designed to scale up with larger hardware, rather than scale out across multiple servers. This is a
weakness of many CEP servers. Additionally, data passing through multiple components and custom code
can add latency, and the resulting delay can increase the resources required to meet processing deadlines.
Messaging, middleware, caching, CEP and stream visualization products were not designed to form a
coherent platform for building realtime applications. They were designed to solve problems in specific
areas. Building realtime applications like this is an expensive route in both software and the time it takes
to integrate all the components into a reliable framework. We should be focusing our efforts on valueadded work, not the integration of parts that are almost what is required for the task.
!On!the!other!hand,!although!Linux!was!free,!installing!it!was!a!royal!pain!that!the!vast!majority!of!
people!had!no!desire!to!experience.!The!price!of!freedom!included!the!privilege!of!choosing!which!
Linux!you!would!pick!from!dozens!of!different!packages,!called!distros,!and!then!attempting!to!
15
!!!!!!!!!!!!!!!!
!
install!your!choice!on!your!hardware.!This!was!made!more!interesting!by!the!fact!that!although!the!
core!Linux!operating!system!was!usually!(though!not!always)!the!same!from!distro!to!distro,!the!
various!Linux!bundles!often!used!different!install!procedures,!had!different!user!interfaces,!looked!
for!key!files!in!different!places,!included!different!utilities,!and!so!on,!and!so!on.1!
Its hard to be agile when you build and integrate all the components from scratch for an application. Just
managing the versions of the many components is difficult because they change and are updated at
different times.
The challenge extends beyond building the application. Deployment, operation and maintenance are
often not taken into account. Beyond the integration and configuration management of the application
and its dependencies, there are elements like data and application security, how to scale the system up
and down, and monitoring of the system when its in production.
A realtime application is not like existing OLTP or BI systems, with well-understood architectures and
layers of infrastructure to address what an organization needs. The do-it-yourself coding of realtime
applications is where weve been for the last ten years. The problem we face today is the complexity of
building and deploying these applications.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1!In!Search!of!Stupidity:!Over!Twenty!Years!of!High!Tech!Marketing!Disasters,!Merrill!R.!Chapman!
16
!!!!!!!!!!!!!!!!
17
!!!!!!!!!!!!!!!!
!
We are at a stage in the market maturity for streaming application technology that is similar to the early
days of application servers, or the early days of data integration platforms. In both of these examples, the
market evolved from component assembly and custom coding by customers to narrow single-purpose
products that replaced some of the custom work, and then to integrated platforms that enabled higherorder systems to be built on top.
New technology gives us new tools for new workloads, and distributed stream analytics applications are
one of those growing workloads. There is enough new technology in use, and enough streaming data
available, that more organizations are seeing the value of realtime applications. This creates a market for
platform technologies to support that effort. A number of open source projects, as well as commercial
products have appeared in this space in the last few years.
18
!!!!!!!!!!!!!!!!
This is still an early market one that demands careful evaluation of the alternatives. Sometimes data
every 15 minutes is good enough, so store-first-query-later models will meet the requirements.
Sometimes data must be processed in realtime, but the processing is a simple algorithm in a pipeline and
the output is never seen by a person. Any number of approaches might suffice for this. When the need is
for realtime, continuous monitoring, event-detection and alerting, or requires both machine processing
and human coordination, its likely that a platform would benefit.
The key to success is to align the speed of insights to business goal. This means understanding the
difference between business objectives that must be solved with a realtime solution and those that do not.
Use BI for BI problems, use OLTP for OLTP problems, and use realtime application architectures for
streaming data problems. Dont try to force one technology and architecture into another domain. They
are different for a reason.
Key Takeaways
! The speed of data flow and the emergence of non-human consumption models has fundamentally
changed application development.
! Even though a realtime stream analytics application primarily works with data-in-motion, it must
also deal with caching and persistence of context data data-at-rest. This adds more complexity to
the application.
! The way to escape the spiral of increasing code complexity and maintenance costs is to redesign
around the idea of a realtime stream analytics platform that encapsulates repeatable, reusable
elements and isolates highly changeable elements.
! There is no single silver bullet. Many realtime monitoring, event detection, and alerting
applications will benefit from an underlying platform, but many use cases are simpler, requiring a
different toolset.
19
!!!!!!!!!!!!!!!!
20