You are on page 1of 20

!

Architecting*the*Event-Driven*Future:*
Building*a*Scalable*Stream*Analytics*Platform*

Mark Madsen
Third Nature Inc.

!!!!!!!!!!!!!!!!

TABLE OF CONTENTS
Executive summary ........................................................................................................................... 3
The Value of Being Reactive .............................................................................................................. 4
Stream Analytics Applications Are Different and More Difficult..................................................... 6
Building a Stream Analytics Application .......................................................................................... 9
Operational Requirements .............................................................................................................. 10
How Realtime Stream Analytics Applications Have Been Built .................................................... 12
Using the BI environment ............................................................................................................... 12
Integrating Products to Build a Complete Application .................................................................. 14
Custom Application Development .................................................................................................. 15
Evolving to a Realtime Stream Analytics Platform......................................................................... 17
Key Takeaways................................................................................................................................. 19
About Mark Madsen ........................................................................................................................20
About Third Nature .........................................................................................................................20

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

Executive summary
Stream analytics applications are not like traditional transaction processing applications (OLTP) or
business intelligence (BI) tools, both of which are human-initiated, on-demand architectures. Eventdriven ecosystems operate continuously, with people acting to govern and coordinate tasks and activities,
rather than initiate them.
Since initial use cases for stream analytics applications are often very focused, building a first application
is often deceptively simple. However, as applications become more sophisticated (and useful), complexity
skyrockets as developers add features and attempt to correlate streams. Stream analytics also place a
burden on operations, which is often unprepared for the type of monitoring and management these
applications require. To escape this complexity spiral, development and operations must abstract away
the repetitive, low-value work through use of a platform.
Key findings include:
! Stream analytics platforms put insights directly in the hands of business users, bypassing scarce,
expensive gatekeepers, but those solutions can be brittle, slow-to-evolve, and expensive to modify.
! The speed of data flow and the emergence of non-human consumption models have
fundamentally changed application development.
! The ongoing need for caching and persistence of context data add complexity to realtime stream
analytics applications.
! To mitigate increasing code complexity and maintenance costs, developers should build their
efforts around a realtime platform that encapsulates repeatable, reusable elements.
! This platform will not address all needs. Many realtime use cases are simpler, requiring a different
toolset.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

The Value of Being Reactive


Closed-loop decision making and realtime, continuous, stream analytics processes are a technical
necessity. Organizations that can keep up will outperform those that cant react quickly enough to
changing conditions. This requires technical capabilities beyond what is currently available in BI and
analytics products.
The flow of data has been accelerating for decades, from generated reports to online-accessible, ondemand databases to the distributed client/server model of the 90s to todays distributed, asynchronous
realtime continuous processing. Along the way, we invented technologies to manage the data. First were
file-based batch mechanisms, which Hadoop has modernized into a reliable, distributed file system that
can be used by distributed processors. Then came databases, which gave up some of the flexibility of files
for the efficiency and performance of tables and more structure.
The two primary axes for the data management are shown in the following figure. On one axis there is
static, data-at-rest, which must be stored before it can be queried or used, and flowing, data-in-motion
which can be used off the network without physical storage. On the other is a range of explicit to implicit
structuring, from the strict table-based databases to the free-form nature of file system-based
technologies.

Four Classes of Data Management and Processing Technologies


Designed for Intersections of Latency, Structure, and Use

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

!
The last major technology generation BI, data warehouses and analytics enabled organizations to see
and analyze across processes. It created a capability to see and understand what had happened and
monitor the business daily or even hourly. It improved the ability to react over the older style of siloed
views from single-department application reporting.
The design of these systems presumed that humans would be the endpoint of data delivery. That is no
longer the case, as work moves into the network and enables processes to be automated and managed by,
rather than driven by, people. The speed of processing and networks enabled a shift in how we can
approach the management and execution of business processes.
Relatively recent additions to the technology stack are streaming databases and similar realtime engines,
which could process realtime, record-based data as it flowed. Today we are seeing the development of
realtime application platforms that permit parallel processing over many different types of data. This
allows algorithms to be placed in the flow of data and connected back to the sources of that data.
The new model for applications is stream monitoring, detection and actuation. Its event-driven rather
than triggered purely on-demand. It creates a new set of technical requirements. The true challenge to
stream analytics is latency, not data volume. Scalability is much less of a problem in the software
platforms that support realtime stream analytics application. The IT market is evolving, reaching a stage
where rethinking of application design and infrastructure is mandated.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

Stream Analytics Applications Are Different


(and More Difficult to Build)
A stream analytics application workflow entails a shift in approach to business processes. In a traditional
process, people enter transactions into a system. A person initiates all activities. The system might
sequence activities (for example, queuing purchase orders for human approval), but the system doesnt
initiate tasks. Traditional business applications have run in this manner for decades. A few industries
have done more to automate processes and decisions, but the practice is still infrequent and usually done
with complex event processing (CEP) tools or custom applications.
The role of people can change in stream analytics workflows where an application is carrying out some of
the work. In such a scheme, it is more important to have people deal with things the computer cant:
exceptions, coordination across departments or processes, and problems requiring judgment or involving
risk. Using people to make simple, rule-based decisions that are automatable adds cost and time to a lowvalue task. This sort of repetitive task work is also dull and therefore can be prone to more errors.
Decisions that can be defined by simple rules can be performed by a machine, provided there is an ability
to intercede when necessary. People add value in their ability to deal with exceptions and to understand
when actions need to be coordinated. Anyone who has experienced bad automated call center systems can
understand the necessity of having humans in the decision loop.
In a stream analytics ecosystems, people still initiate activities, but its possible for a machine to initiate
activities as well. The actions that are taken whether a human-entered transaction or a systemgenerated event may trigger other actions. Stream analytics ecosystems operate continuously, with
people acting to govern and coordinate tasks and activities. Contrast this with the common scenario
today, where almost everything starts and stops with a human at the controls. This changes what we need
in order to manage and run a business process.
The information needed for realtime process management falls into three categories:
! The information people monitor as part of a daily routine. People often use dashboards
as a way to monitor the health of a process throughout the day, for example checking customer
call volumes and service times in a call center. Its a form of passive detection that could be done
by a computer, with alerts sent when there is a deviation so a human can look at and address the

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

!
anomaly. This sort of information is checked to see what is happening at the moment or to look at
the context surrounding a problem one has been alerted to.
! The information people look at when theres an exception. If an alert is sent, or someone
spots a problem on a dashboard, its necessary to look at the context surrounding the event. This
is more detailed data than is displayed on a dashboard, and possibly different data that can be
used as a sanity check to avoid reacting to a spurious alert.
The primary purpose of this information is to help isolate and diagnose a problem. For example,
comparing the stream of events that generated an alert to other streams of events might indicate a
broader a problem. It might be useful to replay events that occurred before and after the alert and
similar alerts in the past to spot a pattern. This older or more detailed or different data is likely
not going to be seen, except when there is a reason to seek it out.
! The information people never see. Some stream analytics processes are managed entirely by
the system with little to no human intervention. In these processes, the application monitors data
continuously in realtime, executes code that generates new data or events from the stream, and
then takes action or sends data to another application. An example of this would be a
recommendation system on a retail website, or a next-best-action system in a banks call center.
The information used to monitor a stream analytics process can change frequently during the
building of an application. At first, little is known about the patterns and variability of the process,
so the information used is narrow. People start by monitoring basic data, devise key performance
indicators (KPIs) and adjust the process to reduce variation, after which they refine their metrics.

Over time, dealing with some problems becomes routine, at which point the actions taken can be
programmed into the system. They might be fully automated, or more likely, set up so that a person can
approve a system-recommended action. In this manner, actions that were once human-directed are
replaced by programmed and automated responses. Eventually, the application evolves to a more stable
state and the rate of change to the data required and the actions to take slows.
This style of building and refining realtime applications requires tools to make the application easy to
change. It should be easy and fast to adjust data sources, both to add new data and metrics and to remove
metrics that are no longer useful. The ease of removing data and metrics is as important as the ease with

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

!
which they can be added. Otherwise, realtime data displays become cluttered with excess information,
making problems harder to see and diagnose.
Most traditional applications encounter difficulty with this cycle of change because they are designed with
an assumption of stable requirements, and because they are deployed in a hard-to-change system design.
They assume permanence. Stream analytics applications, by contrast, expect the business conditions
they're designed to address will change over time.
Stream analytics applications are not like traditional transaction processing applications (OLTP), nor are
they like business intelligence (BI) tools. They differ in several key areas:
! Due to the requirement to react as streaming events occur, they must live on the network, always
on and processing the streams of data.
! Stream analytics applications need access to data as it arrives, but they also need access to past
history. Without the historical context, they cant define baselines from which to detect deviations,
or to determine new actions, or to provide replays of what happened so a person can analyze the
problem.
! Realtime stream analytics applications need to present continuous data to people visually for
monitoring, as well as send events back into the network where other machines can act on them.
This is a mix of features from both OLTP and BI applications, and no OLTP or BI stack can
address this mix when it involves realtime data.
! Stream analytics applications tend to evolve over time, much like dashboards and reports in a BI
system evolve as people learn how to use the new information they are given. Unlike BI, in which
the people use data documents authored in a tool, these are applications created by developers.
That makes them harder to maintain over time, as changes are required.
Current software architectures do not meet these requirements. Despite the fact that both OLTP and BI
stacks are often provided by a single vendor, they do not provide an integrated suite that can meet the
needs of realtime applications. As a result, businesses are required to build applications by assembling
and integrating a variety of products that perform some portion of the required functions, or by
developing custom code on top of basic infrastructure and tools.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

Building a Stream Analytics Application


Its deceptively easy to build a stream analytics application because its initial use case is often simple.
Monitoring a stream of data and calculating or aggregating a metric and displaying it on a visual interface
can be done with a small amount of code. The ease of simple monitoring makes it seem as though all
stream analytics applications can be created this quickly. The reality is that event-by-event processing on
a single stream isnt that interesting or useful, and useful applications are far more complex.
Complexity creeps in from two sources: the metrics and analysis become more sophisticated over time as
usage grows, and success with one application proves the value and creates the desire for more
applications.
Increasingly sophisticated metrics drive internal application complexity because they impose technical
requirements that were not addressed at the beginning of the project. For example, monitoring several
event streams for deviation from a baseline is simple. Combining several event streams to calculate and
output a single metric is far more complicated. The problems multiply when you want join or correlate
additional streams.
Unlike a database, event streams are often independent. This can create situations in which data arrives
later in one stream than another, or a delay causes events to arrive out of order. Sometimes a stream will
restart, sending the same data twice. The simple approach of watching timestamps to match events wont
work. The application must now cache data for each stream for some period and add code to address
problems like matching out of order or duplicate events and catching up on processing of late events.
If the application is displaying multiple metrics simultaneously (or sending multiple events based on
different streams), the failure scenarios are more complex. If one stream driving one metric halts, the
other streams and displays should continue unless they are also dependent on that data. More code is
needed to ensure that dependencies are managed and the application behaves correctly.
This simple change, from single-event to multi-event metrics has added the requirement for independent
caching for each stream, program logic to address event delays, logic for joining events, logic for dealing
with exception conditions that can occur, and more logic for dependency management and suspending
data flow and display or output processes. All of this is important, but it takes away from what the core
focus should be: the use of metrics and their context, not delivery of basic data.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

!!!!!!!!!!!!!!!!

!
Another piece that is rarely thought through at the beginning of realtime application development is the
need for contextual data. Simple trending and display of realtime data doesnt require history. The first
question most users have is What do I compare this to? Whats normal and what are the exception
thresholds?
Comparing a trend to a baseline requires data that precedes current events. This might be weeks or
months of history in the case of cyclic or seasonal trends, or it might be a few minutes worth. In the latter
case, adding an event cache is sufficient. If more history is needed, then the system must be able to read
from files or a database and turn them back into a form consumable by stream processing, adding data
access and management components that were not originally in the design.
The need for external data goes beyond history. Most events carry narrow data payloads. This is fine for
processing and feeding output to other computers. When theres a need to display data to a person,
enrichment data is required so the values or labels are meaningful. This information is usually looked up
in a database using identifiers in the event data.
The bottom line is that even though a realtime application works with streams, it must also deal with
caching and persistence of context data. This adds complexity to the application.
Adding more applications drives a different sort of complexity. Because most applications start small and
grow, the design is generally bottom to top, from the event stream through custom code to the front end
(or the output of new events that flow to other systems).
Even a well-designed realtime application suffers from the single-purpose nature of its construction.
Addressing changes over time is difficult, as was discussed above, and their single-purpose nature limits
reusability of the code, data and interfaces.

Operational Requirements
One element that is rarely taken into consideration until too late is the operation of these applications in
production. Operation of the system is most often handled by another group in the organization, and
maintainability is not on the forefront of a developers mind. The developer's most urgent work is to
complete functionality.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

10

!!!!!!!!!!!!!!!!

!
A realtime system drives operational requirements that may not be supported. Many times, the
monitoring of systems is not up to the task. This is because a realtime system must be monitored in
realtime as well. The health of data streams, the data processing execution, the output and the user
applications must all be logged and monitored. If one of these applications should fail, the effects will
likely be felt much more quickly than traditional systems.
Building complex applications like these is challenging. Building platform elements to enable reusability
and easy changes to data is harder, yet this is what organizations really need. The desire is ease-of-use for
developers so they can focus on the hard but valuable parts - the metric calculations, algorithms,
detection and correlation, providing tangible business value.
By custom assembling or building realtime applications without abstraction over the base layers of
technology, the ability to iterate applications quickly decreases as they grow more complex and as more
applications are added. If you take away the repetitive and reusable, boring code components that every
application needs, the cycle time for both development and maintenance will be faster.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

11

!!!!!!!!!!!!!!!!

How Realtime Stream Analytics Applications


Have Been Built
The first place many people look for a solution to streaming applications and faster analytics is the tools
that are already in use in IT. The choice of how to build a stream analytics application has been limited to
three basic approaches.

Using the BI environment


With BI tools already available that present data to users, its logical to conclude that businesses could
approximate realtime analytics by refreshing the database every few minutes rather than daily. This will
work as long as the need for detection of a problem and reaction to it is longer than the cycle of data
processing, storage, retrieval, and analysis. Otherwise, the introduced latency of the store-then-query
approach will reduce the usefulness of the application. For example, notification of an impending
mechanical failure or a product recommendation based on the contents of a shopping cart must happen
immediately. A realtime stream analytics application cant wait for data to be recorded, extracted,
processed, re-stored in a database, and made available via a query tool, even if that database resides
entirely in-memory.
The figure on the following page shows a comparison of the reaction time between streaming and polling
mechanisms. The example data is taken from application monitoring for a web retailer. In this instance,
the top portion shows that a stream analytics application detects a failed web checkout problem and
alerts 4 seconds after the event. It takes several minutes for someone to diagnose the problem and take
action to resolve it, with the problem improving in 2.5 minutes and resolved at 4 minutes.
The bottom portion shows the latency if the data warehouse were used. Events are collected in a microbatch that executes every 2 minutes. It takes approximately 30 seconds for the process to complete and
data to be ready in the database, and a time between 0 and 2.5 minutes for the polling process to query
the data, process it, and send an alert. This introduces a minimum delay in visibility of 2.5 minutes.
This may not seem like a lot of time, but imagine a shopper experiencing problems with the checkout
page. Most customers are willing to retry once or twice, with 10-30 second timeout windows. The

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

12

!!!!!!!!!!!!!!!!

!
aggregate value of abandoned carts across all shoppers can easily add up to a significant revenue loss in
2.5 minutes.

Stream Latency versus BI Polling Latency

In a data warehouse, reducing latency isnt as easy as it sounds. The first problem is that the data models
presume all the data will be loaded and consistent at a point in time. For example, all of the customer
data will be loaded for all of the orders that have been loaded. With a low latency feed, some parts of the
model will have newer data and be inconsistent with other parts of the model, creating potential
problems in the rest of the BI environment.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

13

!!!!!!!!!!!!!!!!

!
The second and more important challenge is the latency that a store-then-query architecture introduces.
Events are arriving in realtime. They must be formatted, cleaned and stored before they can be used. This
introduces delay between the time an event is visible and the time it can be queried.
There is added delay because a database stores data, but doesnt send data back out. An application must
periodically query the database for new data, adding another delay. This repeated polling, in combination
with the continuous loading of data, creates contention for resources and slows the database. Depending
on the frequency and volume of streaming data, the database may have difficulty storing it quickly
enough while simultaneously supporting indexing and queries over that data. The database cant easily be
tuned for this because the workloads are at odds. This is one of the reasons OLTP and BI architectures
diverged in the early 1990s.
There is a fundamental mismatch between a BI architecture and a stream analytics application. A data
warehouse puts data into a form suitable for query, while a realtime stream analytics application needs to
put data into a form suitable for monitoring, alerting and actuation. The goal is to create a system that
solves the problem of access to data without a delay, such that it can analyze, act on, and display it,
essentially automating the process from event to its effect in a continuous cycle.

Integrating Products to Build a Complete Application


Many organizations built stream analytics applications by using discrete products as components of the
larger application. There are multiple challenges to overcome when building from isolated products in
this manner.
First, realtime systems need to connect to different data streams, filter and process those streams, cache
data from any of them, join data streams together, and perform calculations on them. This can mean the
use of a half a dozen or more different products in a complex application, plus custom code to connect it
together and to perform functions that are not native to the selected products.
There are a few large vendors that provide all of the components, but they are not integrated, primarily
because they are independent products, with each designed for a specific purpose and standalone use. For
example, a CEP engine could be used to filter and correlate events, but it exists primarily for this purpose.
It is unlikely to provide visualization of the data to an end user, or work with both streaming data and
history, or to address caching needs of different streams.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

14

!!!!!!!!!!!!!!!!

!
Scalability in an application assembled from discrete products is also difficult. Many products were
designed to scale up with larger hardware, rather than scale out across multiple servers. This is a
weakness of many CEP servers. Additionally, data passing through multiple components and custom code
can add latency, and the resulting delay can increase the resources required to meet processing deadlines.
Messaging, middleware, caching, CEP and stream visualization products were not designed to form a
coherent platform for building realtime applications. They were designed to solve problems in specific
areas. Building realtime applications like this is an expensive route in both software and the time it takes
to integrate all the components into a reliable framework. We should be focusing our efforts on valueadded work, not the integration of parts that are almost what is required for the task.

Custom Application Development


Its possible (and becoming more common) for enterprises to custom-build stream analytics applications.
There are many open source projects available to build the components needed in a realtime application,
from realtime publish-subscribe software to distributed stream processing systems to streaming data
visualization libraries. The strength of this option is that a purpose-built application without the license
cost or the overhead of a solution assembled from vendor products and custom code. The weakness of
this option is that enterprises must integrate and manage all of the pieces that make up the stream
analytics application. There can be an overwhelming rate of change in some of these open source projects.
It isnt uncommon for one project to be deprecated in favor of another, changing a large amount of code.
Furthermore, the complexity of all the parts in the system requires a lot of up-front design. Custom
building entails a similar level of work as assembling vendor products, and even more programming
expertise.
Expertise, is always in short supply. The skills required to build a custom application of this type or to
integrate a number of products into an application are always in demand. This, in turn, adds time to
project schedules to train internal developers or hire the additional staff needed.
This quote from Merrill Chapman describes the challenge one faces when integrating many disparate
components:

!On!the!other!hand,!although!Linux!was!free,!installing!it!was!a!royal!pain!that!the!vast!majority!of!
people!had!no!desire!to!experience.!The!price!of!freedom!included!the!privilege!of!choosing!which!
Linux!you!would!pick!from!dozens!of!different!packages,!called!distros,!and!then!attempting!to!

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

15

!!!!!!!!!!!!!!!!

!
install!your!choice!on!your!hardware.!This!was!made!more!interesting!by!the!fact!that!although!the!
core!Linux!operating!system!was!usually!(though!not!always)!the!same!from!distro!to!distro,!the!
various!Linux!bundles!often!used!different!install!procedures,!had!different!user!interfaces,!looked!
for!key!files!in!different!places,!included!different!utilities,!and!so!on,!and!so!on.1!
Its hard to be agile when you build and integrate all the components from scratch for an application. Just
managing the versions of the many components is difficult because they change and are updated at
different times.
The challenge extends beyond building the application. Deployment, operation and maintenance are
often not taken into account. Beyond the integration and configuration management of the application
and its dependencies, there are elements like data and application security, how to scale the system up
and down, and monitoring of the system when its in production.
A realtime application is not like existing OLTP or BI systems, with well-understood architectures and
layers of infrastructure to address what an organization needs. The do-it-yourself coding of realtime
applications is where weve been for the last ten years. The problem we face today is the complexity of
building and deploying these applications.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

1!In!Search!of!Stupidity:!Over!Twenty!Years!of!High!Tech!Marketing!Disasters,!Merrill!R.!Chapman!

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

16

!!!!!!!!!!!!!!!!

Evolving to a Realtime Stream Analytics Platform


The way to escape the spiral of increasing code complexity and maintenance costs is to redesign around
the idea of a realtime stream analytics platform that encapsulates repeatable, reusable elements and
isolates highly changeable elements. A platform has the benefit of taking away the repetitive, low-value
work so developers can focus on the work that has an impact in the organization.
The purpose of a platform is to enable more than one application to reuse the same core infrastructure
code, thereby speeding development, shortening change iterations, and improving the reliability and
performance of applications. Any improvement in the platform is immediately shared by all applications,
a situation unlike one-off applications, where each must be rewritten to take advantage of improvements
in components.
Building a platform is something that takes time. Since most organizations start with a single application
and grow from there, it often requires the building of several realtime applications to show where the
common components lie. With this understanding they can redesign application to get the components
and layers right. Platforms evolve over a matter years, not months.
If, on the other hand, an enterprise creates a stream analytics system by first building a platform, the
challenge is the time it takes and the knowledge it requires. There is no value to a platform until the first
application is deployed. This can be months of effort, which may be wasted without the right expertise to
know what components belong at which layers, and what should be under the application developers
control. It ends up as a high-risk endeavor, albeit one that takes less time than evolving one-off
applications over several years.
The figure on the next page shows the high level components of a realtime platform that are needed to
support multiple stream analytics applications. A single stream analytics application is somewhat simpler
than this because it can forego use of some components, depending on the processing requirements. It
would also be simpler because it foregoes components that are necessary in a multi-application
environment. The complexity of maintaining several stream analytics applications built on varying
components quickly outstrips the overhead of using a platform.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

17

!!!!!!!!!!!!!!!!

!
We are at a stage in the market maturity for streaming application technology that is similar to the early
days of application servers, or the early days of data integration platforms. In both of these examples, the
market evolved from component assembly and custom coding by customers to narrow single-purpose
products that replaced some of the custom work, and then to integrated platforms that enabled higherorder systems to be built on top.

The Components of a Stream Analytics Application Platform Architecture

New technology gives us new tools for new workloads, and distributed stream analytics applications are
one of those growing workloads. There is enough new technology in use, and enough streaming data
available, that more organizations are seeing the value of realtime applications. This creates a market for
platform technologies to support that effort. A number of open source projects, as well as commercial
products have appeared in this space in the last few years.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

18

!!!!!!!!!!!!!!!!

This is still an early market one that demands careful evaluation of the alternatives. Sometimes data
every 15 minutes is good enough, so store-first-query-later models will meet the requirements.
Sometimes data must be processed in realtime, but the processing is a simple algorithm in a pipeline and
the output is never seen by a person. Any number of approaches might suffice for this. When the need is
for realtime, continuous monitoring, event-detection and alerting, or requires both machine processing
and human coordination, its likely that a platform would benefit.
The key to success is to align the speed of insights to business goal. This means understanding the
difference between business objectives that must be solved with a realtime solution and those that do not.
Use BI for BI problems, use OLTP for OLTP problems, and use realtime application architectures for
streaming data problems. Dont try to force one technology and architecture into another domain. They
are different for a reason.

Key Takeaways
! The speed of data flow and the emergence of non-human consumption models has fundamentally
changed application development.
! Even though a realtime stream analytics application primarily works with data-in-motion, it must
also deal with caching and persistence of context data data-at-rest. This adds more complexity to
the application.
! The way to escape the spiral of increasing code complexity and maintenance costs is to redesign
around the idea of a realtime stream analytics platform that encapsulates repeatable, reusable
elements and isolates highly changeable elements.
! There is no single silver bullet. Many realtime monitoring, event detection, and alerting
applications will benefit from an underlying platform, but many use cases are simpler, requiring a
different toolset.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

19

!!!!!!!!!!!!!!!!

About Mark Madsen


Mark Madsen, president of Third Nature, is a researcher and consulting for analytics and data strategy.
He is a former CTO and CIO with experience working for both businesses and vendors, including a
company used as a Harvard Business School case study. Over the past decade Mark has received awards
for his work in data warehousing, business intelligence and data integration from the American
Productivity & Quality Center, the Smithsonian Institute and industry events. His focus is on applying
analytics, decision support and the technology needed to support them. Mark frequently speaks
internationally at conferences and seminars on these topics.

About Third Nature


Third Nature is a research and consulting firm focused on new and emerging technology and practices in
analytics, information strategy and data management. Our goal is to help organizations solve problems
using data. We offer education, consulting and research services to support business and IT organizations
as well as technology vendors.

2015 Third Nature, Inc. All Rights Reserved.


This publication may be used only as expressly permitted by license from Third Nature and may not be accessed, used, copied,
distributed, published, sold, publicly displayed, or otherwise exploited without the express prior written permission of Third
Nature. For licensing information, please contact us.

Architecting the Event-Driven Future: Building a Scalable Stream Analytics Platform!

20