You are on page 1of 37

SQL Server 2008 R2

StreamInsight
Speaker: Mark Simms
Microsoft SQLCAT

Silicon Valley SQL Server User Group


May, 2010

Mark Ginnebaugh, User Group Leader,


mark@designmind.com
masimms@microsoft.com
Load barrier is dictated by
current choices of the solution,
Facts/sec.
e.g., loading into databases,
persisting into files. This is
intrinsic because in current
approaches no processing can
be done till the data is loaded. Custom-built solutions that
carry huge development and 100000
customization costs

10000
Active DW analytics
1000

Traditional DW Analytics
100
years months days hrs min sec

Time of interest
Present
ET time in ETL Load time in ETL
Analytical results need to reflect important changes in
business reality immediately and enable responses to them
with minimal latency
Database Applications Event-driven Applications
Query Ad-hoc queries or Continuous standing
Paradigm requests queries
Latency Seconds, hours, days Milliseconds or less
Data Rate Hundreds of events/sec Tens of thousands of
events/sec or more
Query Declarative relational Declarative relational and
Semantics analytics temporal analytics

request Event
output
stream
input
stream
response
5
Latency
Months
StreamInsight
Days Target Scenarios

hours Relational Database Applications Operational Analytics


Applications, e.g., Logistics,
Minutes Data Warehousing etc.
Applications
Seconds
Web Analytics Applications

100 ms Monitoring Manufacturing


Financial trading
Applications Applications
< 1ms
Applications

0 10 100 1000 10000 100000 ~1million

Aggregate Data Rate (Events/sec.)


6
Manufacturing: Web Analytics: Financial Services: Power Utilities:
• Sensor on plant • Click-stream data • Stock & news feeds • Energy
floor • Online customer • Algorithmic trading consumption
• React through behavior • Patterns over time • Outages
device controllers • Page layout • Super-low latency • Smart grids
• Aggregated data • 100,000 events /sec • 100,000 events /sec • 100,000 events/sec
• 10,000 events/sec

Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds


Data Stream

Data Stream
Visual trend-line and KPI monitoring
Batch & product management
Automated anomaly detection
Real-time customer segmentation
Algorithmic trading
Proactive condition-based maintenance

Asset Specs & StreamInsight Engine


Parameters
• Threshold queries
Stream Data Store & • Event correlation from
Archive Lookup multiple sources
• Pattern queries

7
Industry trends
• Data acquisition
costs are
Manage
negligible business via
KPI-triggered
• Raw storage costs
actions
StreamInsight Application
Development
StreamInsight Application at Runtime
Event sources Event targets

Input Output
Devices, Sensors Adapters StreamInsight Engine Adapters Pagers &
Monitoring devices
Standing Queries KPI Dashboards,
SharePoint UI
Web servers
Query Query
Logic Logic

Trading stations
Event stores &
Databases Query
Logic

Stock ticker, news feeds Event stores & Databases


SELECT COUNT(*) FROM ParkingLot
WHERE type = ‘AUTO’
AND color = ‘RED’
red
cars
last hour

Doesn’t seem like a


great solution…
This is the streaming data paradigm in a nutshell –
ask questions about data in flight.
Engine

Adapters
Engine
Queries

Extensions
Host

visual debugger
API
expressed

question
data
data
question
Tell me the just the color of each car that passes.

var result = from car in carStream


select new
{
car.Color
};
Give me only trucks.

var result = from car in carStream


where car.Type == “Truck”
select car;
Tell me the number of cars passed
every 10 seconds.

var result = from win in carStream.TumblingWindow(


TimeSpan.FromSeconds(10))
select new
{
count = win.Count()
};
var result = from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
count = win.Count()
};
Count the number of cars for each make
separately every 10 seconds.

var result = from car in carStream


group car by car.make into eachGroup
from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
make = eachGroup.Key,
count = win.Count()
};
application time

Current Time Indicators


public void EnqueueEvent(SourceData d)
{
var ev = CreateInsertEvent();

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };


ev.StartTime = d.timestamp;

Enqueue(ref ev);
}
public void EnqueueEvent(SourceData d)
{
if AdapterState

return

var ev = CreateInsertEvent();

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };


ev.StartTime = d.timestamp;

Enqueue(ref ev);
}
public void EnqueueEvent(SourceData d)
{
if AdapterState

return

var ev = CreateInsertEvent();
if (ev == null) return;

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };


ev.StartTime = d.timestamp;

Enqueue(ref ev);
}
public void EnqueueEvent(SourceData d)
{
if AdapterState

return

var ev = CreateInsertEvent();
if (ev == null) return;

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };


ev.StartTime = d.timestamp;

if (Enqueue(ref ev) == EnqueueOperationResult.Full)


{

Ready();
return;
}
}
Use them wisely!
public class TimeWeightedAverage :
CepTimeSensitiveAggregate<double, double>
{
public override double
GenerateOutput(IEnumerable<IntervalEvent<double>> events,
WindowDescriptor windowDescriptor)
{
double avg = 0;
foreach (IntervalEvent<double> ev in events)
{
avg += intervalEvent.Payload *
(ev.EndTime - ev.StartTime).Ticks;
}
return = avg / (windowDescriptor.EndTime –
windowDescriptor.StartTime).Ticks;
}
}
To learn more or inquire about speaking opportunities, please contact:

Mark Ginnebaugh, User Group Leader


mark@designmind.com

You might also like