You are on page 1of 35

Engineer

to
Engineer
Architecting Big Data
Solutions
John Doe

Tr a i n i n g & P r a c t i c e B u i l d i n g f o r S o l u t i o n A r c h i t e c t s

Twitter: @JohnDoe | Blog: http://johndoe.com | Email: johndoe@microsoft.com

MICROSOFT CONFIDENTIAL

Agenda

Big Data as part of every project


Lambda Architecture
Cortana Analytics Suite
Bringing it all together

MICROSOFT CONFIDENTIAL

Engineer to
Engineer

Big Data in every project

MICROSOFT CONFIDENTIAL

Data Processing
Data without compute has limited value
Complex problems require massive

compute resources and a variety of tools


Cloud makes this possible
Data paired with the right tools can help
companies explore data like never before
MICROSOFT CONFIDENTIAL

Diverse Data Processing


Landscape
Extract

OLTP

Original
data

Transform

ETL Tool
(SSIS, etc)

Load

Transformed
data

EDW
(SQL Server, Teradata, etc)

ERP

BI Tools
Data Marts

LOB
Scale-out storage & compute
(HDFS, Blob Storage, etc)

Data
Lake(s)
Dashboard
s
Apps

Ingest (EL)
Original
data
Streaming data

Transform and Load


MICROSOFT CONFIDENTIAL

When Do I Do Big Data

Every project can now benefit


Lowered barriers to entry
Dont just move on-prem to cloud
Choose projects that make sense

MICROSOFT CONFIDENTIAL

Complex Data In Everyday Life

Marketing

MICROSOFT CONFIDENTIAL

Predictive
Maintenance

Engineer to
Engineer

Demo
Web Log Processing and Big Data

MICROSOFT CONFIDENTIAL

Engineer to
Engineer

Lambda Architecture

MICROSOFT CONFIDENTIAL

Batch ingest

Lambda Architecture High Level


View
Serving layer

Partial
aggregate

Partial
aggregate

Partial
aggregate

Merged
view

Batch views

Bufered ingest

New data
stream

Precompute
views

All data

Batch layer

Real-time data
Real-time views

Process
Stream

Increment
views

MICROSOFT CONFIDENTIAL

Speed layer

Lambda Architecture Detailed


View
Event Decoration

Event Processing
Logic

Inbound
Data

Bufered
Ingestion
(message bus)

scoring

Hot
Store

Spooling/Archivin
g

Curation
Data Movement
/ Sync

Consumption

Analytical
Store
MICROSOFT CONFIDENTIAL

Lambda in Azure
Area

Service Offerings

Roll Your Own Offerings

Buffered Ingestion
Event Hubs

Real-Time Processing

Service Bus

Azure Stream Analytics

Storage Queues

Storm on HDInsight

Storm on IaaS
Spark Streaming

Azure Data Factory

Options are Limitless


Azure Marketplace
3rd Party Web Services
Custom Code

Enrichment
Azure Machine Learning

Hot Path Storage

Options are Limitless


Elasticsearch
HBase on IaaS

HBase on HDInsight

Cold Path Storage

SQL Server in IaaS


Azure Blob/Table Storage

Batch Processing /
Curation

Kafka
RabbitMQ

HDInsight

Azure SQL Database

Azure Data Factory

Consumption
Power BI
MICROSOFT CONFIDENTIAL

Options are Limitless


HDP on Windows in IaaS
HDP or CDH on Linux in IaaS
Excel
SSRS
Kibana (over Elasticsearch)
Tableau

Engineer to
Engineer

Cortana Analytics Suite

MICROSOFT CONFIDENTIAL

Cortana Analytics Suite


DATA

INTELLIGENCE
Information
Management

Big Data Stores

ACTION

Machine Learning
and Analytics

Business
apps

Azure
Machine Learning

Azure
Data Factory
Azure
Data Lake

Custom
apps

Azure
SQL Data Warehouse
Azure
Event Hub

Power BI

Personal Digital
Assistant
Cortana

Azure
HDInsight (Hadoop)

Azure
Data Catalog

Dashboards and
Visualizations

People

Perceptual
Intelligence
Face, vision
Speech, text

Azure
Stream Analytics

Sensors
and devices

Business
Scenarios
Recommendations,
customer churn,
forecasting, etc.

MICROSOFT CONFIDENTIAL

Automate
d
Systems

Cortana Analytics Scenarios


EXAMPLE SOLUTIONS
Sales
and marketing

Finance
and risk

Customer
and channel

Operations
and workforce

Customer Acquisition

Fraud detection

Lifetime customer
value

Pay for performance

Cross-sell and upsell

Credit risk
management

Personalized ofers

Operational efficiency

Product
recommendation

Smart buildings

Loyalty programs
Marketing mix
optimization

Predictive maintenance
Supply chain
management
MICROSOFT CONFIDENTIAL

Cortana Analytics Key Verticals


Retail

Healthcare

Financial Services

Industry Overview

Industry Overview

Industry Overview

$22 trillion global revenue ($4.7 trillion


U.S.)

$2 trillion spent in U.S./year on healthcare

7.9% of US Economy

5,754 hospitals with >36 million patients

$1.2 trillion revenue

4% annual growth rate

Hospital readmissions cost ~$41


billion/year

12% revenue growth rate

3.7 million businesses


42 million employees

Industry Trends
Commerce anywhere, anytime
Personal and in context

Scenarios
Sales & Marketing: Demand Forecasting
Customer & Channel: Personalized Ofers

ACA new regulations for Medicare/Medicaid

Industry Trends
Payments based on patient outcome
Government monitoring hospital
readmissions
and number of hospital-acquired conditions

5.9 million employees

Industry Trends
Understanding customer behavior
Cybersecurity
Taking banking mobile
Proactive regulatory compliance

Scenarios

Scenarios

Customer & Channel: Patient Outcomes

Finance & Risk: Risk & Compliance

Operations & Workforce: Operational


Efficiency

Sales & Marketing: Personalization

MICROSOFT CONFIDENTIAL

Azure Data Factory

A managed cloud service for building & operating data pipelines (aka. data flows)

1. Orchestrate, monitor & schedule


compose data processing, storage & movement services (on
premises & cloud)
2. Automatic infrastructure mgmt
combine pipeline intent w/ resource allocation & mgmt
data movement as a service (global footprint & on premises)
3. Single pane of glass
one place to manage your network
of data flows

MICROSOFT CONFIDENTIAL

Azure SQL Data Warehouse

Cloud data warehouse that can grow, shrink, and pause in seconds

Petabyte scale with massive parallel

processing
Independent scale of compute and storage
in seconds
Transact-SQL queries across relational and
non-relational data
Full enterprise-class SQL Server experience
Works seamlessly with Power BI, Machine
Learning, HDInsight, and Data Factory
MICROSOFT CONFIDENTIAL

Azure HDInsight
Hadoop in Azure

HBase as a columnar NoSQL transactional database running on Azure


Blobs
Storm as a streaming service for near real time processing
Hadoop 2.4 support for 100x query gains on Hive queries
Mahout support for machine learning + Hadoop
HMaster
Coordination
Graphical User Interface for HIVE queries

Name Node

Region Server

Region Server

Region Server

Region Server

Job Tracker
Data Node

Data Node

Data Node

Data Node

Task Tracker

Task Tracker

Task Tracker

Task Tracker

MICROSOFT CONFIDENTIAL

Azure Stream Analytics


Complex Event Processing of Stream Data in Azure

Complex
Event
Processor

Output Adapter

MICROSOFT CONFIDENTIAL

Input Adapter

Small number of high volume queries


Complex Event Processing
(aggregation, reduction, cleanup)
Predictable & repeatable results
SQL-like queries
Ingress Azure blobs and Event Hub
Egress to Event Hub, blobs, tables,
Service Bus topics, Service Bus
queues, Azure SQL Database, and
Power BI

Power BI
Data
sources

Power BI service

SaaS solutions
e.g. Marketo, Salesforce,
GitHub, Google analytics

Content packs

Natural language query

On-premises data
e.g. Analysis Services

Organizational content
packs
Corporate data sources or
external data services

Live dashboards
Visualizations

Azure services
Azure SQL, Stream Analytics

Excel files

Reports

Workbook data / data models

Power BI Desktop files


Data from files, databases,
Azure, and other sources

01001
Datasets
10101

Data refresh

MICROSOFT CONFIDENTIAL

Sharing & collaboration

Engineer to
Engineer

Bringing it all together


connected cars scenario
MICROSOFT CONFIDENTIAL

Connected Car Market


45% compound annual

growth rate over 5 years


10x faster than overall car
market
75% of cars shipped globally
by 2020 will have necessary
hardware to connect to the
Internet
MICROSOFT CONFIDENTIAL

Opportunity
75% of the cars shipped
globally by 2020 will be
built with the necessary
hardware to connect to the
Internet

MICROSOFT CONFIDENTIAL

Solution Architecture
Data

Ingest

Prepare

Analyze

Publish

Consume

Azure ML

Event Hub

Stream Analytics Event Hub

Custom Dashboard
Application

Data in motion
Data at rest

HDInsight

HDInsight Azure ML

SQL Data Warehouse

Blob storage
Vehicle catalog
import

Data Factory: Move data, orchestrate, schedule, and monitor


MICROSOFT CONFIDENTIAL

Power BI
Dashboar
ds

Power BI Dashboard

MICROSOFT CONFIDENTIAL

Engineer to
Engineer

Demo
Connected Cars and Cortana Analytics Suite

MICROSOFT CONFIDENTIAL

Summary
Comprehensive ofer for our Big Data and AA cloud

portfolio
Enables a broad class of business specific scenarios
Integrates with investments in Perceptual
Intelligence & Cortana
Customers can get going in individual services today
Available to transact this fall as a simple monthly
subscription
Pricing and included quantities are still TBD
MICROSOFT CONFIDENTIAL

Data paired with the


right tools can help
companies explore data
like never before
MICROSOFT CONFIDENTIAL

2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must
respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information
provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

MICROSOFT CONFIDENTIAL

From Data to Decisions and


Actions
Decision automation

Recommendations
What should I do?

Decision support

Data

Predictions

Decision

What will happen?

Interactive dashboards
Why did it happen?

Static reports

What happened?

Manual process

Value
MICROSOFT CONFIDENTIAL

Action

Cortana Analytics Suite

Transform data into intelligent action


Busines
s apps

People

Custom
apps
Cortana
Analytics
Sensors
and
devices
DATA

INTELLIGENCE
MICROSOFT CONFIDENTIAL

Automated
Systems

ACTION

Cortana Analytics Suite

Transform data into intelligent action

Preconfigured Solutions
Dashboards and Visualizations
Machine Learning and Analytics
Big Data Store
Information Management

MICROSOFT CONFIDENTIAL

Personal Digital Assistant


Cortana
Perceptual
Intelligence

Background
Dartmouth-Hitchcock is a leading healthcare provider in New England with 1.2 million patients
DH chose Microsoft for ML-led IoT solution, over IBM, Google, Amazon, Accenture, and Cognizant

Problem
High re-admit rates, costing hospitals thousands per occurrence ($Ms/year)
Treatment plans (pathways) treat to the average and are static based on decades-old data
Poor engagement between patients and medical providers outside the hospital

Solution
ImagineCare, a 24/7 remote medical sensing solution backed by a live nurse-led clinic
Several MSFT products/services used, including ML, HDInsight, Power BI, Event Hub, ASA, Band
ML is core of solution to predict occurrences before they happen based on patients just like you
(i.e., micro-segmenting the population), laying foundation for dynamic, patient-specific pathways

Better customer experience. Better outcomes. Lower


costs.
MICROSOFT CONFIDENTIAL

Personalized care delivers


healthier populations
CHALLENGE

SOLUTION

Dartmouth-Hitchcock Medical Center is a


Dartmouth-Hitchcock created ImagineCare, a
leading regional medical center in New
new consumer-focused solution based on
England with 1.2 million patients and more
the Microsoft Cortana Analytics Suite and
than 1,000 physicians in virtually all areas of
Microsoft Dynamics CRM. ImagineCare collects
medicine. It wanted to replace static treatment
and analyzes real-time and historical data from
plans based on outdated, generic data with
medical and health devices and electronic
highly personalized treatment plans or
health records, and then surfaces metrics
pathways that would evolve based on the
including predictive analytics in clinical
patients own data and near-real-time
dashboards and mobile apps. Clinicians can
information such as vital signs, moods, and
adjust treatment plans and immediately alert
exercise habits from similar people in the same
patients to changes, and patients and doctors
region.
can collaborate
in real
time.
This system is really transforming
how we
can
deliver health

BENEFITS
o Empowered, healthier patients with
personalized, evidence-based treatment
plans and collaborative care
o Millions of dollars saved in readmission
costs, unnecessary ER and doctor visits,
and missed work
o Improved quality of life with 360-degree
view of patient health, mood, and
behavior
o More efective population health
management
and wellness
to the population.

Despite all of the technology involved, ImagineCare does not lose that human touch, which is so
Nathan Larson
important.
Director of Remote Medical Sensing

MICROSOFT CONFIDENTIAL

DARTMOUTHHITCHCOCK