You are on page 1of 40

Big Data is changing BI and analytics

traditional data
warehousing Data warehouse

… data warehousing has reached the


most significant tipping point since ETL

its inception. The biggest, possibly


most elaborate data management
system in IT is changing.
Data sources
– Gartner, “The State of Data Warehousing”*

* Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012)
Big Data definition

Big data is high-volume, high-velocity


and/or high-variety information assets
that demand cost-effective, innovative
forms of information processing that
enable enhanced insight, decision
making, and process automation.
– Gartner, Big Data Definition*

* Gartner, Big Data (Stamford, CT.: Gartner, 2016), URL: http://www.gartner.com/it-glossary/big-data/


Big Data is driving transformative changes
Traditional Big Data

Data Relational data All data


characteristics with highly modeled schema with schema agility

Costs Specialized HW Commodity HW

Operational reporting Experimentation leading


Culture Focus on rear-view analysis to intelligent action
With machine learning, graph, a/b testing
Big Data introduces a
culture of experimentation
Tangerine instantly adapts to customer feedback to
offer customers what they want, when they want it

Lack of insight for targeted campaigns


Scenario Inability to support data growth

Azure HDInsight (Hadoop-as-a-service) with the Analytics


Platform System (APS) enables instant analysis of social
Solution sentiment and customer feedback across digital, face-to-
face and phone.

• Reduced time to customer insight


• Ability to make changes to campaigns or adjust product
Result rollouts based on real-time customer reactions “I can see us…creating predictive, context-aware financial
• Ability to offer incentives and new services to retain— services applications that give information based on
and grow—its customer base the time and where the customer is.”
Billy Lo
Head of Enterprise Architecture
However, there are challenges to Big Data…

Obtaining skills Determining how Integrating with


and capabilities to get value existing IT investments

*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
But, Microsoft has done it before Growth of data @ Microsoft
We needed to better leverage data and analytics to do
more experimentation

So we:
Yammer
• Designed a data lake for everyone to put their data into

Exabytes
Xbox Live
• Built tools approachable by any developer Office365
• Created machine learning tools for collaborating LCA
Live
across large experiment models
Bing

Petabytes
SMSG

Result: CRM/Dynamics

Skype
• Across Microsoft, ten thousand developers doing
Exchange
experimentation leading to better insights Windows
Malware Protection Microsoft Stores
• Leading to growth in our Microsoft businesses: Commerce Risk

• Office productivity revenue (45%YoY)*


• Intelligent Cloud (100% YoY)*
2010 2011 2012 2013 2014 2015
• Bing search share doubles * Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
Microsoft is now taking
everything we’ve
learned on this journey

and bringing it to our


customers

Technology. Cost. Culture.


Big Data as a cornerstone of Cortana Intelligence
Information Big Data Stores Machine Learning Intelligence
Data Management and Analytics
People
Sources
Machine Cognitive
Data Factory Data Lake Store
Learning Services

SQL Data Data Lake Bot Web


Data Catalog Warehouse Analytics Framework

Apps HDInsight
Event Hubs (Hadoop and Cortana Mobile
Spark) Apps

Stream Analytics Bots

Dashboards &
Visualizations
Sensors Automated
and Power BI Systems
devices

Data Intelligence Action


Introducing Azure Data Lake

Data Lake Data Lake HDInsight


Store Analytics

YARN YARN
HDFS

No limits Data Lake Analytics job service Managed Clusters


Azure Reliable with an industry leading SLA

HDInsight Enterprise-grade security and monitoring

A Cloud Spark and Productive platform for developers and


scientists
Hadoop service for the
Enterprise Cost effective cloud scale
Integration with leading ISV applications
Easy for administrators to manage
63% lower TCO than deploy your own
Hadoop on-premises*

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Azure
Data Lake Store Petabyte size files and Trillions of objects
Scalable throughput for massively parallel
A No limits Data Lake that analytics
powers Big Data Analytics
HDFS for the cloud
Always encrypted, role-based security &
auditing
Enterprise-grade support
Azure Start in seconds, scale instantly, pay per job
Data Lake Analytics Develop massively parallel programs with
A No limits Analytics Job simplicity
Service to power intelligent Debug and optimize your big data programs
action with ease
Virtualize your analytics
Enterprise-grade security, auditing and
support
Azure Data Lake
Big Data made easy

Analytics on any data, Easier and more


any size productive for all users Enterprise-ready
Azure Data Lake
Big Data made easy

Analytics on any data, Easier and more


any size productive for all users Enterprise-ready
Petabyte size files and
Trillions of objects • Store data in it’s native format
• PB sized files, 200x larger than
anyone else
Store
• Scalable throughput for
EBs massively parallel analytics
• No need to redesign
application or reparation data
at higher scale
TBs
Any type • Batch, interactive, streaming,
of analytics machine learning
• Allows for exploratory analytics
over data

Analytics HDInsight • Analyze with Hadoop and


Hive R Server
Microsoft solutions
U-SQL

YARN
HDFS

Store

Cortana Intelligence Suite


Start in seconds, Scale • Process big data jobs in 30
seconds
instantly, Pay per job
• No infrastructure to worry
about (no servers, no VMs, no
clusters)
• Instantly scale analytic units up
or down (processing power)
• Architected for cloud scale and
performance
• Frees you up to focus only on
your business logic
Azure Data Lake
Big Data made easy

Analytics on any data, Easier and more


any size productive for all users Enterprise-ready
Easy for administrators • Deploy big data projects
to spin up quickly in minutes
• No hardware to install,
tune, configure or deploy
• No infrastructure or
software to manage
• Scale to tens to thousands
of machines instantly
Debug and Optimize • Deep integration with
your Big Data Visual Studio, Visual Studio
Code, Eclipse, & IntelliJ
programs with ease • Easy for novices to write
simple queries
• Integrated with U-SQL,
Hive, Storm, and Spark
• Actively offers recommendations
to improve performance and
reduce cost
• Playback visually displays job run
• U-SQL: a simple
Develop massively and powerful language that’s
parallel programs with familiar and easily extensible
simplicity • Unifies the declarative
nature of SQL with expressive
power of C#
• Leverage existing libraries in
.NET languages, R and Python
• Massively parallelize code on
diverse workloads (ETL, ML,
image tagging, facial detection)
Easy notebook
experience • Most popular notebooks,
for data scientists Jupyter and Zeppelin out-of-
the-box
• Combine code, statistical
equations and visualizations
• Worked w/ Jupyter
community to enhance kernel
to allow Spark execution
through REST endpoint
Easy for data scientists R Server for HDInsight
with familiar R language • Largest portable R parallel
analytics library
• Terabyte-scale machine
learning—1,000x larger than
in open source R
• Up to 100x faster performance
using Spark and optimized
vector/math libraries
• Enterprise-grade security
and support
*Applies to HDInsight only
Easy for business • Interactive BI with big data
analysts with interactive • Spark 2.0 integration
reports over big data • Interactive Hive with LLAP-
keeps data compressed
running in-memory 25x faster
• ODBC driver to use Power BI
or third party tools (Tableau,
SAP, Qlik, etc.)
Azure Data Lake
Big Data made easy

Analytics on any data, Easier and more


any size productive for all users Enterprise-ready
Highest availability • Managed, monitored and
guarantee in the industry supported by Microsoft
for peace of mind • Enterprise-leading SLA—
99.9% uptime
• No IT resources needed for
upgrades and patching
• Microsoft monitors your
deployment so you don’t
have to
99.9% SLA
Runs in the most datacenters worldwide
North Central US
Illinois
West Europe
Netherlands
Central US
Iowa
China North*
Beijing
Japan East
North Europe China South* Tokyo, Saitama
Ireland Shanghai
West US East US
California Virginia
India Central Japan West
Pune Osaka
East US 2
South Central US Virginia
Texas
East Asia
Hong Kong

SE Asia
Singapore

Australia East

Azure doubling compute New South Wales

and storage every 6 months Brazil South


Sao Paulo State Australia South East
Victoria

*Applies to HDInsight only


• Always encrypted; in motion
Always encrypted, using SSL, and at rest using keys
Role-based security in Azure Key Vault

& Auditing • Single sign-on, multi-factor


authentication and seamless
integration of on-premises
identities with Active Directory
• Fine-grained POSIX-based ACLs
for role-based access controls
• Auditing every access /
configuration change
Lower total cost • No hardware

of ownership • Hadoop support included with


Azure support
• Pay only for what you use
• Independently scale storage
and compute
• No need to hire specialized
operations team
• 63% lower total cost of
ownership than on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud
with Microsoft Azure HDInsight”
Recognized by Forrester Wave for Big Data
top analysts Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*

*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.


Get started now

Learn more on the Data Lake website:


http://azure.com/datalake

Watch videos on Azure Data Lake:


https://channel9.msdn.com/Series/AzureDataLake

Take courses and read documentation


on Azure Data Lake:
http://aka.ms/hditraining
http://aka.ms/adlanalytics
http://aka.ms/adlstore
© 2016 Microsoft Corporation. All rights reserved.
Bringing Big Data to everybody
Accelerate the pace of innovation through a state-of-the-art cloud platform
CONTROL EASE OF USE

User Adoption
Azure Data Lake
Analytics

Azure HDInsight

ANALYTICS
BIG DATA
Azure Marketplace Specific apps in a multi-
HDP | CDH | MapR tenant form factor
Workload optimized,
managed clusters
Any Hadoop technology

IaaS Hadoop Managed Hadoop Big Data as-a-service


Azure Data Lake
Analytics
Azure Data Lake Store

BIG DATA
STORAGE
Azure Storage
Big Data has new data characteristics
Petabytes

Data complexity: variety and velocity


Big Data leverages new hardware cost dynamics

Technology advances:
• Low cost memory
• High speed networking
• Flash and SSD storage
• CPU processing doubling

Commodity Hardware Cloud


Big Data introduces new culture of experimentation
Historical campaign Understand customer patterns to
effectiveness uncover cross-sell opportunities

Generate year-end financial Financial monitoring with real-time


reports recommendations to increase revenue

Generate year-end financial Real-time product offers and


reports promotions based on behavior

Collect historical data on Real-time monitoring to


equipment performance identify proactive maintenance

Building successful features


Shipping features without
correlating user action with
understanding success
product experience
The Business Value and TCO of HDInsight

• 418% 5-year ROI


• Four month payback period
• 63% 5-year lower TCO than on-premises
• 66% staff efficiencies than on-premises

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
From data to decisions and actions

Action

Value

You might also like