Professional Documents
Culture Documents
Talend
Table of Contents
Survey Results ............................................................................................................................. 4 Big Data Company Strategy ........................................................................................................ 6 Big Data Business Drivers and Benefits Received ...................................................................... 8 Big Data Integration ................................................................................................................... 10 Big Data Implementation Challenges......................................................................................... 12 Big Data Implementation Technologies ..................................................................................... 14 About Talend.............................................................................................................................. 15
Page 2
Talend
Big data represents a significant paradigm shift in enterprise technology and stands to transform much of what the modern enterprise is today. Digital data is everywhere and global data is growing at 40% per year. Companies capture trillions of bytes of information about their customers, suppliers, and operations, and millions of networked sensors are being embedded in the physical world in devices such as mobile phones, energy meters and automobiles, sensing, creating, and communicating data. 1 By collecting and analyzing all this information companies gain insight into new business opportunities and threats. But what exactly is big data? Big data encompasses a complex and large set of diverse structured and unstructured datasets that are difficult to process using traditional data management practices and tools. There is an increasing desire to collect call detail records, web logs, data from sensor networks, financial transactions, social media and Internet text, and then analyze with existing data sources. Conventional data management tools fail when trying to integrate, search and analyze big datasets, which (for now) range from terabytes to multiple petabytes of information. As an example, Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data - the equivalent of 167 times the information contained in all the books in the US Library of Congress.2 New technologies based on the Apache Hadoop Big Data Platform have emerged as a way to analyze large data sets through a technique called massively parallel-processing (MPP) of information. As with any new successful paradigm, there is a technology adoption curve from innovators and early adopters, to the early majority, to the late majority, to laggards. Early adopters are driven by competitive advantage and innovation, take the biggest risks for success, typically use primitive tools, and build it themselves. Conversely, the late majority and laggards strive for the productivity gains others have received and take less risk by adopting proven technologies backed by robust products and services. In the information arms race, companies that can collect and analyze more information should be able to make faster, better-informed decisions compared to their competitors, e.g. by maximizing customer wallet share, by knowing when and why customers may leave, by efficiently creating and targeting new markets, or by deterring fraud. To date most of the big data discussion has been about big data technology. The goal of this survey and whitepaper is to highlight big data adoption challenges, business objectives and benefits, as well as big data technologies being used.
1 2
Big data: The next frontier for innovation, competition, and productivity McKinsey & Company, May 2011. http://en.wikipedia.org/wiki/Big_data
Page 3
Talend
Survey Results
In the summer of 2012, Talend conducted a big data adoption survey of 231 professionals involved in the delivery of data solutions for their company. Survey respondents were closely split between North America (49%) and EMEA (51%), with 60% of respondents in IT and 36% having business titles. 95 respondents who did have a big data strategy, were then asked a series of questions about their experience. Figure 1: Survey Demographics
Other 4%
EMEA 51%
Key findings from the survey are: 41% of companies have a strategy for dealing with big data, indicating the growing adoption of big data. 48% of big data initiatives are driven by the business, 39% by IT, and 13% cross-functionally. For those without a big data strategy, the main reason (76%) is that they do not distinguish big data from existing corporate data. Increasing the depth and accuracy of predictive analytics was the number one driver for big data, reported by 68% of those who have a big data strategy. Using todays definition of big data (> 10 terabytes), 71% of respondents have big data to manage. 62% indicated that they have achieved big data business benefits with the primary benefit being business process optimization (28%) and improvements in marketing and sales (24%).
Page 4
Talend
However, 24 companies reported not receiving a business benefit which may indicate the need for improved big data skillsets, governance and management. The types (inputs) of big data that are being used today include web and social media (57% of respondents) followed by sales data (54% of respondents). 61% replied that their primary big data challenge was allocating sufficient time, budget and resources, with just over half (52%) reporting a lack of big data in-house expertise. Open source Apache Hadoop and Hadoop-based distributions represented over 60% of big data implementation technologies in use or considered for use.
Page 5
Talend
Figure 2: Does your organization have a strategy for dealing with big data? (n=231)
No
Yes
76%
For companies that do have a big data strategy (n=95), it is being driven by several company functions (Figure 4), which indicates that big data as a core strategy has moved past an early adopter stage. 39% indicated that big data initiatives are being driven by IT, or a bottom up approach to be more efficient in collecting and analyzing large data sets. However, 48% of big data strategy is being driven by lines of business or executives, which indicates there are compelling business reasons for big data adoption, such as increased revenue, improved customer satisfaction, or faster time to market.
http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-VolumeVelocity-and-Variety.pdf
Page 6
Talend
Board of Directors 2%
IT 39%
Early results show that big data is part of larger corporate data initiatives, and is being driven by business more than IT.
Page 7
Talend
Page 8
Talend
For those that have implemented big data projects, 62% indicated that they have achieved business benefits (Figure 6) with the primary benefit being business process optimization (28%) and improvements in marketing and sales (24%). It is a concern that 38% responded No or Unknown, and may be due to the lack of project governance, data quality, big data skillsets and/or available tooling, which is typical for new paradigms. Figure 6: To date, have you realized any business benefit to big data? (n=95)
Unknown 13%
No 25%
Yes, particularly in Crime Prevention and Fraud 5% Yes, particularly in Customer Retention 5%
Big data business benefits include business process optimization and marketing/sales improvement; however for some projects it is too early to tell or have failed to deliver a benefit.
Page 9
Talend
Furthermore the type of big data that is being used today or considered in the future reinforced the previous response. Web and social media are being used by 57% of respondents today with 23% considering it for the future. Sales data was the second highest for being used today (54%) and
Page 10
Talend
considered for the future (32%) for analyzing buying patterns and sales incentives. Biometric data was the lowest on the list. Figure 8: What type of big data are you involved in or considering making part of your Business Intelligence?
Not Considering
23%
30%
54%
42%
51%
8% 18%
34%
Web and social media Machine generated Sales data (buying Biometric Human interactions (web logs, twitter data (RFID, GPS, patterns and sales (Qingerprint, voice/ (e-mails, voice mails, feeds, JSON) phone apps and other incentives) face recognition, DNA) call centers) machine generated data)
Web and social media are big datas primary inputs with sales data a close second.
Page 11
Talend
70% 60% 50% 40% 30% 20% 10% 0% 18% 11% 5% 4% 36% 37% 52% 61% 48%
Big data: The next frontier for innovation, competition, and productivity McKinsey & Company, May 2011.
Page 12
Talend
Big data processing varies by organization, industry and the types of tools available to process the data. It may be collecting and analyzing terabytes of information or many petabytes of data, and over time it is assumed that the definition of big data will grow. Using todays definition of big data (> 10 terabytes) 71% of respondents (Figure 10) have big data to manage. 46% of respondents have over 100 terabytes of data to manage and 12% had greater than 2 petabytes to manage. Figure 10: What is the total amount of data that exists within your organization?
< 10 Terabytes 6% 11% 6% 29% 10 to 99 Terabytes 100 to 499 Terabytes 500 to 1 Petabytes 23% 25% 2 to 5 Petabytes > 5 Petabytes
A large majority of companies have over 10 terabytes of data to manage, but the biggest barriers to big data adoption are a shortage of time, budget, expertise and resources.
Page 13
Talend
Amazon Web Services 13% Apache Hadoop (own installation) Cloudera (CDH) 38% 28% Greenplum Hortonworks Data Platform 12% 3% 2% MAPR Other
4%
Open source Apache Hadoop and Hadoop-based distributions represented over 60% of big data implementations in use or considered for use.
Page 14
Talend
About Talend
Talend is the recognized leader in open source integration solutions. The companys holistic integration platform helps organizations minimize costs and maximize the value of data integration, ETL, data quality, master data management, application integration and business process management - while supporting their shift toward big data. More than 3,500 paying customers worldwide, including eBay, ING, The Weather Channel, Deutsche Post and Allianz, subscribe to Talends solutions and services. With over 20 million downloads, Talends products are the most trusted integration solutions in the world. The company has major offices in North America, Europe and Asia, and a global network of technical and services partners. Talends open source approach and flexible integration platform for big data enables users to easily connect and analyze data from disparate systems to help drive and improve business performance. Talends big data capabilities integrate with todays big data market leaders such as Cloudera, Hortonworks, Google, Greenplum, Mapr, Teradata and Vertica, positioning Talend as a leader in the management of big data. Talends goal is to democratize the big data market just as it has with data integration, data quality, master data management, enterprise service bus and business process management. Visit www.talend.com to learn more and download your free copy of Talend Open Studio for Big Data.
Page 15