You are on page 1of 12

Editors Note

When to Use Hadoop,


and When Not To

In-Memory Finds a Place


in Big Datas Universe

Training, Planning Needed


to Put Hadoop Into Play

SEPTEMBER 2013

Business
Information
INSIGHT ON MANAGING AND USING DATA

SPECIAL ISSUE

Breaking Big
Besieged by endless
big data plugging and
knee-deep in Hadoop
hoopla, many businesses
are confusedand its no
wonder. To make the right
technology decisions
and tap into real value,
a keen-eyed look
is needed.

EDITORS NOTE | CRAIG STEDMAN

HOME

Beyond the Idle Talk on Big Data

EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

well into the IT hype cycleto the point


where even some vendors and consultants looking to
capitalize on big data deployments are getting tired of
the term. At the 2013 gathering of the Pacific Northwest BI Summit, an annual meeting of about 20 vendor
executives and technology consultants in Grants Pass,
Ore., fun was had at the expense of the moniker big
data during a session on the topic. For example, Shawn
Rogers of consulting and research company Enterprise
Management Associates said that the classic three Vs
definitionvolume, velocity and varietyhas been
beaten to death. It just defines big data as an analyst
hobby, he said, to general approval.
But there are real projects going on out there, in lots
of companies. And by now, efforts to deploy big data
technologies such as Hadoop and NoSQL databases may
be putting your organization through the ringer. If so, its
high time to rinse out some of the hype and look at big
data management and analytics applications with a more
considered eye.
There is value to be had; at least, thats the expectation. The Pacific Northwest BI Summit attendees werent
down on the potential benefits of using big datajust
on the term itself. And in a reader survey on business
BIG DATA IS

BUSINESS INFORMATION SEPTEMBER 2013

intelligence, analytics and data warehousing topics, conducted earlier this year by TechTarget, which publishes
Business Information magazine, interest levels in big data
analytics were relatively high, and high-minded. Forty-one percent of 540 respondents said they had active
programs or planned to add one in the next 12 months.
And the primary goals of those respondents primarily
revolved around driving new business: A combined 66%
cited gaining competitive advantages, better understanding customers or increasing revenue. By comparison,
27% opted for improving organizational efficiency.
The three articles in this special edition of Business
Information offer insight and advice to help point the way
forward. First we look at the capabilities, and limitations,
of Hadoop. Next we report on the relationship between
big data and in-memory analytics toolsand issues to
consider before joining them at the hip. We close with
tips on making Hadoop work in corporate applications
from a panel of IT and BI professionals who spoke at the
Hadoop Summit 2013. n
is executive editor of TechTargets SearchData
Management.com and SearchBusinessAnalytics.com websites.
Email him at cstedman@techtarget.com.
CRAIG STEDMAN

STRATEGIES | ED BURNS

WHEN TO
USE HADOOP,
AND WHEN
NOT TO
Hadoop has become everyones big data darling.
For now, at least, it can only do so muchand
savvy businesses shouldnt buy into the hype.

In the past few years, Hadoop has

earned a lofty reputation as the go-to


big data analytics engine. To many, its
synonymous with big data technology. But the open source distributed
processing framework isnt the right
answer to every big data problem,
and companies looking to deploy
it need to carefully evaluate when
to use Hadoopand when to look
elsewhere.
For example, Hadoop has ample power for processing
large amounts of unstructured or semi-structured data.
But it isnt known for its speed in dealing with smaller
data sets. That has limited its application at Metamarkets
Group, a San Francisco provider of real-time marketing
analytics services for online advertisers.
Metamarkets CEO Michael Driscoll said the company
uses Hadoop for large, distributed data processing tasks

HOME

BUSINESS INFORMATION SEPTEMBER 2013

STRATEGIES | ED BURNS

HOME
EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

where time isnt a constraint. That includes running endof-the-day reports to review daily transactions or scanning historical data dating back several months.
But when it comes to running the real-time analytics processes that are at the heart of what Metamarkets
offers to its clients, Hadoop isnt involved. Driscoll said
thats because its optimized to run batch jobs that look
at every file in a database. It comes down to a tradeoff: In

IT COMES DOWN TO A
TRADEOFF: IN ORDER TO
MAKE DEEP CONNECTIONS
BETWEEN DATA POINTS,
HADOOP SACRIFICES SPEED.
order to make deep connections between data points, the
technology sacrifices speed. Using Hadoop is like having
a pen pal, he said. You write a letter and send it and get
a response back. But its very different than [instant messaging] or email.
Because of the time factor, Hadoop has limited value
in online environments where fast performance is crucial, said Kelly Stirman, director of product marketing at
NoSQL database developer MongoDB Inc. For example,
analytics-fueled online applications, such as product recommendation engines, rely on processing small amounts
of information quickly. But Hadoop cant do that efficiently, Stirman said.
4

BUSINESS INFORMATION SEPTEMBER 2013

No Replacement Plan
Some businesses might be tempted to try scrapping their
traditional data warehouses in favor of Hadoop clusters,
because technology costs are so much lower with the
open source technology. But Carl Olofson, an analyst at
market research company IDC, said that weighing the
two is an apples-and-oranges comparison.
Olofson said the relational databases that power most
data warehouses are used to accommodating trickles of
data that come in at a steady rate over a period of time,
such as transaction records from day-to-day business
processes. Conversely, he added, Hadoop is best suited to
processing vast stores of accumulated data.
And because Hadoop is typically used in large-scale
projects that require clusters of servers and employees
with specialized programming and data management
skills, implementations can become expensive, even
though the cost-per-unit of data may be lower than with
relational databases. When you start adding up all the
costs involved, its not as cheap as it seems, Olofson said.
Specialized development skills are needed because Hadoop uses the MapReduce software programming framework, which limited numbers of developers are familiar
with. That can make it difficult to access data in Hadoop
from SQL databases, according to Todd Goldman, vice
president of enterprise data integration at software vendor Informatica Corp.
Various vendors have developed connector software
that can help move data between Hadoop systems and
relational databases. But Goldman thinks that for many

STRATEGIES | ED BURNS

HOME
EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

organizations, too much work is needed to accommodate


the open source technology. It doesnt make sense to
revamp your entire corporate data structure just for Hadoop, he said.

Helpful, Not Hype-Full


One viable use that Goldman sees for Hadoop is as a staging area and data integration platform for running extract, transform and load (ETL) functions. That may not
be as exciting an application as all the hype over Hadoop
seems to warrant, but Goldman said it particularly makes
sense when an IT department needs to merge large files.
In such cases, the processing power of Hadoop can come
in handy.
Driscoll said Hadoop is good at handling ETL processes because it can split up the integration tasks among
numerous servers in a cluster. He added that using
Hadoop to integrate data and stage it for loading into a
data warehouse or other database could help justify investments in the technologygetting its foot in the door
for larger projects that take more advantage of Hadoops
scalability.
Of course, leading-edge Internet companies such as
Google, Yahoo, Facebook and Amazon.com have been big
Hadoop users for years. And new technologies aimed at
eliminating some of Hadoops limitations are becoming

BUSINESS INFORMATION SEPTEMBER 2013

available. For example, several vendors have released


tools designed to enable real-time analysis of Hadoop
data. A Hadoop 2.0 release that is in the works will make
MapReduce an optional element and enable Hadoop systems to run other types of applications.
Ultimately, its important for IT and business executives to cut through all the hype and understand for
themselves where Hadoop could fit in their operations.

THERES SO MUCH HYPE


AROUND [HADOOP] NOW
THAT PEOPLE THINK IT DOES
PRETTY MUCH ANYTHING.
Kelly Stirman, product marketing director at MongoDB Inc.
Stirman said theres no doubt its a powerful tool that can
support many useful analytical functions. But its still taking shape as a technology, he added.
Theres so much hype around it now that people think
it does pretty much anything, Stirman said. The reality
is that its a very complex piece of technology that is still
raw and needs a lot of care and handling to make it do
something worthwhile and valuable. n

TECHNOLOGIES | BETH STACKPOLE

IN-MEMORY
FINDS A PLACE
IN BIG DATAS
UNIVERSE
Big data plus memory-based analytics software
can form a mutually beneficial relationshipif thats
the kind of power business users really need.

In-memory processing can serve as

a high-octane fuel for supercharging


big data analytics applications. But
organizations should weigh factors
such as additional systems infrastructure costs and the readiness of their
business processes before gassing up
with in-memory analytics technology.
Another key step in greasing the deployment skids is
identifying big data analytics problems that have proven
unsolvable or that could benefit from the performance
boost typically provided by in-memory applications.
The integration of in-memory capabilities and big
data boils down to use case and benefits, said Paul
Barth, co-founder of data management and analytics
consultancy NewVantage Partners. You need to consider the business value of accelerating time to answer
is it a matter of convenience, or is it a case when rapid
turnaround and rapid analysis really benefits the decision-making process.
Detecting patterns in large stockpiles of data is one

HOME

BUSINESS INFORMATION SEPTEMBER 2013

TECHNOLOGIES | BETH STACKPOLE

HOME
EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

application where using in-memory analytics tools


makes sense, Barth said, as are scenarios in which traditional business intelligence (BI) tools hit their limits on
data volumes and processing speeds. Another example
favoring in-memory technology: building an online recommendation engine that can be accelerated by running
its business rules engine and analytics algorithms in
memory.

A Data Flood
At ContactLab, an email marketing services provider in
Milan, Italy, the need for in-memory analytics capabilities became apparent when its business model shifted
from broad-based marketing campaigns to a more individualized outreach approach, said Massimo Fubini,
the companys founder and director. ContactLab, which
manages an average of 60,000 to 70,000 email and outbound SMS messages daily, faced a big data challenge
as it tried to sort through hundreds of millions of data
points on click-throughs, website visits and other actions
to analyze customer behavior and serve up relevant marketing messages on the fly.
Conventional BI tools worked fine up to that point,
Fubini said. But the change in business strategy changed
the analytics game and opened the door to the deployment of a Hadoop system that captures the data and
feeds it into in-memory analytics softwarein this case,
SAS Visual Analytics from SAS Institute Inc.
As part of the big data environment,ContactLab also
7

BUSINESS INFORMATION SEPTEMBER 2013

collects data from a variety of other sources, including


mobile apps, social media sites, transactional systems
and external marketing information services. The plethora of data makes it harder for marketing managers
and other executives at the companys clients to know
what questions to ask. Fortunately, Fubini said the
SAS tools combination of in-memory analytics and
data visualization capabilities lets ContactLabs analysts explore the data and come up with insights nearly
instantaneously.
This world is really changing, he said. In the past,
people knew what data was available and would ask for
specific analytics. Now the amount of data were collecting is huge, and the requirements around analysis are
much more interactive. You cant give someone an
answer in a day or two.

Know Your People


Knowing your user base is another gauge for determining
if in-memory analytics tools are the right fit for a big data
initiative. Its a bit of a judgment call, so you need to
understand if your users can take advantage of the additional performance, said William McKnight, president of
McKnight Consulting Group. If you have data scientists
on staff, you dont want them sitting there drilling and
drilling into data only to get frustrated [by slow response
times] and walk away. With super-fast performance, you
can give them the advanced analytics capabilities they
need.

TECHNOLOGIES | BETH STACKPOLE

HOME
EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

Business process maturity is another issue. Tapping


in-memory technology to deliver self-service capabilities
to analytics users or as a means to accelerate the performance of big data analytics processes is an admirable
goalbut its a lost opportunity if business users cant
quickly initiate actions based on the analytical insights
that the software produces.
The question is, are your business systems ready
to take the results from the data mining exercise, said
Tapan Patel, global product marketing manager for predictive analytics and data mining at SAS. If the end goal
is to make quicker, better decisions and youre getting
insights quickly, but your CRM system is not ready to
execute on near-real-time alerts with price changes or
customer offers, the value [of in-memory analytics] may
not be achieved.
Cindi Howson, founder of BI Scorecard, a research

BUSINESS INFORMATION SEPTEMBER 2013

THE REQUIREMENTS AROUND


ANALYSIS ARE MUCH MORE
INTERACTIVE. YOU CANT
GIVE SOMEONE AN ANSWER
IN A DAY OR TWO.
Massimo Fubini, director of
ContactLab
and consulting company that publishes technical evaluations ofBI and analytics software, said in-memory tools
have a range of potential uses, from speeding up the performance of existing databases to enabling the addition
of new visual data discovery capabilities. In-memory
should be part of everyones analytical environment, she
said. The question is where and how? n

RECOMMENDATIONS | JACK VAUGHAN

TRAINING,
PLANNING
NEEDED TO
PUT HADOOP
INTO PLAY

Experimenting with the vaunted open source distributed


framework is one thing; using it in enterprise applications
is another entirely.

HOME

BUSINESS INFORMATION SEPTEMBER 2013

While a lot of ground has to be

covered to deploy the Hadoop Distributed File System and associated


technologies to support enterprise
uses, a roadmap outlining the path to
that destination is starting to emerge.
At the Hadoop Summit 2013 in San Jose, Calif., a
panel of IT leaders from various industries offered guidance for companies that want to move from experimenting with Hadoop to using it in actual applications. They
said its easy to get started with open source Hadoop
clustersbut taking the technology to the next level is
more difficult.
Implementers should start small and be prepared to
bring in outside training help and think up front about
how Hadoop-processed data will become part of operational and analytical processes, according to summit
participants.
The general rush to try out Hadoop brings its own issues, said Ratnakar Lavu, senior vice president of digital
innovation at retailer Kohls Corp. in Menomonee Falls,
Wis. You hear about all the things that Hadoop can

RECOMMENDATIONS | JACK VAUGHAN

HOME
EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

solve, he said. You get all this data, then you go off and
try to solve everything that you can think of.
But Lavus team learned early on that small projects
were good starting points with Hadoop. Its a whole
new way of doing things, he said. Start with something
small that you can actually manage. Its about learning.
Lavu also told would-be enterprise Hadoop users to be
careful not to solve problems that are already solved.
For example, existing reports that are being produced
and distributed effectively dont need to be redone in Hadoop just for the sake of changing platforms.
Hadoop first gained attention based on the efforts of
systems programmers at Internet companies such as
Yahoo, Google, Facebook and Twitter. But incorporating
the technology into mainstream business and analytics
applications takes different skills. Even Web stalwarts
such as Salesforce.com have learned lessons while moving Hadoop into a support role for business decision
makers.
When Hadoop comes to mind, too often its only
the datahow big it is. But as you add more and more
users, you have to think in terms of the compute [requirements] also. Its not just the storage, said Ramesh
Koteshwar, a business intelligence architect at Salesforce.
Koteshwar anticipates that a sizable part of the companys workforce will ask questions about data collected in
Hadoop. We expect hundreds and thousands of users on
the Hadoop cluster, he said.
Developing robust security capabilities is another part
of the process of bringing Hadoop to wider use, he said.
10

BUSINESS INFORMATION SEPTEMBER 2013

Hadoop use at Salesforce is very much still at an exploratory stage, and end-user access and authentication are
barriers that must be hurdled on the track to broader
deployment. When you really want to bring it into the
enterprise, you want to make sure there are security policies and processes in place in front of the Hadoop [cluster], Koteshwar said.

ITS A WHOLE NEW WAY


OF DOING THINGS. START
WITH SOMETHING SMALL
THAT YOU CAN ACTUALLY
MANAGE.
Ratnakar Lavu, senior vice
president of digital innovation
at Kohls Corp.
Lavu concurred that the way you fit Hadoop systems
into the overall organization is important. Its about
building the right processes and the right kind of systems
and the data feeds as well as the user training and adoption, he said. Those are the pieces that enable us to be
successful.
While there has been a lot to learn in Hadoops early
days, at least some of the frontier work has been done,
said Neeraj Kumar, vice president of information management and analytics at Cardinal Health in Dublin,

RECOMMENDATIONS | JACK VAUGHAN

HOME
EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

Ohio. That betokens a benefit in moving to Hadoop now


that more pieces of the related data infrastructure have
been put into place.
The starters of today are going to have a leg up on us,
Kumar said. We had to build a lot of ad hoc processes
and solutions just because the previous versions of Hadoop lacked those features.
Kumar agreed that Hadoop deployment teams should
start small and should find an initial application that provides a net-new capability for their companies.
You need to also understand the talent base of your
own organization, he said, adding that in many cases
Hadoop creates a need to bring in new skills. As a result,
he advised IT managers to start thinking about Hadoop

11

BUSINESS INFORMATION SEPTEMBER 2013

THE STARTERS OF TODAY ARE


GOING TO HAVE A LEG UP ON
US. WE HAD TO BUILD A LOT
OF AD HOC PROCESSES.
Neeraj Kumar, vice president
of information management and
analytics at Cardinal Health
training issues early in the project planning process.
Consultants can help, Kumar said, but they arent the
ultimate answer: You do need talent on-site, on the
ground. n

ABOUT THE AUTHORS

HOME
EDITORS NOTE
WHEN TO USE
HADOOP, AND
WHEN NOT TO
IN-MEMORY FINDS
A PLACE IN BIG
DATAS UNIVERSE
TRAINING, PLANNING
NEEDED TO PUT
HADOOP INTO PLAY

is site editor of SearchBusiness


Analytics.com; in that position, he covers
business intelligence, analytics and data visualization technologies and topics. He previously was a news writer for TechTargets
SearchHealthIT.com website, and he has also written for
a variety of daily and weekly newspapers in eastern Massachusetts. Email him at eburns@techtarget.com.
ED BURNS

is a freelance writer who


has been covering the intersection of technology and business for more than 25 years
for a variety of publications and websites,
including SearchBusinessAnalytics.com,
SearchDataManagement.com and other TechTarget sites.
Email her at bstack@stackpolepartners.com.
BETH STACKPOLE

Business Information is an e-publication of TechTargets


Business Applications and Architecture Media Group. The
websites featured in this special issue are SearchData
Management.com and SearchBusinessAnalytics.com.
Scot Petersen, Editorial Director
Jason Sparapani, Managing Editor, E-Publications
Joe Hebert, Associate Managing Editor, E-Publications
Craig Stedman, Executive Editor
Melanie Luna, Managing Editor
Linda Koury, Director of Online Design
Doug Olender, Publisher, dolender@techtarget.com

is SearchDataManagement
.coms news and site editor. He covers topics
such as data warehousing, big data management, databases, data integration and data
quality. Vaughan previously worked as an
editor for TechTargets SearchSOA.com, SearchVB.com,
TheServerSide.net and SearchDomino.com websites.
Email him at jvaughan@techtarget.com.
JACK VAUGHAN

Annie Matthews, Director of Sales,

amatthews@techtarget.com

TechTarget, 275 Grove Street, Newton, MA 02466


www.techtarget.com
2013 TechTarget Inc. No part of this publication may be transmitted or
reproduced in any form or by any means without written permission from the
publisher. TechTarget reprints are available through The YGS Group.
About TechTarget: TechTarget publishes media for information technology
professionals. More than 100 focused websites enable quick access to a deep
store of news, advice and analysis about the technologies, products and processes crucial to your job. Our live and virtual events give you direct access
to independent expert commentary and advice. At IT Knowledge Exchange,
our social community, you can get advice and share solutions with peers and
experts.
COVER PHOTOGRAPH: FOTOLIA/FRESHIDEA

Connect with us on Facebook

12

BUSINESS INFORMATION SEPTEMBER 2013

You might also like