You are on page 1of 13

TheNextWaveof

DataManagement
IsBigDataTheNewNormal?

TableofContents
Introduction_______________________________________________________3
SeparatingRealityandHype___________________________________________3
WhyAreFirmsMakingITInvestmentsInBigData? _____________________4
TrendsInDataManagementTechnologies_______________________________5
The Technology Maturity Cycle _______________________________________________ 6

ITChallengesMeetingNewDataDemands_______________________________7
ManagingIncreasingDataVolumeandVariety_________________________7
EnsuringDataGovernanceandDataQuality___________________________8
StrategiesForManagingEvolvingBig(andEnterprise)Data_________________10
ATotalDataManagementApproachForAllData______________________10
BenefitsofTotalDataManagementforAllYourData __________________11
TalendTotalDataManagement_______________________________________12
AboutTalend _____________________________________________________13
ContactUs____________________________________________________13

Introduction
Muchhasbeenwrittenaboutthepromiseofbigdata.Withthegrowthofcorporatedata,itis
clearthatbigdatamaybecomethenewnormal.Inanalytics,forexample,companiesareableto
makemoreinformeddecisionsbyanalyzingbiggerandmorediversesetsofdata.ArecentGartner
reportstatesthatthrough2015,organizationsintegratinghighvalue,diverse,newinformation
typesandsourcesintoacoherentinformationmanagementinfrastructurewilloutperformtheir
industrypeersfinanciallybymorethan20%(Source).
However,thisnewdatamanagementjourneywillbringmanyinstancesoffailedbigdata
experimentsrequiringrearchitectureinafewyears.Effectivelyincorporatingbigdatatechnologies
likeHadoopandNoSQLintoyourcurrentinformationarchitectureisnotasimpleexercise.
Thispaperreviewsthebusinessdriversthatareimpactingthisnewwaveofdatamanagement,
highlightsthenewtoolscompanieshaveforconsiderationintheirnextgenerationdata
architecture,andoutlinesatotaldatamanagementblueprintforconsiderationinyour
informationstrategy.

SeparatingRealityandHype
Isthereanythingbigdatacannotsolve?Isbigdataoverhyped?Well,manywouldsayyesasbig
dataisbeingusedpervasivelyinvendorpresentationstoadorncompanyinformationstrategies.
Bigdatausecasesarerangingfrommarketingcampaignanalysisandrecommendationenginesto
predictiveanalytics,sentimentanalysis,andfrauddetection.Somewereevenproposingtheuseof
bigdatatopredictthenextPopeasleaderoftheCatholicChurch.Whatsnextyoumightask?
Well,weareatthetipofthebigdataiceberginfindingallthevaluebigdatacandeliver.
Whatisundeniable,isthatthedigitaluniverseisrapidlygrowing.Forexample:

Thetotalworldwidevolumeofdataisgrowingat59%peryear,withthenumberoffiles
growingat88%peryear.(Source)

IDCEstimatesthatby2020,businesstransactionsontheinternetbusinesstobusiness
andbusinesstoconsumerwillreach450billionperday.(Source)

Akamaianalyzes75millioneventsperdaytobettertargetadvertisements.(Source)

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page3

Walmarthandlesmorethan1millioncustomertransactionseveryhour,whichisimported
intodatabasesestimatedtocontainmorethan2.5petabytesofdata.(Source)

Morethan5billionpeoplearecalling,texting,tweetingandbrowsingonmobilephones
worldwide.(Source)

100terabytesofdatauploadeddailytoFacebook.(Source)

WhyAreFirmsMakingITInvestmentsInBigData?
ArecentsurveybyTalendhighlightsthetopbusinessdriversforbigdatawithincreasingthe
accuracyanddepthofpredictiveanalyticsbeingtheprimaryreason.Withsomuchunstructured
datanotbeinganalyzed,firmsrecognizethevalueinusingbigdata.
Whatarethebusinessdriversforbigdatainyourorganization?(multipleresponses,n=95)

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page4

TrendsInDataManagementTechnologies
Whether big data is in production, in or on the horizon for your organization, there are considerations in several
areas of data management that require strategy and planning:

RealtimeDataProcessingtherequirementtohaveinformationondemandorin
realtimeisincreasing,asfirmsdonotwanttomakedecisionswithlastmonthsoreven
yesterdaysinformation.

DataServicestoaddressthedemandforinformationaccessibilityandavailabilityfrom
manyconsumers,datamanagementfeaturessuchasdataintegrationanddataqualityare
becomingavailableasonpremiseorcloudbasedservices.

OpenSourceDataManagementtoovercomethehugeinvestmenttypicallymadein
enterprisesoftware,opensourcemakesdatamanagementfiscallypossibleforsmaller
organizationsandimprovestotalcostofownershipforbigones.

DataGovernanceanevolvingapproachtosetforthpoliciesandproceduresfortheuse,
accessandmanagementofdata.Particularlyimportantasbusinessusersplayanever
increasingroleinmanagingdataandbudgeting.

BigDataandNoSQLprovidingabridgebetweenfamiliarlegacydatabasetechnologies
andthedemandsofrapidlystoring,retrieving,processingandanalyzingbigdata.

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page5

TheTechnologyMaturityCycle
Thereisatypicalmaturitycycleasthesenewtechnologiesareintroduced.Firstitneedsto
integratewithexistingsystems,nextitneedstobecomehardenedtoaddressscalability,
reliabilityandsecurityconcerns,thenitmayincorporaterelatedtechnologiesincludingsoftware
lifecyclemanagementfeatureslikeprojectmanagementandtesting,orintheexampleofdata
management,capabilitiestoexposeafunctionasaconsumableservice.Thetechnologymaturity
timelinemaylooklike:

Eachoneoftheaforementionedtrendsisgoingthroughthehypetoproductivitytimeline
offeringgreaterbenefitstoendusers.Ifwelookatbigdata(Hadoop)forexample,itstartedasan
implementationofMapReduceandhasbeenextendedintoamassiveoperatingsystemfor
distributedparallelprocessingofhugeamountsofdata.Othertechnologieshavebeenadded
includingHbase,anopensourcedatabase,HCatalogformetadata,ApacheHiveasadata
warehouseinfrastructure,andApachePigasaprogramminglanguage.
However,Hadoopbyitself,althoughwellequippedforstoringandprocessinglargeamountsof
informationacrossnodes,doesnotyetincludesoftwarelifecyclemanagementfunctionality,such
asprojectmanagementorrelateddatamanagementfunctionalitysuchasdataintegration,meta
datamanagement,governance,profiling,cleansingandmatching.Datagovernanceisparticularly
importantforsuccessfulfirmwideadoption,sinceitincorporatesbusinessprocesses,
organizationalrolesandbestpractices.
Isbigdataareplacementforadatawarehouse?AEuropeantelecommunicationsprovider
needstostoreallcalltransactionsinadatabasefor2yearstocomplywithregulations.One
alternativeistocreatealargedatawarehousetohousethisinformation;however,aless
expensiveandfasterperformingalternativeistoinstallaHadoopclusteroraNoSQLdatabase
tostoretheinformation.Sometimescalleddatahoarding,manycompaniesareusingthis
strategyasameanstostoreinformationforsubsequentprocessing.

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page6

ITChallengesMeetingNewDataDemands
Tomeetthenewbusinessdemandsandtheappetitetoprocessandanalyzemore
information,ITlookstomodernizetheirdatamanagementprocesses.Conventionaldata
managementtoolsfailwhentryingtointegrate,searchandanalyzeverylargedatasets.Inthe
past,addinghardwareandsoftwarewasasolutiontomakeinformationavailableandscalable,but
thatisnolongerthecase.Informationneedstobeavailable24x7acrossavarietyofchannelsfrom
webtomobiledevices.

ManagingIncreasingDataVolumeandVariety
Muchhasbeenwritten(includingthispapersofar)abouttheincreasingvolumeofgrowingdata.
ITneedstobeconcernedsincetheprojectedgrowthofglobaldatageneratedperyearis
increasingat40%;howeverITbudgetsareincreasingonlyat5%(Source).Butwhatisofgreater
concernistheincreasingvarietyofdata,ortheinformationthatisnotinastructureddatabaseor
datawarehouse,i.e.contentmanagementrepositories,streamingdata(audio,video),images,
blogs,customercommentforums,twitter,networksensors,transactionaldataandmore.By
2015,IDCestimatesthat90%ofdatainthedigitaluniversewillbeunstructured(Source).Itis
clearthatfirmswhofocusonvolumealoneandnotdatacharacteristicslikevarietywillneedto
makeadditionalinvestments.
ThefollowingsurveyfromForresterhighlightstheissueintryingtoperformbettercustomer
analyticsataskthatinvolvesintegratingmanysourcesofstructuredandunstructureddata.

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page7

Source:ForresterWebinarwithTalend

EnsuringDataGovernanceandDataQuality
Dependingonthegoalofabigdataproject,poordataqualitycanhaveabigimpacton
effectiveness.Itcanbearguedthatinconsistentorinvaliddatacouldhaveexponentialimpacton
analysisinthebigdataworld.Ifarecordentersasystemasduplicate,incompleteorincorrect,
manyothersystemsareimpacted.Asanalysisonbigdatagrows,sotoowilltheneedfor
validation,standardization,enrichmentandresolutionofdata.Evenidentificationoflinkagescan
beconsideredadataqualityissuethatneedstoberesolvedforbigdata.
Aswithanycorporatedatamanagementdisciplinebigdatawilleventuallyneedtocomplywith
establishedcorporatestandardsandacceptedprojectmanagementnormsfororganizational,
deploymentandsharingofprojectartifacts.
Nottoomanyyearsagothetermsterabyte(or1012bytes)andpetabyte(or1015bytes)were
seenasverylarge.ARFP(requirementforproposal)foradatawarehousemayhavebeento
reliablyhandleapetabyteofinformation.Nowtermssuchasexabyte(or1018bytes)and
zettabyte(or1021bytes)maysoonbethenorm.

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page8

InarecentGartnerreport,theystatethatthrough2015,organizationsintegratinghighvalue,
diverse,newinformationtypesandsourcesintoacoherentinformationmanagement
infrastructurewilloutperformtheirindustrypeersfinanciallybymorethan20%.However,
through2015,85%ofenterpriseswillfailtoadapttheirinformationinfrastructureto"big
data,"sociallymediatedcontentandnewconnecteddevices.Thismeansthatallsystems,even
thelargestintegrationplatforminIT(thedatawarehouse),willbeoverwhelmedinthenext
threetofouryears(Source).

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page9

StrategiesForManagingEvolvingBig(andEnterprise)Data
Manycompanieshavebuilttheirinformationarchitecturedepartmentbydepartmenttosolve
specificdatamanagementproblems.Enterprisedataorthedatainbusinesssystemssuchas
CRMandERPapplicationsarethenlinkedtogetherbydataintegrationanddataqualitytoolsinto
adatawarehouseforfurtherprocessing,analysis,viewingandmanagement.Thisisanexampleof
totaldatamanagementforenterprisedata.
Withtherequirementtonowincludevastamountsofunstructureddataforanalysis(2to10times
structureddata),onecouldtakeanapproachtocreateaseparatebigdataarchitectureasanother
silo.Unfortunately,manyoftheexistingdatavendorsprescribethisapproachsincetheirtools
cannotbeadaptedtosupporttotaldatamanagementacrossstructuredandunstructureddata.A
problemiscreatedthatyouarebuildinganinformationarchitecturewhere:thereisalotof
duplicationacrosssiloswithlimitedreuse;thereisinconsistentdatagovernanceanddataquality
practicesacrossthecompany;andthereexistshighcostsforoperations,maintenanceand
softwarelicenses.
Bigdatashouldbetreatedlikenonbigdata,i.e.youneedtostoreit,youneedtocleanseand
enrichit,andyouneedtouseit,whichcouldberetrievingitforaqueryoranalyzingitinasystem.
Themoreyoursystemsareintegrated,thebiggerthepotentialfordataqualityanddata
governanceissues.BigdatamanagementisnotjustaboutdeployingaHadoopcluster,deploying
dataintoaHadoopfilesystemacrossdistributednodesandthenMapReduceprocessing.Itis
makingsureyourentireinformationenvironmentincorporatestoolstonotonlyhandlethe
increasingvolume,velocity,varietyandcomplexityofdata,butalsoprovidesthesoftwarelifecycle
management,serviceenablementanddataqualityfunctionsaswellasdatagovernance
procedures.

ATotalDataManagementApproachForAllData
Insteadofmanagingsilosofstructured,unstructuredandsemistructureddataandlinking
together(bottomup),arecommendedapproachistothinkofmanagingdataholistically(top
down).Yourcompanyneedstomanagebigdatajustasitdoesenterprisedata,andletsnotforget
aboutsmalldatasuchasspreadsheets.Thedatacanbelocatedonpremise,inthecloud,andina
SaaSapp.Itcanbetransactionalinflightdataorclickstreamdatafromyourwebsite.Youneedto
profileallofthedata,youneedtodeduplicateandmatchdata,andyouneedtoserviceenableit

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page10

forconsumption.Andofteninformationneedstobeprocessedandavailableinrealtime.Total
datamanagementisapplyingthesamebestpracticesandtoolstoallofyourdata.
Fortunatelyforconsumerstoday,therearegoodalternativestocreatethistotaldata
managementapproachthatwillnotbreakthebank.Mostdatamanagementfunctionshave
becomecommodityitems,sothebuyercancostefficientlyachievetotaldatamanagementfor
bothstructuredandunstructureddata.

BenefitsofTotalDataManagementforAllYourData
Withatotaldatamanagementapproachforallyourdata,firmscanexpecttoseethefollowing
benefits:

ImprovedDataQualitybyprofiling,matching,dedupingandmonitoringallyourdata,
youcanimprovethecompleteness,accuracyandintegrityofyourdataandthereforethe
valueitwillprovidetotheorganization.

ImprovedAccountabilityhavingsetdatagovernanceprocessesinplaceforallyourdata,
firmscanbettermeasuretheuseofdata,ensuretheappropriateaccess,andsetinternal
policiesandprocedures.

LowerCostswithadecreaseintheamountofsiloscreatedtherewillbeadecreasein
theamountofduplicateprocesses.Withdataservicesreuse,projectscanbecompleted
quickeratalowercost.Firmscanalsoleveragecommoditydatamanagementtoolsfor
mostdatamanagementfunctions.

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page11

TalendTotalDataManagement
Talendprovidesaunifiedplatformformanagingdata,nomatterifdataresidesintextfiles,
enterprisedatabases,spreadmartsorbigdataclusterslikeHadoop.Theproductsetincorporates
toolsforallyourdatamanagementneedsincluding:bigdataintegration,dataqualityanddata
governance,masterdatamanagement,businessprocessmanagementandapplicationintegration.
Talendrecognizesthatdatamanagementisaboutsupportingawiderangeofdataneeds,
regardlessofscale.Usersneednotlearntousedifferentenvironmentstomanagebigdataand
enterprisedata.Thissavesoninfrastructurecosts,licensing,usertraining,projectdevelopment
timeandthewiderangeofcostsassociatedwithmanagingdifferentdatascales.Forexampleto
integratebigdatasources,youuseTalenddatamanagementgraphicaltoolstointegratebigdata
andNoSQLsourcesaswellasallyourenterprisedataandsmalldata,andthengeneratecodein
Hive,Pigandothernativecodebases.Withbigdataqualitythattakesadvantageofthemassively
parallelenvironmentofHadoop,youcanimprovethecompleteness,accuracyandintegrityof
dataaswellasremoveduplicates.Talendprovidesbigdatagovernancethroughasimple,intuitive
environmentforimplementinganddeployingabigdataprogramwiththeabilitytoschedule,
monitoranddeployanybigdatajob.

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page12

AboutTalend
Talendprovidesintegrationthattrulyscales.Fromsmallprojectstoenterprisewide
implementations,Talendshighlyscalabledata,applicationandbusinessprocessintegration
platformmaximizesthevalueofanorganizationsinformationassetsandoptimizesreturnon
investmentthroughausagebasedsubscriptionmodel.Readyforbigdataenvironments,Talends
flexiblearchitectureeasilyadaptstofutureITplatforms.Andacommonsetofeasytousetools
implementedacrossallTalendproductsenableteamstoscaledeveloperskillsets,too.
Morethan1,800activesubscribersworldwideleverageTalendssolutionsandservices.The
companyhasmajorofficesinNorthAmerica,EuropeandAsia,andaglobalnetworkoftechnical
andservicespartners.Formoreinformation,pleasevisitwww.talend.com.

ContactUs
Contactinfogoeshere:
www.talend.com/contact
info@talend.com
partners@talend.com
sales@talend.com

TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200

Page13

WP173.1-EN

You might also like