Professional Documents
Culture Documents
DataManagement
IsBigDataTheNewNormal?
TableofContents
Introduction_______________________________________________________3
SeparatingRealityandHype___________________________________________3
WhyAreFirmsMakingITInvestmentsInBigData? _____________________4
TrendsInDataManagementTechnologies_______________________________5
The Technology Maturity Cycle _______________________________________________ 6
ITChallengesMeetingNewDataDemands_______________________________7
ManagingIncreasingDataVolumeandVariety_________________________7
EnsuringDataGovernanceandDataQuality___________________________8
StrategiesForManagingEvolvingBig(andEnterprise)Data_________________10
ATotalDataManagementApproachForAllData______________________10
BenefitsofTotalDataManagementforAllYourData __________________11
TalendTotalDataManagement_______________________________________12
AboutTalend _____________________________________________________13
ContactUs____________________________________________________13
Introduction
Muchhasbeenwrittenaboutthepromiseofbigdata.Withthegrowthofcorporatedata,itis
clearthatbigdatamaybecomethenewnormal.Inanalytics,forexample,companiesareableto
makemoreinformeddecisionsbyanalyzingbiggerandmorediversesetsofdata.ArecentGartner
reportstatesthatthrough2015,organizationsintegratinghighvalue,diverse,newinformation
typesandsourcesintoacoherentinformationmanagementinfrastructurewilloutperformtheir
industrypeersfinanciallybymorethan20%(Source).
However,thisnewdatamanagementjourneywillbringmanyinstancesoffailedbigdata
experimentsrequiringrearchitectureinafewyears.Effectivelyincorporatingbigdatatechnologies
likeHadoopandNoSQLintoyourcurrentinformationarchitectureisnotasimpleexercise.
Thispaperreviewsthebusinessdriversthatareimpactingthisnewwaveofdatamanagement,
highlightsthenewtoolscompanieshaveforconsiderationintheirnextgenerationdata
architecture,andoutlinesatotaldatamanagementblueprintforconsiderationinyour
informationstrategy.
SeparatingRealityandHype
Isthereanythingbigdatacannotsolve?Isbigdataoverhyped?Well,manywouldsayyesasbig
dataisbeingusedpervasivelyinvendorpresentationstoadorncompanyinformationstrategies.
Bigdatausecasesarerangingfrommarketingcampaignanalysisandrecommendationenginesto
predictiveanalytics,sentimentanalysis,andfrauddetection.Somewereevenproposingtheuseof
bigdatatopredictthenextPopeasleaderoftheCatholicChurch.Whatsnextyoumightask?
Well,weareatthetipofthebigdataiceberginfindingallthevaluebigdatacandeliver.
Whatisundeniable,isthatthedigitaluniverseisrapidlygrowing.Forexample:
Thetotalworldwidevolumeofdataisgrowingat59%peryear,withthenumberoffiles
growingat88%peryear.(Source)
IDCEstimatesthatby2020,businesstransactionsontheinternetbusinesstobusiness
andbusinesstoconsumerwillreach450billionperday.(Source)
Akamaianalyzes75millioneventsperdaytobettertargetadvertisements.(Source)
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page3
Walmarthandlesmorethan1millioncustomertransactionseveryhour,whichisimported
intodatabasesestimatedtocontainmorethan2.5petabytesofdata.(Source)
Morethan5billionpeoplearecalling,texting,tweetingandbrowsingonmobilephones
worldwide.(Source)
100terabytesofdatauploadeddailytoFacebook.(Source)
WhyAreFirmsMakingITInvestmentsInBigData?
ArecentsurveybyTalendhighlightsthetopbusinessdriversforbigdatawithincreasingthe
accuracyanddepthofpredictiveanalyticsbeingtheprimaryreason.Withsomuchunstructured
datanotbeinganalyzed,firmsrecognizethevalueinusingbigdata.
Whatarethebusinessdriversforbigdatainyourorganization?(multipleresponses,n=95)
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page4
TrendsInDataManagementTechnologies
Whether big data is in production, in or on the horizon for your organization, there are considerations in several
areas of data management that require strategy and planning:
RealtimeDataProcessingtherequirementtohaveinformationondemandorin
realtimeisincreasing,asfirmsdonotwanttomakedecisionswithlastmonthsoreven
yesterdaysinformation.
DataServicestoaddressthedemandforinformationaccessibilityandavailabilityfrom
manyconsumers,datamanagementfeaturessuchasdataintegrationanddataqualityare
becomingavailableasonpremiseorcloudbasedservices.
OpenSourceDataManagementtoovercomethehugeinvestmenttypicallymadein
enterprisesoftware,opensourcemakesdatamanagementfiscallypossibleforsmaller
organizationsandimprovestotalcostofownershipforbigones.
DataGovernanceanevolvingapproachtosetforthpoliciesandproceduresfortheuse,
accessandmanagementofdata.Particularlyimportantasbusinessusersplayanever
increasingroleinmanagingdataandbudgeting.
BigDataandNoSQLprovidingabridgebetweenfamiliarlegacydatabasetechnologies
andthedemandsofrapidlystoring,retrieving,processingandanalyzingbigdata.
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page5
TheTechnologyMaturityCycle
Thereisatypicalmaturitycycleasthesenewtechnologiesareintroduced.Firstitneedsto
integratewithexistingsystems,nextitneedstobecomehardenedtoaddressscalability,
reliabilityandsecurityconcerns,thenitmayincorporaterelatedtechnologiesincludingsoftware
lifecyclemanagementfeatureslikeprojectmanagementandtesting,orintheexampleofdata
management,capabilitiestoexposeafunctionasaconsumableservice.Thetechnologymaturity
timelinemaylooklike:
Eachoneoftheaforementionedtrendsisgoingthroughthehypetoproductivitytimeline
offeringgreaterbenefitstoendusers.Ifwelookatbigdata(Hadoop)forexample,itstartedasan
implementationofMapReduceandhasbeenextendedintoamassiveoperatingsystemfor
distributedparallelprocessingofhugeamountsofdata.Othertechnologieshavebeenadded
includingHbase,anopensourcedatabase,HCatalogformetadata,ApacheHiveasadata
warehouseinfrastructure,andApachePigasaprogramminglanguage.
However,Hadoopbyitself,althoughwellequippedforstoringandprocessinglargeamountsof
informationacrossnodes,doesnotyetincludesoftwarelifecyclemanagementfunctionality,such
asprojectmanagementorrelateddatamanagementfunctionalitysuchasdataintegration,meta
datamanagement,governance,profiling,cleansingandmatching.Datagovernanceisparticularly
importantforsuccessfulfirmwideadoption,sinceitincorporatesbusinessprocesses,
organizationalrolesandbestpractices.
Isbigdataareplacementforadatawarehouse?AEuropeantelecommunicationsprovider
needstostoreallcalltransactionsinadatabasefor2yearstocomplywithregulations.One
alternativeistocreatealargedatawarehousetohousethisinformation;however,aless
expensiveandfasterperformingalternativeistoinstallaHadoopclusteroraNoSQLdatabase
tostoretheinformation.Sometimescalleddatahoarding,manycompaniesareusingthis
strategyasameanstostoreinformationforsubsequentprocessing.
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page6
ITChallengesMeetingNewDataDemands
Tomeetthenewbusinessdemandsandtheappetitetoprocessandanalyzemore
information,ITlookstomodernizetheirdatamanagementprocesses.Conventionaldata
managementtoolsfailwhentryingtointegrate,searchandanalyzeverylargedatasets.Inthe
past,addinghardwareandsoftwarewasasolutiontomakeinformationavailableandscalable,but
thatisnolongerthecase.Informationneedstobeavailable24x7acrossavarietyofchannelsfrom
webtomobiledevices.
ManagingIncreasingDataVolumeandVariety
Muchhasbeenwritten(includingthispapersofar)abouttheincreasingvolumeofgrowingdata.
ITneedstobeconcernedsincetheprojectedgrowthofglobaldatageneratedperyearis
increasingat40%;howeverITbudgetsareincreasingonlyat5%(Source).Butwhatisofgreater
concernistheincreasingvarietyofdata,ortheinformationthatisnotinastructureddatabaseor
datawarehouse,i.e.contentmanagementrepositories,streamingdata(audio,video),images,
blogs,customercommentforums,twitter,networksensors,transactionaldataandmore.By
2015,IDCestimatesthat90%ofdatainthedigitaluniversewillbeunstructured(Source).Itis
clearthatfirmswhofocusonvolumealoneandnotdatacharacteristicslikevarietywillneedto
makeadditionalinvestments.
ThefollowingsurveyfromForresterhighlightstheissueintryingtoperformbettercustomer
analyticsataskthatinvolvesintegratingmanysourcesofstructuredandunstructureddata.
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page7
Source:ForresterWebinarwithTalend
EnsuringDataGovernanceandDataQuality
Dependingonthegoalofabigdataproject,poordataqualitycanhaveabigimpacton
effectiveness.Itcanbearguedthatinconsistentorinvaliddatacouldhaveexponentialimpacton
analysisinthebigdataworld.Ifarecordentersasystemasduplicate,incompleteorincorrect,
manyothersystemsareimpacted.Asanalysisonbigdatagrows,sotoowilltheneedfor
validation,standardization,enrichmentandresolutionofdata.Evenidentificationoflinkagescan
beconsideredadataqualityissuethatneedstoberesolvedforbigdata.
Aswithanycorporatedatamanagementdisciplinebigdatawilleventuallyneedtocomplywith
establishedcorporatestandardsandacceptedprojectmanagementnormsfororganizational,
deploymentandsharingofprojectartifacts.
Nottoomanyyearsagothetermsterabyte(or1012bytes)andpetabyte(or1015bytes)were
seenasverylarge.ARFP(requirementforproposal)foradatawarehousemayhavebeento
reliablyhandleapetabyteofinformation.Nowtermssuchasexabyte(or1018bytes)and
zettabyte(or1021bytes)maysoonbethenorm.
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page8
InarecentGartnerreport,theystatethatthrough2015,organizationsintegratinghighvalue,
diverse,newinformationtypesandsourcesintoacoherentinformationmanagement
infrastructurewilloutperformtheirindustrypeersfinanciallybymorethan20%.However,
through2015,85%ofenterpriseswillfailtoadapttheirinformationinfrastructureto"big
data,"sociallymediatedcontentandnewconnecteddevices.Thismeansthatallsystems,even
thelargestintegrationplatforminIT(thedatawarehouse),willbeoverwhelmedinthenext
threetofouryears(Source).
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page9
StrategiesForManagingEvolvingBig(andEnterprise)Data
Manycompanieshavebuilttheirinformationarchitecturedepartmentbydepartmenttosolve
specificdatamanagementproblems.Enterprisedataorthedatainbusinesssystemssuchas
CRMandERPapplicationsarethenlinkedtogetherbydataintegrationanddataqualitytoolsinto
adatawarehouseforfurtherprocessing,analysis,viewingandmanagement.Thisisanexampleof
totaldatamanagementforenterprisedata.
Withtherequirementtonowincludevastamountsofunstructureddataforanalysis(2to10times
structureddata),onecouldtakeanapproachtocreateaseparatebigdataarchitectureasanother
silo.Unfortunately,manyoftheexistingdatavendorsprescribethisapproachsincetheirtools
cannotbeadaptedtosupporttotaldatamanagementacrossstructuredandunstructureddata.A
problemiscreatedthatyouarebuildinganinformationarchitecturewhere:thereisalotof
duplicationacrosssiloswithlimitedreuse;thereisinconsistentdatagovernanceanddataquality
practicesacrossthecompany;andthereexistshighcostsforoperations,maintenanceand
softwarelicenses.
Bigdatashouldbetreatedlikenonbigdata,i.e.youneedtostoreit,youneedtocleanseand
enrichit,andyouneedtouseit,whichcouldberetrievingitforaqueryoranalyzingitinasystem.
Themoreyoursystemsareintegrated,thebiggerthepotentialfordataqualityanddata
governanceissues.BigdatamanagementisnotjustaboutdeployingaHadoopcluster,deploying
dataintoaHadoopfilesystemacrossdistributednodesandthenMapReduceprocessing.Itis
makingsureyourentireinformationenvironmentincorporatestoolstonotonlyhandlethe
increasingvolume,velocity,varietyandcomplexityofdata,butalsoprovidesthesoftwarelifecycle
management,serviceenablementanddataqualityfunctionsaswellasdatagovernance
procedures.
ATotalDataManagementApproachForAllData
Insteadofmanagingsilosofstructured,unstructuredandsemistructureddataandlinking
together(bottomup),arecommendedapproachistothinkofmanagingdataholistically(top
down).Yourcompanyneedstomanagebigdatajustasitdoesenterprisedata,andletsnotforget
aboutsmalldatasuchasspreadsheets.Thedatacanbelocatedonpremise,inthecloud,andina
SaaSapp.Itcanbetransactionalinflightdataorclickstreamdatafromyourwebsite.Youneedto
profileallofthedata,youneedtodeduplicateandmatchdata,andyouneedtoserviceenableit
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page10
forconsumption.Andofteninformationneedstobeprocessedandavailableinrealtime.Total
datamanagementisapplyingthesamebestpracticesandtoolstoallofyourdata.
Fortunatelyforconsumerstoday,therearegoodalternativestocreatethistotaldata
managementapproachthatwillnotbreakthebank.Mostdatamanagementfunctionshave
becomecommodityitems,sothebuyercancostefficientlyachievetotaldatamanagementfor
bothstructuredandunstructureddata.
BenefitsofTotalDataManagementforAllYourData
Withatotaldatamanagementapproachforallyourdata,firmscanexpecttoseethefollowing
benefits:
ImprovedDataQualitybyprofiling,matching,dedupingandmonitoringallyourdata,
youcanimprovethecompleteness,accuracyandintegrityofyourdataandthereforethe
valueitwillprovidetotheorganization.
ImprovedAccountabilityhavingsetdatagovernanceprocessesinplaceforallyourdata,
firmscanbettermeasuretheuseofdata,ensuretheappropriateaccess,andsetinternal
policiesandprocedures.
LowerCostswithadecreaseintheamountofsiloscreatedtherewillbeadecreasein
theamountofduplicateprocesses.Withdataservicesreuse,projectscanbecompleted
quickeratalowercost.Firmscanalsoleveragecommoditydatamanagementtoolsfor
mostdatamanagementfunctions.
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page11
TalendTotalDataManagement
Talendprovidesaunifiedplatformformanagingdata,nomatterifdataresidesintextfiles,
enterprisedatabases,spreadmartsorbigdataclusterslikeHadoop.Theproductsetincorporates
toolsforallyourdatamanagementneedsincluding:bigdataintegration,dataqualityanddata
governance,masterdatamanagement,businessprocessmanagementandapplicationintegration.
Talendrecognizesthatdatamanagementisaboutsupportingawiderangeofdataneeds,
regardlessofscale.Usersneednotlearntousedifferentenvironmentstomanagebigdataand
enterprisedata.Thissavesoninfrastructurecosts,licensing,usertraining,projectdevelopment
timeandthewiderangeofcostsassociatedwithmanagingdifferentdatascales.Forexampleto
integratebigdatasources,youuseTalenddatamanagementgraphicaltoolstointegratebigdata
andNoSQLsourcesaswellasallyourenterprisedataandsmalldata,andthengeneratecodein
Hive,Pigandothernativecodebases.Withbigdataqualitythattakesadvantageofthemassively
parallelenvironmentofHadoop,youcanimprovethecompleteness,accuracyandintegrityof
dataaswellasremoveduplicates.Talendprovidesbigdatagovernancethroughasimple,intuitive
environmentforimplementinganddeployingabigdataprogramwiththeabilitytoschedule,
monitoranddeployanybigdatajob.
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page12
AboutTalend
Talendprovidesintegrationthattrulyscales.Fromsmallprojectstoenterprisewide
implementations,Talendshighlyscalabledata,applicationandbusinessprocessintegration
platformmaximizesthevalueofanorganizationsinformationassetsandoptimizesreturnon
investmentthroughausagebasedsubscriptionmodel.Readyforbigdataenvironments,Talends
flexiblearchitectureeasilyadaptstofutureITplatforms.Andacommonsetofeasytousetools
implementedacrossallTalendproductsenableteamstoscaledeveloperskillsets,too.
Morethan1,800activesubscribersworldwideleverageTalendssolutionsandservices.The
companyhasmajorofficesinNorthAmerica,EuropeandAsia,andaglobalnetworkoftechnical
andservicespartners.Formoreinformation,pleasevisitwww.talend.com.
ContactUs
Contactinfogoeshere:
www.talend.com/contact
info@talend.com
partners@talend.com
sales@talend.com
TalendInc.
800BridgeParkway,Suite200,RedwoodCity,California94065US
Tel:+1(650)5393200
Page13
WP173.1-EN