Pure Data System for Haoop Big Data !pp"ian#e Dipl.Ing. Wolfgang Nimfhr Business Development Executive IBM Software Group Big Data !"#$ International Business Machines %orporation ! IBM Information Management &orum' ##. Septem(er !"#$' Wien $%ery #ompany &as a Big Data ' !na"yti#s (pportunity )&e po*er of Data #oming toget&er Wit& t&e po*er of )e#&no"ogy )o e"i%er Impro%e (ut#omes $nri#& your information base with Big Data Exploration Pre%ent #rime with Security and Intelligence Extension (ptimi+e operations with Operations Analysis ,ain I) effi#ien#y an s#a"e with Data Warehouse Augmentation Impro%e #ustomer intera#tion with Enhanced 360 !iew o" the #ustomer )o generate Business -a"ue !"#$ International Business Machines %orporation $ IBM Information Management &orum' ##. Septem(er !"#$' Wien . 2013 IBM /orporation Ho* o you &arness t&e e"ep&ant in t&e room0 !"#$ International Business Machines %orporation ) IBM Information Management &orum' ##. Septem(er !"#$' Wien /&a""enges of Bui"ing a Haoop /"uster * (pen sour#e is not rea""y free1 +e,uires man- resources to manage a .a/oop cluster. * !pa#&e Haoop #onsists of mu"tip"e subpro2e#ts 0m(iguit- on which components to use in which situations. * 3a#4 of integrate #"uster management No integrate/ tools to manage the har/ware an/ .a/oop. * /ommoity &ar*are base Haoop #"usters pose signifi#ant #&a""enges Several /iscrete components 1open source' thir/2 part- tools3 means no guarantee/ efficiencies or pretesting /one among the components. !"#$ International Business Machines %orporation 4 IBM Information Management &orum' ##. Septem(er !"#$' Wien .ive 5ig Map+e/uce .D&S HCatalog Visualization Development 6ools 3et5s simp"ify Big Data 6 Designed to Simplif- the (uil/ing' /eplo-ing an/ management of a .a/oop cluster Spee/ the time2to2value for .a/oop an/ unstructure/ /ata Maximi7e the overall anal-tic ecos-stem 5rovi/e enterprise securit- an/ platform management From #ustom an #omp"e7 6)o organi+e simp"i#ity $ Based on IB% internal testing and customer "eed&ac'( )#ustom &uilt clusters) re"er to clusters that are not pro"essionally pre*&uilt+ pre*tested and optimi,ed( Indi-idual results may -ary( !"#$ International Business Machines %orporation 8 IBM Information Management &orum' ##. Septem(er !"#$' Wien !nnoun#ing t&e ne* PureData System for Haoop 0ccelerate time to value 0ccelerate time to insight Simplif- (ig /ata a/option an/ consumption Exten/ the value of the /ata warehouse Implement enterprise class (ig /ata Minimi7e s-stem setup an/ a/ministration System for Hadoop !"#$ International Business Machines %orporation 9 IBM Information Management &orum' ##. Septem(er !"#$' Wien IBM PureData System for Haoop Accelerate .adoop analytics with appliance simplicity Spee Spee/ to insight with (uilt2in anal-tics Spee/ to value with accelerate/ /eplo-ment Simp"i#ity +ea/- to loa/ /ata in hours Integrate/ s-stem management 0ppliance approach re/uces complexit- Single point of support Smart Esta(lish a cost efficient online /ata archive Easil- leverage /ata across the (ig /ata platform Enterprise securit-' governance an/ high availa(ilit- System for Haoop !"#$ International Business Machines %orporation : IBM Information Management &orum' ##. Septem(er !"#$' Wien Benefits of IBM 5ureData S-stem for .a/oop Dep"oy 87 faster than custom*&uilt solutions $ Bui"t9in %isua"i+ation to accelerate insight Bui"t9in ana"yti# a##e"erators 2 unli'e &ig data appliances on the mar'et Sing"e system #onso"e "or "ull system administration :api maintenan#e upates with automation ;o assemb"y re<uire, ata "oa reay in &ours (n"y integrate Haoop system *it& bui"t9in ar#&i%ing too"s 2 De"i%ere *it& more robust se#urity than open source so"tware !r#&ite#te for &ig& a%ai"abi"ity # Base/ on IBM internal testing an/ customer fee/(ac;. <%ustom (uilt clusters< refer to clusters that are not professionall- pre2(uilt' pre2teste/ an/ optimi7e/. In/ivi/ual results ma- var-. ! Base/ on current commerciall- availa(le Big Data appliance pro/uct /ata sheets from large ven/ors. =S >N?@ %?0IM. Accelerate Big Data /ime to !alue Accelerate Big Data /ime to !alue Simpli"y Big Data Adoption 0 #onsumption Simpli"y Big Data Adoption 0 #onsumption Implement Enterprise #lass Big Data Implement Enterprise #lass Big Data !"#$ International Business Machines %orporation A IBM Information Management &orum' ##. Septem(er !"#$' Wien InfoSp&ere BigInsig&ts Entry 1oint "or .adoop !"#$ International Business Machines %orporation #" IBM Information Management &orum' ##. Septem(er !"#$' Wien !na"yti# !##e"erators B Social Me/ia 0ccelerator B Machine Data 0ccelerator B BigSheets sprea/sheet an/ visuali7ation B 0/vance/ 6ext 0nal-tics 0ccelerator B C0D? ,uer- language Performan#e an (ptimi+ation B 0/aptive Map +e/uce B 0/vance/ Sche/uler B BigIn/ex for large scale in/exing B &ast' splitta(le compression Se#urity B +ole (ase/ authori7ation (ptim De%e"opment Stuio B Eclipse (ase/ IDE for Cava Big Data Integration B Information Server' InfoSphere Streams' Nete77a' DB! $nterprise $nab"ement B Big SD? B G5&S2&%> IB%2s distri&ution is &ased on Apache .adoop and utili,es many o" the capa&ilities includes in that distri&ution+ &ut IB% is "ocused on ma'ing its distri&ution more o" an enterprise class o""ering(3 BigInsig&ts -a"ue !bo%e an Beyon Haoop !"#$ International Business Machines %orporation ## IBM Information Management &orum' ##. Septem(er !"#$' Wien IBM2certifie/ 0pache .a/oop !ministration ' Se#urity Wor4"oa (ptimi+ation Integrate De%e"opment $n%ironment /onne#tors =Data, !na"yti#s, Integration> !%an#e )e7t !na"yti#s $ngine -isua"i+ation ' $7p"oration (pen sour#e #omponents !itiona" enterprise #apabi"ities ?ey enterprise #apabi"ities on top of an unmoifie open sour#e founation !"#$ International Business Machines %orporation #! IBM Information Management &orum' ##. Septem(er !"#$' Wien 1 2 BigInsig&ts 2.1 features a %ariety of en&an#ements t&at e"i%er 4ey $nterprise Haoop #apabi"ities * >ut of the (ox .igh 0vaila(ilit- * Seamless' automatic an/ transparent failover for .D&S NameNo/e * Eliminates a/min intervention * +e/uces /owntime for recover- of the cluster * .ar/ware fencing to guarantee /ata integrit- * No single point of failure * Built2in .igh 0vaila(ilit- * 5>SIE compliance * Enhance/ Securit- with 0%? support * Support for Storage 5ools * SnapShot capa(ilit- * %omprehensive Stan/ar/ 0NSI SD? support to access /ata store/ in BigInsights * Stan/ar/s compliant CDB% F >DB% /rivers * ?everages Map+e/uce parallelism in complex /ata sets * Direct access for low2latenc- in small ,ueries' e.g. su(2secon/ response to .Base ,ueries Hig& !%ai"abi"ity ,PFS9FP( support Big S@3 /ognos %10, 6 !"#$ International Business Machines %orporation #$ IBM Information Management &orum' ##. Septem(er !"#$' Wien BigInsig&ts $nterprise $ition 2.1 %onnectivit- an/ Integration Streams Nete77a 6ext processing engine an/ li(rar- CDB% &lume Infrastructure Ca,l .ive 5ig .Base Map+e/uce .D&S GooHeeper In/exing ?ucene 0/aptive Map+e/uce >o7ie 6ext compression Enhance/ securit- &lexi(le sche/uler >ptional IBM an/ partner offerings 0nal-tics an/ /iscover- I0ppsJ DB! BigSheets We( %rawler Distri( file cop- DB export Boar/rea/er DB import 0/ hoc ,uer- Machine learning Data processing . . . 0/ministrative an/ /evelopment tools We( console * Monitor cluster health' Ko(s' etc. * 0// L remove no/es * Start L stop services * Inspect Ko( status * Inspect wor;flow status * Deplo- applications * ?aunch apps L Ko(s * Wor; with /istri( file s-stem *Wor; with sprea/sheet interface *Support +ES62(ase/ 05I * . . . + Eclipse tools * 6ext anal-tics * Map+e/uce programming * Ca,l' .ive' 5ig /evelopment * BigSheets plug2in /evelopment * >o7ie wor;flow generation Integrate/ installer >pen Source IBM IBM %ognos BI Big SD? 0ccelerator for machine /ata anal-sis 0ccelerator for social /ata anal-sis Guar/ium DataStage Data Explorer S,oop .%atalog G5&S B&5> G5&S B&5> !"#$ International Business Machines %orporation #) IBM Information Management &orum' ##. Septem(er !"#$' Wien $7p"oration, Integrate Ware&ouse, an Mart Aones Discover- Deep +eflection >perational 5re/ictive All Data Sources Information Ingestion an (perationa" Information /ase Management !na"yti#s !pp"i#ations !"erts 3aning !rea, !na"yti#s Aone an !r#&i%e +aw Data Structure/ Data 6ext 0nal-tics Data Mining Entit- 0nal-tics Machine ?earning :ea"9time !na"yti# Aone Mi/eoL0u/io Networ;LSensor Entit- 0nal-tics 5re/ictive Stream 5rocessing Data Integration Master Data Streams Information ,o%ernan#e, Se#urity an Business /ontinuity Information ,o%ernan#e, Se#urity an Business /ontinuity Big Data Ecosystem Analytic Applications /ogniti%e 4earn Dynamically5 Pres#ripti%e Best Outcomes5 Prei#ti%e What #ould .appen5 Des#ripti%e What .as .appened5 $7p"oration an Dis#o%ery What Do 6ou .a-e5 /"ou Ser%i#es IBM Watson IBM Big Data ' !na"yti#s :eferen#e !r#&ite#ture PureData System for Haoop !#ti%e !r#&i%e Big Data $7p"oration Pre Pro#essing Hub !"#$ International Business Machines %orporation #4 IBM Information Management &orum' ##. Septem(er !"#$' Wien 7se #ase8 Big Data $7p"oration Bse /ases Explore new /ata an/ previousl- untappe/ sources Misuali7e an/ gain new insight with eas- to use sprea/sheet2st-le anal-sis I/entif- useful information that woul/ a// value when integrate/ =se/ for /ata profiling to un/erstan/ /ata (efore moving to other s-stems !"#$ International Business Machines %orporation #8 IBM Information Management &orum' ##. Septem(er !"#$' Wien 0/2hoc anal-tics for /ata scientists 0nal-7e a variet- of /ata 2 unstructure/ an/ structure/ Browser2(ase/ Sprea/sheet metaphor for exploringL visuali7ing /ata Gather Extract Explore Iterate %rawl B gather statisticall- 0/apterBgather /-namicall- Document2level info %leanse' normali7e 0nal-7e' annotate' filter Misuali7e results Iterate through an- prior step Sprea/sheet2st-le anal-sis process with BigSheets 7se #ase8 Big Data $7p"oration !"#$ International Business Machines %orporation #9 IBM Information Management &orum' ##. Septem(er !"#$' Wien 7se #ase8 !#ti%e !r#&i%e Bse /ases Imme/iate storage alternative of col/ /ata %ost savings for col/ /ata %ompliance re,uirements Simple anal-tics L exploration PureData System for Analytics PureData System for Hadoop !"#$ International Business Machines %orporation #: IBM Information Management &orum' ##. Septem(er !"#$' Wien 7se #ase8 Pre9Pro#essing Hub Bse /ases 0ggregation of /ata 5re2process cleansing %ompliance re,uirements Simple anal-tics L exploration !"#$ International Business Machines %orporation #A IBM Information Management &orum' ##. Septem(er !"#$' Wien PureData System for Haoop Bringing Big Data to the enterprise Simplif- the /eliver- of unstructure/ /ata to the enterprise Integrate .a/oop with the /ata warehouse ?everage .a/oop for /ata archive 5rovi/e (est in class securit- 5rovi/e /ata exploration across structure/ an/ unstructure/ /ata 0ccelerate insight with machine /ata 0ccelerate insight with social /ata S i m p " i f y
B i g
D a t a
f o r
t & e
e n t e r p r i s e 1 !"#$ International Business Machines %orporation !" IBM Information Management &orum' ##. Septem(er !"#$' Wien For apps "i4e $9#ommer#e6 Database cluster services optimized for transactional throughput and scalability For apps "i4e /ustomer !na"ysis6 Data warehouse services optimized for high-speed, peta-scale analytics and simplicity For apps "i4e :ea"9time Frau Dete#tion6 perational data warehouse services optimized to balance high performance analytics and real-time operational throughput !eeting "ig Data #hallenges $ %ast and &asy' IBM PureData System Fami"y System for Transactions System for Analytics System for Operational Analytics System for Hadoop For $7p"oratory !na"ysis ' @ueryab"e !r#&i%e Hadoop data services optimized for big data analytics and online archive with appliance simplicity . 2013 IBM /orporation 20 !"#$ International Business Machines %orporation !# IBM Information Management &orum' ##. Septem(er !"#$' Wien ***.ibm.#omCsoft*areCataCpureataC&aoopC !"#$ International Business Machines %orporation IBM Information Management &orum' ##. Septem(er !"#$' Wien !"#$ International Business Machines %orporation !! IBM Information Management &orum' ##. Septem(er !"#$' Wien $7p"oration, Integrate Ware&ouse, an Mart Aones Discover- Deep +eflection >perational 5re/ictive All Data Sources Information Ingestion an (perationa" Information /ase Management !na"yti#s !pp"i#ations !"erts 3aning !rea, !na"yti#s Aone an !r#&i%e +aw Data Structure/ Data 6ext 0nal-tics Data Mining Entit- 0nal-tics Machine ?earning :ea"9time !na"yti# Aone Mi/eoL0u/io Networ;LSensor Entit- 0nal-tics 5re/ictive Stream 5rocessing Data Integration Master Data Streams Information ,o%ernan#e, Se#urity an Business /ontinuity Information ,o%ernan#e, Se#urity an Business /ontinuity Big Data Ecosystem Analytic Applications /ogniti%e 4earn Dynamically5 Pres#ripti%e Best Outcomes5 Prei#ti%e What #ould .appen5 Des#ripti%e What .as .appened5 $7p"oration an Dis#o%ery What Do 6ou .a-e5 /"ou Ser%i#es IBM Watson IBM Big Data ' !na"yti#s :eferen#e !r#&ite#ture 7D%ise Data $7p"orer !"#$ International Business Machines %orporation !$ IBM Information Management &orum' ##. Septem(er !"#$' Wien !"#$ International Business Machines %orporation !) IBM Information Management &orum' ##. Septem(er !"#$' Wien Part of t&e IBM Big Data P"atform (or)load ptimized Solutions for all your analytic needs !na"yti#s ' De#ision Management So"utions Big Data Infrastru#ture IBM Big Data P"atform !##e"erators Information Integration ' ,o%ernan#e -isua"i+ation ' Dis#o%ery !pp"i#ation De%e"opment Systems Management Stream /omputing Haoop System Data Ware&ouse PureData System for Analytics PureData System for Hadoop . 2013 IBM /orporation 2E