You are on page 1of 14

Informatica Map/Session Tuning Covers basic, intermediate, and advanced tuning practices.

(by: Dan Linstedt


Table of Contents

Basic Guidelines Intermediate Guidelines Advanced Guidelines

INFORMATICA BASIC TUNING GUID !IN S


The following points are high-level issues on where to go to perform "tuning" in Informatica's products. These are NOT permanent instructions, nor are they the end-all solution. Just some items which if tuned first! might ma"e a difference. The level of s"ill availa#le for certain items will cause the results to vary. To 'test' performance throughput it is generally recommended that the source set of data produce a#out $%%,%%% rows to process. &eyond this - the performance pro#lems ' issues may lie in the data#ase - partitioning ta#les, dropping ' recreating inde(es, striping raid arrays, etc... )ithout such a large set of results to deal with, you're average timings will #e s"ewed #y other users on the data#ase, processes on the server, or networ" traffic. This seems to #e an ideal test si*e set for producing mostly accurate averages. Try tuning your maps with these steps first. Then move to tuning the session, iterate this se+uence until you are happy, or cannot achieve #etter performance #y continued efforts. If the performance is still not accepta#le,. then the architecture must #e tuned which can mean changes to what maps are created!. In this case, you can contact us - we tune the architecture and the whole system from top to bottom. ,--. T/I0 IN 1IN23 In order to achieve optimal performance, it's always a good idea to stri"e a #alance #etween the tools, the data#ase, and the hardware resources. 4llow each to do what they do #est. 5arying the architecture can ma"e a huge difference in speed and optimi*ation possi#ilities. !. Utili"e a database #li$e Oracle % S&base % Informi' % DB( etc)))* for si+nificant data ,andlin+ o-erations #suc, as sorts. +rou-s. a++re+ates*) In ot"er #ords, staging tab$es can be a "uge benefit to para$$e$ism of operations. In para$$e$ design % simp$y defined by mat"ematics, near$y a$#ays cuts your e&ecution time. Staging tab$es "ave many benefits. '$ease see t"e staging tab$e discussion in t"e met"odo$ogies section for fu$$ detai$s. (. !ocali"e) !ocali"e all tar+et tables on to t,e SAM instance of Oracle #same SID*. or same instance of S&base) Try not to use Synonyms (remote database $in)s for anyt"ing (inc$uding: $oo)ups, stored procedures, target tab$es, sources, functions, privi$eges, etc... . *ti$i+ing remote $in)s #i$$ most certain$y s$o# t"ings

-.

/.

3.

:.

B.

8.

D.

do#n. ,or Sybase users, remote mounting of databases can definite$y be a "indrance to performance. If &ou can / locali"e all tar+et tables. stored -rocedures. functions. vie0s. se1uences in t,e SOURC database) .gain, try not to connect across synonyms. Synonyms (remote database tab$es cou$d potentia$$y affect performance by as muc" as a factor of - times or more. Remove e'ternal re+istered modules) 2erform -re/-rocessin+ % -ost/ -rocessin+ utili"in+ 2 R!. S D. A34. GR 2 instead) T"e .pp$ication 'rogrammers Interface (.'I #"ic" ca$$s e&terna$s is in"erent$y s$o# (as of: !/!/(000 . 1opefu$$y Informatica #i$$ speed t"is up in t"e future. T"e e&terna$ modu$e #"ic" e&"ibits speed prob$ems is t"e regu$ar e&pression modu$e (*ni&: Sun So$aris 2/30, / C'*4s ( 5I5S 6.M, 7rac$e 8i and Informatica . It bro)e speed from !3009 ro#s per second #it"out t"e modu$e % to /8: ro#s per second #it" t"e modu$e. ;o ot"er sessions #ere running. (T"is #as a S'2CI,IC case % #it" a S'2CI,IC map % it4s not $i)e t"is for a$$ maps . Remember t,at Informatica su++ests t,at eac, session ta$es rou+,l& 5 to 5 5%( C2U6s. In )eeping #it" t"is % Informatica p$ay4s #e$$ #it" 6D<MS engines on t"e same mac"ine, but does ;7T get a$ong (performance #ise #it" .;= ot"er engine (reporting engine, >ava engine, 7L.' engine, >ava virtua$ mac"ine, etc... Remove an& database based se1uence +enerators) T"is re?uires a #rapper function / stored procedure ca$$. *ti$i+ing t"ese stored procedures "as caused performance to drop by a factor of - times. T"is s$o#ness is not easi$y debugged % it can on$y be spotted in t"e @rite T"roug"put co$umn. Copy t"e map, rep$ace t"e stored proc ca$$ #it" an interna$ se?uence generator for a test run % t"is is "o# fast you C7*LD run your map. If you must use a database generated se?uence number, t"en fo$$o# t"e instructions for t"e staging tab$e usage. If you4re dea$ing #it" 5I54s or Terabytes of information % t"is s"ou$d save you $ot4s of "ours tuning. I, =7* M*ST % "ave a s"ared se?uence generator, t"en bui$d a staging tab$e from t"e f$at fi$e, add a S2A*2;C2 ID co$umn, and ca$$ a '7ST T.652T L7.D stored procedure to popu$ate t"at co$umn. '$ace t"e post target $oad procedure in to t"e f$at fi$e to staging tab$e $oad map. . sing$e ca$$ to inside t"e database, fo$$o#ed by a batc" operation to assign se?uences is t"e fastest met"od for uti$i+ing s"ared se?uence generators. TURN OFF 7 RBOS !OGGING) T"e session $og "as a tremendous impact on t"e overa$$ performance of t"e map. ,orce over%ride in t"e session, setting it to ;76M.L $ogging mode. *nfortunate$y t"e $ogging mec"anism is not Cpara$$e$C in t"e interna$ core, it is embedded direct$y in to t"e operations. Turn off 6collect -erformance statistics6) T,is also ,as an im-act / alt,ou+, minimal at times % it #rites a series of performance data to t"e performance $og. 6emoving t"is operation reduces re$iance on t"e f$at fi$e operations. 1o#ever, it may be necessary to "ave t"is turned on D*6I;5 your tuning e&ercise. It can revea$ a $ot about t"e speed of t"e reader, and #riter t"reads. If &our source is a flat file / utili"e a sta+in+ table (see t"e staging tab$e s$ides in t"e presentations section of t"is #eb site . T"is #ay % you can a$so use SALELoader, <C', or some ot"er database <u$)%Load uti$ity. '$ace basic $ogic in

t"e source $oad map, remove a$$ potentia$ $oo)ups from t"e code. .t t"is point % if your reader is s$o#, t"en c"ec) t#o t"ings: ! if you "ave an item in your registry or configuration fi$e #"ic" sets t"e CT"rott$e6eaderC to a specific ma&imum number of b$oc)s, it #i$$ $imit your read t"roug"put (t"is on$y needs to be set if t"e sessions "ave a demonstrated prob$ems #it" constraint based $oads ( Move t"e f$at fi$e to $oca$ interna$ dis) (if at a$$ possib$e . Try not to read a fi$e across t"e net#or), or from a 6.ID device. Most 6.ID array4s are fast, but Informatica seems to top out, #"ere interna$ dis) continues to be muc" faster. 1ere % a $in) #i$$ ;7T #or) to increase speed % it must be t"e fu$$ fi$e itse$f % stored $oca$$y. !0. Tr& to eliminate t,e use of non/cac,ed loo$u-s) <y issuing a non%cac"ed $oo)up, you4re performance #i$$ be impacted significant$y. 'articu$ar$y if t"e $oo)up tab$e is a$so a Cgro#ingC or CupdatedC target tab$e % t"is genera$$y means t"e inde&es are c"anging during operation, and t"e optimi+er $ooses trac) of t"e inde& statistics. .gain % uti$i+e staging tab$es if possib$e. In uti$i+ing staging tab$es, vie#s in t"e database can be bui$t #"ic" >oin t"e data toget"erF or Informatica4s >oiner ob>ect can be used to >oin data toget"er % eit"er one #i$$ "e$p dramatica$$y increase speed. !!. Se-arate com-le' ma-s % try to brea) t"e maps out in to $ogica$ t"readed sections of processing. 6e%arrange t"e arc"itecture if necessary to a$$o# for para$$e$ processing. T"ere may be more sma$$er components doing individua$ tas)s, "o#ever t"e t"roug"put #i$$ be proportionate to t"e degree of para$$e$ism t"at is app$ied. . discussion on 17@ to perform t"is tas) is posted on t"e met"odo$ogies page, p$ease see t"is discussion for furt"er detai$s. !(. BA!ANC ) Balance bet0een Informatica and t,e -o0er of S8! and t,e database) Try to uti$i+e t"e D<MS for #"at it #as bui$t for: reading/#riting/sorting/grouping/fi$tering data en%masse. *se Informatica for t"e more comp$e& $ogic, outside >oins, data integration, mu$tip$e source feeds, etc... T"e ba$ancing act is difficu$t #it"out D<. )no#$edge. In order to ac"ieve a ba$ance, you must be ab$e to recogni+e #"at operations are best in t"e database, and #"ic" ones are best in Informatica. T"is does not degrade from t"e use of t"e 2TL too$, rat"er it en"ances it % it4s a M*ST if you are performance tuning for "ig"%vo$ume t"roug"put. !-. TUN t,e DATABAS ) Don6t be afraid to estimate9 sma$$, medium, $arge, and e&tra $arge source data set si+es (in terms of: numbers of ro#s, average number of bytes per ro# , e&pected t"roug"put for eac", turnaround time for $oad, is it a tric)$e feedG 5ive t"is information to your D<.4s and as) t"em to tune t"e database for C#ost caseC. 1e$p t"em assess #"ic" tab$es are e&pected to be "ig" read/"ig" #rite, #"ic" operations #i$$ sort, (order by , etc... Moving dis)s, assigning t"e rig"t tab$e to t"e rig"t dis) space cou$d ma)e a$$ t"e difference. *ti$i+e a '26L script to generate Cfa)eC data for sma$$, medium, $arge, and e&tra $arge data sets. 6un eac" of t"ese t"roug" your mappings % in t"is manner, t"e D<. can #atc" or monitor t"roug"put as a rea$ $oad si+e occurs. !/. Be sure t,ere is enou+, S3A2. and T M2 s-ace on &our 2MS R7 R mac,ine) ;ot "aving enoug" dis) space cou$d potentia$$y s$o# do#n your entire server during processing (in an e&ponentia$ fas"ion . Sometimes t"is means #atc"ing t"e dis) space as #"i$e your session runs. 7t"er#ise you may not get a

good picture of t"e space avai$ab$e during operation. 'articu$ar$y if your maps contain aggregates, or $oo)ups t"at f$o# to dis) Cac"e directory % or if you "ave a H7I;26 ob>ect #it" "eterogeneous sources. !3. 2lace some +ood server load monitorin+ tools on &our 2MServer in develo-ment % #atc" it c$ose$y to understand "o# t"e resources are being uti$i+ed, and #"ere t"e "ot spots are. Try to fo$$o# t"e recommendations % it may mean upgrading t"e "ard#are to ac"ieve t"roug"put. Loo) in to 2MC4s dis) storage array % #"i$e e&pensive, it appears to be e&treme$y fast, I4ve "eard (but not verified t"at it "as improved performance in some cases by up to 30I !:. S SSION S TTINGS) In t,e session. t,ere is onl& so muc, tunin+ &ou can do) <a$ancing t"e t"roug"put is important % by turning on CCo$$ect 'erformance StatisticsC you can get a good fee$ for #"at needs to be set in t"e session % or #"at needs to be c"anged in t"e database. 6ead t"e performance section carefu$$y in t"e Informatica manua$s. <asica$$y #"at you s"ou$d try to ac"ieve is: 7'TIM.L 62.D, 7'TIMI.L T167*51'*T, 7'TIM.L @6IT2. 7ver%tuning one of t"ese t"ree pieces can resu$t in u$timate$y s$o#ing do#n your session. ,or e&amp$e: your #rite t"roug"put is governed by your read and transformation speed, $i)e#ise, your read t"roug"put is governed by your transformation and #rite speed. T"e best met"od to tune a prob$ematic map, is to brea) it in to components for testing: ! 6ead T"roug"put, tune for t"e reader, see #"at t"e settings are, send t"e #rite output to a f$at fi$e for $ess contention % C,ec$ t,e :T,rottleReader: settin+ (#"ic" is not configured by defau$t , increase t"e Defau$t <uffer Si+e by a factor of :/) eac" s"ot % ignore t"e #arning above !(8). If t"e 6eader sti$$ appears to increase during t"e session, t"en stabi$i+e (after a fe# t"ousand ro#s , t"en try increasing t"e S"ared Session Memory from !(M< to (/M<. If t"e reader sti$$ stabi$i+es, t"en you "ave a s$o# source, s$o# $oo)ups, or your C.C12 directory is not on interna$ dis). If t"e reader4s t"roug"put continues to c$imb above #"ere it stabi$i+ed, ma)e note of t"e session settings. C"ec) t"e 'erformance Statistics to ma)e sure t"e #riter t"roug"put is ;7T t"e bott$enec) % you are attempting to tune t"e reader "ere, and don4t #ant t"e #riter t"reads to s$o# you do#n. C"ange t"e map target bac) to t"e database targets % run t"e session again. T"is time, ma)e note of "o# muc" t"e reader s$o#s do#n, it4s optima$ performance #as reac"ed #it" a f$at fi$e(s . T"is time % s$o# targets are t"e cause. NOT 9 if &our reader session to flat file ;ust doesn6t ever :+et fast:. t,en &ou6ve +ot some basic ma- tunin+ to do) Try to merge e&pression ob>ects, set your $oo)ups to unconnected (for re%use if possib$e , c"ec) your Inde& and Data cac"e settings if you "ave aggregation, or $oo)ups being performed. 2tc... If you "ave a s$o# #riter, c"ange t"e map to a sing$e target tab$e at a time % see #"ic" target is causing t"e Cs$o#nessC and tune it. Ma)e copies of t"e origina$ map, and brea) do#n t"e copies. 7nce t"e Cs$o#erC of t"e ; targets is discovered, ta$) to your D<. about partitioning t"e tab$e, updating statistics, removing inde&es during $oad, etc... T"ere are many database t"ings you can do "ere. !B. Remove all ot,er :a--lications: on t,e 2MServer) 2&cept for t"e database / staging database or Data @are"ouse itse$f. 'MServer p$ays #e$$ #it" 6D<MS (re$ationa$ database management system % but doesn4t p$ay #e$$ #it" app$ication

servers, particu$ar$y H.J. Jirtua$ Mac"ines, @eb Servers, Security Servers, app$ication, and 6eport servers. .$$ of t"ese items s"ou$d be bro)en out to ot"er mac"ines. T"is is critica$ to improving performance on t"e 'MServer mac"ine. &ac" To Top INFORMATI A INT!RM!"IAT! T#NIN$ $#I"!%IN!& The following num#ered items are for intermediate level tuning. 4fter going through all the pieces a#ove, and still having trou#le, these are some things to loo" for. These are items within a map which ma"e a difference in performance )e've done e(tensive performance testing of Informatica to #e a#le to show these affects!. ,eep in mind - at this level, the performance isn't affected unless there are more than 6 1illion rows average si*e3 $.7 8I8 of data!. 499 items are Informatica 14. items, and Informatica O#:ects - none are outside the map. 4lso remem#er, this applies to .ower1art'.ower;enter <.7(, <.=(, ' 6.7(, 6.=(! - other versions have NOT #een tested. The order of these items is not relevant to speed. -ach one has it's own impact on the overall performance. 4gain, throughput is also gauged #y the num#er of o#:ects constructed within a map'maplet. 0ometimes it's #etter to sacrifice a little reada#ility, for a little speed. It's the old paradigm, weighing reada#ility and maintaina#ility true modularity! against raw speed. 1a"e sure the client agrees with the approach, or that the data sets are large enough to warrant this type of tuning. &- 4)4>-3 The following tuning tips range from "minor" cleanup to "last resort" types of things - only when data sets get very large, should these items #e addressed, otherwise, start with the &40I; tuning list a#ove, then wor" your way in to these suggestions. To understand the intermediate section, you'll need to review the memory usage diagrams (also available on this web site). !. Filter '-ressions / tr& to evaluate t,em in a -ort e'-ression) Try to create t"e fi$ter (true/fa$se ans#er inside a port e&pression upstream. Comp$e& fi$ter e&pressions s$o# do#n t"e mapping. .gain, e&pressions/conditions operate fastest in an 2&pression 7b>ect #it" an output port for t"e resu$t. Turns out % t"e $onger t"e e&pression, or t"e more comp$e& % t"e more severe t"e speed degradation. '$ace t"e actua$ e&pression (comp$e& or not in an 2K'62SSI7; 7<H2CT upstream from t"e fi$ter. Compute a sing$e numerica$ f$ag: ! for true, 0 for fa$se as an output port. 'ump t"is in to t"e fi$ter % you s"ou$d see t"e ma&imum performance abi$ity #it" t"is configuration. (. Remove all :D FAU!T: value e'-ressions 0,ere -ossible) 1aving a defau$t va$ue % even t"e C26676(&&& C command s$o#s do#n t"e session. It causes an unnecessary eva$uation of va$ues for every data e$ement in t"e map. T"e on$y time you #ant to use CD2,.*LT va$ue is #"en you "ave to provide a defau$t va$ue for a specific port. T"ere is anot"er met"od: p$acing a variab$e #it" an II,(&&&&, D2,.*LT J.L*2, &&&& condition #it"in an e&pression. T"is #i$$ a$#ays be faster (if assigned to an output port t"an a defau$t va$ue. -. 7ariable 2orts are :slo0er: t,an Out-ut '-ressions) @"enever possib$e, use output e&pressions instead of variab$e ports. T"e variab$es are good for

/.

7.

=.

?.

@.

A.

Cstatic % and state drivenC but do s$o# do#n t"e processing time % as t"ey are a$$ocated/rea$$ocated eac" pass of a ro# t"roug" t"e e&pression ob>ect. Datat&-e conversion / -erform it in a -ort e'-ression) Simp$y mapping a string to an integer, or an integer to a string #i$$ perform t"e conversion, "o#ever it #i$$ be s$o#er t"an creating an output port #it" an e&pression $i)e: toLinteger(&&&& and mapping an integer to an integer. It4s because 'MServer is $eft to decide if t"e conversion can be done mid%stream #"ic" seems to s$o# t"ings do#n. Unused 2orts) Surprising$y, unused output ports "ave no affect on performance. T"is is a good t"ing. 1o#ever in genera$ it is good practice to remove any unused ports in t"e mapping, inc$uding variab$es. *nfortunate$y % t"ere is no C?uic)C met"od for identifying unused ports. Strin+ Functions) String functions definite$y "ave an impact on performance. 'articu$ar$y t"ose t"at c"ange t"e $engt" of a string (substring, $trim, rtrim, etc.. . T"ese functions s$o# t"e map do#n considerab$y, t"e operations be"ind eac" string function are e&pensive (de%a$$ocate, and re%a$$ocate memory #it"in a 62.D26 b$oc) in t"e session . String functions are a necessary and important part of 2TL, #e do not recommend removing t"eir use comp$ete$y, on$y try to $imit t"em to necessary operations. 7ne of t"e #ays #e advocate tuning t"ese, is to use Cvarc"ar/varc"ar(C data types in your database sources, or to use de$imited strings in source f$at fi$es (as muc" as possib$e . T"is #i$$ "e$p reduce t"e need for CtrimmingC input. If your sources are in a database, perform t"e LT6IM/6T6IM functions on t"e data coming in from a database SAL statement, t"is #i$$ be muc" faster t"an operationa$$y performing it mid%stream. IIF Conditionals are costl&) @"en possib$e % arrange t"e $ogic to minimi+e t"e use of II, conditiona$s. T"is is not particu$ar to Informatica, it is cost$y in .;= programming $anguage. It introduces CdecisionsC #it"in t"e too$, it a$so introduces mu$tip$e code pat"s across t"e $ogic (t"us increasing comp$e&ity . T"erefore % #"en possib$e, avoid uti$i+ing an II, conditiona$ % again, t"e on$y possibi$ity "ere mig"t be (for e&amp$e an 76.CL2 D2C7D2 function app$ied to a SAL source. Se1uence Generators slo0 do0n ma--in+s) *nfortunate$y t"ere is no CfastC and easy #ay to create se?uence generators. T"e cost is not t"at "ig" for using a se?uence generator inside of Informatica, particu$ar$y if you are cac"ing va$ues (cac"e at around (000 % seems to be t"e suite spot. 1o#ever % if at a$$ avoidab$e, t"is is one CcardC up a s$eve t"at can be p$ayed. If you don4t abso$ute$y need t"e se?uence number in t"e map for ca$cu$ation reasons, and you are uti$i+ing 7rac$e, t"en $et SALELoader create t"e se?uence generator for a$$ Insert 6o#s. If you4re using Sybase, don4t specify t"e Identity co$umn as a target % $et t"e Sybase Server generate t"e co$umn. .$so % try to avoid Creusab$eC se?uence generators % t"ey tend to s$o# t"e session do#n furt"er, even #it" cac"ed va$ues. Test '-ressions slo0 do0n sessions) 2&pressions suc" as: ISLS'.C2S tend s$o# do#n t"e mappings, t"is is a data va$idation e&pression #"ic" "as to run t"roug" t"e entire string to determine if it is spaces, muc" t"e same as ISL;*M<26 "as to va$idate an entire string. T"ese e&pressions (if at a$$ avoidab$e s"ou$d be removed in cases #"ere it is not necessary to CtestC prior to

conversion. <e a#are "o#ever, t"at direct conversion #it"out testing (conversion of an inva$id va$ue #i$$ )i$$ t"e transformation. If you abso$ute$y need a test e&pression for numerics, try t"is: II,(Mfie$dN E ! NO 0,Mfie$dN,;*LL preferab$y you don4t care if it4s +ero. .n a$p"a in t"is e&pression s"ou$d return a ;*LL to t"e computation. =es % t"e II, condition is s$ig"t$y faster t"an t"e ISL;*M<26 % because ISL;*M<26 parses t"e entire string, #"ere t"e mu$tip$ication operator is t"e actua$ speed gain. 6%. Reduce Number of OB< TS in a ma-. ,re?uent$y, t"e idea of t"ese too$s is to ma)e t"e Cdata trans$ation mapC as easy as possib$e. .$$ to often, t"at means creating CanC (! e&pression for eac" t"roug"put/trans$ation (ta)ing it to an e&treme of course . 2ac" ob>ect adds computationa$ over"ead to t"e session and timings may suffer. Sometimes if performance is an issue / goa$, you can integrate severa$ e&pressions in to one e&pression ob>ect, t"us reducing t"e Cob>ectC over"ead. In doing so % you cou$d speed up t"e map. 66. U-date '-ressions / Session set to U-date lse Insert) If you "ave t"is s#itc" turned on % it #i$$ definite$y s$o# t"e session do#n % Informatica performs ( operations for eac" ro#: update (#/'P , t"en if it returns a Q267 ro#s updated, performs an insert. T"e #ay to speed t"is up is to C)no#C a"ead of time if you need to issue a DDL*'D.T2 or DDLI;S26T inside t"e mapping, t"en te$$ t"e update strategy #"at to do. .fter #"ic" you can c"ange t"e session setting to: I;S26T and *'D.T2 .S *'D.T2 or *'D.T2 .S I;S26T. 6$. Multi-le Tar+ets are too slo0) ,re?uent$y maps are generated #it" mu$tip$e targets, and sometimes mu$tip$e sources. T"is (despite first appearances can rea$$y burn up time. If t"e arc"itecture permits c"ange, and t"e users support re% #or), t"en try to c"ange t"e arc"itecture %N ! map per target is t"e genera$ ru$e of t"umb. 7nce reac"ing one map per target, t"e tuning get4s easier. Sometimes it "e$ps to reduce it to ! source and ! target per map. <ut % if t"e arc"itecture a$$o#s more modu$ari+ation ! map per target usua$$y does t"e tric). 5oing furt"er, you cou$d brea) it up: ! map per target per operation (suc" as insert vs update . In doing t"is, it #i$$ provide a fe# more cards to t"e dec) #it" #"ic" you can CtuneC t"e session, as #e$$ as t"e target tab$e itse$f. 5oing t"is route a$so introduces para$$e$ operations. ,or furt"er info on t"is topic, see my arc"itecture presentations on Staging Tab$es, and -rd norma$ form arc"itecture (Corporate Data @are"ouse S$ides . !-. Slo0 Sources / Flat Files) If you4ve got s$o# sources, and t"ese sources are f$at fi$es, you can $oo) at some of t"e fo$$o#ing possibi$ities. If t"e sources reside on a different mac"ine, and you4ve opened a named pipe to get t"em across t"e net#or) % t"en you4ve opened (potentia$$y a can of #orms. =ou4ve introduced t"e net#or) speed as a variab$e on t"e speed of t"e f$at fi$e source. Try to compress t"e source fi$e, ,T' '*T it on t"e $oca$ mac"ine ($oca$ to 'MServer , decompress it, t"en uti$i+e it as a source. If you4re reac"ing across t"e net#or) to a re$ationa$ tab$e % and t"e session is pu$$ing many many ro#s (over !0,000 t"en t"e source system itse$f may be s$o#. =ou may be better off using a source system e&tract program to dump it to fi$e first, t"en fo$$o# t"e above instructions. 1o#ever, t"ere is somet"ing your S.4s and ;et#or) 7ps fo$)s cou$d do (if necessary % t"is is covered in detai$ in t"e advanced section. T"ey cou$d

bac)bone t"e t#o servers toget"er #it" a dedicated net#or) $ine (no "ubs, routers, or ot"er items in bet#een t"e t#o mac"ines . .t t"e very $east, t"ey cou$d put t"e t#o mac"ines on t"e same sub%net. ;o#, if your fi$e is $oca$ to 'MServer but is sti$$ s$o#, e&amine t"e $ocation of t"e fi$e (#"ic" device is it on . If it4s not on an I;T26;.L DISP t"en it #i$$ be s$o#er t"an if it #ere on an interna$ dis) (C drive for you fo$)s on ;T . T"is doesn4t mean a uni& fi$e LI;P e&ists $oca$$y, and t"e fi$e is remote % it means t"e actua$ fi$e is $oca$. !/. Too Man& A++re+ators) If your map "as more t"an ! aggregator, c"ances are t"e session #i$$ run very very s$o#$y % un$ess t"e C.C12 directory is e&treme$y fast, and your drive see)/access times are very "ig". 2ven sti$$, p$acing aggregators end%to%end in mappings #i$$ s$o# t"e session do#n by factors of at $east (. T"is is because of a$$ t"e I/7 activity being a bott$enec) in Informatica. @"at needs to be )no#n "ere is t"at Informatica4s products: 'M / 'C up t"roug" /.B& are ;7T bui$t for para$$e$ processing. In ot"er #ords, t"e interna$ core doesn4t put t"e aggregators on t"reads, nor does it put t"e I/7 on t"reads % t"erefore being a sing$e strung process it becomes easy for a part of t"e session/map to become a Cb$oc)edC process by I/7 factors. ,or I/7 contention and resource monitoring, p$ease see t"e database/data#are"ouse tuning guide. !3. Ma-lets containin+ A++re+ators) Map$ets are a good source for rep$icating data $ogic. <ut >ust because an aggregator is in a map$et doesn4t mean it #on4t affect t"e mapping. T"e reason map$ets don4t affect speed of t"e mappings, is t"ey are treated as a part of t"e mapping once t"e session starts % in ot"er #ords, if you "ave an aggregator in a map$et, fo$$o#ed by anot"er aggregator in a mapping you #i$$ sti$$ "ave t"e prob$em mentioned above in R!/. 6educe t"e number of aggregators in t"e entire mapping (inc$uded map$ets to ! if possib$e. If necessary, sp$it t"e map up in to severa$ different maps, use intermediate tab$es in t"e database if re?uired to ac"ieve processing goa$s. !:. liminate :too man& loo$u-s:) @"at "appens and #"yG @e$$ % #it" too many $oo)ups, your cac"e is eaten in memory % particu$ar$y on t"e !.: / /.: products. T"e end resu$t is t"ere is no memory $eft for t"e sessions to run in. T"e DTM reader/#riter/transformer t"reads are not $eft #it" enoug" memory to be ab$e to run efficient$y. 'C !.B, 'M /.B so$ve some of t"ese prob$ems by cac"ing some of t"ese $oo)ups out to dis) #"en t"e cac"e is fu$$. <ut you sti$$ end up #it" contention % in t"is case, #it" too many $oo)ups, you4re trading in Memory Contention for Dis) Contention. T"e memory contention mig"t be #orse t"an t"e dis) contention, because t"e system 7S end4s up t"ras"ing (s#apping in and out of T2M'/S@.' dis) space #it" sma$$ b$oc) si+es to try to $ocate CfindC your $oo)up ro#, and as t"e ro# goes from $oo)up to $oo)up, t"e s#apping / t"ras"ing get4s #orse. !B. !oo$u-s = A++re+ators Fi+,t) T"e $oo)ups and t"e aggregators fig"t for memory space as discussed above. 2ac" re?uires Inde& Cac"e, and Data Cac"e and t"ey Cs"areC t"e same 12.' segments inside t"e core. See Memory Layout document for more information. 'articu$ar$y in t"e /.: / !.: products and prior % t"ese memory areas become critica$, and #"en dea$ing #it" many many ro#s % t"e session is a$most certain to cause t"e server to Ct"ras"C memory in and out of t"e 7S S#ap space. If possib$e, separate t"e maps % perform t"e $oo)ups in t"e

first section of t"e maps, position t"e data in an intermediate target tab$e % t"en a second map reads t"e target tab$e and performs t"e aggregation (a$so provides t"e option for a group by to be done #it"in t"e database ... .not"er speed improvement... INFORMATI A A"'AN !" T#NIN$ $#I"!%IN!& The following num#ered items are for advanced level tuning. .lease proceed cautiously, one step at a time. 2o not attempt to follow these guidelines if you haven't already made it through all the #asic and intermediate guidelines first. These guidelines may re+uire a level of e(pertise which involves 0ystem 4dministrators, 2ata#ase 4dministrators, and Networ" Operations fol"s. .lease #e patient. The most important aspect of advanced tuning is to #e a#le to pinpoint specific #ottlenec"s, then have the funding to address them. 4s usual - these advanced tuning guidelines come last, and are pointed at suggestions for the system. There are other advanced tuning guidelines availa#le for 2ata )arehousing Tuning. Bou can refer to those for +uestions surrounding your hardware ' software resources. 6. Brea$ t,e ma--in+s out. ! per target. If necessary, ! per source per target. @"y does t"is #or)G @e$$ % e$iminating mu$tip$e targets in a sing$e mapping can great$y increase speed... <asica$$y it4s $i)e t"is: one session per map/target. 2ac" session estab$is"es it4s o#n database connection. <ecause of t"e uni?ue database connection, t"e D<MS server can no# "and$e t"e insert/update/de$ete re?uests in para$$e$ against mu$tip$e targets. It a$so "e$ps to a$$o# eac" session to be specified for it4s intended purpose (no $onger mi&ing a data driven session #it" I;S26TS on$y to a sing$e target . 2ac" session can t"en be p$aced in to a batc" mar)ed CC7;C*662;TC if preferences a$$o#. 7nce t"is is done, para$$e$ism of mappings and sessions become obvious. . study of para$$e$ processing "as s"o#n again and again, t"at t"e operations can be comp$eted sometimes in "a$f t"e time of t"eir origina$ counterparts mere$y by streaming t"em at t"e same time. @it" mu$tip$e targets in t"e same mapping, you4re te$$ing a sing$e database connection to "and$e mu$tip$y diverse database statements % sometimes "itting t"is target, ot"er times "itting t"at target. T"in) % in t"is situation it4s e&treme$y difficu$t for Informatica (or any ot"er too$ for t"at matter to bui$d <*LP operations... even t"oug" Cbu$)C is specified in t"e session. 6emember t"at C<*LPC means t"is is your preference, and t"at t"e too$ #i$$ revert to ;76M.L $oad if it can4t provide a <*LP operation on a series of consecutive ro#s. 7bvious$y, data driven t"en forces t"e too$ do#n severa$ ot"er $ayers of interna$ code before t"e data actua$$y can reac" t"e database. $. Develo- ma-lets for com-le' business lo+ic. It appears as if Map$ets do ;7T cause any performance "indrance by t"emse$ves. 2&tensive use of map$ets means better, more manageab$e business $ogic. T"e map$ets a$$o# you to better brea) t"e mappings out. C. 4ee- t,e ma--in+s as sim-le as -ossible) <ury comp$e& $ogic (if you must in to a map$et. If you can avoid comp$e& $ogic a$$ toget"er % t"en t"at #ou$d be t"e )ey. T"e o$d ru$e of t"umb app$ies "ere (common sense t"e straig"ter t"e pat"

<.

7.

=.

?.

@.

bet#een t#o points, t"e s"orter t"e distance... Trans$ated as: t"e s"orter t"e distance bet#een t"e source ?ua$ifier and t"e target % t"e faster t"e data $oads. Remember t,e TIMING is affected b& R AD R%TRANSFORM R%3RIT R t,reads) @it" comp$e& mappings, don4t forget t"at eac" 2L2M2;T (fie$d must be #eig"ed % in t"is $ig"t a firm understanding of "o# to read performance statistics generated by Informatica becomes important. In ot"er #ords % if t"e reader is s$o#, t"en t"e rest of t"e t"reads suffer, if t"e #riter is s$o#, same effect. . pipe is on$y as big as it4s sma$$est diameter.... . c"ain is on$y as strong as it4s #ea)est $in). Sorry for t"e metap"ors, but it s"ou$d ma)e sense. C,an+e Net0or$ 2ac$et Si"e #for S&base. MS/S8! Server = Oracle users*) Ma&imum net#or) pac)et si+e is a Database @ide Setting, #"ic" is usua$$y defau$ted at 3!( bytes or !0(/ bytes. Setting t"e ma&imum database pac)et si+e doesn4t necessari$y "urt any of t"e ot"er users, it does "o#ever a$$o# t"e Informatica database setting to ma)e use of t"e $arger pac)et si+es % t"us transfer more data in a sing$e pac)et faster. T"e typica$ 4best4 settings are bet#een !0) and (0). In 7rac$e: you4$$ need to ad>ust t"e Listener.76. and T;S;ames.76. fi$es. Inc$ude t"e parameters: SD*, and TD*. SD* O Service Layer Data <uffer Si+e (in bytes , TD* O Transport Layer Data <uffer Si+e (in bytes . T"e SD* and TD* s"ou$d be set e?ua$$y. See t"e Informatica ,.A page for more information on setting t"ese up. C,an+e to I2C Database Connection for !ocal Oracle Database. If 'MServer and 7rac$e are running on t"e same server, use an I'C connection instead of a TC'/I' connection. C"ange t"e protoco$ in t"e T;S;ames.76. and Listener.76. fi$es, and restart t"e $istener on t"e server. <e carefu$ % t"is protoco$ can on$y be used $oca$$y, "o#ever t"e speed increases from using Inter 'rocess Communication can be bet#een (& and :&. I'C is uti$i+ed by 7rac$e, but is defined as a *ni& System 3 standard specification. =ou can find more information on I'C by reading about in in *ni& System 3 manua$s. C,an+e Database 2riorities for t,e 2MServer Database User) 'rioriti+ing t"e database $ogin t"at any of t"e connections use (setup in Server Manager can assist in c"anging t"e priority given to t"e Informatica e&ecuting tas)s. T"ese tas)s #"en $ogged in to t"e database t"en can over%ride ot"ers. Si+ing memory for t"ese tas)s (in s"ared g$oba$ areas, and server settings must be done if priorities are to be c"anged. If <C' or SALELoader or some ot"er bu$)%$oad faci$ity is uti$i+ed, t"ese priorities must a$so be set. T"is can great$y improve performance. .gain, it4s on$y suggested as a $ast resort met"od, and doesn4t substitute for tuning t"e database, or t"e mapping processes. It s"ou$d on$y be uti$i+ed #"en a$$ ot"er met"ods "ave been e&"austed (tuned . Peep in mind t"at t"is s"ou$d on$y be re$egated to t"e production mac"ines, and on$y in certain instances #"ere t"e Load cyc$e t"at Informatica is uti$i+ing is ;7T impeding ot"er users. C,an+e t,e Uni' User 2riorit&) In order to gain speed, t"e Informatica *ni& *ser must be given a "ig"er priority. T"e *ni& S. s"ou$d understand #"at it ta)es to ran) t"e *ni& $ogins, and grant priorities to particu$ar tas)s. 7r % simp$y

"ave t"e pmserver e&ecuted under a super user (S* command, t"is #i$$ ta)e care of reprioriti+ing Informatica4s core process. T"is s"ou$d on$y be used as a $ast resort % once a$$ ot"er tuning avenues "ave been e&"austed, or if you "ave a dedicated *ni& mac"ine on #"ic" Informatica is running. A. Tr& not to load across t,e net0or$) If at a$$ possib$e, try to co%$ocate 'MServer e&ecutab$e #it" a $oca$ database. ;ot "aving t"e database $oca$ means: ! t"e repository is across t"e net#or) (s$o# , ( t"e sources / targets are across t"e net#or), a$so potentia$$y s$o#. If you "ave to $oad across t"e net#or), at $east try to $oca$i+e t"e repository on a database instance on t"e same mac"ine as t"e server. T"e ot"er t"ing is: try to co%$ocate t"e t#o mac"ines (pmserver and Target database server on t"e same sub%net, even t"e same "ub if possib$e. T"is e$iminates unnecessary routing of pac)ets a$$ over t"e net#or). 1aving a $oca$i+ed database a$so a$$o#s you to setup a target tab$e $oca$$y % #"ic" you can t"en CdumpC fo$$o#ing a $oad, ftp to t"e target server, and bu$)%$oad in to t"e target tab$e. T"is #or)s e&treme$y #e$$ for situations #"ere append or comp$ete refres" is ta)ing p$ace. 6%. Set Session S,ared Memor& Settin+s bet0een 5(MB and (>MB) Typica$$y I4ve seen fo$)s attempt to assign a session $arge "eaps of memory (in "opes it #i$$ increase speed . .$$ it tends to do is s$o# do#n t"e processing. See t"e memory $ayout document for furt"er information on "o# t"is affects Informatica and it4s memory "and$ing, and #"y simp$y giving it more memory doesn4t necessari$y provide speed. 66. Set S,ared Buffer Bloc$ Si"e around 5(?$) .gain, somet"ing t"at4s covered in t"e memory $ayout document. T"is seems to be a Cs#eet spotC for "and$ing b$oc)s of ro#s in side t"e Informatica process. 6$. M MOR@ S TTINGS9 T"e settings above are for an average configured mac"ine, any mac"ine #it" $ess t"an !0 5I54s of 6.M s"ou$d abide by t"e above settings. If you4ve got !(9 5I54s, and you4re running on$y ! to - sessions concurrent$y, go a"ead and specify t"e Session S"ared Memory si+e at ! or ( 5I54s. Peep in mind t"at t"e S"ared <uffer <$oc) Si+e s"ou$d be set in re$ative si+e to t"e S"ared Memory Setting. If you set a S"ared Mem to !(/ M<, set t"e <uffer <$oc) Si+e to !(M<, )eep t"em in re$ative si+es. If you don4t % t"e resu$t #i$$ be more memory C"and$ingC going on in t"e bac)ground, so $ess actua$ #or) #i$$ be done by Informatica. .$so % t"is "o$ds true for t"e simp$er mappings. T"e more comp$e& t"e mapping, t"e $ess $i)e$y you are to see a gain by increasing eit"er buffer b$oc) si+e, or s"ared memory settings % because Informatica potentia$$y "as to process ce$$s (ports/fie$ds/va$ues inside of a "uge memory b$oc)F t"us resu$ting in a potentia$ re%a$$ocation of t"e #"o$e b$oc). 6C. Use SNA2SAOTS 0it, &our Database. If you "ave dedicated $ines, DS-/T!, etc... bet#een servers, use a snaps"ot or .dvanced 6ep$ication to get data out of t"e source systems and in to a staging tab$e (dup$icate of t"e source . T"en sc"edu$e t"e snaps"ot before running processes. T"e 6D<MS servers are bui$t for t"is )ind of data transfer % and "ave optimi+ations bui$t in to t"e core to transfer data incrementa$$y, or as a #"o$e refres". It may be to your advantage. 'articu$ar$y if your sources contain !- Mi$$ion 9 ro#s. '$ace Informatica processes to read from t"e snaps"ot, at t"at point you can inde& any #ay you $i)e

% and increase t"e t"roug"put speed #it"out affecting t"e source systems. =es % Snaps"ots on$y #or) if your sources are "omogeneous to your targets (on t"e same type of system . 6<. INCR AS TA DIS4 S2 D) 7ne of t"e most common fa$$acies is t"at a Data @are"ouse 6D<MS needs on$y ( contro$$ers, and !- dis)s to survive. T"is is fine if you4re running $ess t"an 3 Mi$$ion 6o#s tota$ t"roug" your system, or your $oad #indo# e&ceeds 3 "ours. I recommend at $east / to : contro$$ers, and at $east 30 dis)s % set on a 6aid 09! array, spinning at B(00 6'M or better. If it4s necessary, p$un) t"e money do#n and go get an 2MC device. =ou s"ou$d see a significant increase in performance after insta$$ing or upgrading to suc" a configuration. 67. S0itc, to Raid BC5) 6aid Leve$ 3 is great for redundancy, "orrib$e for Data @are"ouse performance, particu$ar$y on bu$) $oads. 6aid 09! is t"e preferred met"od for data #are"ouses out t"ere, and most fo$)s find t"at t"e rep$ication is >ust as safe as a 6aid 3, particu$ar$y since t"e 1ard#are is no# near$y a$$ "ot% s#appab$e, and t"e soft#are to manage t"is "as improved great$y. 6=. U-+rade &our Aard0are) 7n your production bo&, if you #ant 5igabytes per second t"roug"put, or you #ant to create !0 inde&es in / "ours on -/ mi$$ion ro#s, t"en add C'* po#er, 6.M, and t"e Dis) modifications discussed above. . / C'* mac"ine >ust #on4t cut t"e mustard today for t"is si+e of operation. I recommend a minimum of 8 C'*4s as a starter bo&, and increase to !( as necessary. .gain, t"is is for "uge Data @are"ousing systems % 5I54s per "our/M< per 1our. . bo& #it" / C'*4s is great for deve$opment, or for sma$$er systems (tota$$ing $ess t"an 3 Mi$$ion ro#s in t"e #are"ouse . 1o#ever, )eep in mind t"at <us Speed is a$so a "uge factor "ere. I4ve "eard of a / C'* Dec%.$p"a system outperforming a : C'* system... So #"at4s t"e bottom $ineG Dis) 6'M4s, <us Speed, 6.M, and R of C'*4s. I4d say potentia$$y in t"at order. <ot" 7rac$e and Sybase perform e&treme$y #e$$ #"en given :9 C'*4s and 8 or !( 5I54s 6.M setup on an 2MC device at B(00 6'M #it" minimum of / contro$$ers.

&ortin( ) performance issues


Bou can improve 4ggregator transformation performance #y using the 0orted Input option. )hen the 0orted Input option is selected, the Informatica 0erver assumes all data is sorted #y group. 4s the Informatica 0erver reads rows for a group, it performs aggregate calculations as it reads. )hen necessary, it stores group information in memory. To use the 0orted Input option, you must pass sorted data to the 4ggregator transformation. Bou can gain added performance with sorted ports when you partition the session. )hen 0orted Input is not selected, the Informatica 0erver performs aggregate calculations as it reads. /owever, since data is not sorted, the Informatica 0erver stores data for each group until it reads the entire source to ensure all aggregate calculations are accurate.

Dor e(ample, one 4ggregator has the 0TO>-EI2 and IT-1 8roup &y ports, with the 0orted Input option selected. )hen you pass the following data through the 4ggregator, the Informatica 0erver performs an aggregation for the three records in the 6%6'#attery group as soon as it finds the new group, $%6'#attery3 &TOR!*I" 6%6 6%6 6%6 $%6 $%6 IT!M F#atteryG F#atteryG F#atteryG F#atteryG F#atteryG C 6 $ < 6 +T, $.AA C.6A $.7A 6.7A 6.AA -RI !

If you use the 0orted Input option and do not presort data correctly, the session fails. &orted Input onditions 2o not use the 0orted Input option if any of the following conditions are true3 The aggregate e(pression uses nested aggregate functions. The session uses incremental aggregation. Input data is data-driven. Bou choose to treat source data as data driven in the session properties, or the Hpdate 0trategy transformation appears #efore the 4ggregator transformation in the mapping. The mapping is upgraded from .ower1art C.7. If you use the 0orted Input option under these circumstances, the Informatica 0erver reverts to default aggregate #ehavior, reading all values #efore performing aggregate calculations. -re-&ortin( "ata To use the 0orted Input option, you pass sorted data through the 4ggregator. 2ata must #e sorted as follows3 &y the 4ggregator group #y ports, in the order they appear in the 4ggregator transformation. Hsing the same sort order configured for the session. If data is not in strict ascending or descending order #ased on the session sort order, the Informatica 0erver fails the session. Dor e(ample, if you configure a session to use a Drench sort order, data passing into the 4ggregator transformation must #e sorted using the Drench sort order. If the session uses file sources, you can use an e(ternal utility to sort file data #efore starting the session. If the session uses relational sources, you can use the Num#er of 0orted .orts option in the 0ource Iualifier transformation to sort

group #y columns in the source data#ase. 8roup &y columns must #e in the exact same order in both the Aggregator and Source Qualifier transformations . Dor details on sorting data in the 0ource Iualifier, see 0orted .orts. Inde(es J 1a"e sure inde(es are in place and ta#les have #een analy*ed 1ight #e a#le to use inde( hints in source +ualifier

You might also like