Professional Documents
Culture Documents
ResourceId=57
http://infaexplore.blogspot.in/p/informatica-qa.html
Tuning
Tuning a PowerCenter 8 ETL environment is not that straightforward. A chain is only as strong as the
weakest link. There are four crucial domains that require attention: system, network, database and the
PowerCenter 8 installation itself. It goes without saying that without a well performing infrastructure
the tuning of the PowerCenter 8 environment will not make much of a difference.
As the first three domains are located in the realms of administrators, this article will only briefly touch
these subjects and will mainly focus on the items available to developers within PowerCenter 8.
Tuning is an iterative process: at each iteration the largest bottleneck is removed, gradually improving
performance. Bottlenecks can occur on the system, on the database (either source or target), or within
the mapping or session ran by the Integration Service. To identify bottlenecks, run test sessions,
monitor the system usage and gather advanced performance statistics while running. Examine the
session log in detail as it provides valuable information concerning session performance. From the
perspective of a developer, search for performance problems in the following order:
source / target
mapping
session
system
Bear in mind that major changes were done in the underlying architecture between PowerCenter
version 7 and PowerCenter version 8. If possible use the latest available version of PowerCenter
(currently 8.6).
If tuning the mapping and session still proves to be inadequate, the underlying system will need to be
examined closer. This extended examination needs to be done in close collaboration with the system
administrators and database administrators (DBA). They have several options to improve performance
without invoking hardware changes. Examples are distributing database files over different disks,
improving network bandwidth and lightening the server workload by moving other applications.
However if none of this helps, only hardware upgrades will bring redemption to your performance
problems.
Session logs
The PowerCenter 8 session log provides very detailed information that can be used to establish a
baseline and will identify potential problems.
Very useful for the developer are the detailed thread statistics that will help benchmarking your
actions. The thread statistics will show if the bottlenecks occur while transforming data or while
reading/writing. Always focus attention on the thread with the highest busy percentage first. For every
thread, detailed information on the run and idle time are presented. The busy percentage is calculated:
(run time idle time) / (run time * 100).
Each session has a minimum of three threads:
reader thread
transformation thread
writer thread
An example:
*****
RUN
INFO
FOR
TGT
LOAD
ORDER
GROUP
[1],
CONCURRENT
SET
[1]
*****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL] has
completed: Total Run Time = [31.984171] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL]
has completed: Total Run Time = [0.624996] secs, Total Idle Time = [0.453115] secs, Busy Percentage =
[27.501083].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has completed: Total
Run Time = [476.668825] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
In this particular case it is obvious that the database can be considered as the main bottleneck. Both
reading and writing use most of the execution time. The actual transformations only use a very small
amount of time. If a reader or writer thread is 100% busy, consider partitioning the session. This will
allow the mapping to open several connections to the database, each reading/writing data from/to a
partition thus improving data read/write speed.
Severity
INFO
Timestamp
23/Dec/2008
Node
09:02:22
Thread
Message
Code
node_etl02
MANAGER
Message
PETL_24031
*****
RUN
INFO
FOR
TGT
LOAD
ORDER
GROUP
[1],
CONCURRENT
SET
[1]
*****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has completed.
The total run time was insufficient for any meaningful statistics.
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has
completed: Total Run Time = [22.765478] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has completed: Total
Run Time = [30.937302] secs, Total Idle Time = [20.345600] secs, Busy Percentage = [34.23602355].
In the example above, the transformation thread poses the largest bottleneck and needs to be dealt
with first. The reader thread finished so quickly no meaningful statistics were possible. The writer
thread spends the majority of time in the idle state, waiting for data to emerge from the transformation
thread. Perhaps an unsorted aggregator is used, causing the Integration Service to sort all data before
releasing any aggregated record ?
The number of threads can increase if the sessions will read/write to multiple targets, if sessions have
multiple execution paths, if partitioning is used
Establishing a baseline
To be able to benchmark the increase or decrease of performance following an action it is important to
establish a baseline to compare with. It is good practice to log in detail every iteration in the tuning
process. This log will enable the developer to clearly identify the actions that enhanced or decreased
performance and serve as later reference for future tuning efforts. Never assume that an action will
improve performance because the action is a best practice or worked before: always test and compare
with hard figures. The thread statistics are used to build this log.
This log file could look like this:
click to enlarge
Figure 1: An example log file
Whatever source you are trying to read, always try to limit the incoming data stream maximally. Place
filters as early as possible in the mapping, preferably in the source qualifier. This will ensure only data
needed by the Integration Services is picked up from the database and transported over the network.
If you suspect the performance of reading relational data is not optimal, replace the relational source
with a flat file source containing the same data. If there is a difference in performance, the path
towards the source database should be investigated more closely, such as execution plan of the query,
database performance, network performance, network package sizes,
When using homogeneous relational sources, use a single source qualifier with a user defined join
instead of a joiner transformation. This will force the join being executed on the database instead of
the PowerCenter 8 platform. If a joiner transformation is used instead, all data is first picked up from
the database server, then transported to the PowerCenter 8 platform, sorted and only as a last step
joined
by
the
Integration
server.
Consider pre-sorting the data in the database, this will make further sorting for later aggregators,
joiners, by the Integration Service unnecessary. Make sure the query executed on the database has a
favourable execution plan. Use the explain plan (Oracle) facility to verify the query's execution plan if
indexes are optimally used. Do not use synonyms or database links unless really needed as these will
slow down the data stream.
In general it is good practice to always generate keys for primary key and foreign key fields. If no key is
available or known a dummy key should be used. In the reference table an extra dummy record should
be inserted. This method will improve join performance when using homogenous joins. In general three
dummy rows should be included:
999997 Missing
When using heterogeneous sources there is no alternative but to use a joiner transformation. To ease
up matters for the Integration service, ensure that all relational sources are sorted and joined in
advance in the source qualifier. Flat file sources need to be sorted before joining, using a sorter
transformation. When joining the 2 sorted sources, check the sorted input property at the joiner
transformation. The sorted input option allows the joiner transformation to start passing records to
subsequent transformations as soon as the key value changes. Normal behaviour would be to hold
passing data until all data is sorted and processed in the joiner transformation.
By matching the session property Line Sequential buffer length to the size of exactly one record
overhead is minimized. If possible stage flat files in a staging table. Joining and filtering can then be
done in the database.
user defined commit: slowest , avoid using user defined commit when not really necessary
Remember to use a very large commit interval when using bulk loads with Oracle and Microsoft to
avoid unnecessary overhead. Dropping and recreating indexes can be done by using pre -and post
session tasks or by calling a stored procedure within the mapping.
Transformation Mechanics
Every transformation has its specifics related to performance. In the section below the most important
items are discussed.
A joiner transformation should be used to join heterogeneous data sources. Homogeneous sources
should always be joined in the database by using the user defined join in the source qualifier
transformation.
If not sorted at database level, always use a sorter transformation to sort the data before entering the
joiner transformation. Make sure the sorter transformation has sufficient cache to enable a 1-pass sort.
Not having sufficient cache will plummet performance. The data could be sorted in the joiner, but there
are three advantages of using the sorter transformation:
The use of sorted input enables the joiner transformation to start passing data to subsequent
transformations before all data was passed in the joiner transformation. Consequently, the
transformations following the joiner transformation start receiving data nearly immediately and
do not have to wait until all the data was sorted and joined in the joiner transformation. This
logic is only valid when the source can be sorted in the database: for example when joining
SQL-Server and Oracle. Both sources can be sorted in the database, making additional sorting
using sorters superfluous. When a sorter is needed, for example when joining Oracle and a flat
file, the sorter will have to wait until all data is read from the flat file before records to the
joiner transformation can be passed.
The sorting algorithm used in the sorter is faster then the algorithm used in joiners or
aggregators.
The use of sorted input in the joiner transformation, allows for a smaller cache size, leaving
more memory for other transformations or sessions. Again, when a flat file is used, a sorter will
be needed prior to the joiner transformation. Although the joiner transformation uses less
cache, the sorter cache will need to be sufficiently large to enable sorting all input records.
click to enlarge
Figure 2: Example of an unnecessary sort. The data should be sorted in advance at database level.
As outer joins are far more expensive than inner joins, try to avoid them as much as possible. The
master source should be designated as the source containing fewer rows then the detail source. Join as
early as possible in the pipeline as this limits the number of pipes and decreases the amount of data
being sent to other transformations.
Only use a filter transformation for non-relational sources. When using relational sources, filter in the
source qualifier. Filter as early as possible in the data stream. Try filtering by using numeric lookup
conditions. Numeric matching is considerably faster then the matching of strings. Avoid complex logic
in the filter condition. Be creative in rewriting complex expressions to the shortest possible length.
When multiple filters are needed, consider using a router transformation as this will simplify mapping
logic.
A lookup transformation is used to lookup values from another table. Clever use of lookup caches can
make a huge difference in performance.
By default, lookup transformations are cached. The selected lookup fields from the lookup table are
read into memory and a lookup cache file is built every time the lookup is called. To minimize the
usage of lookup cache, only retrieve lookup ports that are really needed.
However, to cache or not to cache a lookup really depends on the situation. An uncached
lookup makes perfect sense if only a small percentage of lookup rows will be used. For example if we
only need 200 rows in a 10 000 000 rows table. In this particular case, building the lookup cache would
require an extensive amount of time. A direct select to the database for every lookup row will be much
faster on the condition that the lookup key in the database is indexed.
Sometimes a lookup is used multiple times in an execution path of a mapping or workflow. Re-caching
the lookup every time would be time consuming and unnecessary, as long as the lookup source table
remains unchanged. The persistent lookup cache property was created to handle this type of situation.
Only when calling the lookup the first time, the lookup cache file is refreshed. All following lookups
reuse the persistent cache file. Using a persistent cache can improve performance considerably
because the Integration Service builds the memory cache from the cache files instead of the database.
Use dynamic lookup cache when the lookup source table is a target in the mapping and updated
dynamically throughout the mapping. Normal lookups caches are static. The records that were inserted
in the session are not available to the lookup cache. When using dynamic lookup cache, newly inserted
or updated records are update in the lookup cache immediately.
Ensure sufficient memory cache is available for the lookup. If not, the Integration server will have to
write to disk, slowing down.
By using the Additional Concurrent Pipelines property at session level, lookup caches will start building
concurrently at the start of the mapping. Normal behaviour would be that a lookup cache is created
only when the lookup is called. Pre-building caches versus building caches on demand can increase the
total session performance considerably, but only when the pre-build lookups will be used for sure in the
session. Again, the performance gain of setting this property will depend on the particular situation.
An aggregator transformation is an active transformation, used to group and aggregate data. If the
input was not sorted already, always use a sorter transformation in front of the aggregator
transformation. As with the joiner transformation the aggregator transformation will accumulate data
until the dataset is complete and only starts processing and sending output records from there on.
When sorted input is used, the aggregator will process and sent output records as soon as the first set
of records is complete. This will allow for much faster processing and smaller caches. Use as little as
possible functions and difficult nested conditions. Especially avoid the use of complex expressions in
the group by ports. If needed use an expression transformation to build these expressions in advance.
When using change data capture, incremental aggregation will enhance performance considerably.
Sometimes, simple aggregations can be done by using an expression transformation that uses
variables. In certain cases this could be a valid alternative for an aggregation transformation.
Expression transformations are generally used to calculate variables. Try not to use complex nested
conditions, use decode instead. Functions are more expensive than operators; avoid using functions if
the same can be achieved by using operators. Implicit data type conversion is expensive. Try to
convert data types as little as possible. Working with numbers is generally faster then working with
strings. Be creative in rewriting complex expressions to the shortest possible length.
The use of a sequence generator transformation versus a database sequence depends on the load
method of the target table. If using bulk loading, database sequences cannot be used. The sequence
generator transformation can overcome this problem. Every row is given a unique sequence number.
Typically a number of values are cached for performance reasons.
There is however a big catch. Unused sequence numbers at the end of the session are lost. The next
time the session is run, the sequence generator will cache a new batch of numbers.
For example: a sequence generator caches 10000 values. 10000 rows are loaded, using the cached
values. At row 10001, a new batch of sequence values is cached: from 10000 to 20000. However the
last row in the session is 10002. All values between 10002 and 20000 are lost. Next time the session is
ran, the first inserted row will have a key of 20001.
To avoid these gaps use a sequence generator in combination with an unconnected lookup. First look
up the latest key value in the target table. Then use an expression that will call the sequence
generator and add one to the key value that was just retrieved. The sequence generator should restart
numbering at every run. There are some advantages to this approach:
if rows are deleted, gaps between key values will be caused by deletes and not by caching
issues.
As an added advantage, the use of this method will prevent migration problems with persistent values
between repositories. This method is easy to implement and does not imply a performance penalty
while running the session.
In general, it is better to capture all data quality issues that could cause transformation errors in
advance and write flawed records into an error table.
readfromdisk/writetodisk: indicates not enough cache memory is available. Increase the cache
size until this counter is no longer shown.
click to enlarge
Figure 3: Session statistics while using Collect Performance Data
Memory Optimization
Memory plays an important role when the Integration Service is running sessions. Optimizing cache
sizes can really make a huge difference in performance.
Buffer memory is used to hold source and target data while processing and is allocated when the
session is initialized. DTM Buffer is used to create the internal data structures. Buffer blocks are used
to bring data in and out of the Integration Service. Increasing the DTM buffer size will increase the
amount of blocks available to the Integration Service. Ideally a buffer block can contain 100 rows at
the
same
time.
You can configure the amount of buffer memory yourself or you can configure the Integration Service
to automatically calculate buffer settings at run time. In stead of calculating all values manually or by
trial and error, run the session once on auto and retrieve the correct values from the session log:
Severity
Timestamp
Node
Thread
Message
Code
Message
INFO 12/30/2008 1:23:57 AM INFO8_ASPMIS003 MAPPING TM_6660 Total Buffer Pool size is 90000000 bytes and
Block size is 65536 bytes.
The Integration server uses the index and data caches for mostly active transformations: aggregator,
joiner, sorter, lookup, rank
Configuring the correct amount of cache is really necessary as the Integration Server will write and
read from disk if not properly sized. The index cache should be about half of the data cache. Cache
files should be stored on a fast drive and surely not on a network share.
The easiest way of calculating the correct cache sizes is by keeping the defaults on auto and
examining the session log. In the session log a line is written for every lookup that looks like this:
Copy these values into the session properties. Be sure to verify the performance counters to validate
no disk reads/writes are done.
Sorter transformations need special attention concerning cache sizes as well. If not enough cache is
available, the sorter will require a multi pass sort, dropping the session performance. If so, a warning
will be displayed in the session log:
TRANSF_1_1_1> SORT_40427 Sorter Transformation [srt_PRESTATIE] required 4-pass sort (1-pass temp I/O:
19578880 bytes). You may try to set the cache size to 27 MB or higher for 1-pass in-memory sort.
The maximum amount of memory used by transformation caches is set by two properties:
The smaller of the two is used. When the value is 0, the automatic memory attributes are disabled. If
this value is set too low, an error will occur if a lookup with manually configure cache wants to allocate
more memory then available. Keep in mind that sessions can run in parallel: every session will try to
allocate RAM-memory.
Ensure plenty of RAM-memory is available for the Integration Service. Do not assume that adding
cache memory will increase performance, at a certain point optimum performance is reached and
adding further memory will not be beneficial.
Concurrent sessions
Depending on the available hardware, sessions can be run concurrently instead of sequential. At
Integration Service level the number of concurrent sessions can be set. This value is set default to 10.
Depending on the number of CPU's at the PowerCenter 8 server and at source and target database this
value can be increased or decreased. The next step is in designing a workflow that will launch a
number of sessions concurrently. By trial and error an optimal sampling can be achieved.
Session logging
The amount detail in a session log is determined by the tracing level. This level ranges from Terse' to
Verbose Data'. For debugging or testing purposes the Verbose Data' option will trace every row that
passes in the mapping in the session log. At terse, only initialization information, error messages, and
notification of rejected data are logged. It is quite clear the Verbose Data' option causes a severe
performance penalty.
For lookups, use the Additional Concurrent Pipelines for Lookup Creation' to start building lookups as
soon as the session is initialized. By the time the lookups are needed in the session, the cache creation
hopefully is already finished.
Partioning
If a transformation thread is 100% busy, consider adding a partition point in the segment. Pipeline
partitioning will allow for parallel execution within a single session. A session will have multiple threads
for processing data concurrently. Processing data in pipeline partitions can improve performance, but
only if enough CPU's are available. As a rule of thumb, 1.5 CPU's should be available per partition.
Adding a partition point will increase the number of pipeline stages. This means a transformation will
logically be used a number of times, so remember to multiply the cache memory of transformations,
session,.. by the number of partitions. Partitioning can be specified on sources / targets and the
mapping transformations.
Using partitioning requires the Partition Option' in the PowerCenter 8 license.
click to enlarge
Figure 4: The execution time of this mapping was reduced from 400 to 90 seconds using partitioning
and advanced mapping tuning.
Pushdown Optimization
Pushdown optimization will push the transformation processing to the database level without
extracting the data. This will reduce the movement of data when source and target are in the same
database instance. Possibly, more optimal specific database processing can be used to even further
enhancing performance. The metadata en lineage however is kept in PowerCenter.
Partial pushdown optimization to source: one or more transformations can be processed in the
source
Partial pushdown optimization to target: one or more transformations can be processed in the
target
Full pushdown optimization: all transformations can be processed in the database A number of
transformations are not supported for pushdown: XML, ranker, router, Normalizer, Update
Strategy,
Pushdown optimization can be used with sessions with multiple partitions, if the partition types are
pass-through of key range partitioning. You can configure a session for pushdown optimization in the
session properties. Use the Pushdown Optimization Viewer to examine the transformations that can be
pushed to the database. Using pushdown requires the Pushdown Optimization Option' in the
PowerCenter 8 license.
Architecture
64-Bit PowerCenter 8 versions will allow better memory usage as the 2GB limitation is removed. When
PowerCenter 8 is run on a grid, the workflows can be configured to use resources efficiently and
maximize scalability. Within a grid, tasks are distributed to nodes. To improve performance on a grid,
the network bandwidth between the nodes is of importance as a lot of data is transferred between the
nodes. This data should always be stored on local disks for optimal performance. This includes the
caches and any source and target file. Of course even 64-bit computing will not help if the system is
not properly setup. Make sure plenty of disk space is available at the PowerCenter 8 server.
For optimal performance, consider running the Integration service in ASCII data movement mode when
all sources and targets use 7 or 8-bit ASCII as UNICODE can take up to 16 bits.
The repository database should be located on the PowerCenter 8 machine. If not, the repository
database should be physically separated from any target or source database. This will prevent the
same database machine is writing to a target while reading from the repository. Always use native
connections over ODBC connections as they are a lot faster. Maximize the use of parallel operations on
the database. The use of parallelism will cut execution times considerably. Remove any other
application from the PowerCenter 8 server apart from the repository database installation.
Increase the database network packet size to further improve performance. For Oracle this can be
done in the listener.ora and tnsnames.ora. Each database vender has some specific options that can
be beneficial for performance. For Oracle, the use of IPC protocol over TCP will result in a performance
gain of at least by a factor 2 to 6. Inter Process Control (IPC) will remove the network layer between
the client and Oracle database-server. This can only be used if the database is residing on the same
machine as the PowerCenter 8 server. Check the product documentation for further documentation.
By careful load monitoring of the target/source databases and the servers of PowerCenter 8 and
databases while running a session, potential bottlenecks at database or system level can be identified.
Perhaps the database memory is insufficient? Perhaps too much swapping is occurring on the
PowerCenter
8
server
?
Perhaps
the
CPU's
are
overloaded
?
The tuning of servers and database is just as important as delivering an optimized mapping and should
not be ignored. Tuning a system for a datawarehouse poses different challenges then tuning a system
for an OLTP-application. Try to involve DBA's and admins as soon as possible in this process so they
fully understand the sensitivities involved with data warehousing.
Conclusion
Increased competition and tighter budgets force IT managers to look for alternatives to decrease data
centre costs while providing more services to a growing and demanding user base. One way in
accomplishing this is by reducing the load on existing servers by optimizing the execution of
workflows. The reduced execution times of optimized sessions increase the amount of free space in an
often crowded ETL load window.
Using a number of simple guidelines, sessions can be tuned to perform at their best. In typical
situations, the total ETL time can be reduced by 10 to 40% without making any structural investments
in hardware. Performance tuning of sessions however is not an exact science and often needs trial and
error to find an optimum. Tuning sessions can be a labour intensive, even tedious, as the tuning
process tends to repeat itself as time passes by.
Tuning is not a job solely done by the developer. Tuning is a team effort. To ensure optimal
performance, developers, administrators and DBA's need to be actively and continuously involved.
Care for optimization and tuning is not something that needs to be taken care off when development
has completed. The guidelines above should be used as soon as the project starts, to ensure optimal
performance of the PowerCenter 8 system from the start to end of the system lifecycle.