Professional Documents
Culture Documents
AGENDA
Mapplets and types of mapplets Reusable transformations User defined functions Types of batch processing Link conditions Task and types of tasks (Reusable, non reusa) Worklets and types of worklets Scheduling workflow Constraint based load ordering Target Load Plan (TLP)
Importing and exporting objects PMCMD Utility PMREP Utility SCD type II implementation using start date and end date Lookup caches Performance Optimization(source, t/r, session, System) ETL Unit Testing ETL Performance Testing Caches
Mapping debugger Pushed down optimization Power center 8 enhencements Session recovery Mapping parameters Mapping variables Parameterization of sessions Difference between normal and bulk loading Session partitions and types
Re-usable Transformations
A reusable transformation is a reusable object, created with business logic using single transformation. A Re-usable transformation is created in two different ways: a. Using transformation developer tool b. Converting a non reusable tr. Into a reusable tr. Limitations: Source Qualifier transformation can not be used as a re-usable transformation.
Link Conditions
A link condition controls the execution of sessions during workflow run. A link condition is defined using a pre defined variable called status
Scheduling of workflow
A schedule is an administrative task, which specifies the date and time to run the workflow. A schedule is a automation of running the workflow. There are two types of schedules: a. Re-usable schedule b. Non re-usable schedule
PMCMD Utility
The PMCMD is a command line client program, that communicates with integration service. Use PMCMD to start the workflow, on integration service. Issue the following commands to work with PMCMD: a. pmcmd>connect b. pmcmd>start workflow c. pmcmd>set folder d. pmcmd>unset folder e. pmcmd>disconnect f. pmcmd>exit
PMREP Utility
The PMREP is a command line client program, which connects to repository service to perform administrative tasks. It connects to Repository service with following syntax: a. pmrep>connect r repository d domain n user x password Ex. Pmrep>connect r nipuna_rep d domain_nipuna n administrator x administrator b. create folder c. delete folder d. delete object e. backup f. Restore g. Exit
Lookup Caches
There are two types of cache memory: index cache & data cache All port values from the lookup table where the port is part of the lookup condition are loaded into index cache. The index cache contains all port values from the lookup table, where the port is specified in the lookup condition. The data cache contain all port values from the lookup table that are not in lookup condition and that are specified as output port. After the cache is loaded values from the lookup input ports that are part of the lookup condition are compared to the index cache. Upon match the rows from the cache are included in stream. Following are the types of lookup caches: a. Static lookup cache b. Dynamic lookup cache c. Persistent lookup cache
Performance Optimization
Source: Use the following technique to improve the performance of data extraction a. Create Source Filters b. Create indexes Transformation: Filter, Joiner, Aggregator, Expression, Router, Update Strategy, Lookup, Sequence Generator, sorter Sessions: Tune parameter, create partitions System: Increase CPU perf, Increase network speed
Caches
The following transformations need the cache memory to process the data: a. Joiner transformation b. Lookup transformation c. Aggregator transformation d. Rank transformation e. Sorter transformation
Mapping Debugger
It is used to debug the mappings while doing data validations. Ex. Create a mapping to load the employees whose ename starts with S and calculate tax based on salary(20%) Procedure to use debugger
Session Recovery
When we stop a session or an error causes the session to stop, then identify the reasons for the failure of the session and start the session using following options: a. Restart the session: If IS has not issued at least one commit b. Perform session recovery: If IS has issued at least one commit. When we start the session in recovery mode, IS reads the rowID of last record committed from table: OPB_SRVR_RECOVERY (Repository table 522) This IS start processing the data records from the next rowID.
Mapping Parameters
A mapping parameter represents a constant value that can be defined before mapping run. A mapping parameter is created with name, type, data type, precision and scale. The values for mapping parameters are defined in a parameter file. Save the parameter file with an extension .prm or .pst Mapping parameter is represented with $$ symbol. Syntax to create a parameter file: [folder name.wf: workflow.st: session] Mapping parameters are used to reduce dev overhead(avoid creation of multiple mappings when you want to change the constant value) A mapping parameter is specific to that particular mapping Mapping parameters are created to standardize the business logic The mapping parameters and variables can also be used in a SQ T/R The mapping parameters can also be defined while creating mapplet
Mapping Variables
Mapping variable represents a value that can be changed during mapping run. After each successful completion of session, the IS stores the variables with its current values, in the repository. The IS uses the current variable values for next run. A mapping variable can be defined using following variable functions: set variable(), setcountvariable(), setmaxvariable(), setminvariable() Mapping variables for sequence numbers Mapping variables for incremental extraction or reading
Parameterization of Sessions
Connection is designed to path to the database or file system
Bulk Loading: When we configure the session with target load type bulk, the IS improves the session performance, that inserts large amounts of data in the target database. We can enable the bulk loading for following database types: a. Oracle b. SQL Server c. Sybase d. DB2 When we enable the bulk loading for other database types, the IS reverts to normal loading. The bulk loading can not be performed to an indexed target table. Advantage Disadvantage
Session Partitions
A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in that stage. By default the IS creates one partition in every pipeline stage. Partition points mark the boundaries between threads in a pipeline. The IS redistributes rows of data at partition points. We can add partition points, to increase the number of T/R, threads and increase session perf. Types of partitions: Key range, Pass through, Round Robin, Has, Database