You are on page 1of 33

INFORMATICA POWER CENTER ADVANCED

AGENDA
Mapplets and types of mapplets Reusable transformations User defined functions Types of batch processing Link conditions Task and types of tasks (Reusable, non reusa) Worklets and types of worklets Scheduling workflow Constraint based load ordering Target Load Plan (TLP)

Importing and exporting objects PMCMD Utility PMREP Utility SCD type II implementation using start date and end date Lookup caches Performance Optimization(source, t/r, session, System) ETL Unit Testing ETL Performance Testing Caches

Mapping debugger Pushed down optimization Power center 8 enhencements Session recovery Mapping parameters Mapping variables Parameterization of sessions Difference between normal and bulk loading Session partitions and types

Mapplets and Types of Mapplets


A mapplet is a reusable object created with business logic using set of transformations A mapplet is created using mapplet designer tool There are two types of mapplets: a. Active mapplet b. Passive mapplet Limitations of mapplet When we want to use the stored procedure transformation, we should use the stored procedure transformation with the type normal.

Re-usable Transformations
A reusable transformation is a reusable object, created with business logic using single transformation. A Re-usable transformation is created in two different ways: a. Using transformation developer tool b. Converting a non reusable tr. Into a reusable tr. Limitations: Source Qualifier transformation can not be used as a re-usable transformation.

User Defined Functions


A user defined functions is a power center object which is created using power center transformation language. The power center transformation language is set of built in functions (nearly 74 in number), to define the business logic.

Batch processing and types of batch processing


When multiple sessions are run in a single workflow, this is called as batch processing A workflow can execute one or more session. Execution of the sessions or batch processing could be of two type: a. Parallel batch processing b. Sequential batch processing

Link Conditions
A link condition controls the execution of sessions during workflow run. A link condition is defined using a pre defined variable called status

Tasks and Types of Tasks


A task is defined as set of executable actions, commands and functions. There are two types of tasks: a. Re-usable task (Using task developer tool) ex. Session, command, email, worklet, event wait, event raise, timer, decision, assignment, control b. Non re-usable task (Using workflow designer tool)

Worklets and types of worklets


A worklet is defined as group of tasks. There are two types of worklets: a. Re-usable worklet b. Non re-usable worklet Business purpose: A worklet is required to simplify the complex workflow designs and to meet the process operational order. A workflow which contain a worklet is known as a parent workflow.

Scheduling of workflow
A schedule is an administrative task, which specifies the date and time to run the workflow. A schedule is a automation of running the workflow. There are two types of schedules: a. Re-usable schedule b. Non re-usable schedule

Constraint Based Load Ordering


A constraint based load order, defines the order in which data loads into the multiple targets, based on primary and foreign key relationship. Business purpose: Use the Constraint based load ordering to load the data into flake dimensions, which are related with primary and foreign key relationships (Recall joiner transformation)

Target Load Plan


A target load plan defines the order in which data is extracted from source qualifier transformation.

Importing and Exporting Objects


The repository objects such as mappings, sessions, workflows, worklets etc. can be exported into .xml files(backup files). Procedure of exporting Procedure of importing

PMCMD Utility
The PMCMD is a command line client program, that communicates with integration service. Use PMCMD to start the workflow, on integration service. Issue the following commands to work with PMCMD: a. pmcmd>connect b. pmcmd>start workflow c. pmcmd>set folder d. pmcmd>unset folder e. pmcmd>disconnect f. pmcmd>exit

PMREP Utility
The PMREP is a command line client program, which connects to repository service to perform administrative tasks. It connects to Repository service with following syntax: a. pmrep>connect r repository d domain n user x password Ex. Pmrep>connect r nipuna_rep d domain_nipuna n administrator x administrator b. create folder c. delete folder d. delete object e. backup f. Restore g. Exit

SCD type 2 implementation using start date and end date


Example about employee start career (26/04/2011) with end date as null for the designation SE After some time new designation is SSE, so the end date null should be updated.

Lookup Caches
There are two types of cache memory: index cache & data cache All port values from the lookup table where the port is part of the lookup condition are loaded into index cache. The index cache contains all port values from the lookup table, where the port is specified in the lookup condition. The data cache contain all port values from the lookup table that are not in lookup condition and that are specified as output port. After the cache is loaded values from the lookup input ports that are part of the lookup condition are compared to the index cache. Upon match the rows from the cache are included in stream. Following are the types of lookup caches: a. Static lookup cache b. Dynamic lookup cache c. Persistent lookup cache

Performance Optimization
Source: Use the following technique to improve the performance of data extraction a. Create Source Filters b. Create indexes Transformation: Filter, Joiner, Aggregator, Expression, Router, Update Strategy, Lookup, Sequence Generator, sorter Sessions: Tune parameter, create partitions System: Increase CPU perf, Increase network speed

ETL Unit Testing


A unit test for a data warehouse is a white box testing We should check the ETL specification, mappings and ETL procedures Following are the test cases: a. Test case 1: Data availability b. Test case 2: Data Load- Insert c. Test case 3: Data load update d. Test case 4: Incremental data load e. Test case 5: Data accuracy f. Test case 6: Data lose g. Test case 7: Column mappings h. Test case 8: Naming standards

ETL Performance Testing


Most performance issues are encountered when IS writes the data into the target The first step in performance tuning is to identify the performance bottlenecks, in the following order a. Test case 1: Identify the target bottleneck b. Test case 2: Identify the source bottleneck c. Test case 3: Identify the mapping bottleneck d. Test case 4: Identify the session bottleneck e. Test case 5: Identify the system bottleneck

Caches
The following transformations need the cache memory to process the data: a. Joiner transformation b. Lookup transformation c. Aggregator transformation d. Rank transformation e. Sorter transformation

Mapping Debugger
It is used to debug the mappings while doing data validations. Ex. Create a mapping to load the employees whose ename starts with S and calculate tax based on salary(20%) Procedure to use debugger

Pushed Down Optimization


Pushed down optimization is a session property that analyzes the mapping and determines the transformations to be sent to source or target database. When we configure the session for a pushed down optimization, the IS analyzes the transformations and converts the transformation logic into SQL and send the SQL to the source or target database. It improves the performance of the session Configure the session to perform the pushed down optimization in the following ways: a. Source side pushed down optimization b. Target side pushed down optimization c. Full pushed down optimization

Power Center 8 enhancements


1. 2. 3. 4. 5. Pushed down optimization User defined functions Service Oriented Architecture (SOA) SQL Transformation JAVA Transformation

Session Recovery
When we stop a session or an error causes the session to stop, then identify the reasons for the failure of the session and start the session using following options: a. Restart the session: If IS has not issued at least one commit b. Perform session recovery: If IS has issued at least one commit. When we start the session in recovery mode, IS reads the rowID of last record committed from table: OPB_SRVR_RECOVERY (Repository table 522) This IS start processing the data records from the next rowID.

Mapping Parameters
A mapping parameter represents a constant value that can be defined before mapping run. A mapping parameter is created with name, type, data type, precision and scale. The values for mapping parameters are defined in a parameter file. Save the parameter file with an extension .prm or .pst Mapping parameter is represented with $$ symbol. Syntax to create a parameter file: [folder name.wf: workflow.st: session] Mapping parameters are used to reduce dev overhead(avoid creation of multiple mappings when you want to change the constant value) A mapping parameter is specific to that particular mapping Mapping parameters are created to standardize the business logic The mapping parameters and variables can also be used in a SQ T/R The mapping parameters can also be defined while creating mapplet

Mapping Variables
Mapping variable represents a value that can be changed during mapping run. After each successful completion of session, the IS stores the variables with its current values, in the repository. The IS uses the current variable values for next run. A mapping variable can be defined using following variable functions: set variable(), setcountvariable(), setmaxvariable(), setminvariable() Mapping variables for sequence numbers Mapping variables for incremental extraction or reading

Parameterization of Sessions
Connection is designed to path to the database or file system

Difference Between Normal and Bulk Loading


Normal Load: When we configure the session with target load type normal, the IS reads the transaction details, from database log. The target database server creates the db log and enters the records in target db via db log. Since the db log is maintained by db server, the IS can perform rollback on transaction errors. As a result the IS enables the session to perform recovery. Advantage Disadvantage

Bulk Loading: When we configure the session with target load type bulk, the IS improves the session performance, that inserts large amounts of data in the target database. We can enable the bulk loading for following database types: a. Oracle b. SQL Server c. Sybase d. DB2 When we enable the bulk loading for other database types, the IS reverts to normal loading. The bulk loading can not be performed to an indexed target table. Advantage Disadvantage

Session Partitions
A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in that stage. By default the IS creates one partition in every pipeline stage. Partition points mark the boundaries between threads in a pipeline. The IS redistributes rows of data at partition points. We can add partition points, to increase the number of T/R, threads and increase session perf. Types of partitions: Key range, Pass through, Round Robin, Has, Database

You might also like