You are on page 1of 25

PowerCenter Data Validation Option Overview

January 2011

2007 Informatica. Company Confidential. Forward-looking information is based upon multiple


assumptions and uncertainties and does not necessarily represent the companys outlook.

Three Kinds of Data Validation


Source to Target
At the end of ETL development

Production to Development
ETL version upgrade
ETL Migration
Application retirement
Reconciliation / Audit Balance Control
Runtime validation and reconciliation
Embedded in production ETL process

Current Data Validation Approach


Currently Data Validation is done manually by
writing SQL scripts
Customers estimate Data Validation should take
~ 30% of all hours spent on Data Integration
Most customers admit they do not do enough data validation,
resulting in poorer data quality and higher project risk

PowerCenter upgrades take up to 6 months


It takes one day to upgrade the ETL software

Problems with Manual Validation


Takes a long time and is expensive
Time is spent writing queries and waiting for them to run

Error-prone manual process


Cannot perform thorough testing
Time/Cost pressure leads to try it here and there approach
The tester runs out of time/money before testing is done

Usual problems associated with writing custom code


No audit trail
No reuse
No methodology

Data Validation Option Overview


Tool built on top of Informatica PowerCenter
Users define data rules using easy-to-understand GUI
Data is processed and evaluated using PowerCenter
Results are displayed in the GUI and stored for later
retrieval and reporting

Data Validation Option Architecture


1

DVO is used to
define Test Rules

DVO is used
to display results

Comprehensive
Reporting on all
Tests and Results

PC API

Results
DB

All results are


stored in the
Results DB

PowerCenter mappings are generated


Session is executed
6

Customer Example #1
SQL Server to DB2 ETL project
10 tables
Manual testing: 4-5 days
DataValidator, incl. Setup: 3 hours

Customer Example #1
Source-to-target testing of two Oracle databases
50 table pairs, 968 tests
Manual testing: 3 weeks
DVO: 1 week
Regression testing is significantly faster with
DVO: tested fact tables and 20 dimensions in one
day
8

Customer Example #2
PowerCenter Upgrade Testing
We used DataValidator to compare 14 tables and about 30M rows (setup
time and training someone included) in less than 5 hours. The largest of
the tables was 94 columns. When I asked our QA people how long it
would take them to run scripts and test this amount of data, they
mentioned months between the two of them.
Tom Kato

http://www.linkedin.com/groupAnswers?
discussionID=6289235&viewQuestionAndAnswers=&goback=
%2Eanh_39237&gid=39237

Data Validation Option Benefits

Increased likelihood of project success, lower project risk

Significant cost savings, faster time to market


50% source-to-target testing
80% regression testing
90% upgrade testing

Ability to test all data, not just a small sample

Ability to test in heterogeneous environments

No need to know SQL

Complete Audit Trail and comprehensive reporting of all


testing activities

No need to acquire additional server technology: leverage


PowerCenters scalability, platform support and data access
10

SQL Views: Problem


srcOrderHeader
OrderID
CurrencyName
ExchangeRate
srcOrders
LineID
OrderID
ProductID
ProductName
Quantity
UnitPrice (local currency)

FactOrders
LineID
CurrencyName
ExchangeRate
DollarAmount

DollarAmount = UnitPrice * Quantity * ExchangeRate

12

SQL Views: Solution

Create View

Name:
SQL:

MyView
Select LineID, CurrencyName, Quantity*UnitPrice*ExchangeRate
From srcOrderHeader Join srcOrders On srcOrderHeader.OrderID=srcOrders.OrderID

3 Outputs
MyLineID

Integer

MyCurrency

String

MyAmount

Decimal

Create Table Pair

TableA: MyView

TableB: FactOrders

Join:

MyLineID to LineID

Create Outer Value Test

MyAmount vs DollarAmount
13

Lookups: Problem
dimProducts
ProductID
ProductName

SourceOrders
LineID
ProductName
Amount

Validation

factOrders
LineID
ProductID
DollarAmount

Need to compare ProductName to ProductID


14

Lookups: Solution

Lookup Relationship

Lookup View
factOrders
LineID
ProductID
DollarAmount

SourceOrders
LineID
ProductName
Amount
dimProducts
ProductID
ProductName

Validation

15

Advanced Features
Ability to define and reuse complex SQL expressions
Ability to validate lookups
Rapid test generation through scripting or Excel
spreadsheets
Ability to run a DVO test from a scheduler, workflow
or any other process
Performing aggregations in the database instead of
PowerCenter
18

Why Not Check / Reverse Engineer


the ETL Mappings?
Majority of ETL errors are due to:
Misunderstanding among business users, analysts and ETL
developers
Source data being different from what is actually expected

Example:
VP of Sales: I want a field that shows increase over last year
Business Analyst: (this_year last_year) / last_year
ETL developer creates: (this_year last_year) / this_year
Checking the mapping input and output: mapping is correct
Data is not correct
20

Automating Integration Tasks


A typical data integration project has
several distinct phases:

Requirements gathering and


specification

Over the years, various tools


have been introduced to
improve / automate these tasks
Analyst tools

Analysis of source data


Logical and physical modeling
Building the data movement and
transformation layer (mappings or code)
Testing and validation
Maintenance and upgrades

Profiling ~2001
ER tools ~1990s

ETL tools ~1990s


DataValidator automates and
improves these two phases

21

Lookups: Solution

Create Lookup

Name:
Source Table:
Lookup Table:
Relationship:

MyLookup
SourceOrders
dimProducts

Source Table: ProductName


Lookup Table ProductName

Lookup Outputs: fields in both tables

Create Table Pair

TableA: MyLookup
TableB: FactOrders
Join:
LineID to LineID

Create Outer Value Test


ProductID vs ProductID

22

CLI / Workflow Integration Example


cliHeader
HeaderID
CustomerName
cliDetail
DetailID
HeaderID
ProductName
ProductAmount

Join+Agg

cliStage
CustomerName
CustomerAmount

Validate with DVO


Sum(cliDetail.ProductAmount) =
Sum(cliStage.CustomerAmount)

Pass

cliTarget
CustomerName
CustomerAmount

Fail

cliError
CustomerName
CustomerAmount
23

Workflow Scenario 1
cliHeader
1 Smith
2 Jones
3 Green

cliDetail
1 1
iPod
$100
2 1
CD
10
3 2
iPod
100
4 2
Battery
50
5 3
iPod
100
6 3
CD
10
7 3
Battery
50

Join+Agg

cliStage
Smith $110
Jones 150
Green 160

Validation: $420 = $420 Pass


24

Workflow Scenario 2
cliHeader
1 Smith
2 Jones
3 Green

cliDetail
1 1
iPod
$100
2 1
CD
10
3 2
iPod
100
4 2
Battery
50
5 3
iPod
100
6 3
CD
10
7 3
Battery
50

Join+Agg

cliStage
Jones 150
Green 160

Validation: $420 <> $310 Fail


25

Command Line Interface


Ability to integrate DataValidator Tests into
PowerCenter or any other Workflow
Ability to schedule DataValidator Tests

26

Questions

&

Answers

27

28

You might also like