You are on page 1of 18

1

Data Validation Overview

What is Data Validation? Identifying errors in data sets that have been Moved or Transformed to ensure they are Complete and Accurate and meet Expectations or Requirements.

How most companies test today


Many companies perform testing manually by writing SQL scripts, using Excel, or hand coding testing logic into their integration processes Reconciliation is done manually (if done at all)
basic SQL scripts (row counts, aggregates) or manually written mappings/logic

Customers estimate data testing SHOULD take 25-30% of all hours spent on Data Integration
Most customers admit they do not do enough data validation, resulting in poorer data quality and higher project risk

PowerCenter upgrades can take up to weeks or months to complete due to manual testing effort
It takes one day to upgrade the ETL software

Problems with Manual Testing


Takes a long time and is expensive
Time is spent writing queries and waiting for them to run and then searching through the results

Error-prone manual process


Stare and compare

Cannot perform thorough testing


Time/Cost pressure leads to try it here and there approach Testing ends when the deadline is reached, done or not

Usual problems associated with writing custom code


No audit trail No reuse No methodology
5

Current Approach: Like a Photo Hunt

Current Approach: Stare and Compare


Data Set #1 Data Set #2

What is the Data Validation Option?

DVO is a independent (black box) testing solution that provides


automation, repeatability and auditability to virtually any data

testing or reconciliation process.

Some DVO use cases


Data Being Transformed
ETL Reconciliation Data Masking ETL Testing Application Migration

Data is Identical
ETL version upgrade ETL Migration Database migration Application Retirement
9

Two Value Propositions for DVO


Ensure the integrity of data as it moves through the IT environment.
Development & Test
Provide automation for unit and regression testing of integration logic. Ensure that data produced by DI code meets requirements and expectations

Production Reconciliation
Protect the integrity of data that is loaded into production systems. Erroneous data due to failed loads, faulty logic or operational issues is caught in a proactive automated manner and can be addressed as needed

10

Two Value Propositions for DVO


Ensure the integrity of data as it moves through the IT environment.

11

How DVO works with PowerCenter


Data Validation Option
DVO Clients
Define Tests

Repository & Warehouse

Database Views
V_Summary Id: name name: string V_Tests Price: integer DateId: name in: date V_Results Datename: string out: date Salary: float Price: integer Date Id: name in: date Date name: string out: date Price: integer Salary: float Date in: date Date out: date Salary: float

Reports

Execute Tests

Results Data Accessed Repository and Integration Services

Repository PowerCenter

Enterprise Data

12

Key Features of DVO


Broad data connectivity
DBMS (Oracle, SQL Server, DB2, Sybase, Teradata, Netezza) Mainframe (DB2 z/OS, DB2 AS/400, IMS, Adabas, MF Flat files, VSAM) SalesForce.com , SAP transparent tables, SAS, ODBC and Flat files

Numerous built-in tests


COUNT, COUNT_DISTINCT, COUNT_ROWS, MIN, MAX, AVG, SUM SET AinB, SET BinA, SET AeqB VALUE, OUTER VALUE, Expressions

Model ETL constructs


LOOKUPs, Arbitrary SQL Relationships

Other
Run from GUI or CLI (DVOCmd) Built-in reporting

13

Comparing DVO with Manual Testing

14

Development and Test


KEY BUSINESS IMPERATIVE AND IT INITIATIVE

Technology Company Reduced data testing time by 80% with Data Validation Option

SAAS provider of Sales Compensation and analytics Data absolutely has to be correct as it affects peoples paychecks Very high visibility of the data with users Trust in the data is key
THE CHALLENGE New release every ~1 month 1 Full week of data testing by QA team per release Developers wrote SQL for testing the data Testers would execute the SQL, track errors and work with Developers to resolve And who was testing the SQL to make sure it was correct? INFORMATICA ADVANTAGE With DVO they are able to test 100,000s rows of data in regression tests Developers no longer required to write SQL Testers are now empowered and independent of developers RESULTS/BENEFITS Have created a test suite of over 1000 Tests Testers can manage the testing environment Can test large volumes of data Testing time reduced from 1 week to 1 day (80% less) Spend free time on higher level tasks

Informatica Confidential Under NDA

15

Production Reconciliation

Financial Services Company Ensures DW is Complete and Accurate with Data Validation Option
Good data is essential to good business decisions. Their calculations of portfolio risk and value must be correct. Spends hundreds of millions purchasing troubled debt in the USA The data and risk calculations on those assets must be correct. Bad data could cost them millions and put them out of business.

KEY BUSINESS IMPERATIVE AND IT INITIATIVE

THE CHALLENGE
Business users were complaining about missing data in the systems. Data errors can lead to very costly bad business decisions. They were doing manual testing via developer-written mappings and PL/SQL Other products available today could not meet their requirements

INFORMATICA ADVANTAGE
With DVO they are able to perform detailed reconciliations across source and target systems. With DVO, they have a complete audit trail.

RESULTS/BENEFITS
DVO found where data was missing Found thousands of missing records due to bad coding, & improperly rerun failed jobs Reloaded all missing data in two weeks They are looking to implement ongoing incremental validation for all new data loaded into tables

Informatica Confidential Under NDA

16

Production Reconciliation
KEY BUSINESS IMPERATIVE AND IT INITIATIVE

Mid-size Technology Company Reconciling MDM data using Data Validation Option

Customer and contact hub is pivotal to efficient business operations Millions of records processed across various systems Ensure BAs, line managers and customers had access to accurate and complete data based on their needs
THE CHALLENGE INFORMATICA ADVANTAGE RESULTS/BENEFITS

No easy way to reconcile data in systems to identify bad data or identify extent of errors

Incorrectly augmented data in systems Gold record data didnt always match across systems Faulty records propagated downstream.

DVO reconciled data across systems (e.g. SalesForce and Hub) and found:
1000s of missing records between systems Incorrectly augmented D&B data Improperly coded golden records

Identified errors due to faulty DI logic, and error handling process Ensured incorrect records no longer being use in marketing campaigns Bad customer data no longer reaching customer in portal

Informatica Confidential Under NDA

17

18

You might also like