You are on page 1of 18

DATA QUALITY 9.

1 Training

IDQ 9.1 Labs


Lab 1 - Content Management Service....................................................2
Lab 2 - New Reference Table Capabilities...............................................6
Lab 3 - Content Sets............................................................................. 7
Lab 4 - Tags........................................................................................ 10
Lab 5 - Match Enhancements...............................................................12
Lab 6 New Exception Transform........................................................13
Lab 7 - Data Quality for MS Excel.........................................................16
Lab 8 - Profiling Labs..........................................................................17

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Lab 1 - Content Management Service


Create and Configure CMS
Objective: Configure AddressDoctor options and check AV reference file status from
Developer

Steps

Open Administrator Console


Select Action/New/Content Management Service

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Follow Wizard to create and start the CMS

Open CMS, processes tab

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Edit AV Options
AV Licence: S0PCF4MN94L7ZXEZ635NCSM90NZKR0NJUTWA
o Set No Pre-Load to ALL for all types
o Set AV Reference data path to:
C:\Informatica\9.1.0\services\DQContent\INFA_Content\av\default

Recycle CMS Service


Recycle DIS Service
Open Developer
o Select Window / Preferences
o Select Content Status
o Check Status is displayed correctly (expected view below)

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Lab 2 - New Reference Table Capabilities


Create Managed Reference Table from database
Objective: Create new reference table using a database source

Steps

Open Analyst Tool


Select Create New Reference Table
Select Connect to a Relational Table
Select DQ_Tables Connection
Select fname table
Select Column1 as valid value
Save as fname in Your Project

Create Unmanaged Reference Table from database


Objective: Create new unmanaged reference table

Steps

Open Analyst Tool


Select Create New Reference Table
Select Connect to a Relational Table
Make sure Unmanaged Table is ticked
Select DQ_Tables Connection
Select us_states table
Select Column1 as valid value
Save as us_states in Your Project

Informatica Data Quality 9.1

End of Exercise

DATA QUALITY 9.1 Training

Lab 3 - Content Sets


Create and configure new Content Set
Objective: Create new content set and content set expressions

Steps

Open Developer
Select File / New / Content Set
Create new Content Set called ContentSet_91 in Your project
Open your content set
Add a new

Add new Character Set:


o Name: Char
o Label : C
o Range: a-z and A-Z
Add a new Regular Expression:

Name: num
Number of Outputs: 1
RegEx: ^[0-9]+$
new Token Sets (RegEx):
Name: date
Label: date
RegEx: ^\d{1,2}\/\d{1,2}\/\d{4}$
Description: matches dates of the form XX/XX/YYYY where XX can be 1 or 2
digits long and YYYY is always 4 digits long.
Save Content Set
o
o
o
Add a
o
o
o
o

Use a Content Set


Parse, Cleanse and Standardize Data
Objective: Prepare data source for upload to Warehouse and matching scenarios

Steps

Create New Mapping: m_process_customer_data

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Add customer Flat file source from c:\DQ_DATA directory


Add Parser Token Parser
o Add Input Port contact_name from source
o Create new strategy
Name: parse_names
Operation 1:
Operation: Parse Using Reference Table
Name: parse_fname
Reference Table: fnames (Enablement_91 project)
Output: fname, string, 25
Operation 2:
Operation: Parse Using Reference Table
Name: parse_sname
Reference Table: usa_surnames_infa
(Informatica_DQ_Content/Dictionaries/North America/USA)
Output: sname, string, 25
o Add Input port address1 from source
o Create new strategy
Name: parse_housenum
Operation 1:
Operation: Parse Using Token Set
Select Regular Expression
o Choose RegEx num from ContentSet_91
Create new output house_num
o Run data viewer and examine your results
Add Labeler
o Add Input Port address3
o Create New Strategy
Name: label_state
Mode: Token
Operation: Label with Reference Table
Reference Table: us_states (Your project)
Label: state]
Add Labeler ( or use exiting one)
o Add Input Port cust_start_date from source
o Create new strategy
Name: lbl_date
Mode: Token
Operation 1:
Operation: Label Tokens with Token Set
Name: lbl_date
Select Token Set date from ContentSet_91
o Add Input Port currency from source
o Create new strategy
Name: lbl_currency
Mode: Token
Operation 1:
Operation: Label Tokens with reference table
Informatica_DQ_Content/dictionaries/general/currency_codes_inf
a
Name: lbl_currency

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

o Run the data viewer and examine your results


Should look something like this:

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Lab 4 - Tags
Create and associate new tags - Developer
Objective: Create new tags and associate to objects in Developer

Create Tag Steps

Open Developer
Open Window / Preferences
Select Tags
View Out of the Box Tags (These will appear when you install 9.1 accelerators,
which this image does not have)

Create new tags:


o Customer
o Product

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Content

Associate Tag Steps

Open Developer
Apply Tags to Data Sources
o Open Source
o Navigate to Tags View
o Select Edit
o Apply Tag
Apply Tags to content set expressions
o Open Content Set
o Navigate to Tags View
o Select Edit
o Apply Content Tag to all elements

Create and associate new tags - Analyst


Objective: Create new tags and associate to objects in Analyst

Create Tag Steps

Open Analyst
Select Actions / Show Tags
Create new tag:
o Order
o Address

Associate Tag Steps


Open Profile_order
(You probably have not already profiled the order table. Create a data object using the
flat file order in the c:\DQ_DATA directory and profile it, columns only.)
Apply Tags to data columns
o Show Tags view
o Select Address related columns
Apply Address Tag
o Select Order related columns
Apply Order Tag
o Select Customer name related columns
Apply Customer Tag
Go to project view
o Select RTM fnames
Apply Customer and Content Tags

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Lab 5 - Match Enhancements


Pre-req for image
Copy 3 .ysp files from C:\Informatica\9.1.0\services\DQContent\INFA_Content\identity\
To a newly created folder called default at
C:\Informatica\9.1.0\services\DQContent\INFA_Content\identity\default

Key Gen and Match Analysis


Objective: Identify potential duplicates

Steps

Create New Mapping: m_match_customer


Add c:\DQ_Data\aml_demo_data source
Add Key Generator
o Use String strategy on iso_ctry_code
right click on the key generator and Select Analyse Detail from the menu
o Review the following information:
Estimated processing time
Groups above the recommended threshold
o Edit the desired throughput value and observe how estimated processing
time changes
o Edit min and max group size values
o Select groups above the threshold from the dropdown list and drilldown to the
record level
Re-configure Key Generator to zip_or_postcode port
Run GroupKey Analysis again and observe the results
Add Match transform and configure as follows
o Field Matching (Single Source)
o Edit Distance on contact_name
o Threshold 0.6
Select Runtime Analysis by MatchType from the right click menu
o Review results
Select Output Analysis of Clusters from the right click menu
o Review results (There is a bug in this Beta release so you may get funny
results)
Repeat the above steps for a Match transform configured for Identity Matching

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Lab 6 New Exception Transform


Objective: Identify and manually correct exception records
Think of this as further down in a mapping where you have already run data cleansing and
address validation. You have also run a data quality check on the phone field. Now that
youve done that, you need to decide what records pass and which need manual
intervention.

Steps

Create New Mapping: m_exception_records


Add c:\DQ_data\cleansed_customer source
Add Decision
o Assign a score of 60 to records with AddressStatus Incomplete Address or
Invalid Address Line and Phone Status Incomplete Phone
o Assign a score of 90 to all remaining records
if AddressStatus= 'Incomplete Address'
or AddressStatus='Invalid Address Line'
or PhoneStatus='Incomplete Phone'
then
score:=60
else
score:=90
endif

Add Exception transform and configure as follows


o Bad Records Exception
o Table BAD RECORDS in Staging DB
o Connect data ports
o Add AddressStatus and PhoneStatus ports to Labels input
o Connect the score port to Inputs >> Control >> Score.
o Records with a score between 40 and 90 to be reviewed manually
o Send good records to standard output and bad records to Bad Records table
o Map AddressStatus and PhoneStatus ports to respective issues in the priority
tab

Run the mapping (Data Viewer)

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

o Run the data viewer on the Exception transform


In Analyst, add the newly created table to the project view
Review available filters
Select Edit mode
Correct a number of records and select Save All Corrected Records
Open the audit trail and review the changes made
Hover over the new value to see the old value

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Lab 7 - Data Quality for MS Excel


Install DQ for Excel
Objective: Perform base install of DQ for Excel

Steps

Extract lwp3.zip (from lab files provided) to Desktop


Close Excel
Run setup.exe
o Note this will connect to internet to download base Excel DLLs required to run
Add-In
Once complete, open Excel
Check Informatica Ribbon is available

Use DQ for Excel


Objective: Add new service to DQ for Excel and use with sample data

Steps

Export WSDL from service created in DQ/WS lab


Open Excel
Select Informatica Ribbon and Add Service
Point to WSDL file extracted from first step
Open customer.xlsx (see Desktop on 9.1 TTT VM, folder DQ_data_enablement91)
Use imported service to parse Name column into First and Last names

Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

Lab 8 - Profiling Labs


Profiling
1. Add 3 Data Objects (Location: c:\DQ_DATA) Tool Matters (make sure you click
the because there are embedded commas in the data)
a. Customer Orders
b. Product
2. Profile Customer Columns only
3. Delete profile
4. Profile All the tables at once
a. 2 ways
i. Select all the data objects and profile
ii. Profile one add the other two
1. Add a Prefix of DW_ to the objects
5. Profile Customer Column, Primary Key and Dependency
a. Take all the defaults (but keep hitting next not finish to see them)
Remember to select the columns for PK and Functional Dependency
b. View PK Results
c. Select Cust_number verify
i. What happened to the display
d. View Violations
e. Select cust_Number 15952672 and Drill down
i. What is the difference between the records?
f. View Functional Dependencies
i. Why are there blank determinants?
g. Select a column and verify
i. What happened in the Display?
6. Profile Orders All three (Column, Primary Key and Dependency) and take defaults
a. View column profile
i. What is the Key to this table?
b. View Primary Key inference
i. What is the key?
c. View the Functional Dependencies
i. Can you identify any potential Sub-tables
7. Delete the orders profile
8. Re-profile orders This time override and change the default options
Hint: Be careful in changes to these options otherwise you will be here till Saturday
a. Primary Key Minimum Percent = 100
b. Dependency - Minimum Percent = 100
i. View results
ii. Sort on the determinant column
iii. Can you identify a key?
iv. Can you identify a potential sub-table?
9. Go back and modify the profile description (PD) to change the minimum to 75
i. What does this show you?
10.Delete the profile
Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

11.Profile orders dependency only


a. Exclude all columns from determinants except sales_id
b. Verify all fields that show 100%
12.Go back and modify the profile definition to profile all three tasks
a. Did it work?
13.Profile orders all three tasks
a. Verify the primary key
b. Look at dependencies
i. Why are form, ingredient_list, on_hand and segment not determined by
the primary key
ii. How can I fix segment and form to show they are determined by the
primary key?

Filters
1. Delete Customer profile
2. Profile customer adding filters
a. Address3 = NY
b. Address3 = CA
c. Address3 = NY and address2 = NEW YORK
2. Run each profile, view results, modify Profile definition and run the next one.

Drilldowns
1.
2.
3.
4.
5.
6.

Open the Customer Profile in the Analyst tool


Remove any active filters and re-run the profile
Go to address3 and drill down on NY
Edit the drill down filter and add address3 = NEW YORK (yes case matters)
Add another field Zip_or_postcode < 10020
Add to the filter, iso_ctry_code is not null or != USA or != U.S.A.

Modeling
1. Create a Profile model called Whatever with orders, customer and product
a. Select the customer object, right click and data object profile
b. Profile all 3 steps setting options you think are appropriate
c. View results
d. In Primary Key inference, right click and add cust_number to model
2. Go back to the default view and create and run a data object profile for orders
a. View results
b. Add item+order to the model
3. Go back to the default view and create and run a data object profile for product
a. Make sure you verify all your keys
b. Add product_id to the model
4. Select customer and orders and profile foreign keys
a. Verify and add relationship to the model
b. Re-select the relationship and view the Venn diagram
Informatica Data Quality 9.1

DATA QUALITY 9.1 Training

c. Double click on the non-overlapping orders.


i. See orders without a valid customer ID
ii. Double click on customer and see all the deadbeat customers.
5. Go back to the default view and do the same for orders and product changing the
inference options (Trim values and case sensitivity)
a. Verify the relationship (if it makes sense)
b. Re-select the relationship and view the Venn diagram.
i. Find the products that dont have open orders.
ii. Find the most ordered products.
6. FINALLY (well almost), select customer and do a join profile
a. Add orders
b. Add a join on cust_number and GCLOC

Generating a Mapping from a Profile


1. Open the Customer Profile
2. Add an OOTB rule to validate cust_start_date
a. You may want to copy it and change the date format if you want it to actually
work correctly.
2. Add an OOTB rule to validate Last_Order_Date
a. See above
3. Create a rule to validate iso_ctry_code (US is only valid value)
a. IIF(iso_ctry_code ='US','Valid','Invalid')
4. Add an OOTB rule to remove punctuation from contact_name
5. Add more as appropriate or stop to get out early.
6. Run the profile (anything interesting?)
7. Right click on profile and generate a mapping
8. View the mapping and see the results and behavior.

Informatica Data Quality 9.1

You might also like