You are on page 1of 4

Where (n How) to use Token Parser

Transformation in Informatica Data Quality


IDQ (Informatica data quality) is a super set of power center. To

use IDQ you need to take a


separate license. Another way around is that you can create a custom
transformation in IDQ which in turn can be used in power center. Informatica data
quality tool is different from profiling feature of Informatica. IDQ gives many in built
features to streamline the data. As well it gives tremendous capability to business
to review data ensures greater business involvement which helps to achieve
single truth of data. So if your organization is considering addressing BIG DATA
challenges then this tool can help your job done easily.

Parsing ideally refers as an activity where a system can analyze and breakdown the information into
meaningful chunk. We are exactly going to achieve the same using this Data Quality transformation.
IDQ parser is meant to identify one or more data elements in an input field and to write each element to a
different output field.
Parsing allows you to have greater control over the information in each column. For example

Your source file name includes generic file name, date, country, option type (say
coseorder_xxxxsecurity.20121016113056.EU.CNC.dat); you may have to preserve them as
independent columns in audit.

Data field that contains a persons full name, William Shakespeare You can use the Parser
transformation to split the full name into separate data columns for the first name and last name.

Any data field that has a structure that can be described in a regular expression. VAT numbers,
SSN, PAN, credit card numbers etc are all the valid candidates

After the data is parsed into new columns, one can create custom data quality operations for each column.
Primarily two types of parser transformations are there

TOKEN BASED Parse the input based on the


matching token sets, expressions or reference
table entries in token parser transformation.
In this parser transformation can be configured in two ways; Predefined forms of
token sets to parse data columns into component strings. For example ZIP
codes, phone numbers, and Social Security numbers etc has a particular format
and IDQ have them all in place, so just pick the token you need.

Custom Token sets or Expressions. Example I from above fall in this category. Parse data that matches
reference table entries or custom regular expressions that you enter.

PATTERN BASED Parse the patterns made


of multiple strings, one can define custom
patterns for this in token parser
transformation.
This kind of parser to be designed and configured along with a Labeler
transformation, which I will discuss later
Let us discuss a case pertaining this I want to load the data from a source file to target applying some
business rules, which is not a point of discussion now. Two things to be achieved in it:

R1. Full name to be parsed to First and Last names, Dont process the record if the full name format is other
than First_Name Last_Name, this need to written into the reject file;

R2. File name to be parsed and the data should be preserved in audit file

Token Parser Transformation Sample Mapping


Before we proceed with the solution, Why REGEX? Because it allow you to output desired number of fields

10 STEPS LEARN to configure a simple


parse transformation with regular expression.

Create a parser transformation

Go to Properties -> Strategies -> Add New Parser

Enter parser name, description, choose input fields. Go to Next

Choose Parse using Token set; Go to Next

Select Regular Expression, click choose (It will open Regex editor window)

Name it, go to next

Enter Number of desired output fields, for this example

Number of outputs 5

Regular expression -([A-Za-z._%-]+)\.([0-9]+)\.([A-Z]{2-3})\.([A-Z]{3})\.([A-Za-z]{3})

Enter sample data to validate expression

Enter output fields

Click Finish

Select the newly added regular expression and go to Next.

Done

Token Parser Transformation


This can either be exported to power center as a transformation or you can create a simple mapping (as shown) and
can export it as a mapplets; I chose the latter here.
In data quality define the format, preserve in model repository so that the format is available across the projects,
deploy to Power Center to use it in your mapping. In addition, it can be profiled, can be shared across your systems.

You might also like