Professional Documents
Culture Documents
sensitive data
1
What is Data Masking
Why Data Masking
DM Transformation - Informatica
Different Masking Rules
Key Masking
Random Masking
Inbuilt Masking Rules
Substitution Masking
Procedure followed – DM requirement in DICoE
Challenges faced in AASI Data Masking
2
Transformation of sensitive information into de-identified, realistic-looking
data
3
There are requirements in the Enterprise for production data in non-
production environments for needs like
Development
Test
Data Analysis and training
Organizations take immense measures to secure private data in
production environments. As a result the non-prod environments become
an attractive target to the malicious users.
There rises need to use the prod data in testing environments in a way
such that the sensitive data is masked yet realistic.
Informatica power center data masking option protects the sensitive
information by masking it while maintaining the original nature of data
and preserving the referential integrity.
Pre-requisite for Data Masking transformation is Infa 8.5.1. In AMP DM
server components are installed in Infa 8.6.1
4
Data Masking feature can be utilized by just adding a new transformation
– Data Masking transformation in the mapping.
Non-Deterministic Randomization
Deterministic and Repeatable masking
Blurring – adding variance value to the original data
Substitute original data with false unrealistic data
5
Different Masking Rules
Key Masking :
produces deterministic data.
Maintain referential integrity by the use of seed value.
DM transformation requires seed value to return deterministic data. DM
transformation creates default seed value and is also editable. Default seed
value is a random number between 1 and 1,000.
Numeric Masking – Field in the source file or table can be configured for
numeric key masking to generate repeatable outputs.
Date Masking – This masking rule can be used if a particular date column
needs to be masked in such a way that it maintains referential integrity.
6
7
Random Masking :
to generate non-deterministic data
The Data Masking transformation returns different values when the
same source value occurs in different rows.
Numeric Masking
Rules that can be applied for numeric random masking
Range – define range of the masked value
Blurring – generate masked values that are within the fixed or percentage variance of
source data.
String Masking
Similar rules as string key masking. In addition there will be option to specify the range
of string length.
8
9
Date Random Masking
Masking rules that can be applied
Range - upper/lower bound of the masked date value
The default date time format is MM/DD/YYYY HH24:MI:SS.
Blur – mask date based on the variance applied to the unit of date.
Blur unit can be year, month day or hour. Default is year.
DM applies variance to the selected blur unit and for other units random
numbers are substituted.
For example, to restrict the masked date to a date within two years of
the source date, select year as the unit. Enter two as the low and high
bound.
10
Inbuilt Masking Rules
Inbuilt masking rules that can be applied
SSN
Credit card
URL / IP address
Phone
Email address
11
Substitution Masking:
Example: FirstNames.dic
This file will contain SNo column and FirstNames column. In the mapping we can
generate a random number using the DM, give random number as input to lookup
thereby lookup for the SNo and get the first name from the lookup file. Suppose the dic
file has 100 names in it, while generating random numbers range can be specified and 1
to 100.
12
Identify sensitive fields
Documenting DM requirement in proper format– ideally it should have
table/file name , attribute/field name , DM required (Y/N) , PK/FK relation ,
Rule Type and Description
If requirement has common fields to be masked across files/tables ,
creating mapplet with the “to be masked” fields would be helpful.
Coding/Testing
13
Few challenges faced in AASI Data Masking
In the AASI DM requirement, the source and target were MF files. So to ensure
that our DM mappings makes no impact to the fields which does not require
masking , we had Only the “fields to be masked” in the Data Map and all others
were declared as Filler with binary data type.
Masking Format defined for attributes should be in sync with the actual character
feed from source as in case of String Masking. For example, if the mask format is
defined to have “alphabets – A” but value from source is having special characters
(@,$ etc.), error will pop up - “Invalid input mask format”
SSN Masking accepts only valid TAX_ID as input example: XXX-XX-XXXX.
So if we are planning to use inbuilt SSN masking, we need to take call on whether
to use SSN masking by transforming the input source value to proper format or
simply use numeric/string key masking.
Maintaining Data Quality – DM transformation produces masked output based on
inbuilt algorithms, so even if null or 0 values are passed as input, DM generates a
masked output value. But it may be that the downstream teams using the masked
data may need to check for NULL or 0 values in the source. So we need to make
sure that the data quality is maintained. As in the above case, we may have to
apply a transformation to retain source value in case of 0 or NULL.
14
Thank
You
15