You are on page 1of 6

Slowly changing Dimension

Implementation & Target updates


with MD5/CRC32 function
(CHECKSUM)

The Purpose of this document is to brief about the implementation of Checksum
Function MD5 and CR32 in SCD2 and Target updates handling large number of
columns.

In general when requirement come to implement SCD type 2, we look out for any date field
in the Target. If not date field we go for SCD type 2 with flag column, still we can go for
SCD type two with versioning. This all scenario holds good when there is a date column or
flag column in the table its easy for a developer to implement SCD type2.
Lets say we come across a situation where there are no date columns or flag column, then
how to implement SCD type 2. Usually we come across such situation when you are dealing
with very old legacy system which doesnt have such fields in the table. Then the solution is
to generate hash code for the entire row and use it as identifier column for reference. To
achieve this informatica allow us to use hash function namely MD5 () Message Digest and
CRC32 () Cyclic Redundancy Check. The hash code in ETL perspective is referred as
checksum value
Now we will try this in a simple example to see how does it works. I am considering a
simple flat file and a target table for this scenario.


The above is the mapping for the same. So to begin with in expression transformation
create two new output ports like below



And edit the expression of the two new ports as below
MD5_CHECKSUM MD5 (COL1||COL2|COL3||..||COLn)
CRC32_CHECKSUM CRC32(COL1||COL2|COL3||..||COLn)
So how does it works, we are passing a concat value of all the fields to these hash function.
It will create a unique hash value for these columns and we can use it for further unique key
reference.
NOTE: Its good to handle NULL values in the column which are used by NVL2() function to
avoid any data discrepancies. This will be shown in other section.
Next in look up transformation look up on the primary key for any changes and pass
primary key and MD5_CHECKSUM and CRC32_CHECKSUM to a router transformation and
populate rest fields from the source.







In Router create groups for insert n update rows as below
INSERT ISNULL(LKP_PK)
UPDATE NOT ISNULL(LKP_PK) AND ( LKP_MD5_CHECKSUM != MD5_CHECKSUM AND
LKP_CRC_CHECKSUM!=CRC32_CHECKSUM)

Here you can use either of MD5 or CRC32 function, I have used both just for illustration. So
once you have driven the records for INSERT and UPDATE rows perform SCD type 2 then.


This way you can implement SCD type 2 without bothering about DATE or FLAG columns in
the table.









Handling NULLs while using CRC32/MD5 functions:
Handling NULLs while using CRC32/MD5 functions in informatica is quite tricky some times. Care should
be taken while using these functions especially when using for updating the SCDs.
Approach:
Here is an approach which proves worthy when handling NULLs while using CRC32/MD5 functions in
informatica. Here below is a table with values which shows how the results will be erratic when NULLs
are not handled properly.



The above table depicts that when calculating the MD5/CRC32 of the concatenated columnar values
will ignore NULL values. Different sets of data produce same set of MD5/CRC32 values.
Instead prior concatenation if the NULL values are replaced with or substituted with some character the
functionality of the MD5/CRC32 functions remains same and the purpose will be served when updating
the values of an SCD.
Here below is how the approach works:



The above table depicts that Different sets of data produce Different sets of MD5/CRC32 values.
Result:
This approach will avert from miss functionality when the functions like MD5/CRC32 are used to
(update/insert) handle SCDs.

You might also like