Professional Documents
Culture Documents
Informatica PowerCenter Level II Developer Lab Guide Version 8.1 September 2006
Copyright (c) 19982006 Informatica Corporation. All rights reserved. Printed in the USA. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Informatica Corporation does not warrant that this documentation is error free. Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software are copyrighted by DataDirect Technologies, 1999-2002. Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved. Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica, as-is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration is a registered trademark of Meta Integration Technology, Inc. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2005 The Apache Software Foundation. All rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit and redistribution of this software is subject to terms available at http://www.openssl.org. Copyright 1998-2003 The OpenSSL Project. All Rights Reserved. The zlib library included with this software is Copyright (c) 1995-2003 Jean-loup Gailly and Mark Adler. The Curl license provided with this Software is Copyright 1996-200, Daniel Stenberg, <Daniel@haxx.se>. All Rights Reserved. The PCRE library included with this software is Copyright (c) 1997-2001 University of Cambridge Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel. The source for this library may be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/ pcre. InstallAnywhere is Copyright 2005 Zero G Software, Inc. All Rights Reserved. Portions of the Software are Copyright (c) 1998-2005 The OpenLDAP Foundation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License, available at http://www.openldap.org/software/release/license.html. This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775 and other U.S. Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation as is without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this documentation at any time without notice.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Obtaining Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Obtaining Informatica Professional Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Providing Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
iii
vi
Preface
Welcome to PowerCenter, Informaticas software product that delivers an open, scalable data integration solution addressing the complete life cycle for all data integration projects including data warehouses and data marts, data migration, data synchronization, and information hubs. PowerCenter combines the latest technology enhancements for reliably managing data repositories and delivering information resources in a timely, usable, and efficient manner. The PowerCenter metadata repository coordinates and drives a variety of core functions, including extracting, transforming, loading, and managing data. The Integration Service can extract large volumes of data from multiple platforms, handle complex transformations on the data, and support high-speed loads. PowerCenter can simplify and accelerate the process of moving data warehouses from development to test to production.
vii
Document Conventions
This guide uses the following formatting conventions:
If you see > It means Indicates a submenu to navigate to. Example Click Repository > Connect. In this example, you should click the Repository menu or button and choose Connect.
Indicates text you need to type or enter. Database tables and column names are shown in all UPPERCASE. Indicates a variable you must replace with specific information. The following paragraph provides additional facts. The following paragraph provides suggested uses or a Velocity best practice.
Click the Rename button and name the new source definition S_EMPLOYEE. T_ITEM_SUMMARY Connect to the Repository using the assigned login_id. Note: You can select multiple objects to import by using the Ctrl key. Tip: The m_ prefix for a mapping name is
viii
Informatica Documentation Informatica Customer Portal Informatica web site Informatica Developer Network Informatica Knowledge Base Informatica Professional Certification Informatica Technical Support
The site contains information on how to create, market, and support customer-oriented add-on solutions based on interoperability interfaces for Informatica products.
Providing Feedback
Email any comments on this guide to aconlan@informatica.com.
support@informatica.com for technical inquiries support_admin@informatica.com for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http:// my.informatica.com .
North America / South America Informatica Corporation Headquarters 100 Cardinal Way Redwood City, California 94063 United States Toll Free 877 463 2435 Standard Rate United States: 650 385 5800 Europe / Middle East / Africa Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN United Kingdom Toll Free 00 800 4632 4357 Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511 445 Asia / Australia Informatica Business Solutions Pvt. Ltd. 301 & 302 Prestige Poseidon 139 Residency Road Bangalore 560 025 India Toll Free Australia: 00 11 800 4632 4357 Singapore: 001 800 4632 4357 Standard Rate India: +91 80 5112 5738
Objectives
Use a dynamic lookup cache to update and insert rows in a customer table Use a Router transformation to route rows based on the NewLookupRow value Use an Update Strategy transformation to flag rows for update or insert
Duration
45 minutes
Mapping Overview
Update the existing customer list with new and updated information. On demand None None None None CUST_ID
Sources
Files File Name updated_customer_list.txt Create shortcut from DEV_SHARED folder File Location In the Source Files directory on the Integration Service process machine.
Targets
Tables Table Name CUSTOMER_LIST Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Keys PK_KEY CUST_ID
Detailed Overview
Repository Object Name m_DYN_update_customer_list_xx Shortcut_to_updated_customer_list SQ_Shortcut_to_updated_customer_list LKP_CUSTOMER_LIST Object Type Mapping Source Definition Source Qualifier Lookup Description and Instructions m_DYN_update_customer_list_xx Flat file in $PMSourceFileDir directory. Create shortcut from DEV_SHARED folder. Connect to input/output ports of the Lookup transformation, LKP_CUSTOMER_LIST. Lookup transformation based on the target definition Shortcut_to_CUSTOMER_LIST and the target table CUSTOMER_LIST. - Change the input/output port names prepend them with IN_. - Use dynamic caching. - Define the lookup condition using the customer ID ports. - Configure the Lookup properties so it inserts new rows and updates existing rows. (Insert Else Update) - Ignore NULL inputs for all lookup/output ports except CUST_ID and PK_KEY. - Associate input/output ports with a similar name for each lookup/output port. - PK_KEY must be an integer in order to specify Sequence-ID as the Associated Port. - Connect the NewLookupRow port and all lookup/output ports to RTR_Insert_Update. Create two output groups with the following names: - UPDATE_EXISTING: Condition is NewLookupRow=2. Connect output ports to UPD_Update_Existing. - INSERT_NEW: Condition is NewLookupRow=1. Connect output ports to UPD_Insert_New. Do not connect any of the NewLookupRow ports to any transformation. Do not connect the Default output group ports to any transformation. Update Strategy Expression DD_INSERT. Connect all input/output ports to CUSTOMER_LIST_Insert. Update Strategy Expression DD_UPDATE. Connect all input/output ports to CUSTOMER_LIST_Update. First instance of the target table definition in EDWxx schema. Create shortcut from DEV_SHARED folder of the CUSTOMER_LIST target definition. In the mapping, rename the target instance name to CUSTOMER_LIST_Insert. Second instance of the target table definition in EDWxx schema. Create shortcut from DEV_SHARED folder of the CUSTOMER_LIST target definition. In the mapping, rename the target instance name to CUSTOMER_LIST_Update.
RTR_Insert_Update
Router
CUSTOMER_LIST_Update
Target Definition
Instructions
Step 1: Create Mapping
1. 2.
Connect to the PC8A_DEV repository using Developerxx as the user name and developerxx as the password. Create a mapping called m_DYN_update_customer_list_xx, where xx is your student number. Use the mapping details described in Detailed Overview on page 3 for guidelines. Figure 1-1 shows an overview of the mapping you must create:
Figure 1-1. m_DYN_update_customer_list_xx Mapping
In the m_DYN_update_customer_list_xx mapping, preview the target data to view the rows that exist in the table.
2.
Use the ODBC_EDW ODBC connection to connect to the target database. Use EDWxx as the user name and password.
Figure 1-2. Preview Target Data for CUSTOMER_LIST Table Before Session Run
2406 Glnrdge Strtford Dr New York 6917 Roswell Rd Ne 146 W 16th St New York New York
221 Colonial Homes Dr NW New York 260 King St 544 9th Ave San Francisco San Francisco
Open updated_customer_list.txt in a text editor. The updated_customer_list.txt source file contains the following data:
CUST_ID,FIRSTNAME,LASTNAME,ADDRESS,CITY,STATE,ZIP 67001,Thao,Nguyen,1200 Broadway Ave,Burlingame,CA,94010 67002,Maria,Gomez,390 Stelling Ave,Cupertino,CA,95014 67003,Jean,Carlson,555 California St,Menlo Park,CA,94025 67004,Chris,Park,13450 Saratoga Ave,Santa Clara,CA,95051 55002,Anish,Desai,400 W Pleasant View Ave,Hackensack,NJ,07601 55006,Bianco,Lo,900 Seville Dr,Clarkston,GA,30021 55003,Janice,MacIntosh,,,, 67003,Jean,Carlson,120 Villa St,Mountain View,CA,94043
3. 4. 5.
Notice that the row for customer ID 55003 contains some NULL values. You do not want to insert the NULL values into the target, you only want to update the other column values in the target. Notice that the file contains two rows with customer ID 67003. Because of this, you must use a dynamic cache for the Lookup transformation. Close the file.
Open the Workflow Manager and open your ~Developerxx folder. Create a workflow named wf_DYN_update_customer_list_xx. Create a session named s_m_DYN_update_customer_list_xx using the m_DYN_update_customer_list_xx mapping. In the session, verify that the target connection is EDWxx. Verify that the Target load type is set to Normal and the Truncate target table option is not checked. Verify the specified source file name is updated_customer_list.txt and the specified location is $PMSourceFileDir.
Preview the target data from the mapping to verify the results. Figure 1-3 shows the Preview Data dialog box for the CUSTOMER_LIST table:
Figure 1-3. Preview Target Data for CUSTOMER_LIST Table After Session Run
MacIntosh 1538 Chantilly Dr Ne Ernest Gibiser Lo Bradley Freeman Morton Nguyen Gomez Carlson Park
2406 Glnrdge Strtford Dr New York 6917 Roswell Rd Ne 900 Seville Dr New York Clarkston
221 Colonial Homes Dr NW New York 260 King St 544 9th Ave 1200 Broadway Ave 390 Stelling Ave 120 Villa St 13450 Saratoga Ave San Francisco San Francisco Burlingame Cupertino Mountain View Santa Clara
Look at customer ID 55003. It should not contain any NULLs. Look at customer ID 67003. It should contain data from the last row for customer ID 67003 in the source file.
Technical Description
A Worklet will be created with a Worklet variable to define the time the Workflow started plus one hour. A Timer Task will be created in the Worklet to wait for one hour before sending an email. If the session runs for less than an hour a Control Task will be issued to stop the timer.
Objectives
Create a Workflow Create a Worklet Create a Timer Task Create an Email Task Create a Control Task Create a condition to control the Email Task
Duration
30 minutes
Worklet Overview
Workflow Overview
Instructions
Step 1: Setup
Connect to the PC8A_DEV repository in the Designer and Workflow Manager.
m_DIM_CUSTOMER_ACCT_xx m_DIM_CUSTOMER_ACCT_STATUS_xx
s_m_DIM_CUSTOMER_ACCT_xx s_m_DIM_CUSTOMER_ACCT_STATUS_xx
Create a Worklet called wl_DIM_CUSTOMER_ACCT_LOAD_xx. Open the Worklet and create the following tasks.
Create a Timer task and name it tim_SESSION_RUN_TIME. Edit the Timer task and click the Timer tab. Select the Relative time: radio button.
10
4.
Select the Start after 1 Hour from the start time of this task.
Create an Email task and name it eml_SESSION_RUN_TIME. Click the Properties tab. For the Email User Name type - administrator@anycompany.com. For the Email Subject type - session s_m_DIM_CUSTOMER_ACCT_xx ran an hour or longer. For the Email Text type an appropriate message.
11
Create a Control task and name it ctrl_STOP_SESS_TIMEOUT. Edit the Control task and click the Properties tab. Set the Control Option attribute to Stop parent.
Add s_m_DIM_CUSTOMER_ACCT_xx to wl_DIM_CUSTOMER_ACCT_LOAD_xx. Verify source connections are ODS and source file name is customer_type.txt. Verify target connections are EDWxx. Verify lookup connections are valid - DIM tables to EDWxx, ODS tables to ODS. Truncate target table. Ensure Target Load Type is Normal.
Link Start to tim_SESSION_RUN_TIME and s_m_DIM_CUSTOMER_ACCT_xx. Link tim_SESSION_RUN_TIME to eml_SESSION_RUN_TIME. Link s_m_DIM_CUSTOMER_ACCT_xx to ctrl_STOP_SESS_TIMEOUT Link.
Add s_m_DIM_CUSTOMER_ACCT_STATUS_xx to wf_DIM_CUSTOMER_ACCT_LOAD_xx. Verify source connections are ODS and source file name is customer_type.txt. Verify target connections are EDWxx.
Lab 2 Informatica PowerCenter 8 Level II Developer
12
4. 5. 6.
Verify lookup connections are valid - DIM tables to EDWxx, ODS tables to ODS. Truncate target table. Ensure Target Load Type is Normal.
In the Workflow Monitor, click the Filter Tasks button in the toolbar, or select Filters > Tasks from the menu. Make sure to show all of the tasks. When you run your workflow, the Task View should look as follows.
13
14
Technical Description
Use workflow variables to calculate when the session starts. The starting time of the session has to be at the top of the hour on or after 6 a.m. and not on or after 6 p.m. To accomplish this, the workflow will run continuously.
Objectives
Create and use workflow variables Create an Assignment Task Create a Timer Task
Duration
30 minutes
Workflow Overview
15
Instructions
Step 1: Setup
Connect to PC8A_DEV Repository in the Designer and Workflow Manager.
m_SALES_DEPARTMENT_xx
s_m_SALES_DEPARTMENT_xx
2. 3.
Add reusable session s_m_SALES_DEPARTMENT_xx to the Workflow. Source Database Connection should be ODS. Target Database Connection should be EDWxx. Ensure Target Load Type is Normal.
Lab 3 Informatica PowerCenter 8 Level II Developer
16
5.
Create a Timer Task called tim_SALES_DEPARTMENT_START. Edit the Timer task and click the Timer tab. Select the Absolute time: radio button. Select the Use this workflow date-time variable to calculate the wait radio button. Select the ellipsis to browse variables. Double click on wf_SALES_DEPARTMENT_xx. Select $$NEXT_START_TIME as the workflow variable. Save.
Note: The above functions could be nested together in one assignment expression if desired.
2. 3. 4.
Create a link from asgn_SALES_DEPARTMENT_START_TIME to tim_SALES_DEPARTMENT_START. Create a link from tim_SALES_DEPARTMENT_START to s_m_SALES_DEPARTMENT_xx. Save the repository.
Edit workflow wf_SALES_DEPARTMENT_xx. Click the SCHEDULER Tab. Verify that the scheduler is Non Reusable. Edit the schedule. Click the Schedule Tab. Click Run Continuously.
7. 8. 9.
Click OK. Click OK. Save the repository. This will start the workflow.
18
Note: Notice that assignment task as already executed and the timer task is running.
2. 3.
Browse the Workflow Log. Verify the results of the Assignment expressions in the log file. Listed below are examples:
Variable [$$TRUNC_START_TIME], Value [05/23/2004 16:00:00]. Variable [$$HOUR_STARTED], Value [16]. Variable [$$NEXT_START_TIME], Value [05/23/2004 17:00:00].
4.
Verify the Load Manager message that tells when the timer task will complete. Listed below is an example message:
INFO : LM_36606 [Sun May 23 16:05:02 2004] : (2288|2004) Timer task instance [TM_SALES_DEPARTMENT_START]: The timer will complete at [Sun May 23 17:00:00 2004].
5.
6.
At or near the top of the hour, open the monitor to check the status of the session. Verify that it starts(started) at the desired time. Below is an example:
7. 8.
After the session completes, notice that the workflow automatically starts again. If the workflow starts after 5 p.m., the timer message in the workflow log will show that the timer will end at 6 a.m. the following morning. Listed below is an example:
19
INFO : LM_36608 [Sun May 23 17:00:25 2004] : (2288|2392) Timer task instance [TM_SALES_DEPARTMENT_START]: Timer task specified to wait until absolute time [Mon May 24 06:00:00 2004], specified by variable [$$NEXT_START_TIME]. INFO : LM_36606 [Sun May 23 17:00:25 2004] : (2288|2392) Timer task instance [TM_SALES_DEPARTMENT_START]: The timer will complete at [Mon May 24 06:00:00 2004]. 9. 10.
Stop or abort the workflow at any time. Afterwards, edit the workflow scheduler and select RUN ON DEMAND. Save the repository.
20
Objectives
Configure a mapping, session, and workflow for recovery. Recover a suspended workflow.
Duration
30 minutes
21
Instructions
Step 1: Copy the Workflow
1. 2. 3. 4. 5. 6.
Open the Repository Manager. Copy the wkf_Stage_Customer_Contacts_xx workflow from the SOLUTIONS_ADVANCED folder to your folder. In the Workflow Manager, open the wkf_Stage_Customer_Contacts_xx workflow. Rename the workflow to replace xx with your student number. Rename the session in the workflow to replace xx with your student number. Save the workflow.
Open the wkf_Stage_Customer_Contacts_xx workflow. Edit the workflow, and on the General tab, select Suspend on Error.
3. 4.
Edit the s_m_Stage_Customer_Contacts_xx session and click the Properties tab. Scroll to the end of the General Options settings and select Resume from last checkpoint for the Recovery Strategy.
5.
Click the Mapping tab and change the target load type to Normal.
Note: When you configure a session for bulk load, the session is not recoverable using the resume recovery strategy. You must use normal load.
6.
22
7.
Edit the s_m_Stage_Customer_Contacts_xx session, and click the Mapping tab. The source in the mapping uses a file list, customer_list.txt. To make the session encounter an error, you will change the value in the Source Filename session property.
2. 3. 4.
On the Sources node, change the source file name to customer_list1234.txt. Click the Config Object tab. In the Error Handling settings, configure the session to stop on one error.
5.
Step 4: Run the Workflow, Fix the Session, and Recover the Workflow
1.
Run the workflow. The Workflow Monitor shows that the Integration Service suspends the workflow and fails the session.
23
3.
Notice that the Integration Service failed the session. Next, you will fix the session.
4. 5. 6. 7.
In the Workflow Manager, edit the session. On the Mapping tab, enter customer_list.txt as the source file name. Save the workflow. In the Workflow Manager, right-click the workflow, and choose Recover Workflow. The Workflow Monitor shows that the Integration Service is running the workflow and that the session is running as a recovery run.
24
When the session and workflow complete, the Workflow Monitor shows that the session completed successfully as a recovery run.
Open the session log. Search for session run completed with failure.
Notice that the Integration Service continues to write log events to the same session log.
25
10.
26
Technical Description
A flag will be created to tell PowerCenter when a new set of Invoice numbers are found. A Transaction Control Transformation will be created to tell the database when to issue a commit.
Objectives
Create a flag to check for new INVOICE_NOs Commit upon seeing a new set of INVOICE_NOs
Duration
45 minutes
Mapping Overview
27
Sources
Tables Table Name ODS_LINE_ITEM Create shortcut from DEV_SHARED folder Schema/Owner ODS Selection/Filter
Targets
Tables Table Name DIM_LINE_ITEM Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Key LINE_ITEM_NO
28
Detailed Overview
Transformation Name Mapping ODS_LINE_ITEM Shortcut_to_sq_ODS_LINE_ITEM Type Mapping Source Definition Source Qualifier Description m_DIM_LINE_ITEM_xx Table Source definition in ODS schema. Create shortcut from DEV_SHARED folder. Send to srt_DIM_LINE_ITEM: LINE_ITEM_NO, INVOICE_NO, PRODUCT_CODE, QUANTITY, DISCOUNT, PRICE, COST Sort by INVOICE_NO Send to exp_DIM_LINE_ITEM INVOICE_NO SEND to tc_DIM_LINE_ITEM: LINE_ITEM_NO, INVOICE_NO, PRODUCT_CODE, QUANTITY, DISCOUNT, PRICE, COST Uncheck the 'o' on INVOICE_NO Create a variable called v_PREVIOUS_INVOICE_NO as a decimal 10,0 to house the value of the previous row's INVOICE_NO. Expression: INVOICE_NO Create a variable called v_NEW_INVOICE_NO_FLAG as an Integer to set a flag to check whether the current row's INVOICE_NO is the same as the previous row's INVOICE_NO Expression: IIF(INVOICE_NO=v_PREVIOUS_INVOICE_NO, 0,1) Move v_NEW_INVOICE_NO_FLAG above v_PREVIOUS_INVOICE_NO Create an output port called NEW_INVOICE_NO_FLAG_out as a integer to hold the value of the flag Expression: v_NEW_INVOICE_NO_FLAG SEND to tc_DIM_LINE_ITEM: NEW_INVOICE_NO_FLAG_out On the ports tab, delete the _out from NEW_INVOICE_FLAG_out On the properties tab enter the following Transaction Control Condition: IIF(NEW_INVOICE_NO_FLAG=1, TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION) SEND to DIM_LINE_ITEM: LINE_ITEM_NO, INVOICE_NO, PRODUCT_CODE, QUANTITY, DISCOUNT, PRICE, COST Target definition in the EDWxx schema. Create a shortcut from DEV_SHARED folder.
srt_DIM_LINE_ITEM
Sorter
exp_DIM_LINE_ITEM
Expression
tc_DIM_LINE_ITEM
Transaction Control
Shortcut_to_DIM_LINE_ITEM
Target Definition
29
Instructions
Step 1: Create Mapping
Create a mapping called m_DIM_LINE_ITEM_xx, where xx is your student number. Use the mapping details described in the previous pages for guidelines.
Open ~Developerxx folder. Create workflow named wf_DIM_LINE_ITEM_xx. Create session named s_m_DIM_LINE_ITEM_xx. In the session, edit Mapping tab and expand the Sources node. Under Connections verify that the Connection Value is ODS. Expand the Targets node and verify that the Connection value is correct, the Target load type is set to Normal and the Truncate target table option is checked.
30
31
32
Technical Description
Records will be committed when a new group of VENDOR_IDs comes in. This will require a flag to be set to determine whether a VENDOR_ID is new or not. Rows will need to be rolled back if an error occurs. An error flag will be set when a business rule is violated.
Objectives
Use a Transaction Control Transformation to Commit based upon Vendor IDs and issue a rollback based upon errors.
Duration
60 minutes
Mapping Overview
33
Sources
Files File Name PRODUCT.txt Create shortcut from DEV_SHARED folder File Location In the Source Files directory on the Integration Service process machine
Targets
Tables Table Name DIM_VENDOR_PRODUCT Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Key
The VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE are needed to populate DIM_VENDOR_PRODUCT. ODS.VENDOR_ID = PRODUCT.VENDOR_ID N/A VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE
Lab 6 Informatica PowerCenter 8 Level II Developer
34
Detailed Overview
Transformation Name Mapping PRODUCT.txt Sq_Shortcut_To_ PRODUCT Type Mapping Source Definition Source Qualifier Description m_DIM_VENDOR_PRODUCT_TC_xx Drag in Shortcut from DEV_SHARED Data Source Qualifier for flat file SEND PORT to exp_SET_ERROR_FLAG: PRODUCT_CODE, VENDOR_ID, CATEGORY, PRODUCT_NAME, MODEL, PRICE exp_SET_ERROR_FLAG Expression Output port: ERROR_FLAG Expression: IIF(ISNULL(PRODUCT_CODE) OR ISNULL(CATEGORY), TRUE, FALSE) Send all output ports to srt_VENDOR_ID. srt_VENDOR_ID Sorter Sort data ascending by VENDOR_ID & ERROR_FLAG. This puts any error records at the end of each group. SEND all PORTS to exp_SET_TRANS_TYPE. SEND PORTS to lkp_ODS_VENDOR: VENDOR_ID
35
Type Expression
Description 1. Create a variable called v_PREV_VENDOR_ID as a Decimal with precision of 10 to house the value of the previous vendor. Expression: VENDOR_ID 2. Create a variable port called v_NEW_VENDOR_ID_FLAG as an integer to check and see if the current VENDOR_ID is new. Expression: IIF(VENDOR_ID != v_PREV_VENDOR_ID, TRUE, FALSE) Variables can be used to remember values across rows. V_PREV_VENDOR_ID must always hold the value of the previous VENDOR_ID, so it must be placed after v_NEW_VENDOR_ID_FLAG 3. Create an output port as a string(8) called TRANSACTION_TYPE to tell the Transaction Control Transformation whether to CONTINUE, COMMIT, or ROLLBACK. Expression: IIF(ERROR_FLAG = TRUE, 'ROLLBACK', IIF(v_NEW_VENDOR_ID_FLAG = TRUE, 'COMMIT', 'CONTINUE')) Since we sorted to put error records at the end of each group, when we ROLLBACK, we'll be rolling back the whole group. 4. SEND all output PORTS to tc_DIM_VENDOR_PRODUCT.
lkp_ODS_VENDOR
Lookup
Create a connected lookup to ODS.ODS_VENDOR. Create an input port for the source data field VENDOR_ID Rename VENDOR_ID1 to VENDOR_ID_in Set Lookup Condition: VENDOR_ID = VENDOR_ID_in SEND PORTS to tc_DIM_VENDOR_PRODUCT VENDOR_NAME, FIRST_CONTACT, VENDOR_STATE Expression: DECODE(TRANSACTION_TYPE, 'COMMIT', TC_COMMIT_BEFORE, 'ROLLBACK', TC_ROLLBACK_AFTER, 'CONTINUE', TC_CONTINUE_TRANSACTION) // If we're starting a new group, we need to COMMIT the // prior group. // If we hit an error, we need to ROLLBACK the current // group including the current record. PORTS to SEND to DIM_VENDOR_PRODUCT: All ports except for TRANSACTION_TYPE
tc_DIM_VENDOR_PRODUCT
Transaction Control
Shorcut_To_DIM_VENDOR_PROD UCT
Target Table
All data without errors will be routed here Create shortcut from DEV_SHARED folder
36
Instructions
Step 1: Create Mapping
Create a mapping called m_DIM_VENDOR_PRODUCT_TC_xx, where xx is your student number. Use the mapping details described in the previous pages for guidelines.
Open ~Developerxx folder. Create workflow named wf_DIM_VENDOR_PRODUCT_TC_xx. Create session named s_m_DIM_VENDOR_PRODUCT_TC_xx Source file is found in the Source Files directory on the Integration Service machine Verify that the source filename is PRODUCT.txt (extension required) Verify target database connection value is EDWxx Verify target load type is Normal Select Truncate for DIM_VENDOR_PRODUCT Set Lookup connection to ODS
37
38
Technical Description
Instead of using a Transaction Control Transformation, route the Fatal Errors off to a Fatal Error table and route the Nonfatal Errors off to a Nonfatal table. All good data will be sent to the EDW.
Objectives
Trap all database errors and load them to a table called ERR_FATAL. Trap the dirty data coming through from the CATEGORY field and write it to a table called ERR_NONFATAL. Write all data without fatal or nonfatal errors to DIM_VENDOR_PRODUCT.
Duration
60 minutes
39
Mapping Overview
40
Sources
Files File Name PRODUCT.txt Create shortcut from DEV_SHARED folder File Location In the Source Files directory on the Integration Service process machine.
Targets
Tables Table Name DIM_VENDOR_PRODUCT Create shortcut from DEV_SHARED folder Tables Table Name ERR_NONFATAL Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Key ERR_ID Schema/Owner Update EDWxx Delete Insert yes Unique Key
41
Schema/Owner Update
42
The VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE are needed to populate DIM_VENDOR_PRODUCT. ODS.VENDOR_ID = PRODUCT.VENDOR_ID N/A VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE
43
Target Table DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT
Source Column Derived Value from lkp_ODS_VENDOR Derived Value from lkp_ODS_VENDOR PRODUCT_NAME CATEGORY MODEL PRICE Derived Value from lkp_ODS_VENDOR
Expression Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY.
Detailed Overview
Transformation Name Mapping PRODUCT.txt Shortcut_To_sq_PRODUCT Type Mapping Flat File Source Definition Source Qualifier Description m_DIM_VENDOR_PRODUCT_xx Drag in Shortcut from DEV_SHARED Source Qualifier for flat file. Create shortcut from DEV_SHARED folder exp_ERROR_TRAPPING Expression Check to see if PRODUCT_CODE is NULL Derive ISNULL_PRODUCT_CODE_out by creating an output port CODE: IIF(ISNULL(PRODUCT_CODE),'FATAL','GOOD DATA') Check to see if CATEGORY is NULL Derive INVALID_CATEGORY_out by creating an output port IIF(ISNULL(CATEGORY), 'NONFATAL', 'GOOD DATA') Derive ERR_RECORD_out by creating an output port that concatenates the entire record. Use a To_Char function to convert all non-strings to strings SEND PORTS to lkp_ODS_VENDOR: VENDOR_ID SEND PORTS to rtr_PRODUCT_DATA: PRODUCT_CODE, ISNULL_PRODUCT_CODE_out, VENDOR_ID, CATEGORY, INVALID_CATEGORY_out, PRODUCT_NAME, MODEL, PRICE, REC_NUM, ERR_RECORD_out
44
Type Lookup
Description Create a connected lookup to ODS.ODS_VENDOR Create an input port for the source data field VENDOR_ID Rename VENDOR_ID1 to VENDOR_ID_in Set Lookup Condition: VENDOR_ID = VENDOR_ID_in SEND PORTS to rtr_PRODUCT_DATA: VENDOR_NAME, FIRST_CONTACT, VENDOR_STATE Create groups to route the data off to different paths: Group = NONFATAL_ERRORS CODE: INVALID_CATEGORY_out='NONFATAL' Group = FATAL_ERRORS CODE: ISNULL_PRODUCT_CODE_out='FATAL' The default group will contain rows that do not match the above conditions, hence all good rows. PORTS TO SEND TO exp_ERR_NONFATAL: NONFATAL_ERRORS.PRODUCT_CODE PORTS to SEND to ERR_NONFATAL: NONFATAL_ERRORS.REC_NUM, NONFATAL_ERRORS.ERR_RECORD PORTS to SEND to exp_ERR_FATAL: FATAL_ERRORS.PRODUCT_CODE PORTS to SEND to ERR_FATAL: FATAL_ERRORS.REC_NUM, FATAL_ERRORS.ERR_RECORD PORTS to SEND to DIM_VENDOR_PRODUCT: DEFAULT.PRODUCT_CODE, DEFAULT.VENDOR_ID, DEFAULT.VENDOR_NAME, DEFAULT.VENDOR_STATE, DEFAULT.PRODUCT_NAME, DEFAULT.CATEGORY, DEFAULT.MODEL, DEFAULT.PRICE, DEFAULT.FIRST_CONTACT
rtr_PRODUCT_DATA
Router
exp_ERR_FATAL
Expression
Derive ERR_DESCRIPTION_out by creating an output port CODE: 'NULL VALUE IN KEY' Derive LOAD_DATE_out by creating an output port CODE: SESSSTARTTIME PORTS to SEND to ERR_FATAL: LOAD_DATE_out, ERR_DESCRIPTION_out Derive ERR_DESCRIPTION_out by creating an output port CODE: INVALID CATEGORY' Derive LOAD_DATE_out by creating an output port CODE: SESSSTARTTIME PORTS to SEND to ERR_NONFATAL: LOAD_DATE_out, ERR_DESCRIPTION_out Generate the ERR_ID for ERR_FATAL Generate the ERR_ID for ERR_NONFATAL Traps all of the FATAL ERRORS Traps all NONFATAL ERRORS All good data to be loaded into the target table.
exp_ERR_NONFATAL
Expression
45
Instructions
Step 1: Create Mapping
Create a mapping called m_DIM_VENDOR_PRODUCT_xx, where xx is your student number. Use the mapping details described in the previous pages for guidelines.
Open ~Developerxx folder. Create workflow named wf_DIM_VENDOR_PRODUCT_xx. Create session named s_m_DIM_VENDOR_PRODUCT_xx. Source file is found in the Source Files directory on the Integration Service process machine.
4. 5. 6. 7. 8.
Verify source file name is PRODUCT all Uppercase with an extension of .txt. Verify the target database connect is EDWxx. Change the target load type to Normal. Truncate DIM_VENDOR_PRODUCT. Set Lookup connection to ODS.
46
ERR_FATAL
DIM_VENDOR_PRODUCT
47
48
Objectives
Duration
15 minutes
49
Instructions
Step 1: Create a Query to Search for Targets with Customer
First, you will create a query that searches for target objects with the string customer in the target name.
1.
In the Designer, choose Tools > Queries. The Query Browser appears. Click New to create a new query. Figure 8-4 shows the Query Editor:
Figure 8-4. Query Editor
2.
3. 4. 5. 6. 7.
In the Query Name field, enter targets_customer. In the Parameter Name column, select Object Type. In the Operator column, select Is Equal To. In the Value 1 column, select Target Definition. Click the New Parameter button. Notice that the Query Editor automatically adds an AND operator for the two parameters.
AND Operator
50
8.
Edit the new parameter to search for object names that contain the text customer.
The PowerCenter Client displays a dialog box stating if the query is valid or not. If the query is not valid, fix the error and validate it again.
2.
Click Save. The PowerCenter Client saves the query to the repository.
3.
Click Run. The Query Results window shows the results of the query you created. Your query results might include more objects than in the following results:
Some columns only apply to objects in a versioned repository, such as Version Comments, Label Name, and Purged By User.
51
Close the Query Editor, and create a new query. Enter product_inventory_mapping_dependents as the query name. Edit the first parameter so the object name contains product. Add another parameter, and choose Include Children and Parents for the parameter name.
Note: When you search for children and parents, you enter the following information in the value columns:
Value 1. Object type(s) for dependent object(s), the children and parents. Value 2. Object type(s) for the object(s) you are querying. Value 3. Reusable status of the dependent object(s).
Click the arrow in the Value 1 column, select the following objects, and click OK:
6.
7.
52
The query returned objects in all folders in the repository. Next, you will modify the query so it only returns objects in your folder.
In the Query Editor, place the cursor somewhere in the last parameter and then add a new parameter. Modify the parameter so it searches for folders equal to the SOLUTIONS_ADVANCED folder. Validate and save the query. Run the query.
53
Notice that the even though the query says to include parent and child objects, it does not display any parent objects to the mapping. Parent objects of a mapping include sessions, worklets, and workflows. When you run a query accessed by the Designer, the query results only display Designer objects. Similarly, when you run a query accessed by the Workflow Manager, the query results only display Workflow Manager objects. In the next step, you will run the same query accessed by the Repository Manager.
Open the Repository Manager and connect to the repository. Open the Query Browser. For details on how to do this, see Create a Query to Search for Targets with Customer on page 50. Select the product_inventory_mapping_dependents query, and run it by clicking Execute.
54
Notice that the query results show all parent (and child) objects, including Workflow Manager objects, such as workflows.
55
56
Technical Description
The session that needs to be optimized is wf_FACT_MKT_SEGMENT_ORDERS_xx. This session runs a mapping that reads in a flat file of order data, finds the customer market segment information, aggregates the orders and writes the values out to a relational table. The support group needs to find the bottleneck(s), determine the cause of the bottleneck(s) and then reduce the bottleneck(s). The reduction in run time must be at least 30%.
Objectives
Use learned techniques to determine and reduce the bottleneck(s) that exist.
Duration
120 minutes
Object Locations
ProjectX folder
57
Workshop Details
Overview
This workshop is designed to assist the developers with the task at hand. It does not give detailed instructions on how to identify a bottleneck, determine the cause of a bottleneck or how to optimize the session/mapping. The approach to take is left entirely up to the discretion of the developers. The optimization techniques to use are also left up to the developers. The workshop will provide instructions on establishing a typical read baseline and on running the original session. The suggested steps to follow are:
1. 2. 3.
Establish a typical read baseline Run the original session Identify and reduce the bottlenecks
Target Source Mapping Session Important: For detailed information on identifying bottlenecks and reducing bottlenecks, see the Performance Tuning Guide in the PowerCenter online help. To access the online help, press the F1 key in any of the PowerCenter Client tools. In the online help, click the Contents tab and expand the section for the Performance Tuning Guide.
Workshop Rules
The rules of the workshop are:
Developers must work in teams of two. Partitioning cannot be used to optimize the session. Data results must match the initial session run. Think out of the box. Ask the instructor any questions that come to mind.
In the Repository Manager, copy the wf_Source_Baseline_xx workflow from the ProjectX folder to your folder. In the Workflow Manager, open the wf_Source_Baseline_xx workflow in your folder.
Lab 9 Informatica PowerCenter 8 Level II Developer
3.
Edit the session named s_m_Source_Baseline_xx, and click the Mapping tab:
a. b. c.
Edit the Sources node and ensure the database connection is ODS. Edit the Targets node and change the Writer from Relational Writer to File Writer. Change the Targets Properties for the Output and Reject filenames to include your assigned student number.
4. 5.
Save, start and monitor the workflow. Document the results in the table provided in Documented Results on page 65.
In the Repository Manager, copy the wf_FACT_MKT_SEGMENT_ORDERS_xx workflow from the ProjectX folder to your folder. In the Workflow Manager, edit the session named s_m_FACT_MKT_SEGMENT_ORDERS_xx located in the wf_FACT_MKT_SEGMENT_ORDERS_xx workflow in your folder. In the Mapping Tab, edit the Sources node:
a. b.
Ensure the ORDER_LINE_ITEM source filename value is daily_order_line_item.dat. Ensure the ODS_INVOICE_SUMMARY database connection is ODS.
4.
Ensure the database connection is EDWxx. Ensure the Target load type is set to Normal. Ensure the Truncate target table option is checked.
5. 6.
Save, start and monitor the workflow. Document the results in the table provided in Documented Results on page 65.
59
Calculates totals for quantity, revenue and cost for market segments. Values are summarized by customer, date, market segment, region and item. On demand None None None
SOURCES
Tables Table Name daily_order_line_item Schema/Owner Flat File Selection/Filter This is a daily order line item file that contains order information for customers. The file contains 1,328,667 rows of order data for August 29, 2003 and is sorted by order id. This file is joined to the ODS_INVOICE_SUMMARY relational table in order to retrieve the payment type that the customer uses. It is assumed that the customer uses the same payment type each time. The payment types are CREDIT CARD, DEBIT CARD, CASH and CHECK The source file is called daily_order_line_item.dat. The location for the file can be found by checking the service variable $PMSourceFileDir. ODS_INVOICE_SUMMARY ODS This is a monthly summary of customer invoice data. The table contains invoice number, customer, order date, payment type and amount. The Primary Key is Invoice Number. The table contains 2,686,668 rows.
TARGETS
Tables Table Name FACT_MKT_SEGMENT_ORDERS Schema Owner Update EDWxx Delete Insert Yes Unique Key ORDER_KEY (system generated)
60
LOOKUPS
Lookup Name Table lkp_ITEM_ID DIM_ITEM Location EDWxx
Description
The FACT_MKT_SEGMENT_ORDERS fact table needs to have the ITEM_KEY stored on it as a Foreign Key. The item id contained in the source will be matched with the item id in the DIM_ITEM table to retrieve the ITEM_KEY. The cost of each item needs to be obtained from this table and used in the calculation of item costs for each row written to the target. This table contains 27 rows. DIM_ITEM.ITEM_ID = ORDER_LINE_ITEM.ITEM_ID N/A ITEM_KEY, COST lkp_CUSTOMER_INFO DIM_CUSTOMER_PT Location EDWxx
Description
The FACT_MKT_SEGMENT_ORDERS fact table needs to have the customer key stored on it as a Foreign Key. The CUSTOMER_ID contained in the source will be matched with the CUSTOMER_ID in the DIM_CUSTOMER_PT table to retrieve the customer key (C_CUSTKEY). The market segment of each customer is also retrieved and used in aggregate groupings. This table contains 1,000,000 rows. DIM_CUSTOMER_PT.C_CUST_ID = ORDER_LINE_ITEM.CUSTOMER_ID N/A C_CUSTKEY, C_CUST_ID, C_MKTSEGMENT
61
Source Table
Expression The market segment that the customer belongs in. Obtained via a lookup to the DIM_CUSTOMER_PT dimension table. Derived based on customer id. If the customer id is: < 50000 the region is 'WEST', >= 50000 and < 95000 the region is 'CENTRAL', >= 95000 and < 120000 the region is 'SOUTH', >= 120000 and < 200501 the region is 'EAST', >= 200501 the region will be 'UNKNOWN' Foreign Key referencing the DIM_ITEM table. Obtained via a lookup to the DIM_ITEM dimension table on the ITEM_ID column. SUM of the (COST * QUANTITY). COST is obtained via a lookup to the DIM_ITEM dimension table.
REGION
Derived Value
ITEM_KEY
Derived Value
ORDER_COST
Derived Value
62
DETAILED OVERVIEW
Transformation Name Mapping Shortcut_to_ORDER_LINE_ITEM Sq_Shortcut_to_ORDER_LINE_IT EM Type Mapping Source Definition Source Qualifier Description m_FACT_MKT_SEGMENT_ORDERS_xx Flat file containing daily order information for each customer. Contains orders for August 29, 2003. This table contains 1,328,667 rows Flat File Source Qualifier Sent to jnr_PAYMENT_TYPE: All Ports Shortcut_to_ODS_INVOICE_SUM MARY Source Qualifier Relational table containing a summary of the invoices for the month. This table contains data from August 1, 2003 through August 29, 2003. The key is INVOICE_NO and the table contains 2,686,668 rows Sq_Shortcut_To_ODS_INVOICE_S UMMARY Source Qualifier Relational Source Qualifier Sent to jnr_PAYMENT_TYPE: All Ports Jnr_PAYMENT_TYPE Joiner Joiner transformation that joins the ORDER_LINE_ITEM table to the ODS_INVOICE_SUMMARY table. Master Source: ORDER_LINE_ITEM Detail Source: ODS_INVOICE_SUMMARY Join Condition: ORDER_DATE = ORDER_DATE CUSTOMER_ID = CUSTOMER_ID Sent to lkp_ITEM_ID: ORDER_LINE_ITEM: ITEM_ID Sent to lkp_CUSTOMER_INFO: ORDER_LINE_ITEM: CUSTOMER_ID Sent to exp_SET_UNKNOWN_KEYS: ORDER_LINE_ITEM: ORDER_DATE, QUANTITY, PRICE ODS_INVOICE_SUMMARY: PYMT_TYPE lkp_ITEM_ID Lookup Lookup transformation that obtains item keys from the DIM_ITEM table. The DIM_ITEM table is located in the EDWxx schema. Lookup Condition ITEM_ID from DIM ITEM = ITEM_ID from ORDER_LINE_ITEM Sent to exp_SET_UNKNOWN_KEYS: ITEM_KEY, COST
63
Type Lookup
Description Lookup transformation that obtains customer keys from the DIM_CUSTOMER_PT table. The DIM_CUSTOMER_PT table is located in the EDWxx schema. Lookup Condition CUSTOMER_ID from DIM_CUSTOMER_PT = CUSTOMER_ID from ORDER_LINE_ITEM Sent to exp_SET_UNKNOWN_KEYS: C_CUSTKEY, C_CUST_ID, C_MKTSEGMENT
exp_SET_UNKNOWN_KEYS
Expression
Expression Transformation that sets values for missing columns (item key, mktsegment). It also defines the region the customer belongs in. Output Ports: MKTSEGMENT_out Formula: IIF( ISNULL(MKTSEGMENT), 'UNKNOWN', MKTSEGMENT) ITEM_ID_out Formula: IIF(ISNULL(ITEM_KEY), 0.00, ITEM_COST) REGION_OUT Formula: IIF(C_CUST_ID > 0 AND C_CUST_ID < 50000, 'WEST', IIF(C_CUST_ID >= 50000 AND C_CUST_ID < 95000, 'CENTRAL', IIF(C_CUST_ID >= 95000 AND C_CUST_ID < 120000, 'SOUTH', IIF(C_CUST_ID >= 120000 AND C_CUST_ID < 200501, 'EAST', 'UNKNOWN')))) Sent to agg_VALUES: All output ports
agg_VALUES
Aggregator
Aggregator transformation that calculates the revenue, quantity and cost Group by ports: C_CUSTKEY, ORDER_DATE, MKTSEGMENT, REGION, ITEM_KEY Output ports: ORDER_QUANTITY Formula: SUM(QUANTITY) ORDER_REVENUE Formula: SUM(PRICE * QUANTITY) ORDER_COST Formula: SUM(ITEM_COST * QUANTITY) Sent to FACT_MKT_SEGMENT_ORDERS: All output ports
64
Description Sequence Generator transformation that populates the system generated ORDER_KEY Sent to FACT_MKT_SEGMENT_ORDERS: NEXTVAL
Shortcut_to_FACT_MKT_SEGMEN T_ORDERS
Target Definition
Documented Results
Session Name ETL Read Baseline Original Session Write to Flat File Test (Target) Filter Test (Source) Read Mapping Test (Source or Mapping) Filter Test (Mapping) Rows Processed Rows Failed Start Time End Time Elapsed Time (Secs) Rows Per Second
65
66
Technical Description
The sessions/mappings that are in need of analysis are:
s_m_Target_Bottleneck_xx. This session reads in a relational source that contains customer account balances for the year. s_m_Items_Bottleneck_xx. This mapping reads a large flat file of item sold data, filters out last years stock, applies some row level manipulation, performs a lookup to get cost information and then loads the data into an Oracle table.
Note: The s_m_Items_Bottleneck_xx mapping is a hypothetical example. It does not exist in the repository.
s_m_Source_Bottleneck_xx. This mapping reads in one relational source that contains customer account balances and another relational source that contains customer demographic information. The two tables are joined at the database side. s_m_Mapping_Bottleneck_xx. This mapping reads in a flat file of order data, finds the customer market segment information, filters out rows that haven't sold more than one item, aggregates the orders and writes the values out to a relational table.
The support group needs to review each one of these sessions to if it makes sense to partition the session.
Objectives
Review the sessions and based on knowledge gained from the presentations determine what partitioning, if any, should be done.
Duration
60 minutes
Object Locations
ProjectX folder
67
Workshop Scenarios
Scenario 1
The session in question is s_m_Target_Bottleneck_xx has been optimized already but it is felt that more can be done. The machine that the session is running on has 32 Gig of Memory and 16 CPUs. The mapping takes account data from a relational source, calculates various balances and then writes the data out to the BalanceSummary table. The BalanceSummary table is an Oracle table that the DBA has partitioned by the account_num column. Answer the following questions:
Question I. II. III. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers
IV. V.
68
2.
Click the Mapping > Partitions tab to see the partition points.
3.
Select each Transformation and look at the window at the bottom of the screen to see what partition type is being used for that particular partition point.
Partition Test
The purpose of this section is to implement partitioning on the s_m_Target_Bottleneck_xx session.
1. 2. 3. 4. 5. 6. 7. 8. 9.
Copy the wf_Target_Bottleneck_xx workflow and rename it to wf_Target_Bottleneck_Partition_xx Edit the s_m_Target_Bottleneck_xx Session located in the wf_Target_Bottleneck_Partition_xx workflow and rename in to s_m_Target_Bottleneck_Partition_xx Click the Mapping tab, and then click the Partitions tab On the Partitions tab, select the Shortcut_to_BalanceSummary transformation, click the Edit Partition Point icon and add two new partitions Select Key Range from the drop down box and click OK Leave <**All**> selected in the Key Range drop down menu Click on Edit Keys - this allows the definition of the columns that are going to be in the key range Add the Account_num column to the Key Range and select OK Input the following ranges for the 3 partitions
Partition #1 - start range 1, end range 3500 Partition #2 - start range 3500, end range 7000 Partition #3 - start range 7000
10. 11.
Select the SQ_Shortcut_to_Source2 partition Point and edit the partition point Select Key Range from the drop down box
69
12. 13.
Add the Account_num column to the Key Range and select OK Input the following ranges for the 3 partitions
Partition #1 - start range 1, end range 3500 Partition #2 - start range 3500, end range 7000 Partition #3 - start range 7000
14. 15.
Save, start and monitor the Workflow Compare the results against the original session results and against the indexed session results. Is there a performance gain?
Conclusion
The instructor will discuss the answers to the questions in the lab wrap-up.
Scenario 2
Note: The mapping shown in this scenario is a hypothetical example. It does not exist in the repository.
The session in question is s_m_Items_Bottleneck_xx has been running slowly and the Project manager wants it optimized. The machine that this is running on has 8 Gig of Memory and 4 CPUs. The mapping takes items sold data from a large flat file, transforms it and writes out to an Oracle table. The flat file comes from one location and splitting it up is not an option. The second Expression transformation is very complex and takes a long time to push the rows through.
Mapping Overview
IV. V.
70
Conclusion
The instructor will discuss the answers to the questions in the lab wrap-up.
Scenario 3
The session in question is s_m_Source_Bottleneck_xx has been running slowly and the Project manager wants it optimized. The machine that this is running on has 2 Gig of Memory and 2 CPUs. The mapping reads one relational source that contains customer account balances and another relational source that contains customer demographic information. The tables are joined at the database side, the rows are then pushed through an expression transformation and loaded into an Oracle table.
Mapping Overview
IV. V.
Conclusion
The instructor will discuss the answers to the questions in the lab wrap-up.
Scenario 4
The session in question is s_m_Mapping_Bottleneck_Sorter_xx is still not running quite as fast as is needed. The machine that this is running on has 24 Gig of Memory and 16 CPUs. The mapping reads a flat file source that is really 3 region specific flat files being read from a file list. The rows are then passed through two lookups to obtain item costs and customer information. It is then sorted and aggregated before being loaded into an Oracle table. The customer is part of the sort key and the DBA has
71
partitioned the Oracle table by customer_key. What can be done to further optimize this session/ mapping?
Mapping Overview
IV. V.
Conclusion
The instructor will discuss the answers to the questions in the lab wrap-up.
72
Answers
Scenario 1
Question I. II. III. IV. V. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers 3 Source Qualifier, Target Yes Key_Range at both the source and the target This will add multiple connections to the source and target which will result in data being read concurrently. This will be faster.
Scenario 2
Question I. II. III. IV. V. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers 3 Source Qualifier and Target Yes Additional pass-through at the exp_complex_calculations transformation This will add one more pipeline stage which in turn will give you an additional buffer to move data.
Scenario 3
Question I. II. III. IV. V. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers 3 Source Qualifier, Target No - Each partition takes at least between 1-2 CPUs N/A N/A
73
Scenario 4
Question I. II. III. IV. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? Answers 4 Source Qualifier, Aggregator and Target Yes 3 Partitions - Key-Range at Target - Split the source into the 3 regions specific files and read each one into one of the partitions - Hash Auto Keys an the Sorter Transformation. This will also allow you to remove the partition point at the aggregator if you like. Additional connections at the target will load faster. You need to split the source flat file into the 3 region specific files because you can have only one connection open to a flat file The Hash Auto-Keys is required to make sure that there is no overlap at the aggregator. You could also remove the partition point at the aggregator if you like. If the flat files significantly vary in size then you may want to add a round robin somewhere. In this particular mapping this will not make sense to do this.
V.
74