Pc8liid Lab Guide

Informatica PowerCenter 8 Level II Developer Lab Guide
Version - PC8LIID 20060910
Informatica PowerCenter Level II Developer Lab Guide Version 8.1 September 2006
Copyright (c) 19982006 Informatica Corporation. All rights reserved. Printed in the USA. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Informatica Corporation does not warrant that this documentation is error free. Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software are copyrighted by DataDirect Technologies, 1999-2002. Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved. Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica, as-is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration is a registered trademark of Meta Integration Technology, Inc. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2005 The Apache Software Foundation. All rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit and redistribution of this software is subject to terms available at http://www.openssl.org. Copyright 1998-2003 The OpenSSL Project. All Rights Reserved. The zlib library included with this software is Copyright (c) 1995-2003 Jean-loup Gailly and Mark Adler. The Curl license provided with this Software is Copyright 1996-200, Daniel Stenberg, <Daniel@haxx.se>. All Rights Reserved. The PCRE library included with this software is Copyright (c) 1997-2001 University of Cambridge Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel. The source for this library may be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/ pcre. InstallAnywhere is Copyright 2005 Zero G Software, Inc. All Rights Reserved. Portions of the Software are Copyright (c) 1998-2005 The OpenLDAP Foundation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License, available at http://www.openldap.org/software/release/license.html. This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775 and other U.S. Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation as is without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this documentation at any time without notice.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Obtaining Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Obtaining Informatica Professional Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Providing Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Lab 1: Dynamic Lookup Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Step 1: Create Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Step 2: Preview Target Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Step 3: View Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Step 4: Create Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Step 5: Run Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Step 6: Verify Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Step 7: Verify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Lab 2: Workflow Alerts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Step 1: Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Step 2: Mappings Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Step 3: Reusable Sessions Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Step 4: Create a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Step 5: Create a Worklet in the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Step 6: Create a Timer Task in the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Step 7: Create an E-Mail Task in the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Step 8: Create a Control Task in the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Step 9: Add Reusable Session to the Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Step 10: Link Tasks in Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Step 11: Add Reusable Session to the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Step 12: Link Tasks in Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Step 13: Run Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Table of Contents Informatica PowerCenter 8 Level II Developer
iii
Lab 3: Dynamic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Step 1: Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Step 2: Mapping Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Step 3: Copy Reusable Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Step 4: Create Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Step 5: Create Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Step 6: Add Session to Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Step 7: Create a Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Step 8: Create an Assignment Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Step 9: Link Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Step 10: Run Workflow by Editing the Workflow Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Step 11: Monitor the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Lab 4: Recover a Suspended Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Step 1: Copy the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Step 2: Edit the Workflow and Session for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Step 3: Edit the Session to Cause an Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Step 4: Run the Workflow, Fix the Session, and Recover the Workflow . . . . . . . . . . . . . . . . . . 23
Lab 5: Using the Transaction Control Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Step 1: Create Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Step 2: Create Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Step 3: Run Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Step 4: Verify Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Step 5: Verify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Lab 6: Error Handling with Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Step 1: Create Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Step 2: Create Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Step 3: Run Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Step 4: Verify Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Step 5: Verify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Lab 7: Handling Fatal and Non-Fatal Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Step 1: Create Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Step 2: Create Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Step 3: Run Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iv Table of Contents Informatica PowerCenter 8 Level II Developer
Step 4: Verify Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Step 5: Verify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Lab 8: Repository Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Step 1: Create a Query to Search for Targets with Customer . . . . . . . . . . . . . . . . . . . . . . . 50 Step 2: Validate, Save, and Run the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Step 3: Create A Query to Search For Mapping Dependencies . . . . . . . . . . . . . . . . . . . . . . . . 52 Step 4: Validate, Save, and Run the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Step 5: Modify and Run the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Step 6: Run the Query Accessed by the Repository Manager . . . . . . . . . . . . . . . . . . . . . . . . . 54 Step 7: Create Your Own Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Lab 9: Performance and Tuning Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Workshop Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Workshop Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Establish ETL Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Documented Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Lab 10: Partitioning Workshop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Workshop Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Scenario 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
vi
Preface
Welcome to PowerCenter, Informaticas software product that delivers an open, scalable data integration solution addressing the complete life cycle for all data integration projects including data warehouses and data marts, data migration, data synchronization, and information hubs. PowerCenter combines the latest technology enhancements for reliably managing data repositories and delivering information resources in a timely, usable, and efficient manner. The PowerCenter metadata repository coordinates and drives a variety of core functions, including extracting, transforming, loading, and managing data. The Integration Service can extract large volumes of data from multiple platforms, handle complex transformations on the data, and support high-speed loads. PowerCenter can simplify and accelerate the process of moving data warehouses from development to test to production.
Preface Informatica PowerCenter 8 Level II Developer
vii
About This Guide

Welcome to the PowerCenter 8 Level II Developer course. This course is designed for data integration and data warehousing implementers. You should be familiar with PowerCenter, data integration and data warehousing terminology, and Microsoft Windows.
Document Conventions
This guide uses the following formatting conventions:
If you see > It means Indicates a submenu to navigate to. Example Click Repository > Connect. In this example, you should click the Repository menu or button and choose Connect.
boldfaced text UPPERCASE italicized text

Note: Tip:
Indicates text you need to type or enter. Database tables and column names are shown in all UPPERCASE. Indicates a variable you must replace with specific information. The following paragraph provides additional facts. The following paragraph provides suggested uses or a Velocity best practice.
Click the Rename button and name the new source definition S_EMPLOYEE. T_ITEM_SUMMARY Connect to the Repository using the assigned login_id. Note: You can select multiple objects to import by using the Ctrl key. Tip: The m_ prefix for a mapping name is
viii
Other Informatica Resources

In addition to the student guides, Informatica provides these other resources:

Informatica Documentation Informatica Customer Portal Informatica web site Informatica Developer Network Informatica Knowledge Base Informatica Professional Certification Informatica Technical Support
Obtaining Informatica Documentation

You can access Informatica documentation from the product CD or online help.
Visiting Informatica Customer Portal

As an Informatica customer, you can access the Informatica Customer Portal site at http:// my.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica Knowledge Base, and access to the Informatica user community.
Visiting the Informatica Web Site

You can access Informaticas corporate web site at http://www.informatica.com . The site contains information about Informatica, its background, upcoming events, and locating your closest sales office. You will also find product information, as well as literature and partner information. The services area of the site includes important information on technical support, training and education, and implementation services.
Visiting the Informatica Developer Network

The Informatica Developer Network is a web-based forum for third-party software developers. You can access the Informatica Developer Network at the following URL:
http://devnet.informatica.com
The site contains information on how to create, market, and support customer-oriented add-on solutions based on interoperability interfaces for Informatica products.
Visiting the Informatica Knowledge Base

As an Informatica customer, you can access the Informatica Knowledge Base at http:// my.informatica.com. The Knowledge Base lets you search for documented solutions to known technical issues about Informatica products. It also includes frequently asked questions, technical white papers, and technical tips.
Obtaining Informatica Professional Certification

You can take, and pass, exams provided by Informatica to obtain Informatica Professional Certification. For more information, go to:
http://www.informatica.com/services/education_services/certification/default.htm
Preface Informatica PowerCenter 8 Level II Developer ix
Providing Feedback
Email any comments on this guide to aconlan@informatica.com.
Obtaining Technical Support

There are many ways to access Informatica Technical Support. You can call or email your nearest Technical Support Center listed in the following table, or you can use our WebSupport Service. Use the following email addresses to contact Informatica Technical Support:

support@informatica.com for technical inquiries support_admin@informatica.com for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http:// my.informatica.com .
North America / South America Informatica Corporation Headquarters 100 Cardinal Way Redwood City, California 94063 United States Toll Free 877 463 2435 Standard Rate United States: 650 385 5800 Europe / Middle East / Africa Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN United Kingdom Toll Free 00 800 4632 4357 Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511 445 Asia / Australia Informatica Business Solutions Pvt. Ltd. 301 & 302 Prestige Poseidon 139 Residency Road Bangalore 560 025 India Toll Free Australia: 00 11 800 4632 4357 Singapore: 001 800 4632 4357 Standard Rate India: +91 80 5112 5738
Lab 1: Dynamic Lookup Cache

Technical Description
You have a customer table in your target database that contains existing customer information. You also have a flat file that contains new customer data. Some rows in the flat file contain new information on new customers, and some contain updated information on existing customers. You need to insert the new customers into your target table and update the existing customers. The source file may contain multiple rows for a customer. It may also contain rows that contain updated information for some columns and NULLs for the columns that do not need to be updated. To do this, you will use a Lookup transformation using a dynamic cache that looks up data on the target table. The Integration Service inserts new rows and updates existing rows in the lookup cache as it inserts and updates rows in the target table. If you configure the Lookup transformation properly, the Integration Service ignores NULLs in the source when it updates a row in the cache and target.
Objectives

Use a dynamic lookup cache to update and insert rows in a customer table Use a Router transformation to route rows based on the NewLookupRow value Use an Update Strategy transformation to flag rows for update or insert
Duration
45 minutes
Mapping Overview
Lab 1 Informatica PowerCenter 8 Level II Developer
Velocity Deliverable: Mapping Specifications

Mapping Name Source System Initial Rows Short Description Load Frequency Preprocessing Post Processing Error Strategy Reload Strategy Unique Source Fields m_DYN_update_customer_list_xx Flat file 8 Target System Rows/Load EDWxx 8
Update the existing customer list with new and updated information. On demand None None None None CUST_ID
Sources
Files File Name updated_customer_list.txt Create shortcut from DEV_SHARED folder File Location In the Source Files directory on the Integration Service process machine.
Targets
Tables Table Name CUSTOMER_LIST Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Keys PK_KEY CUST_ID
Source To Target Field Matrix

Target Column PK_KEY CUST_ID FIRSTNAME LASTNAME ADDRESS CITY Source File or Transformation LKP_CUSTOMER_LIST updated_customer_list.txt updated_customer_list.txt updated_customer_list.txt updated_customer_list.txt updated_customer_list.txt Source Column Sequence-ID CUST_ID FIRSTNAME LASTNAME ADDRESS CITY Yes Yes Yes Yes
Ignore NULL Inputs for Updates (Lookup Transformation)
Target Column STATE ZIP
Source File or Transformation updated_customer_list.txt updated_customer_list.txt
Source Column STATE ZIP
Ignore NULL Inputs for Updates (Lookup Transformation) Yes Yes
Detailed Overview
Repository Object Name m_DYN_update_customer_list_xx Shortcut_to_updated_customer_list SQ_Shortcut_to_updated_customer_list LKP_CUSTOMER_LIST Object Type Mapping Source Definition Source Qualifier Lookup Description and Instructions m_DYN_update_customer_list_xx Flat file in $PMSourceFileDir directory. Create shortcut from DEV_SHARED folder. Connect to input/output ports of the Lookup transformation, LKP_CUSTOMER_LIST. Lookup transformation based on the target definition Shortcut_to_CUSTOMER_LIST and the target table CUSTOMER_LIST. - Change the input/output port names prepend them with IN_. - Use dynamic caching. - Define the lookup condition using the customer ID ports. - Configure the Lookup properties so it inserts new rows and updates existing rows. (Insert Else Update) - Ignore NULL inputs for all lookup/output ports except CUST_ID and PK_KEY. - Associate input/output ports with a similar name for each lookup/output port. - PK_KEY must be an integer in order to specify Sequence-ID as the Associated Port. - Connect the NewLookupRow port and all lookup/output ports to RTR_Insert_Update. Create two output groups with the following names: - UPDATE_EXISTING: Condition is NewLookupRow=2. Connect output ports to UPD_Update_Existing. - INSERT_NEW: Condition is NewLookupRow=1. Connect output ports to UPD_Insert_New. Do not connect any of the NewLookupRow ports to any transformation. Do not connect the Default output group ports to any transformation. Update Strategy Expression DD_INSERT. Connect all input/output ports to CUSTOMER_LIST_Insert. Update Strategy Expression DD_UPDATE. Connect all input/output ports to CUSTOMER_LIST_Update. First instance of the target table definition in EDWxx schema. Create shortcut from DEV_SHARED folder of the CUSTOMER_LIST target definition. In the mapping, rename the target instance name to CUSTOMER_LIST_Insert. Second instance of the target table definition in EDWxx schema. Create shortcut from DEV_SHARED folder of the CUSTOMER_LIST target definition. In the mapping, rename the target instance name to CUSTOMER_LIST_Update.
RTR_Insert_Update
Router
UPD_Insert_New UPD_Update_Existing CUSTOMER_LIST_Insert
Update Strategy Update Strategy Target Definition
CUSTOMER_LIST_Update
Target Definition
Instructions
Step 1: Create Mapping
1. 2.
Connect to the PC8A_DEV repository using Developerxx as the user name and developerxx as the password. Create a mapping called m_DYN_update_customer_list_xx, where xx is your student number. Use the mapping details described in Detailed Overview on page 3 for guidelines. Figure 1-1 shows an overview of the mapping you must create:
Figure 1-1. m_DYN_update_customer_list_xx Mapping
Step 2: Preview Target Data

1.
In the m_DYN_update_customer_list_xx mapping, preview the target data to view the rows that exist in the table.
2.
Use the ODBC_EDW ODBC connection to connect to the target database. Use EDWxx as the user name and password.
Figure 1-2. Preview Target Data for CUSTOMER_LIST Table Before Session Run
The CUSTOMER_LIST table should contain the following data:

PK_KEY 111001 111002 111003 111004 111005 111006 111007 111008 111009 CUST_ID 55001 55002 55003 55004 55005 55006 55007 55008 55009 FIRSTNAME Melvin Anish J Chris Rudolph Bianco Justina Monique Jeffrey LASTNAME Bradley Desai Anderson Ernest Gibiser Lo Bradley Freeman Morton ADDRESS 4070 Morning Trl 2870 Elliott Cir Ne 1538 Chantilly Dr Ne CITY New York New York New York STATE NY NY NY NY NY NY NY CA CA ZIP 30349 30305 30324 30342 30328 10011 30309 94107 94118
2406 Glnrdge Strtford Dr New York 6917 Roswell Rd Ne 146 W 16th St New York New York
221 Colonial Homes Dr NW New York 260 King St 544 9th Ave San Francisco San Francisco
Step 3: View Source Data

1. 2.
Navigate to the $PMSourceFileDir directory. By default, the path is:

C:\Informatica\PowerCenter8.1.0\server\infa_shared\SrcFiles
Open updated_customer_list.txt in a text editor. The updated_customer_list.txt source file contains the following data:
CUST_ID,FIRSTNAME,LASTNAME,ADDRESS,CITY,STATE,ZIP 67001,Thao,Nguyen,1200 Broadway Ave,Burlingame,CA,94010 67002,Maria,Gomez,390 Stelling Ave,Cupertino,CA,95014 67003,Jean,Carlson,555 California St,Menlo Park,CA,94025 67004,Chris,Park,13450 Saratoga Ave,Santa Clara,CA,95051 55002,Anish,Desai,400 W Pleasant View Ave,Hackensack,NJ,07601 55006,Bianco,Lo,900 Seville Dr,Clarkston,GA,30021 55003,Janice,MacIntosh,,,, 67003,Jean,Carlson,120 Villa St,Mountain View,CA,94043
3. 4. 5.
Notice that the row for customer ID 55003 contains some NULL values. You do not want to insert the NULL values into the target, you only want to update the other column values in the target. Notice that the file contains two rows with customer ID 67003. Because of this, you must use a dynamic cache for the Lookup transformation. Close the file.
Step 4: Create Workflow

1. 2. 3. 4. 5. 6.
Open the Workflow Manager and open your ~Developerxx folder. Create a workflow named wf_DYN_update_customer_list_xx. Create a session named s_m_DYN_update_customer_list_xx using the m_DYN_update_customer_list_xx mapping. In the session, verify that the target connection is EDWxx. Verify that the Target load type is set to Normal and the Truncate target table option is not checked. Verify the specified source file name is updated_customer_list.txt and the specified location is $PMSourceFileDir.
Step 5: Run Workflow

Run workflow wf_DYN_update_customer_list_xx.
Step 6: Verify Statistics
Step 7: Verify Results

1.
Preview the target data from the mapping to verify the results. Figure 1-3 shows the Preview Data dialog box for the CUSTOMER_LIST table:
Figure 1-3. Preview Target Data for CUSTOMER_LIST Table After Session Run
The CUSTOMER_LIST table should contain the following data:

PK_KEY 111001 111002 111003 111004 111005 111006 111007 111008 111009 111010 111011 111012 111013 2. 3. CUST_ID 55001 55002 55003 55004 55005 55006 55007 55008 55009 67001 67002 67003 67004 FIRSTNAME Melvin Anish Janice Chris Rudolph Bianco Justina Monique Jeffrey Thao Maria Jean Chris LASTNAME Bradley Desai ADDRESS 4070 Morning Trl 400 W Pleasant View Ave CITY New York Hackensack New York STATE NY NJ NY NY NY GA NY CA CA CA CA CA CA ZIP 30349 07601 30324 30342 30328 30021 30309 94107 94118 94010 95014 94043 95051
MacIntosh 1538 Chantilly Dr Ne Ernest Gibiser Lo Bradley Freeman Morton Nguyen Gomez Carlson Park
2406 Glnrdge Strtford Dr New York 6917 Roswell Rd Ne 900 Seville Dr New York Clarkston
221 Colonial Homes Dr NW New York 260 King St 544 9th Ave 1200 Broadway Ave 390 Stelling Ave 120 Villa St 13450 Saratoga Ave San Francisco San Francisco Burlingame Cupertino Mountain View Santa Clara
Look at customer ID 55003. It should not contain any NULLs. Look at customer ID 67003. It should contain data from the last row for customer ID 67003 in the source file.
Lab 2: Workflow Alerts

Business Purpose
A session usually runs for under an hour. Occasionally, it will run longer. The administrator would like to be notified via an alert if the session runs longer than an hour. A second session is to run after the first session completes.
A Worklet will be created with a Worklet variable to define the time the Workflow started plus one hour. A Timer Task will be created in the Worklet to wait for one hour before sending an email. If the session runs for less than an hour a Control Task will be issued to stop the timer.
Objectives

Create a Workflow Create a Worklet Create a Timer Task Create an Email Task Create a Control Task Create a condition to control the Email Task
Duration
30 minutes
Worklet Overview
Workflow Overview
Instructions
Step 1: Setup
Connect to the PC8A_DEV repository in the Designer and Workflow Manager.
Step 2: Mappings Required

If any of the following mappings do not exist in the ~Developerxx folder, copy them from the SOLUTIONS_ADVANCED folder. Rename the mappings to have the _xx reflect the Developer number.

m_DIM_CUSTOMER_ACCT_xx m_DIM_CUSTOMER_ACCT_STATUS_xx
Step 3: Reusable Sessions Required

If any of the following sessions do not exist in the ~Developerxx folder, copy them from the SOLUTIONS_ADVANCED folder. Resolve any conflicts that may occur. Rename the mappings to have the _xx reflect the Developer number.

s_m_DIM_CUSTOMER_ACCT_xx s_m_DIM_CUSTOMER_ACCT_STATUS_xx
Step 4: Create a Workflow

Create a Workflow called wf_DIM_CUSTOMER_ACCT_LOAD_xx.
Step 5: Create a Worklet in the Workflow

1. 2.
Create a Worklet called wl_DIM_CUSTOMER_ACCT_LOAD_xx. Open the Worklet and create the following tasks.
Step 6: Create a Timer Task in the Worklet

1. 2. 3.
Create a Timer task and name it tim_SESSION_RUN_TIME. Edit the Timer task and click the Timer tab. Select the Relative time: radio button.
10
4.
Select the Start after 1 Hour from the start time of this task.
Step 7: Create an E-Mail Task in the Worklet

1. 2. 3. 4. 5.
Create an Email task and name it eml_SESSION_RUN_TIME. Click the Properties tab. For the Email User Name type - administrator@anycompany.com. For the Email Subject type - session s_m_DIM_CUSTOMER_ACCT_xx ran an hour or longer. For the Email Text type an appropriate message.
11
Step 8: Create a Control Task in the Worklet

1. 2. 3.
Create a Control task and name it ctrl_STOP_SESS_TIMEOUT. Edit the Control task and click the Properties tab. Set the Control Option attribute to Stop parent.
Step 9: Add Reusable Session to the Worklet

1. 2. 3. 4. 5. 6.
Add s_m_DIM_CUSTOMER_ACCT_xx to wl_DIM_CUSTOMER_ACCT_LOAD_xx. Verify source connections are ODS and source file name is customer_type.txt. Verify target connections are EDWxx. Verify lookup connections are valid - DIM tables to EDWxx, ODS tables to ODS. Truncate target table. Ensure Target Load Type is Normal.
Step 10: Link Tasks in Worklet

1. 2. 3.
Link Start to tim_SESSION_RUN_TIME and s_m_DIM_CUSTOMER_ACCT_xx. Link tim_SESSION_RUN_TIME to eml_SESSION_RUN_TIME. Link s_m_DIM_CUSTOMER_ACCT_xx to ctrl_STOP_SESS_TIMEOUT Link.
Step 11: Add Reusable Session to the Workflow

1. 2. 3.
Add s_m_DIM_CUSTOMER_ACCT_STATUS_xx to wf_DIM_CUSTOMER_ACCT_LOAD_xx. Verify source connections are ODS and source file name is customer_type.txt. Verify target connections are EDWxx.
12
4. 5. 6.
Verify lookup connections are valid - DIM tables to EDWxx, ODS tables to ODS. Truncate target table. Ensure Target Load Type is Normal.
Step 12: Link Tasks in Workflow

1. 2.
Link Start to wl_DIM_CUST_ACCT_LOAD_xx. Link wl_DIM_CUST_ACCT_LOAD_xx to s_m_DIM_CUSTOMER_ACCT_STATUS_xx.

1. 2. 3.
In the Workflow Monitor, click the Filter Tasks button in the toolbar, or select Filters > Tasks from the menu. Make sure to show all of the tasks. When you run your workflow, the Task View should look as follows.
13
14
Lab 3: Dynamic Scheduling

Business Purpose
The Department Dimension table must load sales information on an hourly basis during the business day. It does not load during non-business hours (before 6 a.m. or after 6 p.m.). The start time of the loading session should be calculated and started based on the workflow starting time.
Use workflow variables to calculate when the session starts. The starting time of the session has to be at the top of the hour on or after 6 a.m. and not on or after 6 p.m. To accomplish this, the workflow will run continuously.
Objectives

Create and use workflow variables Create an Assignment Task Create a Timer Task
Duration
30 minutes
Workflow Overview
15
Instructions
Step 1: Setup
Connect to PC8A_DEV Repository in the Designer and Workflow Manager.
Step 2: Mapping Required

The following Mapping will be used in this lab. If the below Mapping does not exist in the ~Developerxx folder, copy it from the SOLUTIONS_ADVANCED folder. Change the xx in the mapping name to reflect the Developer Number.
m_SALES_DEPARTMENT_xx
Step 3: Copy Reusable Sessions

Copy the following reusable session from the SOLUTIONS_ADVANCED folder to the ~Developerxx folder. Change the xx in the session name to reflect the Developer Number.
s_m_SALES_DEPARTMENT_xx

Create a Workflow called wf_SALES_DEPARTMENT_xx.
Step 5: Create Workflow Variables

1.
Add three variables as follows:
2. 3.
Click OK. Save.
Step 6: Add Session to Workflow

1. 2. 3. 4.
Add reusable session s_m_SALES_DEPARTMENT_xx to the Workflow. Source Database Connection should be ODS. Target Database Connection should be EDWxx. Ensure Target Load Type is Normal.
16
5.
Truncate the Target Table.
Step 7: Create a Timer Task

1. 2. 3. 4. 5. 6. 7. 8.
Create a Timer Task called tim_SALES_DEPARTMENT_START. Edit the Timer task and click the Timer tab. Select the Absolute time: radio button. Select the Use this workflow date-time variable to calculate the wait radio button. Select the ellipsis to browse variables. Double click on wf_SALES_DEPARTMENT_xx. Select $$NEXT_START_TIME as the workflow variable. Save.
Step 8: Create an Assignment Task

1. 2.
Create an Assignment Task called asgn_SALES_DEPARTMENT_START_TIME. Add the following expressions:
Calculates the absolute workflow start time to the hour

$$TRUNC_START_TIME = TRUNC(WORKFLOWSTARTTIME, 'HH')
Extracts/assigns the hour from the above calculation

$$HOUR_STARTED = GET_DATE_PART($$TRUNC_START_TIME, 'HH')
Calculates/assigns the start time of the session

$$NEXT_START_TIME = IIF($$HOUR_STARTED >= 5 AND $$HOUR_STARTED < 17, ADD_TO_Date($$TRUNC_START_TIME, 'HH',1), DECODE($$HOUR_STARTED, 0, ADD_TO_DATE($$TRUNC_START_TIME, 'HH',6), 1, ADD_TO_DATE($$TRUNC_START_TIME, 'HH',5), 2, ADD_TO_DATE($$TRUNC_START_TIME, 'HH',4), 3, ADD_TO_DATE($$TRUNC_START_TIME, 'HH',3), 4, ADD_TO_DATE($$TRUNC_START_TIME, 'HH',2), 17, ADD_TO_DATE($$TRUNC_START_TIME,'HH',13), 18, ADD_TO_DATE($$TRUNC_START_TIME,'HH',12), 19, ADD_TO_DATE($$TRUNC_START_TIME,'HH',11), 20 ,ADD_TO_DATE($$TRUNC_START_TIME,'HH',10), 21, ADD_TO_DATE($$TRUNC_START_TIME,'HH',9), 22, ADD_TO_DATE($$TRUNC_START_TIME,'HH',8), 23, ADD_TO_DATE($$TRUNC_START_TIME,'HH',7)))
Note: The above functions could be nested together in one assignment expression if desired.
Step 9: Link Tasks

1.
Create a link from the Start Task to asgn_SALES_DEPARTMENT_START_TIME.

17
2. 3. 4.
Create a link from asgn_SALES_DEPARTMENT_START_TIME to tim_SALES_DEPARTMENT_START. Create a link from tim_SALES_DEPARTMENT_START to s_m_SALES_DEPARTMENT_xx. Save the repository.
Step 10: Run Workflow by Editing the Workflow Schedule

Note: In order for the top of the hour to be calculated based on the workflow start time, the workflow must be configured to execute continuously.
1. 2. 3. 4. 5. 6.
Edit workflow wf_SALES_DEPARTMENT_xx. Click the SCHEDULER Tab. Verify that the scheduler is Non Reusable. Edit the schedule. Click the Schedule Tab. Click Run Continuously.
7. 8. 9.
Click OK. Click OK. Save the repository. This will start the workflow.
18
Step 11: Monitor the Workflow

1.
Open the Gantt Chart View.
Note: Notice that assignment task as already executed and the timer task is running.
2. 3.
Browse the Workflow Log. Verify the results of the Assignment expressions in the log file. Listed below are examples:
Variable [$$TRUNC_START_TIME], Value [05/23/2004 16:00:00]. Variable [$$HOUR_STARTED], Value [16]. Variable [$$NEXT_START_TIME], Value [05/23/2004 17:00:00].
4.
Verify the Load Manager message that tells when the timer task will complete. Listed below is an example message:
INFO : LM_36606 [Sun May 23 16:05:02 2004] : (2288|2004) Timer task instance [TM_SALES_DEPARTMENT_START]: The timer will complete at [Sun May 23 17:00:00 2004].
5.
Open Task View.
6.
At or near the top of the hour, open the monitor to check the status of the session. Verify that it starts(started) at the desired time. Below is an example:
7. 8.
After the session completes, notice that the workflow automatically starts again. If the workflow starts after 5 p.m., the timer message in the workflow log will show that the timer will end at 6 a.m. the following morning. Listed below is an example:
19
INFO : LM_36608 [Sun May 23 17:00:25 2004] : (2288|2392) Timer task instance [TM_SALES_DEPARTMENT_START]: Timer task specified to wait until absolute time [Mon May 24 06:00:00 2004], specified by variable [$$NEXT_START_TIME]. INFO : LM_36606 [Sun May 23 17:00:25 2004] : (2288|2392) Timer task instance [TM_SALES_DEPARTMENT_START]: The timer will complete at [Mon May 24 06:00:00 2004]. 9. 10.
Stop or abort the workflow at any time. Afterwards, edit the workflow scheduler and select RUN ON DEMAND. Save the repository.
20
Lab 4: Recover a Suspended Workflow

In this lab, you will configure a mapping and its related session and workflow for recovery. Then, you will change a session property to create an error that causes the session to suspend when you run it. You will fix the error and recover the workflow.
Objectives

Configure a mapping, session, and workflow for recovery. Recover a suspended workflow.
Duration
30 minutes
21
Instructions
Step 1: Copy the Workflow
1. 2. 3. 4. 5. 6.
Open the Repository Manager. Copy the wkf_Stage_Customer_Contacts_xx workflow from the SOLUTIONS_ADVANCED folder to your folder. In the Workflow Manager, open the wkf_Stage_Customer_Contacts_xx workflow. Rename the workflow to replace xx with your student number. Rename the session in the workflow to replace xx with your student number. Save the workflow.
Step 2: Edit the Workflow and Session for Recovery

1. 2.
Open the wkf_Stage_Customer_Contacts_xx workflow. Edit the workflow, and on the General tab, select Suspend on Error.
3. 4.
Edit the s_m_Stage_Customer_Contacts_xx session and click the Properties tab. Scroll to the end of the General Options settings and select Resume from last checkpoint for the Recovery Strategy.
5.
Click the Mapping tab and change the target load type to Normal.
Note: When you configure a session for bulk load, the session is not recoverable using the resume recovery strategy. You must use normal load.
6.
Change the target database connection to EDWxx.

22
7.
Save the workflow.
Step 3: Edit the Session to Cause an Error

In this step, you will edit the session so that when the Integration Service runs it, there will be an error.
1.
Edit the s_m_Stage_Customer_Contacts_xx session, and click the Mapping tab. The source in the mapping uses a file list, customer_list.txt. To make the session encounter an error, you will change the value in the Source Filename session property.
2. 3. 4.
On the Sources node, change the source file name to customer_list1234.txt. Click the Config Object tab. In the Error Handling settings, configure the session to stop on one error.
5.
Save the workflow.
Step 4: Run the Workflow, Fix the Session, and Recover the Workflow
1.
Run the workflow. The Workflow Monitor shows that the Integration Service suspends the workflow and fails the session.
Suspended Workflow and FailedSession 2.
Open the session log.
23
3.
Scroll to the end of the session log.
Session run has completed with failure.
Notice that the Integration Service failed the session. Next, you will fix the session.
4. 5. 6. 7.
In the Workflow Manager, edit the session. On the Mapping tab, enter customer_list.txt as the source file name. Save the workflow. In the Workflow Manager, right-click the workflow, and choose Recover Workflow. The Workflow Monitor shows that the Integration Service is running the workflow and that the session is running as a recovery run.
Running Recovery Session Run
24
When the session and workflow complete, the Workflow Monitor shows that the session completed successfully as a recovery run.
Successful Recovery Session Run 8. 9.
Open the session log. Search for session run completed with failure.
Notice that the Integration Service continues to write log events to the same session log.
25
10.
Search for recovery run.
The Integration Service writes recovery information to the session log.

11.
Close the Log Viewer.
26
Lab 5: Using the Transaction Control Transformation

Business Purpose
Line item data is read and sorted by invoice number. We need each invoice number committed to the target database as a single transaction.
A flag will be created to tell PowerCenter when a new set of Invoice numbers are found. A Transaction Control Transformation will be created to tell the database when to issue a commit.
Objectives

Create a flag to check for new INVOICE_NOs Commit upon seeing a new set of INVOICE_NOs
Duration
45 minutes
Mapping Overview
27

Mapping Name Source System Initial Rows Short Description Load Frequency Preprocessing Post Processing Error Strategy Reload Strategy Unique Source Fields m_DIM_LINE_ITEM_xx ODS Target System Rows/Load Commit on a new set of INVOICE NO's On demand None None None None LINE_ITEM_NO EDWxx
Sources
Tables Table Name ODS_LINE_ITEM Create shortcut from DEV_SHARED folder Schema/Owner ODS Selection/Filter
Targets
Tables Table Name DIM_LINE_ITEM Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Key LINE_ITEM_NO

Target Table DIM_LINE_ITEM DIM_LINE_ITEM DIM_LINE_ITEM DIM_LINE_ITEM DIM_LINE_ITEM DIM_LINE_ITEM Target Column LINE_ITEM_NO INVOICE_NO PRODUCT_CODE QUANTITY PRICE COST Source Table ODS_LINE_ITEM ODS_LINE_ITEM ODS_LINE_ITEM ODS_LINE_ITEM ODS_LINE_ITEM ODS_LINE_ITEM Source Column LINE_ITEM_NO INVOICE_NO PRODUCT_CODE QUANTITY PRICE COST Expression Issue a commit upon a new set of Invoice Nos. Issue a commit upon a new set of Invoice Nos. Issue a commit upon a new set of Invoice Nos. Issue a commit upon a new set of Invoice Nos. Issue a commit upon a new set of Invoice Nos. Issue a commit upon a new set of Invoice Nos.
28
Detailed Overview
Transformation Name Mapping ODS_LINE_ITEM Shortcut_to_sq_ODS_LINE_ITEM Type Mapping Source Definition Source Qualifier Description m_DIM_LINE_ITEM_xx Table Source definition in ODS schema. Create shortcut from DEV_SHARED folder. Send to srt_DIM_LINE_ITEM: LINE_ITEM_NO, INVOICE_NO, PRODUCT_CODE, QUANTITY, DISCOUNT, PRICE, COST Sort by INVOICE_NO Send to exp_DIM_LINE_ITEM INVOICE_NO SEND to tc_DIM_LINE_ITEM: LINE_ITEM_NO, INVOICE_NO, PRODUCT_CODE, QUANTITY, DISCOUNT, PRICE, COST Uncheck the 'o' on INVOICE_NO Create a variable called v_PREVIOUS_INVOICE_NO as a decimal 10,0 to house the value of the previous row's INVOICE_NO. Expression: INVOICE_NO Create a variable called v_NEW_INVOICE_NO_FLAG as an Integer to set a flag to check whether the current row's INVOICE_NO is the same as the previous row's INVOICE_NO Expression: IIF(INVOICE_NO=v_PREVIOUS_INVOICE_NO, 0,1) Move v_NEW_INVOICE_NO_FLAG above v_PREVIOUS_INVOICE_NO Create an output port called NEW_INVOICE_NO_FLAG_out as a integer to hold the value of the flag Expression: v_NEW_INVOICE_NO_FLAG SEND to tc_DIM_LINE_ITEM: NEW_INVOICE_NO_FLAG_out On the ports tab, delete the _out from NEW_INVOICE_FLAG_out On the properties tab enter the following Transaction Control Condition: IIF(NEW_INVOICE_NO_FLAG=1, TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION) SEND to DIM_LINE_ITEM: LINE_ITEM_NO, INVOICE_NO, PRODUCT_CODE, QUANTITY, DISCOUNT, PRICE, COST Target definition in the EDWxx schema. Create a shortcut from DEV_SHARED folder.
srt_DIM_LINE_ITEM
Sorter
exp_DIM_LINE_ITEM
Expression
tc_DIM_LINE_ITEM
Transaction Control
Shortcut_to_DIM_LINE_ITEM
Target Definition
29
Instructions
Create a mapping called m_DIM_LINE_ITEM_xx, where xx is your student number. Use the mapping details described in the previous pages for guidelines.

1. 2. 3. 4. 5.
Open ~Developerxx folder. Create workflow named wf_DIM_LINE_ITEM_xx. Create session named s_m_DIM_LINE_ITEM_xx. In the session, edit Mapping tab and expand the Sources node. Under Connections verify that the Connection Value is ODS. Expand the Targets node and verify that the Connection value is correct, the Target load type is set to Normal and the Truncate target table option is checked.

Run workflow wf_DIM_LINE_ITEM_xx.
30
31
32
Lab 6: Error Handling with Transactions

Business Purpose
The IT Department would like to prevent erroneous data from being committed into the DIM_VENDOR_PRODUCT table. They would also like to issue a commit every time a new group of VENDOR_IDs is written. A rollback will also be issued for an entire group of vendors if any record in that group has an error.
Records will be committed when a new group of VENDOR_IDs comes in. This will require a flag to be set to determine whether a VENDOR_ID is new or not. Rows will need to be rolled back if an error occurs. An error flag will be set when a business rule is violated.
Objectives
Use a Transaction Control Transformation to Commit based upon Vendor IDs and issue a rollback based upon errors.
Duration
60 minutes
Mapping Overview
33

Mapping Name Source System Initial Rows Short Description Load Frequency Preprocessing Post Processing Error Strategy Reload Strategy Unique Source Fields m_DIM_VENDOR_PRODUCT_TC_xx Flat File Target System Rows/Load Issue a commit based upon VENDOR_ID, but only if the PRODUCT_CODE is not null and the CATEGORY is valid for all records in the group. A rollback of the entire group should occur if Informatica comes across a null PRODUCT code or an invalid CATEGORY. On demand None None None None None EDWxx
Sources
Files File Name PRODUCT.txt Create shortcut from DEV_SHARED folder File Location In the Source Files directory on the Integration Service process machine
Targets
Tables Table Name DIM_VENDOR_PRODUCT Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Key
Lookup Transformation Detail

Lookup Name Lookup Table Name Description Match Condition(s) Filter/ SQL Override Return Value(s) lkp_ODS_VENDOR ODS_VENDOR Location ODS
The VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE are needed to populate DIM_VENDOR_PRODUCT. ODS.VENDOR_ID = PRODUCT.VENDOR_ID N/A VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE
34

Target Table DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT DIM_VENDOR_PRODUCT Target Column PRODUCT_CODE VENDOR_ID VENDOR_NAME VENDOR_STATE PRODUCT_NAME CATEGORY MODEL PRICE FIRST_CONTACT Source Table PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT Source Column PRODUCT_CODE VENDOR_ID Derived Value from lkp_ODS_VENDOR Derived Value from lkp_ODS_VENDOR PRODUCT_NAME CATEGORY MODEL PRICE Derived Value from lkp_ODS_VENDOR Return FIRST_CONTACT from lkp_ODS_VENDOR Return VENDOR_NAME from lkp_ODS_VENDOR Return VENDOR_STATE from lkp_ODS_VENDOR Expression
Detailed Overview
Transformation Name Mapping PRODUCT.txt Sq_Shortcut_To_ PRODUCT Type Mapping Source Definition Source Qualifier Description m_DIM_VENDOR_PRODUCT_TC_xx Drag in Shortcut from DEV_SHARED Data Source Qualifier for flat file SEND PORT to exp_SET_ERROR_FLAG: PRODUCT_CODE, VENDOR_ID, CATEGORY, PRODUCT_NAME, MODEL, PRICE exp_SET_ERROR_FLAG Expression Output port: ERROR_FLAG Expression: IIF(ISNULL(PRODUCT_CODE) OR ISNULL(CATEGORY), TRUE, FALSE) Send all output ports to srt_VENDOR_ID. srt_VENDOR_ID Sorter Sort data ascending by VENDOR_ID & ERROR_FLAG. This puts any error records at the end of each group. SEND all PORTS to exp_SET_TRANS_TYPE. SEND PORTS to lkp_ODS_VENDOR: VENDOR_ID
35
Transformation Name exp_SET_TRANS_TYPE
Type Expression
Description 1. Create a variable called v_PREV_VENDOR_ID as a Decimal with precision of 10 to house the value of the previous vendor. Expression: VENDOR_ID 2. Create a variable port called v_NEW_VENDOR_ID_FLAG as an integer to check and see if the current VENDOR_ID is new. Expression: IIF(VENDOR_ID != v_PREV_VENDOR_ID, TRUE, FALSE) Variables can be used to remember values across rows. V_PREV_VENDOR_ID must always hold the value of the previous VENDOR_ID, so it must be placed after v_NEW_VENDOR_ID_FLAG 3. Create an output port as a string(8) called TRANSACTION_TYPE to tell the Transaction Control Transformation whether to CONTINUE, COMMIT, or ROLLBACK. Expression: IIF(ERROR_FLAG = TRUE, 'ROLLBACK', IIF(v_NEW_VENDOR_ID_FLAG = TRUE, 'COMMIT', 'CONTINUE')) Since we sorted to put error records at the end of each group, when we ROLLBACK, we'll be rolling back the whole group. 4. SEND all output PORTS to tc_DIM_VENDOR_PRODUCT.
lkp_ODS_VENDOR
Lookup
Create a connected lookup to ODS.ODS_VENDOR. Create an input port for the source data field VENDOR_ID Rename VENDOR_ID1 to VENDOR_ID_in Set Lookup Condition: VENDOR_ID = VENDOR_ID_in SEND PORTS to tc_DIM_VENDOR_PRODUCT VENDOR_NAME, FIRST_CONTACT, VENDOR_STATE Expression: DECODE(TRANSACTION_TYPE, 'COMMIT', TC_COMMIT_BEFORE, 'ROLLBACK', TC_ROLLBACK_AFTER, 'CONTINUE', TC_CONTINUE_TRANSACTION) // If we're starting a new group, we need to COMMIT the // prior group. // If we hit an error, we need to ROLLBACK the current // group including the current record. PORTS to SEND to DIM_VENDOR_PRODUCT: All ports except for TRANSACTION_TYPE
tc_DIM_VENDOR_PRODUCT
Transaction Control
Shorcut_To_DIM_VENDOR_PROD UCT
Target Table
All data without errors will be routed here Create shortcut from DEV_SHARED folder
36
Instructions
Create a mapping called m_DIM_VENDOR_PRODUCT_TC_xx, where xx is your student number. Use the mapping details described in the previous pages for guidelines.

1. 2. 3. 4. 5. 6. 7. 8. 9.
Open ~Developerxx folder. Create workflow named wf_DIM_VENDOR_PRODUCT_TC_xx. Create session named s_m_DIM_VENDOR_PRODUCT_TC_xx Source file is found in the Source Files directory on the Integration Service machine Verify that the source filename is PRODUCT.txt (extension required) Verify target database connection value is EDWxx Verify target load type is Normal Select Truncate for DIM_VENDOR_PRODUCT Set Lookup connection to ODS

Run workflow wf_DIM_VENDOR_PRODUCT_TC_xx.
37
38
Lab 7: Handling Fatal and Non-Fatal Errors

Business Purpose
ABC Incorporated would like to track which records are failing when trying to run a load from the PRODUCT Flat File to the DIM_VENDOR_PRODUCT table. Also some of the developers have noticed dirty data being loaded into the DIM_VENDOR_PRODUCT table, therefore users are getting dirty data in their reports.
Instead of using a Transaction Control Transformation, route the Fatal Errors off to a Fatal Error table and route the Nonfatal Errors off to a Nonfatal table. All good data will be sent to the EDW.
Objectives

Trap all database errors and load them to a table called ERR_FATAL. Trap the dirty data coming through from the CATEGORY field and write it to a table called ERR_NONFATAL. Write all data without fatal or nonfatal errors to DIM_VENDOR_PRODUCT.
Duration
60 minutes
39
Mapping Overview
40

Mapping Name Source System Initial Rows Short Description Load Frequency Preprocessing Post Processing Error Strategy Reload Strategy Unique Source Fields m_DIM_VENDOR_PRODUCT_xx Flat File Target System Rows/Load If a fatal error is found, route data to a fatal error table, If a nonfatal error is found route data to a nonfatal table, If data is free of errors route it to DIM_VENDOR_PRODUCT. On demand None None Create a flag for both fatal errors and nonfatal errors. Route bad data to its respective table. None None EDWxx
Sources
Files File Name PRODUCT.txt Create shortcut from DEV_SHARED folder File Location In the Source Files directory on the Integration Service process machine.
Targets
Tables Table Name DIM_VENDOR_PRODUCT Create shortcut from DEV_SHARED folder Tables Table Name ERR_NONFATAL Create shortcut from DEV_SHARED folder Schema/Owner Update EDWxx Delete Insert yes Unique Key ERR_ID Schema/Owner Update EDWxx Delete Insert yes Unique Key
41
Tables Table Name ERR_FATAL Create shortcut from DEV_SHARED folder
Schema/Owner Update
EDWxx Delete Insert yes Unique Key ERR_ID
42
Lookup Transformation Detail

Lookup Name Lookup Table Name Description Match Condition(s) Filter/ SQL Override Return Value(s) lkp_ODS_VENDOR ODS_VENDOR Location ODS
The VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE are needed to populate DIM_VENDOR_PRODUCT. ODS.VENDOR_ID = PRODUCT.VENDOR_ID N/A VENDOR_NAME, FIRST_CONTACT and VENDOR_STATE

Target Table ERR_NONFATAL ERR_NONFATAL ERR_NONFATAL ERR_NONFATAL Target Column ERR_ID REC_NBR ERR_RECORD ERR_DESCRIPTION Source Table PRODUCT PRODUCT PRODUCT PRODUCT Source Column Derived Value REC_NUM Derived Value Derived Value Expression Generated from seq_ERR_ID_ERR_NONFATAL N/A The entire source record is concatenated First, records must be tested for validity. Run a check to see if the PRODUCT_CODE is Null. Set a flag to True or False Run a check to see if CATEGORY is Null Set a flag to True or False Rows must be separated into Fatal, Nonfatal and Good Data All NONFATAL ERRORS have a description of INVALID CATEGORY Date and time the session runs Generated from seq_ERR_ID_ERR_FATAL N/A The entire record is concatenated and sent to the ERR_FATAL table. First, records must be tested for validity. Run a check to see if the PRODUCT_CODE is null. Set a flag to True or False Run a check to see if CATEGORY is Null Set a flag to True or False Rows must be separated into Fatal, Nonfatal and Good Data All Fatal Errors have a description of NULL VALUE IN KEY The Date and time the session runs. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY.
ERR_NONFATAL ERR_FATAL ERR_FATAL ERR_FATAL ERR_FATAL
LOAD_DATE ERR_ID REC_NBR ERR_RECORD ERR_DESCRIPTION
PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT
Derived Value Derived Value REC_NUM Derived Value Derived Value
ERR_FATAL DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT
LOAD_DATE PRODUCT_CODE VENDOR_ID
PRODUCT PRODUCT PRODUCT
Derived Value PRODUCT_CODE VENDOR_ID
43
Target Table DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT DIM_VENDOR_PR ODUCT
Target Column VENDOR_NAME VENDOR_STATE PRODUCT_NAME CATEGORY MODEL PRICE FIRST_CONTACT
Source Table PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT
Source Column Derived Value from lkp_ODS_VENDOR Derived Value from lkp_ODS_VENDOR PRODUCT_NAME CATEGORY MODEL PRICE Derived Value from lkp_ODS_VENDOR
Expression Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY. Rows must have a non null PRODUCT_CODE and a valid CATEGORY.
Detailed Overview
Transformation Name Mapping PRODUCT.txt Shortcut_To_sq_PRODUCT Type Mapping Flat File Source Definition Source Qualifier Description m_DIM_VENDOR_PRODUCT_xx Drag in Shortcut from DEV_SHARED Source Qualifier for flat file. Create shortcut from DEV_SHARED folder exp_ERROR_TRAPPING Expression Check to see if PRODUCT_CODE is NULL Derive ISNULL_PRODUCT_CODE_out by creating an output port CODE: IIF(ISNULL(PRODUCT_CODE),'FATAL','GOOD DATA') Check to see if CATEGORY is NULL Derive INVALID_CATEGORY_out by creating an output port IIF(ISNULL(CATEGORY), 'NONFATAL', 'GOOD DATA') Derive ERR_RECORD_out by creating an output port that concatenates the entire record. Use a To_Char function to convert all non-strings to strings SEND PORTS to lkp_ODS_VENDOR: VENDOR_ID SEND PORTS to rtr_PRODUCT_DATA: PRODUCT_CODE, ISNULL_PRODUCT_CODE_out, VENDOR_ID, CATEGORY, INVALID_CATEGORY_out, PRODUCT_NAME, MODEL, PRICE, REC_NUM, ERR_RECORD_out
44
Transformation Name lkp_ODS_VENDOR
Type Lookup
Description Create a connected lookup to ODS.ODS_VENDOR Create an input port for the source data field VENDOR_ID Rename VENDOR_ID1 to VENDOR_ID_in Set Lookup Condition: VENDOR_ID = VENDOR_ID_in SEND PORTS to rtr_PRODUCT_DATA: VENDOR_NAME, FIRST_CONTACT, VENDOR_STATE Create groups to route the data off to different paths: Group = NONFATAL_ERRORS CODE: INVALID_CATEGORY_out='NONFATAL' Group = FATAL_ERRORS CODE: ISNULL_PRODUCT_CODE_out='FATAL' The default group will contain rows that do not match the above conditions, hence all good rows. PORTS TO SEND TO exp_ERR_NONFATAL: NONFATAL_ERRORS.PRODUCT_CODE PORTS to SEND to ERR_NONFATAL: NONFATAL_ERRORS.REC_NUM, NONFATAL_ERRORS.ERR_RECORD PORTS to SEND to exp_ERR_FATAL: FATAL_ERRORS.PRODUCT_CODE PORTS to SEND to ERR_FATAL: FATAL_ERRORS.REC_NUM, FATAL_ERRORS.ERR_RECORD PORTS to SEND to DIM_VENDOR_PRODUCT: DEFAULT.PRODUCT_CODE, DEFAULT.VENDOR_ID, DEFAULT.VENDOR_NAME, DEFAULT.VENDOR_STATE, DEFAULT.PRODUCT_NAME, DEFAULT.CATEGORY, DEFAULT.MODEL, DEFAULT.PRICE, DEFAULT.FIRST_CONTACT
rtr_PRODUCT_DATA
Router
exp_ERR_FATAL
Expression
Derive ERR_DESCRIPTION_out by creating an output port CODE: 'NULL VALUE IN KEY' Derive LOAD_DATE_out by creating an output port CODE: SESSSTARTTIME PORTS to SEND to ERR_FATAL: LOAD_DATE_out, ERR_DESCRIPTION_out Derive ERR_DESCRIPTION_out by creating an output port CODE: INVALID CATEGORY' Derive LOAD_DATE_out by creating an output port CODE: SESSSTARTTIME PORTS to SEND to ERR_NONFATAL: LOAD_DATE_out, ERR_DESCRIPTION_out Generate the ERR_ID for ERR_FATAL Generate the ERR_ID for ERR_NONFATAL Traps all of the FATAL ERRORS Traps all NONFATAL ERRORS All good data to be loaded into the target table.
exp_ERR_NONFATAL
Expression
seq_ERR_FATAL seq_ERR_NONFATAL ERR_FATAL ERR_NONFATAL DIM_VENDOR_PRODUCT
Sequence Generator Sequence Generator Target Target Target
45
Instructions
Create a mapping called m_DIM_VENDOR_PRODUCT_xx, where xx is your student number. Use the mapping details described in the previous pages for guidelines.

1. 2. 3.
Open ~Developerxx folder. Create workflow named wf_DIM_VENDOR_PRODUCT_xx. Create session named s_m_DIM_VENDOR_PRODUCT_xx. Source file is found in the Source Files directory on the Integration Service process machine.
4. 5. 6. 7. 8.
Verify source file name is PRODUCT all Uppercase with an extension of .txt. Verify the target database connect is EDWxx. Change the target load type to Normal. Truncate DIM_VENDOR_PRODUCT. Set Lookup connection to ODS.

1.
Run workflow wf_DIM_VENDOR_PRODUCT_xx.
46
Step 5: Verify Results ERR_NONFATAL
ERR_FATAL
DIM_VENDOR_PRODUCT
47
48
Lab 8: Repository Queries

In this lab, you will search for repository objects by creating and running object queries.
Objectives

Create object queries Run object queries
Duration
15 minutes
49
Instructions
Step 1: Create a Query to Search for Targets with Customer
First, you will create a query that searches for target objects with the string customer in the target name.
1.
In the Designer, choose Tools > Queries. The Query Browser appears. Click New to create a new query. Figure 8-4 shows the Query Editor:
Figure 8-4. Query Editor
2.
Run the query. Validate the query. Add AND or OR operators.
Add a new query parameter.
3. 4. 5. 6. 7.
In the Query Name field, enter targets_customer. In the Parameter Name column, select Object Type. In the Operator column, select Is Equal To. In the Value 1 column, select Target Definition. Click the New Parameter button. Notice that the Query Editor automatically adds an AND operator for the two parameters.
AND Operator
50
8.
Edit the new parameter to search for object names that contain the text customer.
Step 2: Validate, Save, and Run the Query

1.
Click the Validate button to validate the query.

Run Validate Save
The PowerCenter Client displays a dialog box stating if the query is valid or not. If the query is not valid, fix the error and validate it again.
2.
Click Save. The PowerCenter Client saves the query to the repository.
3.
Click Run. The Query Results window shows the results of the query you created. Your query results might include more objects than in the following results:
Some columns only apply to objects in a versioned repository, such as Version Comments, Label Name, and Purged By User.
51
Step 3: Create A Query to Search For Mapping Dependencies

Next, you will create a query that returns all dependent objects for a mapping. A dependent object is an object used by another object. The query will search for both parent and child dependent objects. An example child object of a mapping is a source. An example parent object of a mapping is a session.
1. 2. 3. 4.
Close the Query Editor, and create a new query. Enter product_inventory_mapping_dependents as the query name. Edit the first parameter so the object name contains product. Add another parameter, and choose Include Children and Parents for the parameter name.
Note: When you search for children and parents, you enter the following information in the value columns:

Value 1. Object type(s) for dependent object(s), the children and parents. Value 2. Object type(s) for the object(s) you are querying. Value 3. Reusable status of the dependent object(s).
The PowerCenter Client automatically chooses Where for the operator.

5.
Click the arrow in the Value 1 column, select the following objects, and click OK:

Mapplet Source Definition Target Definition Transformation
6.
In the Value 2 column, choose Mapping.

Note: When you access the Query Editor from the Designer, you can only search for Designer repository objects. To search for all repository object types that use the mapping you are querying, create a query from the Repository Manager.
7.
Choose Reusable Dependency in the third value column.
Step 4: Validate, Save, and Run the Query

1. 2.
Validate the query. Save and run the query.
52
Your query results might look similar to the following results:
The query returned objects in all folders in the repository. Next, you will modify the query so it only returns objects in your folder.
Step 5: Modify and Run the Query

1. 2. 3. 4.
In the Query Editor, place the cursor somewhere in the last parameter and then add a new parameter. Modify the parameter so it searches for folders equal to the SOLUTIONS_ADVANCED folder. Validate and save the query. Run the query.
53
Notice that the even though the query says to include parent and child objects, it does not display any parent objects to the mapping. Parent objects of a mapping include sessions, worklets, and workflows. When you run a query accessed by the Designer, the query results only display Designer objects. Similarly, when you run a query accessed by the Workflow Manager, the query results only display Workflow Manager objects. In the next step, you will run the same query accessed by the Repository Manager.
Step 6: Run the Query Accessed by the Repository Manager

1. 2. 3.
Open the Repository Manager and connect to the repository. Open the Query Browser. For details on how to do this, see Create a Query to Search for Targets with Customer on page 50. Select the product_inventory_mapping_dependents query, and run it by clicking Execute.
54
Notice that the query results show all parent (and child) objects, including Workflow Manager objects, such as workflows.
Step 7: Create Your Own Queries

1.
Create a new query that searches for invalid mappings.

Tip: You might need to modify a mapping in your folder to make it invalid. You can copy the mapping
with a new name, and then delete links to the target.

2.
Create a new query that searches for impacted mappings.

Tip: You can modify a source or target used in a mapping by removing a column. The Designer or Workflow Manager invalidates a parent object when you modify a child object in such a way that the parent object may not be able to run.
55
56
Lab 9: Performance and Tuning Workshop

Business Purpose
The support group within the IT Department has taken over the support of an ETL system that was recently put into production. The implementation team seemed to do a good job but the over the last few runs some of the sessions/mappings have been running very slowly and they need to be optimized. Due to budget constraints, the management does not want to pay consultants to optimize the sessions/mappings so the task has fallen on the support group. It has been mandated that the group reduce the run time of one particular session/mapping by at least 30%. The Team Lead is confident that the group is up to the challenge, they have just returned from an Informatica Advanced Training course.
The session that needs to be optimized is wf_FACT_MKT_SEGMENT_ORDERS_xx. This session runs a mapping that reads in a flat file of order data, finds the customer market segment information, aggregates the orders and writes the values out to a relational table. The support group needs to find the bottleneck(s), determine the cause of the bottleneck(s) and then reduce the bottleneck(s). The reduction in run time must be at least 30%.
Objectives
Use learned techniques to determine and reduce the bottleneck(s) that exist.
Duration
120 minutes
Object Locations
ProjectX folder
57
Workshop Details
Overview
This workshop is designed to assist the developers with the task at hand. It does not give detailed instructions on how to identify a bottleneck, determine the cause of a bottleneck or how to optimize the session/mapping. The approach to take is left entirely up to the discretion of the developers. The optimization techniques to use are also left up to the developers. The workshop will provide instructions on establishing a typical read baseline and on running the original session. The suggested steps to follow are:
1. 2. 3.
Establish a typical read baseline Run the original session Identify and reduce the bottlenecks
Target Source Mapping Session Important: For detailed information on identifying bottlenecks and reducing bottlenecks, see the Performance Tuning Guide in the PowerCenter online help. To access the online help, press the F1 key in any of the PowerCenter Client tools. In the online help, click the Contents tab and expand the section for the Performance Tuning Guide.
Workshop Rules
The rules of the workshop are:

Developers must work in teams of two. Partitioning cannot be used to optimize the session. Data results must match the initial session run. Think out of the box. Ask the instructor any questions that come to mind.
Establish ETL Baseline

In order to obtain a starting point for measurement purposes it is necessary to establish baselines. Ideally a baseline should be established for the ETL process, the network and disks. A straight throughput mapping sourcing from a RDBMS and writing to a flat file will establish a typical read baseline.
Typical Read Baseline

In order to have a reasonable measurement for uncovering source bottlenecks a typical read baseline will need to be established. This can be accomplished by running a straight throughput mapping that sources a relational table and writes to a flat file. The session properties can be used to accomplish this.
1. 2.
58
In the Repository Manager, copy the wf_Source_Baseline_xx workflow from the ProjectX folder to your folder. In the Workflow Manager, open the wf_Source_Baseline_xx workflow in your folder.
3.
Edit the session named s_m_Source_Baseline_xx, and click the Mapping tab:
a. b. c.
Edit the Sources node and ensure the database connection is ODS. Edit the Targets node and change the Writer from Relational Writer to File Writer. Change the Targets Properties for the Output and Reject filenames to include your assigned student number.
4. 5.
Save, start and monitor the workflow. Document the results in the table provided in Documented Results on page 65.
Run Original Session

Running the original session will provide a starting point to measure the progress against.
1. 2. 3.
In the Repository Manager, copy the wf_FACT_MKT_SEGMENT_ORDERS_xx workflow from the ProjectX folder to your folder. In the Workflow Manager, edit the session named s_m_FACT_MKT_SEGMENT_ORDERS_xx located in the wf_FACT_MKT_SEGMENT_ORDERS_xx workflow in your folder. In the Mapping Tab, edit the Sources node:
a. b.
Ensure the ORDER_LINE_ITEM source filename value is daily_order_line_item.dat. Ensure the ODS_INVOICE_SUMMARY database connection is ODS.
4.
In the Mapping Tab, edit the Targets node:

a. b. c.
Ensure the database connection is EDWxx. Ensure the Target load type is set to Normal. Ensure the Truncate target table option is checked.
5. 6.
Save, start and monitor the workflow. Document the results in the table provided in Documented Results on page 65.
59

Mapping Name Source System Initial Rows Short Description Load Frequency Preprocessing Post Processing Error Strategy Unique Source Fields m_FACT_MKT_SEGMENT_ORDERS_xx ODS and Flat File 4,015,335 Target System Rows/Load EDWxx 437,023
Calculates totals for quantity, revenue and cost for market segments. Values are summarized by customer, date, market segment, region and item. On demand None None None
SOURCES
Tables Table Name daily_order_line_item Schema/Owner Flat File Selection/Filter This is a daily order line item file that contains order information for customers. The file contains 1,328,667 rows of order data for August 29, 2003 and is sorted by order id. This file is joined to the ODS_INVOICE_SUMMARY relational table in order to retrieve the payment type that the customer uses. It is assumed that the customer uses the same payment type each time. The payment types are CREDIT CARD, DEBIT CARD, CASH and CHECK The source file is called daily_order_line_item.dat. The location for the file can be found by checking the service variable $PMSourceFileDir. ODS_INVOICE_SUMMARY ODS This is a monthly summary of customer invoice data. The table contains invoice number, customer, order date, payment type and amount. The Primary Key is Invoice Number. The table contains 2,686,668 rows.
TARGETS
Tables Table Name FACT_MKT_SEGMENT_ORDERS Schema Owner Update EDWxx Delete Insert Yes Unique Key ORDER_KEY (system generated)
60
LOOKUPS
Lookup Name Table lkp_ITEM_ID DIM_ITEM Location EDWxx
Description
The FACT_MKT_SEGMENT_ORDERS fact table needs to have the ITEM_KEY stored on it as a Foreign Key. The item id contained in the source will be matched with the item id in the DIM_ITEM table to retrieve the ITEM_KEY. The cost of each item needs to be obtained from this table and used in the calculation of item costs for each row written to the target. This table contains 27 rows. DIM_ITEM.ITEM_ID = ORDER_LINE_ITEM.ITEM_ID N/A ITEM_KEY, COST lkp_CUSTOMER_INFO DIM_CUSTOMER_PT Location EDWxx
Match Condition(s) Filter/SQL Override Return Value(s) Lookup Name Table
Description
The FACT_MKT_SEGMENT_ORDERS fact table needs to have the customer key stored on it as a Foreign Key. The CUSTOMER_ID contained in the source will be matched with the CUSTOMER_ID in the DIM_CUSTOMER_PT table to retrieve the customer key (C_CUSTKEY). The market segment of each customer is also retrieved and used in aggregate groupings. This table contains 1,000,000 rows. DIM_CUSTOMER_PT.C_CUST_ID = ORDER_LINE_ITEM.CUSTOMER_ID N/A C_CUSTKEY, C_CUST_ID, C_MKTSEGMENT
Match Condition(s) Filter/SQL Override Return Value(s)
SOURCE TO TARGET FIELD MATRIX

Target table name: FACT_MKT_SEGMENT_ORDERS
Target Column ORDER_DATE ORDER_QUANTITY ORDER_REVENUE PYMT_TYPE ORDER_KEY CUSTOMER_KEY Source Table ORDER_LINE_ITEM ORDER_LINE_ITEM ORDER_LINE_ITEM ODS_INVOICE_SUMMARY Source Column ORDER_DATE QUANTITY REVENUE PYMT_TYPE Derived Value Derived Value Expression N/A Sum of QUANTITY grouped by customer key, order date, market segment, region and item key. Sum of REVENUE grouped by customer key, order date, market segment, region and item key. N/A Generated by a Sequence Generator Foreign Key referencing the DIM_CUSTOMER_PT table. Obtained via a lookup to the dimension table on the CUSTOMER_ID column.
61
Target Column MKTSEGMENT
Source Table
Source Column Derived Value
Expression The market segment that the customer belongs in. Obtained via a lookup to the DIM_CUSTOMER_PT dimension table. Derived based on customer id. If the customer id is: < 50000 the region is 'WEST', >= 50000 and < 95000 the region is 'CENTRAL', >= 95000 and < 120000 the region is 'SOUTH', >= 120000 and < 200501 the region is 'EAST', >= 200501 the region will be 'UNKNOWN' Foreign Key referencing the DIM_ITEM table. Obtained via a lookup to the DIM_ITEM dimension table on the ITEM_ID column. SUM of the (COST * QUANTITY). COST is obtained via a lookup to the DIM_ITEM dimension table.
REGION
Derived Value
ITEM_KEY
Derived Value
ORDER_COST
Derived Value
62
DETAILED OVERVIEW
Transformation Name Mapping Shortcut_to_ORDER_LINE_ITEM Sq_Shortcut_to_ORDER_LINE_IT EM Type Mapping Source Definition Source Qualifier Description m_FACT_MKT_SEGMENT_ORDERS_xx Flat file containing daily order information for each customer. Contains orders for August 29, 2003. This table contains 1,328,667 rows Flat File Source Qualifier Sent to jnr_PAYMENT_TYPE: All Ports Shortcut_to_ODS_INVOICE_SUM MARY Source Qualifier Relational table containing a summary of the invoices for the month. This table contains data from August 1, 2003 through August 29, 2003. The key is INVOICE_NO and the table contains 2,686,668 rows Sq_Shortcut_To_ODS_INVOICE_S UMMARY Source Qualifier Relational Source Qualifier Sent to jnr_PAYMENT_TYPE: All Ports Jnr_PAYMENT_TYPE Joiner Joiner transformation that joins the ORDER_LINE_ITEM table to the ODS_INVOICE_SUMMARY table. Master Source: ORDER_LINE_ITEM Detail Source: ODS_INVOICE_SUMMARY Join Condition: ORDER_DATE = ORDER_DATE CUSTOMER_ID = CUSTOMER_ID Sent to lkp_ITEM_ID: ORDER_LINE_ITEM: ITEM_ID Sent to lkp_CUSTOMER_INFO: ORDER_LINE_ITEM: CUSTOMER_ID Sent to exp_SET_UNKNOWN_KEYS: ORDER_LINE_ITEM: ORDER_DATE, QUANTITY, PRICE ODS_INVOICE_SUMMARY: PYMT_TYPE lkp_ITEM_ID Lookup Lookup transformation that obtains item keys from the DIM_ITEM table. The DIM_ITEM table is located in the EDWxx schema. Lookup Condition ITEM_ID from DIM ITEM = ITEM_ID from ORDER_LINE_ITEM Sent to exp_SET_UNKNOWN_KEYS: ITEM_KEY, COST
63
Transformation Name Lkp_CUSTOMER_INFO
Type Lookup
Description Lookup transformation that obtains customer keys from the DIM_CUSTOMER_PT table. The DIM_CUSTOMER_PT table is located in the EDWxx schema. Lookup Condition CUSTOMER_ID from DIM_CUSTOMER_PT = CUSTOMER_ID from ORDER_LINE_ITEM Sent to exp_SET_UNKNOWN_KEYS: C_CUSTKEY, C_CUST_ID, C_MKTSEGMENT
exp_SET_UNKNOWN_KEYS
Expression
Expression Transformation that sets values for missing columns (item key, mktsegment). It also defines the region the customer belongs in. Output Ports: MKTSEGMENT_out Formula: IIF( ISNULL(MKTSEGMENT), 'UNKNOWN', MKTSEGMENT) ITEM_ID_out Formula: IIF(ISNULL(ITEM_KEY), 0.00, ITEM_COST) REGION_OUT Formula: IIF(C_CUST_ID > 0 AND C_CUST_ID < 50000, 'WEST', IIF(C_CUST_ID >= 50000 AND C_CUST_ID < 95000, 'CENTRAL', IIF(C_CUST_ID >= 95000 AND C_CUST_ID < 120000, 'SOUTH', IIF(C_CUST_ID >= 120000 AND C_CUST_ID < 200501, 'EAST', 'UNKNOWN')))) Sent to agg_VALUES: All output ports
agg_VALUES
Aggregator
Aggregator transformation that calculates the revenue, quantity and cost Group by ports: C_CUSTKEY, ORDER_DATE, MKTSEGMENT, REGION, ITEM_KEY Output ports: ORDER_QUANTITY Formula: SUM(QUANTITY) ORDER_REVENUE Formula: SUM(PRICE * QUANTITY) ORDER_COST Formula: SUM(ITEM_COST * QUANTITY) Sent to FACT_MKT_SEGMENT_ORDERS: All output ports
64
Transformation Name Seq_ORDER_KEY
Type Sequence Generator
Description Sequence Generator transformation that populates the system generated ORDER_KEY Sent to FACT_MKT_SEGMENT_ORDERS: NEXTVAL
Shortcut_to_FACT_MKT_SEGMEN T_ORDERS
Target Definition
Fact table located in EDWxx schema
Documented Results
Session Name ETL Read Baseline Original Session Write to Flat File Test (Target) Filter Test (Source) Read Mapping Test (Source or Mapping) Filter Test (Mapping) Rows Processed Rows Failed Start Time End Time Elapsed Time (Secs) Rows Per Second
65
66
Lab 10: Partitioning Workshop

Business Purpose
The support group within the IT Department has taken over the support of an ETL system that was recently put into production. During development the test data was not up to standard, therefore serious performance testing could not be accomplished. The system has been in production for a while and the support group has already taken some steps to optimize the sessions that have been running. The time window is still tight so the management wants the support group to look at partitioning some of the sessions to see if this would help.
The sessions/mappings that are in need of analysis are:
s_m_Target_Bottleneck_xx. This session reads in a relational source that contains customer account balances for the year. s_m_Items_Bottleneck_xx. This mapping reads a large flat file of item sold data, filters out last years stock, applies some row level manipulation, performs a lookup to get cost information and then loads the data into an Oracle table.
Note: The s_m_Items_Bottleneck_xx mapping is a hypothetical example. It does not exist in the repository.
s_m_Source_Bottleneck_xx. This mapping reads in one relational source that contains customer account balances and another relational source that contains customer demographic information. The two tables are joined at the database side. s_m_Mapping_Bottleneck_xx. This mapping reads in a flat file of order data, finds the customer market segment information, filters out rows that haven't sold more than one item, aggregates the orders and writes the values out to a relational table.
The support group needs to review each one of these sessions to if it makes sense to partition the session.
Objectives
Review the sessions and based on knowledge gained from the presentations determine what partitioning, if any, should be done.
Duration
60 minutes
Object Locations
ProjectX folder
67
Workshop Scenarios
Scenario 1
The session in question is s_m_Target_Bottleneck_xx has been optimized already but it is felt that more can be done. The machine that the session is running on has 32 Gig of Memory and 16 CPUs. The mapping takes account data from a relational source, calculates various balances and then writes the data out to the BalanceSummary table. The BalanceSummary table is an Oracle table that the DBA has partitioned by the account_num column. Answer the following questions:
Question I. II. III. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers
IV. V.
Review Partition Points

1.
Edit the s_m_Target_Bottlenck_xx Session located in the wf_Target_Bottleneck_xx workflow.
68
2.
Click the Mapping > Partitions tab to see the partition points.
3.
Select each Transformation and look at the window at the bottom of the screen to see what partition type is being used for that particular partition point.
Partition Test
The purpose of this section is to implement partitioning on the s_m_Target_Bottleneck_xx session.
1. 2. 3. 4. 5. 6. 7. 8. 9.
Copy the wf_Target_Bottleneck_xx workflow and rename it to wf_Target_Bottleneck_Partition_xx Edit the s_m_Target_Bottleneck_xx Session located in the wf_Target_Bottleneck_Partition_xx workflow and rename in to s_m_Target_Bottleneck_Partition_xx Click the Mapping tab, and then click the Partitions tab On the Partitions tab, select the Shortcut_to_BalanceSummary transformation, click the Edit Partition Point icon and add two new partitions Select Key Range from the drop down box and click OK Leave <**All**> selected in the Key Range drop down menu Click on Edit Keys - this allows the definition of the columns that are going to be in the key range Add the Account_num column to the Key Range and select OK Input the following ranges for the 3 partitions

Partition #1 - start range 1, end range 3500 Partition #2 - start range 3500, end range 7000 Partition #3 - start range 7000
10. 11.
Select the SQ_Shortcut_to_Source2 partition Point and edit the partition point Select Key Range from the drop down box
69
12. 13.
Add the Account_num column to the Key Range and select OK Input the following ranges for the 3 partitions

Partition #1 - start range 1, end range 3500 Partition #2 - start range 3500, end range 7000 Partition #3 - start range 7000
14. 15.
Save, start and monitor the Workflow Compare the results against the original session results and against the indexed session results. Is there a performance gain?
Conclusion
The instructor will discuss the answers to the questions in the lab wrap-up.
Scenario 2
Note: The mapping shown in this scenario is a hypothetical example. It does not exist in the repository.
The session in question is s_m_Items_Bottleneck_xx has been running slowly and the Project manager wants it optimized. The machine that this is running on has 8 Gig of Memory and 4 CPUs. The mapping takes items sold data from a large flat file, transforms it and writes out to an Oracle table. The flat file comes from one location and splitting it up is not an option. The second Expression transformation is very complex and takes a long time to push the rows through.
Mapping Overview
Answer the following questions:

IV. V.
70
Conclusion
Scenario 3
The session in question is s_m_Source_Bottleneck_xx has been running slowly and the Project manager wants it optimized. The machine that this is running on has 2 Gig of Memory and 2 CPUs. The mapping reads one relational source that contains customer account balances and another relational source that contains customer demographic information. The tables are joined at the database side, the rows are then pushed through an expression transformation and loaded into an Oracle table.
Mapping Overview

IV. V.
Conclusion
Scenario 4
The session in question is s_m_Mapping_Bottleneck_Sorter_xx is still not running quite as fast as is needed. The machine that this is running on has 24 Gig of Memory and 16 CPUs. The mapping reads a flat file source that is really 3 region specific flat files being read from a file list. The rows are then passed through two lookups to obtain item costs and customer information. It is then sorted and aggregated before being loaded into an Oracle table. The customer is part of the sort key and the DBA has
71
partitioned the Oracle table by customer_key. What can be done to further optimize this session/ mapping?
Mapping Overview

IV. V.
Conclusion
72
Answers
Scenario 1
Question I. II. III. IV. V. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers 3 Source Qualifier, Target Yes Key_Range at both the source and the target This will add multiple connections to the source and target which will result in data being read concurrently. This will be faster.
Scenario 2
Question I. II. III. IV. V. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers 3 Source Qualifier and Target Yes Additional pass-through at the exp_complex_calculations transformation This will add one more pipeline stage which in turn will give you an additional buffer to move data.
Scenario 3
Question I. II. III. IV. V. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? In what way will this increase performance? Answers 3 Source Qualifier, Target No - Each partition takes at least between 1-2 CPUs N/A N/A
73
Scenario 4
Question I. II. III. IV. How many pipeline stages does this session contain? What default partition points does this session contain? Can partitions be added/deleted or can the partition types be changed to make this more efficient? What partition types should be used and where? Answers 4 Source Qualifier, Aggregator and Target Yes 3 Partitions - Key-Range at Target - Split the source into the 3 regions specific files and read each one into one of the partitions - Hash Auto Keys an the Sorter Transformation. This will also allow you to remove the partition point at the aggregator if you like. Additional connections at the target will load faster. You need to split the source flat file into the 3 region specific files because you can have only one connection open to a flat file The Hash Auto-Keys is required to make sure that there is no overlap at the aggregator. You could also remove the partition point at the aggregator if you like. If the flat files significantly vary in size then you may want to add a round robin somewhere. In this particular mapping this will not make sense to do this.
V.
In what way will this increase performance?
74

Pc8liid Lab Guide

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pc8liid Lab Guide

Uploaded by

Copyright:

Available Formats

Informatica PowerCenter 8 Level II Developer Lab Guide

Version - PC8LIID 20060910

Lab 1: Dynamic Lookup Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Lab 2: Workflow Alerts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Table of Contents Informatica PowerCenter 8 Level II Developer

Lab 3: Dynamic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Lab 4: Recover a Suspended Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Lab 5: Using the Transaction Control Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Lab 6: Error Handling with Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Lab 7: Handling Fatal and Non-Fatal Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Step 4: Verify Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Step 5: Verify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Lab 8: Repository Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Lab 9: Performance and Tuning Workshop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Lab 10: Partitioning Workshop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Table of Contents Informatica PowerCenter 8 Level II Developer

Table of Contents Informatica PowerCenter 8 Level II Developer

Preface Informatica PowerCenter 8 Level II Developer

About This Guide

boldfaced text UPPERCASE italicized text

Preface Informatica PowerCenter 8 Level II Developer

Other Informatica Resources

Obtaining Informatica Documentation

Visiting Informatica Customer Portal

Visiting the Informatica Web Site

Visiting the Informatica Developer Network

Visiting the Informatica Knowledge Base

Obtaining Informatica Professional Certification

Obtaining Technical Support

Preface Informatica PowerCenter 8 Level II Developer

Lab 1: Dynamic Lookup Cache

Lab 1 Informatica PowerCenter 8 Level II Developer

Velocity Deliverable: Mapping Specifications

Source To Target Field Matrix

Ignore NULL Inputs for Updates (Lookup Transformation)

Target Column STATE ZIP

Source File or Transformation updated_customer_list.txt updated_customer_list.txt

Source Column STATE ZIP

Ignore NULL Inputs for Updates (Lookup Transformation) Yes Yes

UPD_Insert_New UPD_Update_Existing CUSTOMER_LIST_Insert

Update Strategy Update Strategy Target Definition

Lab 1 Informatica PowerCenter 8 Level II Developer

Step 2: Preview Target Data

Lab 1 Informatica PowerCenter 8 Level II Developer

The CUSTOMER_LIST table should contain the following data:

Lab 1 Informatica PowerCenter 8 Level II Developer

Step 3: View Source Data

Navigate to the $PMSourceFileDir directory. By default, the path is:

Step 4: Create Workflow

Step 5: Run Workflow

Lab 1 Informatica PowerCenter 8 Level II Developer

Step 6: Verify Statistics

Step 7: Verify Results

Lab 1 Informatica PowerCenter 8 Level II Developer

The CUSTOMER_LIST table should contain the following data:

Lab 1 Informatica PowerCenter 8 Level II Developer

Lab 2: Workflow Alerts

Lab 2 Informatica PowerCenter 8 Level II Developer

Step 2: Mappings Required

Step 3: Reusable Sessions Required

Step 4: Create a Workflow

Step 5: Create a Worklet in the Workflow

Step 6: Create a Timer Task in the Worklet

Lab 2 Informatica PowerCenter 8 Level II Developer