!t is intended to replace product documentation and provide guidance when implementing systems where the database server is distinct from the DSfEE server. To demonstrate the processes discussed in this document, we will implement the DB2 Enterprise stage and use screen shots and actual files from this implementation.
!t is intended to replace product documentation and provide guidance when implementing systems where the database server is distinct from the DSfEE server. To demonstrate the processes discussed in this document, we will implement the DB2 Enterprise stage and use screen shots and actual files from this implementation.
!t is intended to replace product documentation and provide guidance when implementing systems where the database server is distinct from the DSfEE server. To demonstrate the processes discussed in this document, we will implement the DB2 Enterprise stage and use screen shots and actual files from this implementation.
February 1, 2006 INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 2 of 19
1 Preface This document is intended for those who are planning for and implementing !BN WebSphere DataStage Enterprise Edition (DSfEE) requiring connectivity to DB2 Enterprise Server Edition (DB2). !t is intended to replace product documentation and provide guidance when implementing systems where the database server is distinct from the DSfEE server. Such an implementation is referred to as a remote server implementation. !n the following sections, we discuss the issues that should be resolved prior to installation, installation requirements, and configuration.
To demonstrate the processes discussed in this document, we will implement the DB2 Enterprise stage and use screen shots and actual files from this implementation.
Our example system will run DSfEE version 7.5.1a. 1.1 Organization This document contains the following sections: 1 PREFACE......................................................................................................... 2 1.1 Organization .................................................................................................................... 2 1.2 Documentation Conventions ........................................................................................... 2 1.3 Goals and Target Audience.............................................................................................. 2 2 BACKGROUND ................................................................................................ 5 2.1 DB2 Stage Types within DataStage EE............................................................................ 5 2.2 DB2 Enterprise Stage Architecture.................................................................................. 6 3 PREREQUISITES............................................................................................. S 4 HOW-TO SET UP DB2 CONNECTIVITY FOR REMOTE SERVERS ..................... S 5 USING THE DB2 ENTERPRISE STAGE........................................................... 15 6 CONFIGURING MULTIPLE INSTANCES IN ONE JOB.................................... 1S 7 TROUBLESHOOTING..................................................................................... 1S S PERFORMANCE NOTES................................................................................. 19 9 SUMMARY OF SETTINGS .............................................................................. 19
1.2 Documentation Conventions This document uses the following conventions: Convention Usage Bold !n syntax, bold indicates commands, function names, keywords, and options that must be input exactly as shown. !n text, bold indicates keys INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 3 of 19 to press, function names, and menu selections. !talic !n syntax, italic indicates information that you supply. !n text, italic also indicates UN!X commands and options, file names, and pathnames. Plain !n text, plain indicates prompts, commands and options, file names, and pathnames. Bold !talic Indicates: important information. Courier Courier indicates examples of source code and system output and prompts. Tahoma Bold !n examples, tahoma bold indicates characters that the user types or keys the user presses (for example, <Return>).
A right arrow between menu commands indicates you should choose
each command in sequence. For example, Choose File Exit" means you should choose File from the menu bar, and then choose Exit from the File pull-down menu. This linecontinues The continuation character is used in source code examples to indicate a line that is too long to fit on the page, but must be entered as a single line on screen.
The following are also used: Syntax definitions and examples are indented for ease in reading. All punctuation marks included in the syntax-for example, commas, parentheses, or quotation marks-are required unless otherwise indicated. Syntax lines that do not fit on one line in this manual are continued on subsequent lines. The continuation lines are indented. When entering syntax, type the entire syntax entry, including the continuation lines, on the same input line. Text enclosed in parenthesis and underlined (like this) following the first use of proper terms will be used instead of the proper term.
!nteraction with our example system will usually include the system prompt and the command, most often on 2 or more lines. For example:
/home/dsadm @ database_server > JbinJtar -cvf JdevJrmt0 JusrJdsadmJAscentialJDataStageJProjects 1.3 Goals and Target Audience This document presents a detailed set of instructions for configuring connectivity from DSfEE to a remote DB2 instance using the native parallel DB2 Enterprise stage.
The primary audience for this document is DataStage administrators and DB2 DBAs. !nformation in certain sections may also be relevant for Technical Architects and System Administrators.
For additional tips and best practices: The Ascential Developer Net (ADN) is a set of online services designed to help project managers, architects, analysts and developers with their data integration tasks using !BN !nformation !ntegration products and technologies.
The Ascential Developer Net allows you to share ideas, ask questions among your peers at other companies around the world, share files, tips 8 tricks, and search the archive. INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page + of 19 Ascential Developer Net provides interactive forums with subscription capabilities that automatically build a valuable knowledgebase from which Ascential can build better products and lasting customer partnerships.
The Ascential Developer Net will additionally be able to share documents, configuration files, code samples, and more. Ascential is committed to making ADN the premier location for developer resources for data integration.
A link to the Ascential Developer Net can be found in the Help->About dialog on the DataStage clients. Or, you can access it at the following URL: http:ffdevelopernet.ascential.comf
INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 5 of 19 2 Background 2.1 DB2 Stage Types within DataStage EE There are four stages available on the DSfEE Designer canvas that can access DB2: DB2 AP! - plug-in data access for read, insert, update-insertion (upsert) and delete. DB2 Load - plug-in data access for load Dynamic RDBNS - plug-in data access for read, insert, upsert and delete. Enterprise ODBC - native non-parallel data access for read, insert, upsert and delete. DB2 Enterprise - native parallel data access for read, insert, upsert, delete and load.
The plug-in stages are designed for lower-volume access to DB2 databases without the DPF option installed (prior to DB2 UDB v8, DB2 EE"). These stages also provide connectivity to non-UN!X DB2 databases, databases on UN!X platforms that differ from the platform of the DataStage ETL server, or DB2 databases on Windows or Nainframe platforms (except for the Load" stage against a mainframe DB2 instance which is not supported).
Figure 1: DB2 stages available on the DSfEE Parallel Job design palette
By facilitating flexible connectivity to multiple types of remote DB2 database servers, the use of DataStage plug-in stages expands the range of options available to the designer. However, this flexibility limits overall performance and scalability. Furthermore, when used as data sources, plug-in stages cannot read from DB2 in parallel.
Using the DB2 API stage or the Dynamic RDBMS stage, it is possible to access a DB2 with Data Partitioning Facility (DPF) database in parallel by manually partitioning data and stages on the canvas for each partition of the database. Because each plug-in invocation will open a separate connection to the same target DB2 database table, the ability to function in parallel may be limited by the table and index configuration set by the DB2 database administrator. This document does not provide any further discussion of this technique. INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 6 of 19
The capabilities of each DB2 stage are summarized in the following table. For specific details on the stage capabilities, consult the DataStage documentation (DataStage Parallel Job Developers Guide, DataStage Plug-!n guides)
DataStage EE Stage Name Stage Type DB2 Requirement Supports Partitioned DB2? Parallel Read? Parallel Write? Parallel Sparse Lookup SQL Open J Close DB2 Enterprise Native Parallel DPF, Homogeneous Hardware and Operating System 1
Yes f directly to each DB2 node Yes Yes Yes Yes DB2 AP! Plug- !n Any DB2 via DB2 Client or DB2-Connect Yes f through DB2 node 0 No Possible Limitations No No Dynamic RDBNS Plug- !n Any DB2 via DB2 Client or DB2-Connect Yes f through DB2 node 0 No Possible Limitations No No Enterprise ODBC Native Any DB2 via DB2 Client or DB2-Connect Yes f through DB2 node 0 No No No No DB2 Load Plug- !n Subject to DB2 Loader Limitations No No No No No Figure 2: DSfEE DB2 Communication Options and Capabilities 1 !t is possible to connect the DB2 UDB stage to a remote database by cataloging the remote database in the local instance and then using it as if it were a local database. This will only work when the authentication mode of the database on the remote instance is set to client authentication". !f you use the stage in this way, you may experience data duplication when working in partitioned instances since the node configuration of the local instance may not be the same as the remote instance. For this reason, the client authentication" configuration of a remote instance is not recommended. 2.2 DB2 Enterprise Stage Architecture As a native, parallel component, the DB2 Enterprise stage is designed for maximum performance and scalability. These goals are achieved through tight integration with DB2, including direct communication with each DB2 database node, and reading from or writing to DB2 in parallel (where appropriate), using the same data partitioning as the referenced DB2 tables.
This section outlines the high-level architecture of the native parallel DB2 Enterprise stage providing relevant background to understand its configuration as detailed in the remaining sections of this document.
Prior to v7, DSfEE required the primary DataStage ETL server (aka conductor node") to be installed on the DB2 coordinator server. Starting with v7 and later releases, DSfEE provides remote DB2" configuration, separating the primary ETL server (conductor INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 7 of 19 node") from the primary DB2 server (coordinator node or node zero") using the native parallel DB2 Enterprise stage. Because DSfEE is tightly integrated with the DB2 servers and routes data to individual nodes based on DB2 table partitioning, configuration is provided by a combination of DB2 client and DSfEE clustered processing.
As outlined in Figure 3, the primary ETL server (conductor node") must have the 32-bit DB2 client installed and configured to connect to the remote DB2 server instance. This is the same DB2 client that DataStage uses to connect to DB2 databases through the DB2 plug-in stages (DB2 AP!, DB2 Load, Dynamic RDBNS) for reading, writing, and import of metadata.
Primary ("conductor node) DataStage EE Server 32-bit DB2 client DB2 DPF node 1 DB2 DPF node n DSEE engine DSEE engine DSEE engine DB2 DPF node 0 DSEE engine
Figure 3: DSfEE DB2 Communication Architecture
The native parallel DB2 Enterprise stage of DataStage EE uses the DB2 client connection to pre-query" the DB2 instance and determine partitioning of the source or target table. This partitioning information is then used to readfwritefload data directly fromfto the remote DB2 nodes based on the actual table configuration. This tight integration is provided by routing data within the DSfEE engine to DSfEE engine nodes configured on the DB2 instance server(s), which requires a clustered configuration of the DSfEE engine.
As with any clustered DSfEE configuration, the DSfEE engine and libraries must be installed in the same location on all ETL and DB2 servers in the cluster. This is most easily achieved by creating a shared mount point on the remote DSEE and DB2 nodes through NFS or similar directory sharing methods.
The DB2 client does not have to be installed in the same location on all servers, as long as all locations are included in the $PATH and $L!BPATH, $LD_L!BRARY_PATH, or $SHL!B_PATH environment variable settings.
The connectivity scenario for a DataStage EE DB2 Enterprise stage is:
1) The DSfEE conductor node uses the DB2 environment variable APT_DB2!NSTANCE_HONE as the location on the ETL server where the remote DB2 server's db2nodes.cfg has been copied.
INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 8 of 19 2) DSfEE reads the file db2nodes.cfg from a sqllib subdirectory identified for the specified DB2 instance. This file allows DSfEE to determine the individual network node names of each DB2 node.
3) DSfEE scans the current DSfEE configuration file specified by the environment variable $APT_CONF!G_F!LE (APT_CONF!G_F!LE) for node names whose fastname properties match the node names provided in db2nodes.cfg. DSfEE must find each DB2 node name in the APT_CONF!G_F!LE or the job will fail.
+) The DSfEE conductor node queries the local DB2 instance via the DB2 client to determine table partitioning information. The results of this query are then used to route data directly to or from the appropriate DB2 nodes.
5) DSfEE starts up processes across all ETL and DB2 nodes in the cluster. This can be easily verified by setting the environment variable $APT_DUMP_SCORE to TRUE, and examining the corresponding score entry placed in the job log within DataStage Director.
3 Prerequisites - The DB2 database schema to be accessed must NOT have any columns with User Defined Types (UDTs). Use the db2 describe table [table-name|" command on the DB2 client for each table to be accessed to determine if UDTs are in use. Alternatively, examine the DDL for each schema to be accessed.
- DSfEE must be installed on all ETL server(s) as well as each DB2 node in the DB2 cluster. The DSfEE server version demonstrated in this document is 7.5.1a.
- The hardware and operating system of the ETL server and DB2 nodes must be the same. The systems demonstrated in this document were running A!X v5.3.
- A DB2 32-bit client must be installed on the primary (conductor) ETL server. The DB2 client demonstrated in this document is v8.1 FixPack10 aka v8.2. Use the db2level" command on the ETL server to identify the version of the database.
- The database must be DB2 Enterprise Server Edition with the Data Partitioning Facility (DPF) option installed. The DB2 UDB server demonstrated in this document is v8.1 FixPack9 aka v8.2. Use the db2level" command on the DB2 server to identify the version of the database.
4 How-To Set Up DB2 Connectivity for Remote Servers Our example systems are 2 A!X systems, one with + CPUs used as the DB2 UDB server, and one with 2 CPUs used as the DSfEE server. !n this How-To, we will demonstrate using the DSfEE super-user, by default dsadm.
Note that dsadm does NOT have to be the local database instance owner. INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 9 of 19
Figure +: DSfEE DB2 Example System
1) Perform the following on ALL members of the cluster BEFORE installing DSfEE on the ETL server: a. Create the primary group to which the DSfEE users will belong (in this document, this group is the recommended default dstage) and ensure that this group has the same UN!X group id (like 127) on all the systems. b. Create DSfEE users on all members of the cluster. Nake sure that each user has the same user id (like 20+) on all the systems, and that every user has the correct group memberships, minimally with dstage as the primary group, and the DB2 group in the list of secondary groups. c. Add these users to the DB2 database and ensure they can log in to DB2 on db2_server. At this step, we are on the DB2 server, and NOT the ETL server. !f you fail here, contact your DB2 DBA for support - this is NOT a DSfEE issue.
/db2home/db2inst1@db2_server> . Jdb2homeJdb2inst1JsqllibJdb2profile /db2home/db2inst1@db2_server> db2 connect to db2_dpf1_db user dsadm using db2_psword
Database Connection Information
Database server = DB2/6000 8.2.2 SQL authorization ID = DSADM Local database alias = db2dev1
2) Enable the rsh command on all servers in the cluster. The simplest way to do this is to create a .rhosts file in the home directory of each DSfEE user that has the host name or !P address of all members of the cluster, and then setting the permissions on this file to 600. This must be done for each user on all members of the cluster. Note that modern security systems may prohibit this method, but it will serve as an adequate example of the requirement. Contact the System Administrators for the cluster for assistance. Here are the commands to be performed on each node of our example system to implement the rhosts method: echo "etl_server dsadm" > ~J.rhosts echo "db2_server dsadm" >> ~J.rhosts chmod 600 ~J.rhosts INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 10 of 19 And an example of the validation of the etl_server:
/home/dsadm@etl_server> rsh db2_server date Wed Jan 1S 15:40:51 CST 2006
3) !nstall a 32 bit DB2 client if one is not installed on the primary ETL server (server on which DSfEE is installed and on which the DS repository resides, also known as the conductor node"). a. Nake dsadm the owner of the client. While the software will be installed in fusr, management directories and components appear under the home directory of this owner, the top of which is ~fsqllib. For dsadm on our sample A!X system, this is fhomefdsadmfsqllib. b. Comment out the call to ~fsqllibfdb2profile that the client install puts into the .profile of dsadm. !f you don't, DSfEE will not operate - it will find DB2 libraries before it finds DSfEE libraries. c. Edit ~fsqllibfdb2profile to export !NSTHONE, DB2D!R and DB2!NSTANCE.
+) The DB2 DBA must now catalog all the databases you wish to access on the DB2 server into this instance of the DB2 client. a. Ensure that dsadm can log in to DB2 on the db2_server. At this step, we are on the ETL server, and NOT the DB2 server. !f you fail here, contact your DB2 DBA for support - this is NOT a DSfEE issue.
/home/dsadm@etl_server> . JhomeJdsadmJsqllibJdb2profile /home/dsadm@etl_server> db2 connect to db2dev1 user dsadm using db2_psword
Database Connection Information
Database server = DB2/6000 8.2.2 SQL authorization ID = DSADM Local database alias = db2dev1
Database alias = db2dev1 Database name = db2_dpf1_db Node name = db2_server Database release level = a.00 Comment = Directory entry type = Remote Authentication = SERVER Catalog database partition number = -1
6) Log out of the ETL server and log back in to reset all the environment variables to their original state. Edit $DSHONEfdsenv to include the following information (note that underlined items in blue should be substituted with appropriate values for your configuration). We are assuming that the $DB2D!R directory is the same on all nodes in our cluster. This ensures that $PATH and $L!BPATH are correctly set for the remote sessions as well as the local session without resorting to individual files on each member of the cluster.
INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 11 of 19 Note that on operating systems other than A!X (our example system), $L!BPATH may be $SHL!B_PATH or $LD_L!BRARY_PATH.
################################################ # DB2 Setup section of dsenv ################################################ #DB2DIR is where the DB2 home is located DB2DIR=/usr/opt/db2_08_01; export $DB2DIR
#DB2INSTANCE is the name of the DB2 client where the databases are cataloged DB2INSTANCE=dsadm; export $DB2INSTANCE
#INSTHOME is the PATH where the client instance is located, usually the home directory of the instance owner. INSTHOME=/home/dsadm; export $INSTHOME
#Append the DB2 directories to the PATH PATH=$PATH:$DB2DIR/bin; export $PATH THREADS_FLAG=native; export $THREADS_FLAG
#Add the DB2 libraries to END of the LIBPATH on AIX or LD_LIBRARY_PATH on SUN and Linux LIBPATH=$LIBPATH:$DB2DIR/lib; export $LIBPATH
IMPORTANT: the DataStage libraries NUST be placed BEFORE the DB2 entries in $L!BPATH ($SHL!B_PATH or $LD_L!BRARY_PATH). DataStage and DB2 use the same library name librwtool".
7) Copy the db2nodes.cfg file from the remote instance to the DataStage server. !f you create a user on the DataStage server with the same name as the DB2 remote instance owner (for example, db2inst1), then the db2nodes.cfg can be placed in that user's home directoryfsqllib" on the DataStage server. Otherwise, create a user defined environment variable APT_DB2!NSTANCE_HONE in the DS administrator, add it to a test job and have it point to the location of the sqllib subdirectory where the db2nodes.cfg has been placed. Avoid setting this at the Project level so that other DB2 jobs which are connecting locally do not pick up this value.
!n our example, the DB2 server has four processing nodes (logical nodes), the instance owner is db2inst1, the db2nodes.cfg file on the DB2 server is fhomefdb2inst1fsqllibfdb2nodes.cfg, and this file has these contents:
!n our example, the ETL server client is owned by dsadm, the APT_DB2!NSTANCE_HONE environment variable has been set to fhomefdsadmfremote_db2config", and this file was copied to fhomefdsadmfremote_db2configfsqllibfdb2nodes.cfg on the ETL server.
INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 12 of 19 8) Ensure that dsadm can connect to the instance using the values in $DSHONEfdsenv instead of ~fsqllibfdb2profile. Log out of the ETL server and log back in to reset all the environment variables to their original state.
/home/dsadm@etl_server> cd `cat J.dshome`Jdsenv /home/dsadm@etl_server> . .Jdsenv /home/dsadm@etl_server> db2 connect to db2dev1 user dsadm using db2_psword
Database Connection Information
Database server = DB2/6000 8.2.2 SQL authorization ID = DSADM Local database alias = db2dev1
9) !mplement a DSfEE cluster (please refer to the !nstall and Upgrade guide for more details). !n this example, fetlfAscential is the file system that contains the DSfEE software system, and it is NFS-exported from the ETL server to the DB2 server, and NFS-mounted exactly on fetlfAscential, a file system owned by dsadm on the DB2 server.
10) verify that the DB2 operator library has been properly configured by making sure the link orchdb2op" exists in the $PXEngineJlib directory. Normally this link is configured on install, but if it does not exist, you must run the script $PXEngineJinstallJinstall.liborchdb2op. You will be prompted to specify DB2 version 7 or 8, in our case, version 8.
11) The db2setup.sh script located in the $PXHONEfbinf can run without reporting errors even if they occur, and if there are errors, DSfEE will not be able to connect to the database(s). Run the following commands and ensure that no errors occur.
/home/dsadm@etl_server> db2 connect reset /home/dsadm@etl_server> db2 connect terminate /home/dsadm@etl_server> db2 connect to db2dev1 user dsadm using db2_psword /home/dsadm@etl_server> db2 bind ${APT_ORCHHOME}JbinJdb2esql.bnd datetime ISO blocking all grant public /home/dsadm@etl_server> cd ${INSTHOME}JsqllibJbnd /home/dsadm@etl_server> db2 bind @db2bind.lst datetime ISO 1 blocking all grant public /home/dsadm@etl_server> db2 bind @db2cli.lst datetime ISO 2 blocking all grant public /home/dsadm@etl_server> db2 connect reset /home/dsadm@etl_server> db2 connect terminate
/home/dsadm@etl_server> db2 connect to db2dev1 user dsadm using db2_psword /home/dsadm@etl_server> db2 grant bind, execute on package dsadm.db2.esql to group dstage /home/dsadm@etl_server> db2 connect reset /home/dsadm@etl_server> db2 connect terminate
12) The db2grant.sh script located in the $PXHONEfbinf can run without reporting errors even if they occur, and if there are errors, DSfEE will not operate correctly. Run the following commands and ensure that no errors occur. Grant bind and
1 Datetime !SO currently prevents this bind from succeeding. Omit this option when issuing the bind until this issue has been resolved by development. 2 Datetime !SO currently prevents this bind from succeeding. Omit this option when issuing the bind until this issue has been resolved by development. INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 13 of 19 execute privileges to every member of the primary DSfEE group, in our case dstage.
/home/dsadm@etl_server> db2 connect to db2dev1 user dsadm using dsadm_db2_psword /home/dsadm@etl_server> db2 grant bind, execute on package dsadm.db2.esql to group dstage /home/dsadm@etl_server> db2 connect reset /home/dsadm@etl_server> db2 connect terminate
13) Create a DSfEE configuration file that includes nodes to be used for ETL processing and a node entry for each physical server in the remote DB2 instance.
Unless ETL processing is to be performed on the remote DB2 instance nodes, these entries should be removed from the default node pool (pools "). Each node in the DB2 instance should be part of the same node pool (eg. pools db2"). An example configuration file is shown below:
15) Test server connectivity by trying to import a table definition within DataStage Designer (or DataStage Nanager) using the DB2 AP! plug-in (Server plug-in). !f this fails, you do not have connectivity to the DB2 server and need to revisit all the previous steps until this succeeds.
!f this succeeds, check the imported TableDefs to be sure the data types are legitimate.
16) Create a user defined variable APT_DB2!NSTANCE_HONE in the DSfEE project using the DataStage Administrator client for use in jobs that access DB2. Avoid setting this at the Project level so that other DB2 jobs which are connecting locally do not pick up this value. Set this variable in each job to the location of the sqllibfdb2nodes.cfg file, in our case fhomefdsadmfremote_db2config. INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 1+ of 19 5 Using the DB2 Enterprise Stage Create a Parallel job and add a DB2 Enterprise stage and sequential file stage. Set the file path in the sequential file stage to fdevfnull. Set or add the following properties to the DB2 Enterprise stage (see image below).
Figure 5: DSfEE DB2 Enterprise Stage Properties
For connection to a remote DB2 instance, you need to set the following properties on the DB2 Enterprise stage in your parallel job: Client Instance Name. Set this to the DB2 client instance name. If you set this property, DataStage assumes you require remote connection. Server. Set this to the name of the DB2 server OR use the DB2 environment variable DB2!NSTANCE to identify the name of the DB2 server. Client Alias DB Name. Set this to the DB2 client's alias database name for the remote DB2 server database. [This is required only if the client's alias is different from the actual name of the remote server database.| Database. Set this to the remote server database name OR use the environment variables APT_DBNANE or APT_DB2DBDFT to identify the database. User. Enter the user name for connecting to DB2. This is required for a remote connection in order to retrieve the catalog information from the local instance of DB2 and thus must have privileges for that local instance. Password. Enter the password for connecting to DB2. This is required for a remote connection in order to retrieve the catalog information from the local instance of DB2 and thus must have privileges for that local instance.
This stage has been parameterized in the following example:
INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 15 of 19
Figure 6: DSfEE Parallel Job Properties Tab
Figure 7: DSfEE DB2 Enterprise Stage Properties Using Job Parameters
Set the APT_DB2!NSTANCE_HONE variable in the Parameters panel to fhomefdsadmfremote_db2config.
INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 16 of 19
Figure 8: Sample Job Properties Panel
Test the connection using view Data on the Output f Properties panel:
Figure 9: Sample view Data Output INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 17 of 19 6 Configuring Multiple Instances in One Job Although it is not officially supported, it is possible to connect to more than one DB2 instance within a single job. Your job must meet one of the following configurations (note: the use of the word stream" refers to a contiguous flow of one stage to another within a single job):
1. Single stream - Two Instances Only reading from one instance and writing to another instance with no other DB2 instances (not sure how many stages of these 2 instances can be added to the canvas for this configuration for lookups)
2. Two Stream - One Instance per Steam reading from instance A and writing to instance A and reading from instance B and writing to instance B (not sure how many stages of these 2 instances can be added to the canvas for this configuration for lookups)
3. Multiple Stream with N DB2 sources with no DB2 targets reading from 1 to n DB2 instances in separate source stages with no downstream other DB2 stages
!n order to get this configuration to work correctly, you must adhere to all of the directions specified for connecting to a remote instance AND the following:
You must not set the APT_DB2!NSTANCE_HONE environment variable. Once this variable is set, it will try to use it for each of the connections in the job. Since a db2nodes.cfg file can only contain information for one instance, this will create problems.
!n order for DS to locate the db2nodes.cfg, you must build a user on the DS server with the same name as the instance you are trying to connect to (the default logic for the DB2 Enterprise stage is to use the instance's home directory as defined for the UN!X user with the same name as the DB2 instance). !n the users UN!X home directory, create a sqllib subdirectory and place the remote instance's db2nodes.cfg there. Since the APT_DB2!NSTANCE_HONE is not set, DS will default to this directory to find the configuration file for the remote instance.
7 Troubleshooting 1) !f you get an error while performing the binds and grants, make sure dsadm has privileges to create schema, can select on the sysibm.dummy1 table, and bind packages (see installation documentation for the DB2 grants necessary to run the scripts).
2) There are several errors while trying to view data from the DB2 Enterprise stage that don't represent the actual issue: - !f you log into DS with a username (ex dsadm) and try to view data with a INFORMATION INTEGRATION SOLUTIONS
version 2.5 DataStage Enterprise Edition DB2 Configuration Page 18 of 19 different user in the plug-in (username and password inside of the plug-in, you could get a failed connection. This is because the username and password inside of the stage is only used to create a connection to DB2 via the client and them the job actually runs using the DS user (username used to log into DS either from the designer or the director). - The user doesn't have permission to read the catalog tables
3) The userid used to access the DB2 remote servers has to be set in each of the servers. For example, the dsadm user has to exist as a UN!X user on the ETL server and all of the DB2 nodes. Also make sure the groups are set correctly since the db2grant.sh scripts only grants permission to the group (in our example, dstage or something like db2group).
+) The DB2 client instance is a service that needs to be running before you can connect to any of the cataloged databases.
5) The permission on the resource disk or scratch are not set correctly (mainly for performing a load) When performing a load, make sure the resource disk and scratch are read f write to the dstage group as well as the DB2 instance owner were the data is going to be loaded. Usually the groups are different so the permission needs to be set to 777. S Performance Notes !n some cases, when using user-defined SQL without partitioning against large volumes of DB2 data, the overhead of routing information through a remote DB2 coordinator may be significant. !n these instances, it may be beneficial to have the DB2 DBA configure separate DB2 coordinator nodes (no local data) on each ETL server (in clustered ETL configurations). !n this configuration, DB2 Enterprise stage should not include the Client Instance Name property, forcing the DB2 Enterprise stages on each ETL server to communicate directly with their local DB2 coordinator. 9 Summary of Settings The DB2 libraries must come after the DataStage libraries because both products have libraries with identical names. The DB2 client alters the .profile of the DB2 owner, and this must be removed or DataStage will not function. Here is the .profile for user dsadm on the ETL server: /home/dsadm @ etl_server >> tail -4 .profile # The following three lines were added by UDB and removed by !BN !!S. # if [ -f fhomefdsadmfsqllibfdb2profile |; then # . fhomefdsadmfsqllibfdb2profile # fi
Environment variables set by fhomefdsadmfsqllibfdb2profile must be supplied after the native DataStage environment variables. This is done with the dsenv file for the DataStage server. Here are the last lines of the dsenv file with DB2 setup information added: /etl/Ascential/DataStage/DSEngine @ etl_server >> tail -S dsenv # DB2 setup section INFORMATION INTEGRATION SOLUTIONS
Here are the contents of the db2nodes.cfg file located in fhomefdsadmfremote_db2config fsqllib: /home/dsadm/remote_db2config/sqllib @ etl_server >> cat db2nodes.cfg 0 db2_server 0 1 db2_server 1