You are on page 1of 16

Yahoo!

White Paper

Upgrading Oracle Real Application Clusters from 10.2.0.4 to 11.2.0.1 on NFS

By Ritesh Rajkaran Chhajer Arpan Kumar Shrivastava Mukesh Burgupalli SE&O DBA Team

Yahoo! White Paper


Oracle released 11gR2 in September 2009 and its a good time to migrate Yahoo databases to 11gR2. This paper briefly describes the new features of 11gR2 from database administration perspective and then proceeds with, steps for upgrading yahoo 10.2.0.4 RAC databases to 11.2.0.1 RAC databases. These are tested on RHEL5.3 x-64.

Oracle 11gR2 new features: Oracle has made installations more reliable from 11gR2 onwards. In 11gR2, system integrates with the cluster and it looks for all the pre-requisites, if not it prompts to run the runfixup.sh script, which fixes most of the pre-requistes for the upgrade. Oracle 11gR2 Clusterware is known as 11g Grid Infrastructure, CRS and ASM binaries reside under a single Oracle Home known as Grid Home. Oracle Base definition is mandatory since it stores all the diagnostic files. Oracle Grid Infrastructure (Clusterware) cannot be placed under Oracle Base since permissions will be changed to root. Single Client Access Name (SCAN) is a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access Oracle Databases running in a cluster. The benefit is that the clients connect information does not need to change if you add or remove nodes in the cluster. The scan name must be 15 characters or less in length, not including the domain, and must be resolvable without the domain suffix.

Upgrading to 11g release 2 This paper list the steps for upgrading the oracle CRS home from 10.2.0.4 to 11.2.0.1 in a rolling bounce fashion, once CRS upgrade is done, proceed with the steps for installing 11.2.0.1 RDBMS binaries. CRS upgrade and RDBMS binaries installation can be done while the database is up and running. Once we have all the binaries ready, upgrade the database manually.

Yahoo! White Paper


Upgrading oracle CRS home form 10.2.0.4 to 11.2.0.1 in rolling bounce fashion. First of all download the oracle 11gR2 grid software from the below link http://www.oracle.com/technetwork/database/enterprise-edition/downloads/112010linx8664soft-100572.html or you can copy it from sp1-portaldb-001.ysm.corp.sp1.yahoo.com from this location /u08/software/11.2.0.1 Prechecks Create a directory to keep all the logs [oracle@gq1-stgemsdb-001]~% mkdir -p /oracle/11gupgrade_log While the database is up and running take the snapshot backup of both CRS and database on both the nodes, take the host binary backup too. Its also safe to take the backup of services for example srvctl config service -d stgmetat > /oracle/11gupgrade_log/service_config.txt Keep the scan names, 11g initfile, profile files and the software ready. Note: GRID Home and oracle home needs to be new ones and existing ones cannot be used for upgrades. Symlinks and Directories: Create two mount points one for grid home and the other for oracle home. As root user create the mount points as below on both the nodes: [root@gq1-stgemsdb-001 ~]# mkdir p /home/oragrid [root@gq1-stgemsdb-001 ~]# mkidr p /home/oracle [root@gq1-stgemsdb-001 ~]# cd / [root@gq1-stgemsdb-001 ~]# ln s /home/oracle /oracle [root@gq1-stgemsdb-001 ~]# ln s /home/oragrid /oracle [root@gq1-stgemsdb-001 ~]# chmod r oracle:dba /home/oracle /home/oragrid PORTMAP and NFS lock Check the status of portmap, nfs, if they are not running start the processes as below and make sure they are configured to start as part of host boot up process. As root user on both the nodes [root@gq1-stgemsdb-001 ~]# service portmap start [root@gq1-stgemsdb-001 ~]# chkconfig portmap on [root@gq1-stgemsdb-001 ~]# chkconfig list|grep portmap

Yahoo! White Paper


[root@gq1-stgemsdb-001 ~]# /etc/init.d/nfslock start [root@gq1-stgemsdb-001 ~]# chkconfig nfslock on [root@gq1-stgemsdb-001 ~]# chkconfig list|grep nfslock Huge Pages There is a bug with hugepages not being used on 11g. If it's a new install and hugepages are not being used, please check for the following on both the nodes: [root@gq1-stgemsdb-001 ~]# cat /etc/security/limits.conf [root@gq1-stgemsdb-001 ~]# ulimit -l If the memlock value is too low, hugepages won't be used. Please set it to unlimited by adding these lines to /etc/security/limits.conf on both the nodes [root@gq1-stgemsdb-001 ~]# vi /etc/security/limits.conf * * soft memlock hard memlock unlimited unlimited

Now if DB is started using sqlplus, hugepages would be used indeed. However there is another bug with 11g where if the DB is started using srvctl, hugepages may not be used. This bug is apparently fixed in 11.2.0.2 The temporary workaround is to edit the ohasd file and add this line ulimit -l unlimited to $GRID_HOME/bin/ohasd or /etc/init.d/ohasd [root@gq1-stgemsdb-001 ~]# vi /etc/init.d/ohasd ulimit -l unlimited Restart CRS [root@gq1-stgemsdb-001 bin]# ./crsctl stop crs [root@gq1-stgemsdb-001 bin]# ./crsctl start crs User equivalence: Test for the user equivalence between the two nodes, if not set it up. If you are installing using GUI oracle does this by itself based on the /etc/hosts.

Yahoo! White Paper


NTP Since we are using Network Time Protocol (NTP) for synchronization of time across all the servers in the cluster, a mandatory requirement with 11gR2 is to enable the slewing option by adding -x argument in the ntp configuration file as seen below: [root@gq1-stgemsdb-001 ~]# vi /etc/sysconfig/ntpd and add -x OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid" [root@gq1-stgemsdb-001 ~]# service ntpd stop [root@gq1-stgemsdb-001 ~]# service ntpd start [root@gq1-stgemsdb-001 ~]# vi /etc/sysconfig/ntpd [root@gq1-stgemsdb-001 ~]# service ntpd status ntpd (pid 3642) is running... [root@gq1-stgemsdb-001 ~]# service ntpd stop Shutting down ntpd: [ OK ] [root@gq1-stgemsdb-001 ~]# service ntpd start ntpd: Synchronizing with time server: [ OK ] Starting ntpd: [ OK ] [root@gq1-stgemsdb-001 ~]# service ntpd status ntpd (pid 32062) is running... [root@gq1-stgemsdb-001 ~]# cat /etc/sysconfig/ntpd Drop root to id 'ntp:ntp' by default. OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid" Set to 'yes' to sync hw clock after successful ntpdate SYNC_HWCLOCK=no Additional options for ntpdate NTPDATE_OPTIONS="" Oracle's Database Pre-Upgrade Utility: Oracle Database Pre-Upgrade utility is executed on your existing database, while the database is running (no shutdown required) and provides a list of items which should be reviewed prior to the actual upgrade. Reviewing and making adjustments prior to actual database upgrade will usually reduce down time and can limit problems during the upgrade. Download utlu112i.sql from ML Note: 884522.1 and execute it.

Yahoo! White Paper


SQL>spool /oracle/11gupgrade_log/upgrade_info.log SQL>@utlu112i.sql Script to Collect DB Upgrade/Migrate Diagnostic Information (dbupgdiag.sql): This script is intended to provide a user friendly output to diagnose the status of the database either before (or) after upgrade. The script will create a file called db_upg_diag_<sid>_<timestamp>.log. The script needs to be run in SQL*Plus both before the upgrade on the source database and after the upgrade on the upgraded database as SYS user. This will help to determine the status of the database before and after upgrade. Download the script from metalink (Note: 556610.1) and save as dbupgdiag.sql. The script needs no additional configuration. Connect as sysdba and execute the script [oracle@gq1-stgemsdb-001]~% sqlplus / as sysdba SQL> alter session set nls_language='American'; SQL> @dbupgdiag.sql /oracle/11gupgrade_log Give directory path as argument. Log file will get generated there. Example:/oracle/11gupgrade_log/db_upg_diag_stgmetat_13-Oct-2010_0419.log Following duplicate objects found can be ignored. OBJECT_NAME OBJECT_TYPE ---------------------------------------- ---------------------------------------AQ$_SCHEDULES TABLE AQ$_SCHEDULES_PRIMARY INDEX DBMS_REPCAT_AUTH PACKAGE DBMS_REPCAT_AUTH PACKAGE BODY Recompile invalids SQL>spool /oracle/11gupgrade_log/utlrp.log SQL>@?/rdbms/admin/utlrp.sql Verify no invalids SQL>@dbupgdiag.sql /oracle/11gupgrade_log/

Yahoo! White Paper


Following duplicate objects are ignorable: OBJECT_NAME OBJECT_TYPE ---------------------------------------- ---------------------------------------AQ$_SCHEDULES TABLE AQ$_SCHEDULES_PRIMARY INDEX DBMS_REPCAT_AUTH PACKAGE DBMS_REPCAT_AUTH PACKAGE BODY Cluster verification utility (cluvfy): From 11g software home, run cluvfy: [oracle@gq1-stgemsdb-001]~/software/grid% ./runcluvfy.sh stage -pre crsinst -n gq1-stgemsdb001,gq1-stgemsdb-002 -verbose > /oracle/11gupgrade_log/cluvfy.log Check: Hard limits for "maximum open file descriptors" Node Name Type Available Required Comment ---------------- ------------ ------------ ------------ ---------------gq1-stgemsdb-002 hard 16384 65536 failed gq1-stgemsdb-001 hard 16384 65536 failed Result: Hard limits check failed for "maximum open file descriptors" [oracle@gq1-stgemsdb-001]~% cat /etc/security/limits.conf * soft nofile 16384 * hard nofile 16384 This should get fixed when running runfixup.sh during installation. CRS binaries installation: Upgrading the CRS home using silent installation, you can copy the default response file and edit the below parameters and leave the other parameters to default value. GRID.rsp ORACLE_HOSTNAME=gq1-stgemsdb-001.data.gq1.yahoo.com INVENTORY_LOCATION=/oracle/oraInventory SELECTED_LANGUAGES=en oracle.install.option=UPGRADE ORACLE_BASE=/oracle ORACLE_HOME=/oragrid/product/11.2 oracle.install.asm.OSDBA=dba

Yahoo! White Paper


oracle.install.asm.OSOPER=dba oracle.install.asm.OSASM=dba oracle.install.crs.config.gpnp.scanName=gq1-stgems-clu.data.gq1.yahoo.com oracle.install.crs.config.gpnp.scanPort=1521 oracle.install.crs.config.clusterName=crs oracle.install.crs.config.autoConfigureClusterNodeVIP=false oracle.install.crs.upgrade.clusterNodes=gq1-stgemsdb-001,gq1-stgemsdb-002 Once the response file is modified based on the above parameters, we can start the upgradation. [oracle@gq1-stgemsdb-001]~/software/grid% time ./runInstaller -silent -waitforcompletion responseFile /home/oracle/software/grid/grid.rsp -force ignoreSysPrereqs Once the binaries are installed on both the hosts it will prompt us to run rootupgrade.sh as root user which actually does the upgrade process. When rootupgrade.sh runs, rootupgrade.sh will stop 10g CRS and start 11g CRS. This can be done in rolling fashion. Downtime starts only at the step of rootupgrade.sh, till then it's just 11g software installation. CRS active version would get updated only after all nodes have been upgraded. Till then new options that come with 11g crsctl won't work. Only 10g compliant commands would work. Before proceeding with the rootupgrade.sh, run the runfixup.sh as root user on both the nodes Node 1: [root@gq1-stgemsdb-001 tmp]# cd /tmp/CVU_11.2.0.1.0_oracle [root@gq1-stgemsdb-001 tmp]# cp fixup/gq1-stgemsdb-001/fixup.* . [root@gq1-stgemsdb-001 tmp]# ./runfixup.sh Copy the fixup response files to other node [oracle@gq1-stgemsdb-001]/tmp% scp fixup.* gq1-stgemsdb-002:/tmp/CVU_11.2.0.1.0_oracle/ Node 2: [root@gq1-stgemsdb-002 tmp]# cd /tmp/CVU_11.2.0.1.0_oracle [root@gq1-stgemsdb-002 tmp]# ./runfixup.sh Veify /etc/security/limits.conf and /etc/sysctl.conf

Yahoo! White Paper


As root user execute the rootupgrade.sh. Log into a log file Node 1: [root@gq1-stgemsdb-001 ~]# script /oracle/11gupgrade_log/rootupgrade.log [root@gq1-stgemsdb-001 ~]# time /oragrid/product/11.2/rootupgrade.sh Node 2: [root@gq1-stgemsdb-002 ~]# script /oracle/11gupgrade_log/rootupgrade.log [root@gq1-stgemsdb-002 ~]# time /oragrid/product/11.2/rootupgrade.sh Once the rootupgrade.sh is executed on both the nodes, you can query the crs version. [oracle@gq1-stgemsdb-001]~% crsctl query crs softwareversion Oracle Clusterware version on node [gq1-stgemsdb-001] is [11.2.0.1.0] [oracle@gq1-stgemsdb-001]~% crsctl query crs activeversion Oracle Clusterware active version on the cluster is [11.2.0.1.0] Verify the crs is up [oracle@gq1-stgemsdb-001]~% crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [oracle@gq1-stgemsdb-002]~% crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online Oracle Binaries installation Once everything looks fine from the crs side, start the installation of 11.2.0.1 oracle binaries. Install the binaries using silent installation. Copy the default response file from the oracle 11.2.0.1 software and edit the following parameters as below and leave the other parameter to default values RDBMS.rsp oracle.install.option=INSTALL_DB_SWONLY ORACLE_HOSTNAME=gq1-stgemsdb-001.data.gq1.yahoo.com

Yahoo! White Paper


UNIX_GROUP_NAME=dba INVENTORY_LOCATION=/oracle/oraInventory SELECTED_LANGUAGES=en ORACLE_HOME=/oracle/product/11.2 ORACLE_BASE=/oracle oracle.install.db.InstallEdition=EE oracle.install.db.DBA_GROUP=dba oracle.install.db.OPER_GROUP=dba oracle.install.db.CLUSTER_NODES=gq1-stgemsdb-001,gq1-stgemsdb-002 oracle.install.db.config.starterdb.type=GENERAL_PURPOSE oracle.install.db.config.starterdb.characterSet=AL32UTF8 oracle.install.db.config.starterdb.memoryOption=false SECURITY_UPDATES_VIA_MYORACLESUPPORT=false DECLINE_SECURITY_UPDATES=true Now start the installation of the binaries [oracle@gq1-stgemsdb-001]~/software/database% ./runInstaller -silent -waitforcompletion responseFile /home/oracle/software/database/db.rsp force At the end of the installation, It will prompt us to run root.sh on both the nodes. Login as root user and execute the script root.sh. Before proceeding with the rootupgrade.sh, run the runfixup.sh as root user on both the nodes Node 1: [root@gq1-stgemsdb-001 ~]# cd /tmp/CVU_11.2.0.1.0_oracle [root@gq1-stgemsdb-001 tmp]# cp fixup/gq1-stgemsdb-001/fixup.* . [root@gq1-stgemsdb-001 tmp]# ./runfixup.sh Copy the fixup response files to other node [oracle@gq1-stgemsdb-001]/tmp% scp fixup.* gq1-stgemsdb-002:/tmp/CVU_11.2.0.1.0_oracle/ Node 2: [root@gq1-stgemsdb-002 ~]# cd /tmp/CVU_11.2.0.1.0_oracle [root@gq1-stgemsdb-002 tmp]# ./runfixup.sh Node1: [root@gq1-stgemsdb-001 ~]# /oracle/product/11.2% ./root.sh Node2: [root@gq1-stgemsdb-002 ~]# /oracle/product/11.2% ./root.sh

Yahoo! White Paper


Manual Database upgrade: We have all the binaries ready for upgrading, till now the database is up and running and now the actual downtime starts. Upgrade the database manually, Source the 10g specific profile file and shutdown the database, and as part of rollback plan take a cold backup of the database. Keep the new init files ready with following changes: *.cluster_database=false *.compatible='11.2.0' *.sec_case_sensitive_logon=FALSE *.recyclebin=OFF *.diagnostic_dest=/oracle Comment out following: #*.audit_file_dest #*.background_dump_dest #*.core_dump_dest #*.user_dump_dest [Note: if you set the compatible parameter to 11.2.0.1 and start the upgrade, then you cant rollback to 10.2.0.4. In case of any issues during upgrade] Now source the 11gr2 specific profile file, and start the upgrade process [oracle@gq1-stgemsdb-001]~% echo $ORACLE_HOME /oracle/product/11.2 [oracle@gq1-stgemsdb-001]~% echo $ORACLE_SID gqems01s1 [oracle@gq1-stgemsdb-001]~% sqlplus "/as sysdba" SQL*Plus: Release 11.2.0.1.0 Production on Thu Oct 21 10:07:20 2010 Copyright (c) 1982, 2009, Oracle. All rights reserved. SQL> startup upgrade SQL> spool upgrade.log SQL> set time on timing on echo on SQL> @?/rdbms/admin/catupgrd.sql SQL> spool off SQL> exit [oracle@gq1-stgemsdb-001]~% sqlplus "/as sysdba" SQL*Plus: Release 11.2.0.1.0 Production on Thu Oct 21 10:07:20 2010

Yahoo! White Paper


Copyright (c) 1982, 2009, Oracle. All rights reserved. SQL> @?/rdbms/admin/utlu112s.sql --Report updated registry version and upgrade time SQL> @?/rdbms/admin/catuppst.sql --Migrating AWR/ADDM data from 10g to 11g dictionary SQL> @?/rdbms/admin/utlrp.sql --Recompiling invalids SQL> select * from registry$history; Remove database from cluster and add it back. Source 10g home profile and remove DB from srvctl [oracle@gq1-stgemsdb-001]~% /oracle/product/10.2/bin/srvctl remove database -d gqems01s Source 11g home profile and add DB to srvctl [oracle@gq1-stgemsdb-001]~%srvctl add database -d gqems01s -o /oracle/product/11.2 [oracle@gq1-stgemsdb-001]~%srvctl add instance -d gqems01s -i gqems01s1 -n gq1-stgemsdb-001 [oracle@gq1-stgemsdb-001]~%srvctl add instance -d gqems01s -i gqems01s2 -n gq1-stgemsdb-002 Shutdown the database, change the cluster_database parameter to TRUE on both the nodes and start the database using srvctl [oracle@gq1-stgemsdb-001]~% srvctl start database d gqems01s [oracle@gq1-stgemsdb-001]~% srvctl status db -d gqems01s Instance gqems01s1 is running on node gq1-stgemsdb-001 Instance gqems01s2 is running on node gq1-stgemsdb-002 Post checks: [oracle@gq1-stgemsdb-001]~% sqlplus /as sysdba SQL> col comp_name format A40 SQL> col status format A15 SQL> col version format A15 SQL> SELECT comp_name, status, version FROM dba_registry; SQL> col owner format A20 SQL> col object_name format A40 SQL> select owner, object_type, object_name, status from dba_objects where status!='VALID' order by owner, object_type; Sample output of crsctl status res -t [oracle@gq1-stgemsdb-001]~% crsctl status res -t -------------------------------------------------------------------------------NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------Local Resources --------------------------------------------------------------------------------

Yahoo! White Paper


ora.LISTENER.lsnr ONLINE ONLINE gq1-stgemsdb-001 ONLINE ONLINE gq1-stgemsdb-002 ora.asm OFFLINE OFFLINE gq1-stgemsdb-001 OFFLINE OFFLINE gq1-stgemsdb-002 ora.eons ONLINE ONLINE gq1-stgemsdb-001 ONLINE ONLINE gq1-stgemsdb-002 ora.gsd OFFLINE OFFLINE gq1-stgemsdb-001 OFFLINE OFFLINE gq1-stgemsdb-002 ora.net1.network ONLINE ONLINE gq1-stgemsdb-001 ONLINE ONLINE gq1-stgemsdb-002 ora.ons ONLINE ONLINE gq1-stgemsdb-001 ONLINE ONLINE gq1-stgemsdb-002 ora.registry.acfs OFFLINE OFFLINE gq1-stgemsdb-001 OFFLINE OFFLINE gq1-stgemsdb-002 -------------------------------------------------------------------------------Cluster Resources -------------------------------------------------------------------------------ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE gq1-stgemsdb-001 ora.gq1-stgemsdb-001.vip 1 ONLINE ONLINE gq1-stgemsdb-001 ora.gq1-stgemsdb-002.vip 1 ONLINE ONLINE gq1-stgemsdb-002 ora.gqems01s.db 1 ONLINE ONLINE gq1-stgemsdb-001 2 ONLINE ONLINE gq1-stgemsdb-002 Open ora.oc4j 1 OFFLINE OFFLINE ora.scan1.vip 1 ONLINE ONLINE gq1-stgemsdb-001 GSD in offline state is expected as its not required and not used anymore.

Yahoo! White Paper


rootupgrade failure: If rootupgrade.sh fails during installation, follow the below mentioned Metalink note for further troubleshooting. How to proceed from Failed Upgrade to 11gR2 Grid Infrastructure (CRS) [ID 969254.1] XDB AND ACL: Starting with Oracle 11gR1 (11.1.0.6) so called "fine-grained access" was implemented to limit usage of packages like UTL_SMTP, UTL_HTTP connecting over the network to other services like mail server etc. In 11g several UTL_* packages require additional permissions to be granted for network access. In summary these are the steps: 1. Create an ACL - setting the privilege required for the user. 2. Assign the ACL to a network. 3. Test the UTL_ package. The ACL is an XML file which lists the permissions given to user(s). This XML is stored in Oracle XML DB. Ensure this is installed. XML DB creation :( Ref ML Note:742014.1) [oracle@gq1-stgemsdb-001]~%sqlplus / as sysdba SQL> spool catqm.log SQL> @?/rdbms/admin/catqm.sql xdb sysaux temp SQL> spool off Where: xdb - Schema Name sysaux - Tablespace(You can have seperate tablespace created if needed) temp - Temporary tablespace When prompted for arg 4, press Enter or you can say "YES" or "NO". By default it will choose "Secure Files" which is "YES" Verification SQL: xdbusagecheck.sql (Ref ML Note:733667.1) to verify if the XDB install is valid

Yahoo! White Paper


ACL Creation: Sample ACL statements needed for UTL_SMTP to work for DEV user Create ACL [oracle@gq1-stgemsdb-001]~%sqlplus / as sysdba SQL> BEGIN DBMS_NETWORK_ACL_ADMIN.create_acl ( acl => 'DNADEV_ACL_file.xml', description => 'DNA development ACL file', principal => 'METACONF', is_grant => TRUE, privilege => 'connect', start_date => SYSTIMESTAMP, end_date => NULL); COMMIT; END; / Add Privilege: BEGIN DBMS_NETWORK_ACL_ADMIN.add_privilege ( acl => 'DNADEV_ACL_file.xml', principal => 'METACONF', is_grant => TRUE, privilege => 'connect', position => NULL, start_date => NULL, end_date => NULL); COMMIT; END; / Assign an ACL to a Network BEGIN DBMS_NETWORK_ACL_ADMIN.assign_acl ( acl => 'DNADEV_ACL_file.xml', host => '*', lower_port => 80, upper_port => NULL);

Yahoo! White Paper


COMMIT; END; / Assign ACL BEGIN DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL('DNADEV_ACL_file.xml','mtarelay.ops.yahoo.net',25); COMMIT; END; / Conclusion: This is how 11gR2 upgrades are done using silent and manual mode. There are major differences in the Clusterware stack for 11gR2 as compared to 10gR2. Hence its highly imperative that every Database Administrator goes through the 11gR2 technical stack comprehensively to understand the key changes Oracle has made before doing the upgrades.

You might also like