HA means - no SPOC - N+1 redundancy - Ideal : dual power sources / vendors ; hubs and switches connected to dual power sources - not load balancing (foundry / cisco local director, software load balancers) Availability : 99% - standard server 99.5% - MC / ServiceGuard - application, not node 99.99% -??
HA means - no SPOC - N+1 redundancy - Ideal : dual power sources / vendors ; hubs and switches connected to dual power sources - not load balancing (foundry / cisco local director, software load balancers) Availability : 99% - standard server 99.5% - MC / ServiceGuard - application, not node 99.99% -??
HA means - no SPOC - N+1 redundancy - Ideal : dual power sources / vendors ; hubs and switches connected to dual power sources - not load balancing (foundry / cisco local director, software load balancers) Availability : 99% - standard server 99.5% - MC / ServiceGuard - application, not node 99.99% -??
http://uxsl.europe.hp.com/doc/tech/ha/HAtrain/ Prepared by Anand
Other platforms have other HA software
HA means the following : - no SPOC - N+1 redundancy - Ideal : Dual power sources/vendors ; hubs and switches connected to dual power sources. - Not load balancing (foundry / cisco local director, software load balancers)
HA Terminology : - Cluster (1) - Node (1 to many) - Package (1 to many) - Floating IPs (single/multiple eg. BAMM) ; Can specify hostnames in DNS for each floating IP
Question : Can we have a node in 2 clusters ? Not advisable - dependencies
Availability : 99% - standard server 99.5% - MC/ServiceGuard application, not node 99.99% - ??
Criteria for HA : Ensure that both (all) nodes in the cluster - Are of same build hardware and software wise (patch level, kernel changes, user accounts)
Type of disks applicable for use with HA MC/ServiceGuard : - In general, disks with 2 SPUs/controllers : - VA - FC10, SC10 - XP - DS - AutoRAID 12H - Nike disks
Not recommended - Jamaica disks - Desktop
Note : Disks should have HA (RAID1, RAID5) as well.
Question : Can MC/ServiceGuard work across DCs or countries ie one node in Singapore, the other node in Japan ?
Answer : Yes, provided the heartbeat cable is long enough, or more importantly the subnet is the same and the shared disk system is accessible by both servers. Software Licenses
Part# Description Qty Unit Price B3935DA MC/SG software system license for HPUX 11.x 2 USD 0.0 B3935DA-AE5 MC/SG software license for K/N class 2 USD 5117.0 B3935DA-ABA MC/SG software English localization 2 USD 0.0 B3935DA-0S6 MC/SG 24x7 Support (first year) 2 USD 496.8 B5140BA MC/SG NFS toolkit license 2 USD 322.5 B5140BA-0S6 MC/SG NFS toolkit 24x7 support (first year) 2 USD 64.8 B5139DA Enterprise Cluster Extension 2 USD 427.85 B5139DA-0S6 Enterprise Cluster Extension 24x7 Support 2 USD 86.4 first year) H6194AA MC/SG Implementation 1 USD 15000.0 (to be included only if you want to buy consulting and implementation service from HPC) B7885BA MC/SG LTU Extension for SAP 1 USD 12900.0 (per SAP instance)
**Pls verify with the SAP team if any other SAP related license is needed.
If you would like to buy service from HPC, what our team usually do is to approach Vincent who's the Account manager for HPO and he will arrange for someone from HPC to work with us. (Do remember to include the USD15k)
Software Installation
Note : MC/ServiceGuard can be installed from ctss144 depots. (/var/depot/applications/11.00/hp- ux.,/var/depot/applications/11.11/hp-ux.,)
We have in our depots : Version 11.09 Version 11.13 - recommended
MC/ServiceGuard software to install (basic setup, install on both machines):
B3935DA A.11.13 MC / Service Guard B5140BA A.11.00.04 MC/ServiceGuard NFS Toolkit install only if NFS is required to work within the cluster B5139DA B.01.06 Enterprise Cluster Master Toolkit - optional B8324BA A.01.03 HP Cluster Object Manager optional
Note : Only install the above software from the same DART/CD version, do not try to mix and match from different releases.
Note : If OS is ver 11.11 (11i) and your OS is Mission Critical Environment, then it should come with MC/ServiceGuard installed. Note : Do check /etc/services and /etc/inetd.conf files for the MC/ServiceGuard related services, esp. for 11i mission critical OS.
/etc/services hacl-hb 5300/tcp # High Availability (HA) Cluster heartbeat hacl-gs 5301/tcp # HA Cluster General Services hacl-cfg 5302/tcp # HA Cluster TCP configuration hacl-cfg 5302/udp # HA Cluster UDP configuration hacl-probe 5303/tcp # HA Cluster TCP probe hacl-probe 5303/udp # HA Cluster UDP probe hacl-local 5304/tcp # HA Cluster Commands hacl-test 5305/tcp # HA Cluster Test hacl-dlm 5408/tcp # HA Cluster distributed lock manager
Question : can we install one node with MC/ServiceGuard version 11.09 and the other with version 11.13 or something else, ie different versions?
Answer : Not advisable, compatibility issues. Unless, your doing rolling upgrades.
MC/ServiceGuard Network Design
Note : Usually heartbeat LAN, use internal LAN card Primary and Secondary LANs use 2 separate LAN cards.
Question : How will a 3 node, 4 node cluster be like ? How can we configure the packages to failover ?? Many possibilities
Heartbeat network - cross UTP - Serial cable - dedicated heartbeat subnet - Primary LAN usually set as secondary heartbeat
Cluster/Package Node Configurations - ACTIVE ; ACTIVE - ACTIVE ; PASSIVE
Cluster Lock Disk - Tie breaker - Who gets the lock disk who will reform the cluster, the other will panic reboot usually - What if the cluster lock disk is dead?? - UNPLANNED OUTAGE
switch 1 switch 2 User Lan (Securenet) lan0 lan1 lan1 lan0 lan2 FC2 lan2 sgpue036 sgpue037 heartbeat lan (cross UTP cable) Primary lan (15.209.0.25) - cable name : sgpue036 Primary lan (15.209.0.26) - cable name : sgpue037 192.0.0.1 192.0.0.2 Keychain Database MC/ServiceGuard network design Failover lan (no physical IP, but must be connected to switch 1)) cable name: sgpue037s Failover lan (no physical IP, but must be connected to switch 2)) cable name: sgpue036s MC/ServiceGuard Monitoring - Hardware - Application - ITO - ClusterviewPlus - NNM
MC/ServiceGuard Commands
Cmquerycl Cmcheckconf Cmapplyconf will distribute binary configuration details to all nodes in the cluster Cmgetconf
Cluster specific commands Cmruncl Cmviewcl Cmhaltcl
Node specific commands Cmrunnode Cmhaltnode
Package specific commands Cmrunpkg Cmhaltpkg Cmmodpkg
MC/ServiceGuard with SAM
MC/ServiceGuard backups Database vendors online backup tools Split mirror Business Copy (VA, XP) KNET JFS snapshots Practise of backup for HPMS if no special requests o for filesystem backup whatever filesystem is mounted on which system, therefore if failed over. o for database SAP/DBA will consult tools team on backup strategy usually configure omniback to detect and backup by floating IP. Issues with BAMM ??
Project Timeline (TAT) Gathering information 2 days Hardware setup (LAN)- 2 days Configuration - 3 days (varies) dependencies : Application/DB scripts Testing - 1 day (require CE presence)
MC/8ERVCEGUARD MPLEMENTATON / CONFGURATON
Configure /etc/rc.config.d/netconf on each of the nodes in the cluster with the heartbeat LAN (if using LAN and not serial interface)
# PR1|ARY LAN 1NTERFA0E_NA|E|0="Jan1" 1P_A00RESS|0="15.209.0.25" Su8NET_|ASK|0="255.255.255.192" 8R0A00AST_A00RESS|0="" 1NTERFA0E_STATE|0="" 0R0P_ENA8LE|0=0
# REART8EAT LAN 1NTERFA0E_NA|E|1="Jan0" 1P_A00RESS|1="192.0.0.1" Su8NET_|ASK|1="255.255.255.0" 8R0A00AST_A00RESS|1="" 1NTERFA0E_STATE|1="" 0R0P_ENA8LE|1=0
No1e : Secondary LAN {1f any) does no1 need 1o be conf1gured.
0onf1gure 1he 1roo11hone1roo11.rhos1s on each of 1he nodes 1n 1he cJus1er 1o 1ncJude 11seJf and o1her nodes 1n 1he cJus1er. E.g. E.g. E.g. E.g. Node 1 roo1 Node 1 roo1 Node 1 roo1 Node 1 roo1 Node 2 roo1 Node 2 roo1 Node 2 roo1 Node 2 roo1
Sgpue036.sgp.hp.com root Sgpue037.sgp.hp.com root
No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1 No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1 No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1 No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1er1cncJnodeJ1s1 f1Je on aJJ 1he er1cncJnodeJ1s1 f1Je on aJJ 1he er1cncJnodeJ1s1 f1Je on aJJ 1he er1cncJnodeJ1s1 f1Je on aJJ 1he nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes. nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes. nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes. nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes.
Unmount Logical Volumes and deactivate the Volume Groups that will be controlled/run by the cluster. (These do not need to be entered in /etc/fstab)
E.g. 1. vgchange a n vg02 2. vgchange a n vg03
Note : It is possible that a cluster does not have any cluster lock disk or a even VG at all. Same for packages. Also each VG must be unique for each package, cannot use the same VG for other packages.
Export and distribute the Volume Groups to the secondary (failover) node.
E.g. 1. vgexport p s m v /tmp/vg02.map /dev/vg02 2. vgexport p s m v /tmp/vg03.map /dev/vg03
-p option : preview mode, so that the volume group will not be exported off the original node. - s option : Sharable option, Series 800 only. When the s option is is specified, then the -p, -v, and m options must also be specified. A mapfile is created that can be used to create volume group entries on other systems in the high availability cluster (with the vgimport command). - m option : generates the map file - v option : print verbose
FTP the .map files to secondary (failover) node.
On Secondary (failover) node, create the volume group directories:
E.g. 3. mkdir /dev/vg02 4. mkdir /dev/vg03 5. ls l /dev/*/group 6. mknod /dev/vg01/group c 64 0x020000 7. mknod /dev/vg02/group c 64 0x030000
Import the volume groups onto the secondary (failover) node E.g. 8. vgimport s m /tmp/vg02.map /dev/vg02 9. vgimport s m /tmp/vg03.map /dev/vg03
Note : Leave the cluster volume groups decactivated.
Configure the Cluster (do this on one node) 1. cmquerycl [w] [full] v C /etc/cmcluster/cluster.conf n primary node n secondary node [n other nodes in the cluster]
(Note : This will generate the cluster config file.)
2. Edit the /etc/cmcluster/cluster.conf file
# ********************************************************************** # ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE *************** # ***** For complete details about cluster parameters and how to **** # ***** set them, consult the ServiceGuard manual. **** # **********************************************************************
# Enter a name for this cluster. This name will be used to identify the # cluster when viewing or manipulating it.
CLUSTER_NAME Kcdatabases
# Cluster Lock Parameters # # The cluster lock is used as a tie-breaker for situations # in which a running cluster fails, and then two equal-sized # sub-clusters are both trying to form a new cluster. The # cluster lock may be configured using either a lock disk # or a quorum server. # # You can use either the quorum server or the lock disk as # a cluster lock but not both in the same cluster. # # Consider the following when configuring a cluster. # For a two-node cluster, you must use a cluster lock. For # a cluster of three or four nodes, a cluster lock is strongly # recommended. For a cluster of more than four nodes, a # cluster lock is recommended. If you decide to configure # a lock for a cluster of more than four nodes, it must be # a quorum server.
# Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and # FIRST_CLUSTER_LOCK_PV parameters to define a lock disk. # The FIRST_CLUSTER_LOCK_VG is the LVM volume group that # holds the cluster lock. This volume group should not be # used by any other cluster as a cluster lock device.
# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL, # and QS_TIMEOUT_EXTENSION parameters to define a quorum server. # The QS_HOST is the host name or IP address of the system # that is running the quorum server process. The # QS_POLLING_INTERVAL (microseconds) is the interval at which # ServiceGuard checks to make sure the quorum server is running. # The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase # the time interval after which the quorum server is marked DOWN. # # The default quorum server timeout is calculated from the # ServiceGuard cluster parameters, including NODE_TIMEOUT and # HEARTBEAT_INTERVAL. If you are experiencing quorum server # timeouts, you can adjust these parameters, or you can include # the QS_TIMEOUT_EXTENSION parameter. # # For example, to configure a quorum server running on node # "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to # add 2 seconds to the system assigned value for the quorum server # timeout, enter: # # QS_HOST qshost # QS_POLLING_INTERVAL 120000000 # QS_TIMEOUT_EXTENSION 2000000
FIRST_CLUSTER_LOCK_VG /dev/vg02 < - -This is automatically searched for.
# Definition of nodes in the cluster. # Repeat node definitions as necessary for additional nodes.
NODE_NAME sgpue036 NETWORK_INTERFACE lan0 HEARTBEAT_IP 192.0.0.1. <<- need to change manually from stationary IP to Heartbeat IP NETWORK_INTERFACE lan1 HEARTBEAT_IP 15.209.0.25 NETWORK_INTERFACE lan5 FIRST_CLUSTER_LOCK_PV /dev/dsk/c3t0d0 # List of serial device file names # For example: # SERIAL_DEVICE_FILE /dev/tty0p0
# Primary Network Interfaces on Bridged Net 1: lan0. # Warning: There are no standby network interfaces on bridged net 1. <<- because using cross UTP # Primary Network Interfaces on Bridged Net 2: lan1. # Possible standby Network Interfaces on Bridged Net 2: lan5.
NODE_NAME sgpue037 NETWORK_INTERFACE lan0 HEARTBEAT_IP 192.0.0.2 NETWORK_INTERFACE lan1 HEARTBEAT_IP 15.209.0.26 NETWORK_INTERFACE lan5 FIRST_CLUSTER_LOCK_PV /dev/dsk/c3t0d0 # List of serial device file names # For example: # SERIAL_DEVICE_FILE /dev/tty0p0
# Primary Network Interfaces on Bridged Net 1: lan0. # Warning: There are no standby network interfaces on bridged net 1. # Primary Network Interfaces on Bridged Net 2: lan1. # Possible standby Network Interfaces on Bridged Net 2: lan5.
# Cluster Timing Parameters (microseconds).
# The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds). # This default setting yields the fastest cluster reformations. # However, the use of the default value increases the potential # for spurious reformations due to momentary system hangs or # network load spikes. # For a significant portion of installations, a setting of # 5000000 to 8000000 (5 to 8 seconds) is more appropriate. # The maximum value recommended for NODE_TIMEOUT is 30000000 # (30 seconds).
# Package Configuration Parameters. # Enter the maximum number of packages which will be configured in the cluster. # You can not add packages beyond this limit. # This parameter is required. MAX_CONFIGURED_PACKAGES 8
# List of cluster aware LVM Volume Groups. These volume groups will # be used by package applications via the vgchange -a e command. # Neither CVM or VxVM Disk Groups should be used here. # For example: # VOLUME_GROUP /dev/vgdatabase # VOLUME_GROUP /dev/vg02
VOLUME_GROUP /dev/vg02 VOLUME_GROUP /dev/vg03
Verify the Cluster Configuration (do this on one node) 1. cmcheckconf [k] v C /etc/cmcluster/cluster.conf
Note : If there are no errors, means that the cluster is ready to be applied.
Distributing the Binary Configuration File (do this on one node) 1. vgchange a y /dev/vg02 (cluster lock volume group) 2. cmapplyconf [k] v C /etc/cmcluster/cluster.conf 3. vgchange a n /dev/vg02
Note : Need to activate cluster lock volume group in order for it to be applied for first time clusters. Subsequent changes to the cluster may not need to activate cluster lock or even may not need to down the cluster ie can be done online but not recommended.
Note : Need to deactivate cluster lock disk right after cluster changes are applied.
Backing up Volume Group and Cluster Lock Configuration Data (optional) 1. vgcfgbackup u /dev/vg02 2. vgcfgbackup u /dev/vg03
Note : This does not requires the volume groups to be activated.
Checking Cluster Operation (do on either node) 1. cmruncl v 2. cmhaltnode v primary node 3. cmrunnode v primary node 4. cmhaltcl v 5. cmruncl v 6. cmhaltcl v
Note : Try this on all other nodes in the cluster as well.
Disable Automount of Volume Groups (On both nodes) 1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0
Note : This is necessary as we do not want the cluster volume groups to be activated when a system reboots. It is now under the control of the cluster now.
Disable Autostart Features (On both nodes) 1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=0
Note : This is to prevent the cluster node from automatically joining the cluster after a reboot. Usually done when doing maintenance.
Create Packages E.g. 1. mkdir /etc/cmcluster/kci2prd < - can be any name 2. cmmakepkg p /etc/cmcluster/kci2prd.conf < - can be any name 3. Edit the configuration file
Note : If the package and control file is special (e.g NFS required) then do not run the cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS extension toolkit (similar for SAP extension). You still need to do adjustments to the files to suit your needs.
# ********************************************************************** # ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) ******* # ********************************************************************** # ******* Note: This file MUST be edited before it can be used. ******** # * For complete details about package parameters and how to set them, * # * consult the MC/ServiceGuard ServiceGuard OPS Edition manuals ******* # **********************************************************************
# Enter a name for this package. This name will be used to identify the # package when viewing or manipulating it. It must be different from # the other configured package names.
PACKAGE_NAME kci2prd
# Enter the package type for this package. PACKAGE_TYPE indicates # whether this package is to run as a FAILOVER or SYSTEM_MULTI_NODE # package. # # FAILOVER package runs on one node at a time and if a failure # occurs it can switch to an alternate node. # # SYSTEM_MULTI_NODE # package runs on multiple nodes at the same time. # It can not be started and halted on individual nodes. # Both NODE_FAIL_FAST_ENABLED and AUTO_RUN must be set # to YES for this type of package. All SERVICES must # have SERVICE_FAIL_FAST_ENABLED set to YES. # # NOTE: Packages which have a PACKAGE_TYPE of SYSTEM_MULTI_NODE are # not failover packages and should only be used for applications # provided by Hewlett-Packard. # # Since SYSTEM_MULTI_NODE packages run on multiple nodes at # one time, following parameters are ignored: # # FAILOVER_POLICY # FAILBACK_POLICY # # Since an IP address can not be assigned to more than node at a # time, relocatable IP addresses can not be assigned in the # package control script for multiple node packages. If # volume groups are assigned to multiple node packages they must # activated in a shared mode and data integrity is left to the # application. Shared access requires a shared volume manager. # # # Examples : PACKAGE_TYPE FAILOVER (default) # PACKAGE_TYPE SYSTEM_MULTI_NODE #
PACKAGE_TYPE FAILOVER
# Enter the failover policy for this package. This policy will be used # to select an adoptive node whenever the package needs to be started. # The default policy unless otherwise specified is CONFIGURED_NODE. # This policy will select nodes in priority order from the list of # NODE_NAME entries specified below. # # The alternative policy is MIN_PACKAGE_NODE. This policy will select # the node, from the list of NODE_NAME entries below, which is # running the least number of packages at the time this package needs # to start.
FAILOVER_POLICY CONFIGURED_NODE
# Enter the failback policy for this package. This policy will be used # to determine what action to take when a package is not running on # its primary node and its primary node is capable of running the # package. The default policy unless otherwise specified is MANUAL. # The MANUAL policy means no attempt will be made to move the package # back to its primary node when it is running on an adoptive node. # # The alternative policy is AUTOMATIC. This policy will attempt to # move the package back to its primary node whenever the primary node # is capable of running the package.
FAILBACK_POLICY MANUAL
# Enter the names of the nodes configured for this package. Repeat # this line as necessary for additional adoptive nodes. # # NOTE: The order is relevant. # Put the second Adoptive Node after the first one. # # Example : NODE_NAME original_node # NODE_NAME adoptive_node # # If all nodes in cluster is to be specified and order is not # important, "NODE_NAME *" may be specified. # # Example : NODE_NAME *
NODE_NAME sgpue036 NODE_NAME sgpue037
# Enter the value for AUTO_RUN. Possible values are YES and NO. # The default for AUTO_RUN is YES. When the cluster is started the # package will be automatically started. In the event of a failure the # package will be started on an adoptive node. Adjust as necessary. # # AUTO_RUN replaces obsolete PKG_SWITCHING_ENABLED.
AUTO_RUN YES
# Enter the value for LOCAL_LAN_FAILOVER_ALLOWED. # Possible values are YES and NO. # The default for LOCAL_LAN_FAILOVER_ALLOWED is YES. In the event of a # failure, this permits the cluster software to switch LANs locally # (transfer to a standby LAN card). Adjust as necessary. # # LOCAL_LAN_FAILOVER_ALLOWED replaces obsolete NET_SWITCHING_ENABLED.
LOCAL_LAN_FAILOVER_ALLOWED YES
# Enter the value for NODE_FAIL_FAST_ENABLED. # Possible values are YES and NO. # The default for NODE_FAIL_FAST_ENABLED is NO. If set to YES, # in the event of a failure, the cluster software will halt the node # on which the package is running. All SYSTEM_MULTI_NODE packages must have # NODE_FAIL_FAST_ENABLED set to YES. Adjust as necessary. NODE_FAIL_FAST_ENABLED NO
# Enter the complete path for the run and halt scripts. In most cases # the run script and halt script specified here will be the same script, # the package control script generated by the cmmakepkg command. This # control script handles the run(ning) and halt(ing) of the package. # Enter the timeout, specified in seconds, for the run and halt scripts. # If the script has not completed by the specified timeout value, # it will be terminated. The default for each script timeout is # NO_TIMEOUT. Adjust the timeouts as necessary to permit full # execution of each script. # Note: The HALT_SCRIPT_TIMEOUT should be greater than the sum of # all SERVICE_HALT_TIMEOUT specified for all services.
# Enter the names of the storage groups configured for this package. # Repeat this line as necessary for additional storage groups. # # Storage groups are only used with CVM disk groups. Neither # VxVM disk groups or LVM volume groups should be listed here. # By specifying a CVM disk group with the STORAGE_GROUP keyword # this package will not run until the VxVM-CVM-pkg package is # running and thus the CVM shared disk groups are ready for # activation. # # NOTE: Should only be used by applications provided by # Hewlett-Packard. # # Example : STORAGE_GROUP dg01 # STORAGE_GROUP dg02 # STORAGE_GROUP dg03 # STORAGE_GROUP dg04 #
# Enter the SERVICE_NAME, the SERVICE_FAIL_FAST_ENABLED and the # SERVICE_HALT_TIMEOUT values for this package. Repeat these # three lines as necessary for additional service names. All # service names MUST correspond to the service names used by # cmrunserv and cmhaltserv commands in the run and halt scripts. # # The value for SERVICE_FAIL_FAST_ENABLED can be either YES or # NO. If set to YES, in the event of a service failure, the # cluster software will halt the node on which the service is # running. If SERVICE_FAIL_FAST_ENABLED is not specified, the # default will be NO. # # SERVICE_HALT_TIMEOUT is represented in the number of seconds. # This timeout is used to determine the length of time (in # seconds) the cluster software will wait for the service to # halt before a SIGKILL signal is sent to force the termination # of the service. In the event of a service halt, the cluster # software will first send a SIGTERM signal to terminate the # service. If the service does not halt, after waiting for the # specified SERVICE_HALT_TIMEOUT, the cluster software will send # out the SIGKILL signal to the service to force its termination. # This timeout value should be large enough to allow all cleanup # processes associated with the service to complete. If the # SERVICE_HALT_TIMEOUT is not specified, a zero timeout will be # assumed, meaning the cluster software will not wait at all # before sending the SIGKILL signal to halt the service. # # Example: SERVICE_NAME DB_SERVICE # SERVICE_FAIL_FAST_ENABLED NO # SERVICE_HALT_TIMEOUT 300 # # To configure a service, uncomment the following lines and # fill in the values for all of the keywords. # SERVICE_NAME kci2prd SERVICE_FAIL_FAST_ENABLED NO SERVICE_HALT_TIMEOUT 300
# Enter the network subnet name that is to be monitored for this package. # Repeat this line as necessary for additional subnet names. If any of # the subnets defined goes down, the package will be switched to another # node that is configured for this package and has all the defined subnets # available.
SUBNET 15.209.0.0
# The keywords RESOURCE_NAME, RESOURCE_POLLING_INTERVAL, # RESOURCE_START, and RESOURCE_UP_VALUE are used to specify Package # Resource Dependencies. To define a package Resource Dependency, a # RESOURCE_NAME line with a fully qualified resource path name, and # one or more RESOURCE_UP_VALUE lines are required. The # RESOURCE_POLLING_INTERVAL and the RESOURCE_START are optional. # # The RESOURCE_POLLING_INTERVAL indicates how often, in seconds, the # resource is to be monitored. It will be defaulted to 60 seconds if # RESOURCE_POLLING_INTERVAL is not specified. # # The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED. # The default setting for RESOURCE_START is AUTOMATIC. If AUTOMATIC # is specified, ServiceGuard will start up resource monitoring for # these AUTOMATIC resources automatically when the node starts up. # If DEFERRED is selected, ServiceGuard will not attempt to start # resource monitoring for these resources during node start up. User # should specify all the DEFERRED resources in the package run script # so that these DEFERRED resources will be started up from the package # run script during package run time. # # RESOURCE_UP_VALUE requires an operator and a value. This defines # the resource 'UP' condition. The operators are =, !=, >, <, >=, # and <=, depending on the type of value. Values can be string or # numeric. If the type is string, then only = and != are valid # operators. If the string contains whitespace, it must be enclosed # in quotes. String values are case sensitive. For example, # # Resource is up when its value is # -------------------------------- # RESOURCE_UP_VALUE = UP "UP" # RESOURCE_UP_VALUE != DOWN Any value except "DOWN" # RESOURCE_UP_VALUE = "On Course" "On Course" # # If the type is numeric, then it can specify a threshold, or a range to # define a resource up condition. If it is a threshold, then any operator # may be used. If a range is to be specified, then only > or >= may be used # for the first operator, and only < or <= may be used for the second operator. # For example, # Resource is up when its value is # -------------------------------- # RESOURCE_UP_VALUE = 5 5 (threshold) # RESOURCE_UP_VALUE > 5.1 greater than 5.1 (threshold) # RESOURCE_UP_VALUE > -5 and < 10 between -5 and 10 (range) # # Note that "and" is required between the lower limit and upper limit # when specifying a range. The upper limit must be greater than the lower # limit. If RESOURCE_UP_VALUE is repeated within a RESOURCE_NAME block, then # they are inclusively OR'd together. Package Resource Dependencies may be # defined by repeating the entire RESOURCE_NAME block. # # Example : RESOURCE_NAME /net/interfaces/lan/status/lan0 # RESOURCE_POLLING_INTERVAL 120 # RESOURCE_START AUTOMATIC # RESOURCE_UP_VALUE = RUNNING # RESOURCE_UP_VALUE = ONLINE # # Means that the value of resource /net/interfaces/lan/status/lan0 # will be checked every 120 seconds, and is considered to # be 'up' when its value is "RUNNING" or "ONLINE". # # Uncomment the following lines to specify Package Resource Dependencies. # #RESOURCE_NAME <Full_path_name> #RESOURCE_POLLING_INTERVAL <numeric_seconds> #RESOURCE_START <AUTOMATIC/DEFERRED> #RESOURCE_UP_VALUE <op> <string_or_numeric> [and <op> <numeric>]
Create Package Control Scripts 1. cmmakepkg s /etc/cmcluster/kci2prd/kci2prd.cntl 2. Edit the control script.
Note : If the package and control file is special (e.g NFS required) then do not run the cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS extension toolkit (similar for SAP extension). You still need to do adjustments to the files to suit your needs.
Note : It is possible that packages do not use any volume groups.
# ********************************************************************** # * * # * HIGH AVAILABILITY PACKAGE CONTROL SCRIPT (template) * # * * # * Note: This file MUST be edited before it can be used. * # * * # **********************************************************************
# The PACKAGE and NODE environment variables are set by # ServiceGuard at the time the control script is executed. # Do not set these environment variables yourself! # The package may fail to start or halt if the values for # these environment variables are altered.
# UNCOMMENT the variables as you set them.
# Set PATH to reference the appropriate directories. PATH=/usr/bin:/usr/sbin:/etc:/bin
# VOLUME GROUP ACTIVATION: # Specify the method of activation for volume groups. # Leave the default ("VGCHANGE="vgchange -a e") if you want volume # groups activated in exclusive mode. This assumes the volume groups have # been initialized with 'vgchange -c y' at the time of creation. # # Uncomment the first line (VGCHANGE="vgchange -a e -q n"), and comment # out the default, if your disks are mirrored on separate physical paths, # # Uncomment the second line (VGCHANGE="vgchange -a e -q n -s"), and comment # out the default, if your disks are mirrored on separate physical paths, # and you want the mirror resynchronization to ocurr in parallel with # the package startup. # # Uncomment the third line (VGCHANGE="vgchange -a y") if you wish to # use non-exclusive activation mode. Single node cluster configurations # must use non-exclusive activation. # # VGCHANGE="vgchange -a e -q n" # VGCHANGE="vgchange -a e -q n -s" # VGCHANGE="vgchange -a y" VGCHANGE="vgchange -a e" # Default
# CVM DISK GROUP ACTIVATION: # Specify the method of activation for CVM disk groups. # Leave the default # (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite") # if you want disk groups activated in the exclusive write mode. # # Uncomment the first line # (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"), # and comment out the default, if you want disk groups activated in # the readonly mode. # # Uncomment the second line # (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"), # and comment out the default, if you want disk groups activated in the # shared read mode. # # Uncomment the third line # (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"), # and comment out the default, if you want disk groups activated in the # shared write mode. # # CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly" # CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread" # CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite" CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite"
# VOLUME GROUPS # Specify which volume groups are used by this package. Uncomment VG[0]="" # and fill in the name of your first volume group. You must begin with # VG[0], and increment the list in sequence. # # For example, if this package uses your volume groups vg01 and vg02, enter: # VG[0]=vg01 # VG[1]=vg02 # # The volume group activation method is defined above. The filesystems # associated with these volume groups are specified below. # VG[0]=vg02 VG[1]=vg03
# CVM DISK GROUPS # Specify which cvm disk groups are used by this package. Uncomment # CVM_DG[0]="" and fill in the name of your first disk group. You must # begin with CVM_DG[0], and increment the list in sequence. # # For example, if this package uses your disk groups dg01 and dg02, enter: # CVM_DG[0]=dg01 # CVM_DG[1]=dg02 # # The cvm disk group activation method is defined above. The filesystems # associated with these volume groups are specified below in the CVM_* # variables. # #CVM_DG[0]=""
# VxVM DISK GROUPS # Specify which VxVM disk groups are used by this package. Uncomment # VXVM_DG[0]="" and fill in the name of your first disk group. You must # begin with VXVM_DG[0], and increment the list in sequence. # # For example, if this package uses your disk groups dg01 and dg02, enter: # VXVM_DG[0]=dg01 # VXVM_DG[1]=dg02 # # The cvm disk group activation method is defined above. # #VXVM_DG[0]=""
# # NOTE: A package could have LVM volume groups, CVM disk groups and VxVM # disk groups. # # FILESYSTEMS # Specify the filesystems which are used by this package. Uncomment # LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]="" and fill in the name of your first # logical volume, filesystem and mount option for the file system. You must # begin with LV[0], FS[0] and FS_MOUNT_OPT[0] and increment the list in # sequence. # # For the LVM example, if this package uses the file systems pkg1a and # pkg1b, which are mounted on the logical volumes lvol1 and lvol2 with # read and write options enter: # LV[0]=/dev/vg01/lvol1; FS[0]=/pkg1a; FS_MOUNT_OPT[0]="-o rw" # LV[1]=/dev/vg01/lvol2; FS[1]=/pkg1b; FS_MOUNT_OPT[1]="-o rw" # # For the CVM or VxVM example, if this package uses the file systems # pkg1a and pkg1b, which are mounted on the volumes lvol1 and lvol2 # with read and write options enter: # LV[0]="/dev/vx/dsk/dg01/vol01"; FS[0]="/pkg1a"; FS_MOUNT_OPT[0]="-o rw" # LV[1]="/dev/vx/dsk/dg01/vol02"; FS[1]="/pkg1b"; FS_MOUNT_OPT[1]="-o rw" # # The filesystems are defined as triplets of entries specifying the logical # volume, the mount point and the mount options for the file system. Each # filesystem will be fsck'd prior to being mounted. The filesystems will be # mounted in the order specified during package startup and will be unmounted # in reverse order during package shutdown. Ensure that volume groups # referenced by the logical volume definitions below are included in # volume group definitions above. # #LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]=""
# # VOLUME RECOVERY # # When mirrored VxVM volumes are started during the package control # bring up, if recovery is required the default behavior is for # the package control script to wait until recovery has been # completed. # # To allow mirror resynchronization to ocurr in parallel with # the package startup, uncomment the line # VXVOL="vxvol -g \$DiskGroup -o bg startall" and comment out the default. # # VXVOL="vxvol -g \$DiskGroup -o bg startall" VXVOL="vxvol -g \$DiskGroup startall" # Default
# FILESYSTEM UNMOUNT COUNT # Specify the number of unmount attempts for each filesystem during package # shutdown. The default is set to 1. FS_UMOUNT_COUNT=1
# FILESYSTEM MOUNT RETRY COUNT. # Specify the number of mount retrys for each filesystem. # The default is 0. During startup, if a mount point is busy # and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and # the script will exit with 1. If a mount point is busy and # FS_MOUNT_RETRY_COUNT is greater than 0, the script will attempt # to kill the user responsible for the busy mount point # and then mount the file system. It will attempt to kill user and # retry mount, for the number of times specified in FS_MOUNT_RETRY_COUNT. # If the mount still fails after this number of attempts, the script # will exit with 1. # NOTE: If the FS_MOUNT_RETRY_COUNT > 0, the script will execute # "fuser -ku" to freeup busy mount point. FS_MOUNT_RETRY_COUNT=0
# CONCURRENT VGCHANGE OPERATIONS # Specify the number of concurrent volume group activations or # deactivations to allow during package startup or shutdown. # Setting this value to an appropriate number may improve the performance # while activating or deactivating a large number of volume groups in the # package. If the specified value is less than 1, the script defaults it # to 1 and proceeds with a warning message in the package control script # logfile. CONCURRENT_VGCHANGE_OPERATIONS=1
# CONCURRENT DISK GROUP OPERATIONS # Specify the number of concurrent VxVM DG imports or deports to allow # during package startup or shutdown. # Setting this value to an appropriate number may improve the performance # while importing or deporting a large number of disk groups in the # package. If the specified value is less than 1, the script defaults it # to 1 and proceeds with a warning message in the package control script # logfile. CONCURRENT_DISKGROUP_OPERATIONS=1
# CONCURRENT FSCK OPERATIONS # Specify the number of concurrent fsck to allow during package startup. # Setting this value to an appropriate number may improve the performance # while checking a large number of file systems in the package. If the # specified value is less than 1, the script defaults it to 1 and proceeds # with a warning message in the package control script logfile. CONCURRENT_FSCK_OPERATIONS=1
# CONCURRENT MOUNT AND UMOUNT OPERATIONS # Specify the number of concurrent mounts and umounts to allow during # package startup or shutdown. # Setting this value to an appropriate number may improve the performance # while mounting or un-mounting a large number of file systems in the package. # If the specified value is less than 1, the script defaults it to 1 and # proceeds with a warning message in the package control script logfile. CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS=1
# IP ADDRESSES # Specify the IP and Subnet address pairs which are used by this package. # Uncomment IP[0]="" and SUBNET[0]="" and fill in the name of your first # IP and subnet address. You must begin with IP[0] and SUBNET[0] and # increment the list in sequence. # # For example, if this package uses an IP of 192.10.25.12 and a subnet of # 192.10.25.0 enter: # IP[0]=192.10.25.12 # SUBNET[0]=192.10.25.0 # (netmask=255.255.255.0) # # Hint: Run "netstat -i" to see the available subnets in the Network field. # # IP/Subnet address pairs for each IP address you want to add to a subnet # interface card. Must be set in pairs, even for IP addresses on the same # subnet. # #IP[0]="" #SUBNET[0]=""
# SERVICE NAMES AND COMMANDS. # Specify the service name, command, and restart parameters which are # used by this package. Uncomment SERVICE_NAME[0]="", SERVICE_CMD[0]="", # SERVICE_RESTART[0]="" and fill in the name of the first service, command, # and restart parameters. You must begin with SERVICE_NAME[0], SERVICE_CMD[0], # and SERVICE_RESTART[0] and increment the list in sequence. # # For example: # SERVICE_NAME[0]=pkg1a # SERVICE_CMD[0]="/usr/bin/X11/xclock -display 192.10.25.54:0" # SERVICE_RESTART[0]="" # Will not restart the service. # # SERVICE_NAME[1]=pkg1b # SERVICE_CMD[1]="/usr/bin/X11/xload -display 192.10.25.54:0" # SERVICE_RESTART[1]="-r 2" # Will restart the service twice. # # SERVICE_NAME[2]=pkg1c # SERVICE_CMD[2]="/usr/sbin/ping" # SERVICE_RESTART[2]="-R" # Will restart the service an infinite # number of times. # # Note: No environmental variables will be passed to the command, this # includes the PATH variable. Absolute path names are required for the # service command definition. Default shell is /usr/bin/sh. # #SERVICE_NAME[0]="" #SERVICE_CMD[0]="" #SERVICE_RESTART[0]=""
# DEFERRED_RESOURCE NAME # Specify the full path name of the 'DEFERRED' resources configured for # this package. Uncomment DEFERRED_RESOURCE_NAME[0]="" and fill in the # full path name of the resource. # #DEFERRED_RESOURCE_NAME[0]=""
# DTC manager information for each DTC. # Example: DTC[0]=dtc_20 #DTC_NAME[0]=
# START OF CUSTOMER DEFINED FUNCTIONS
# This function is a place holder for customer define functions. # You should define all actions you want to happen here, before the service is # started. You can create as many functions as you need.
function customer_defined_run_cmds { # ADD customer defined run commands. : # do nothing instruction, because a function must contain some command.
/etc/cmcluster/kci2prd/kci2prd.sh start
test_return 51 }
# This function is a place holder for customer define functions. # You should define all actions you want to happen here, before the service is # halted.
function customer_defined_halt_cmds { # ADD customer defined halt commands. : # do nothing instruction, because a function must contain some command.
/etc/cmcluster/kci2prd/kci2prd.sh shutdown
test_return 52 }
# END OF CUSTOMER DEFINED FUNCTIONS
..
Ftp all ascii scripts to secondary (failover) node/nodes.
Verify the Cluster Configuration (Do this on the packages primary node) 1. cmcheckconf [C] [/etc/cmcluster/cluster.conf] P /etc/cmcluster/kci2prd/kci2prd.conf
Note : If there are no errors, means that the package is ready to be applied
Distribute the Cluster Configuration File (Do this on the packages primary node) 1. vgchange a y /dev/vg02 (cluster lock volume group) 2. cmapplyconf [v] [C] [/etc/cmcluster/cluster.conf] P /etc/cmcluster/kci2prd/kci2prd.conf 3. vgchange a n /dev/vg02
Note : You should not need to activate and later deactivate cluster lock volume group while applying packages.
Note : Repeat steps Create Packages to here again if there are more packages required in the cluster.
Configure Automounter (Do this only if your system is using automounter) 1. Check that in /etc/rc.config.d/nfsconf, the automounter section should be: AUTOMOUNT=1 AUTOMASTER="/etc/auto_master" AUTOMOUNT_OPTIONS="-f $AUTO_MASTER" AUTOMOUNTD_OPTIONS= 2. Check in /etc/rc.config.d/nfsconf, one nfs client and one nfs server daemon is configured to run: NFS_CLIENT=1 NFS_SERVER=1 NUM_NFSD=4 NUM_NFSIOD=4 3. Add this line to /etc/auto_master /- /etc/auto.direct 4. Create an /etc/auto.direct file /oracle <relocdbci_s>:/export/ 5. Restart the automounter with /sbin/init.d/nfs.client stop /sbin/init.d/nfs.client start Disable Automount of Volume Groups (On both nodes) 1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0
Enable Autostart Features (On both nodes) 1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=1
Checking Package Operation (do on either node) 7. cmruncl v 8. cmhaltnode v primary node (node will be halted and package failed over to secondary (adoptive) node) 9. cmrunnode v primary node (node will rejoin cluster) 10. cmhaltpkg package name (halt package on adoptive node) 11. cmrunpkg package name (run package on original node) 12. cmmodpkg e package name (enable package switching) 13. cmhaltcl v
Note : Use cmviewcl or cmviewcl v to view results of each command.
MC/ServiceGuard Template
System Configuration
Hardware Information
Hostname Model Operating System version Physical Memory Swap Space
Non-Shared HDs Shared HDs Tapes LAN Cards Primary and Standby Network Type
Heartbeat Network Type MC ServiceGuard Version MirrorDisk/UX Version Online JFS Version Application name / Application version
Database name / Database version OS/Appls Patch Level
System Information
Server Hostname Server IP Address Server IP Netmask Server Default Router Primary Network on separate Switch
Standby Network on separate Switch
Operation System File System Layout
Volume Group Logical FS Type Size (mb) Mount point
MC/ServiceGuard Configuration
Cluster Information
Cluster Name Cluster Members Cluster Lock Disk Heartbeat Interval Default Value is 1 Node Timeout Default Value is 2 ; recommended 8 Network Polling Interval Default Value is 2 Autostart Delay Default Value is 10mins Maximum Configured Packages To allow online package reconfiguration
Packages Overview
The cluster consist of ________ packages:
1. 2. 3.
Detailed Package Information:
Package Name Re-locatable Hostname Re-locatable IP Address Monitor Subnet Primary Node Adoptive Node Run/Halt Script Run/Halt Script Timeout Package Switch Enabled Network Switch Enabled Node Failfast Enabled Service Name Volume Groups Logic Logical Volume and File System Details Device file Size/ Type Mount Point Owner Group Perm.
Custer Configuration File
Parameter
Value CLUSTER_NAME
FIRST_CLUSTER_LOCK_VG
NODE_NAME
NETWORK_INTERFACE
HEARTBEAT_IP
NETWORK_INTERFACE
HEARTBEAT_IP
FIRST_CLUSTER_LOCK_PV
NODE_NAME
NETWORK_INTERFACE
HEARTBEAT_IP
NETWORK_INTERFACE
HEARTBEAT_IP
FIRST_CLUSTER_LOCK_PV
HEARTBEAT_INTERVAL
(Default value is 1s)
NODE_TIMEOUT (Default value is 2s)
AUTO_START_TIMEOUT
(Default value is 10 mins) NETWORK_POLLING_INTERVAL
(Default value is 2s) MAX_CONFIGURED_PACKAGES
(To allow and add for online package reconfiguration)
(Mount point and session name) MONITOR_INTERVAL (Time between checks)
MONITOR_PROCESSES (Processes like dataserver etc)
PACKAGE_NAME
TIME_OUT (Waiting time in seconds for Informix/Oracle abort to complete before killing Informix/Oracleprocesses)
Note : If it is oracle, SAP or NFS, there are pre-defined scripts for these, provided you install the enterprise master toolkit and nfs toolkit - /opt/cmcluster/
TESTING MC/SERVICEGUARD
1.1 Test Overview
This section contains the test requirement and test plan for the MC/ServiceGuard
1.2 Test Requirement
The MC/ServiceGuard product is a High Availability solution that performs system failure detection and transfers the application from the primary node to the adoptive node when a system failure occurs.
Note : We assume that there is only 1 package in the cluster. If in the event there are more packages, please change/add steps accordingly.
The faults to be tested and the appropriate methods are listed below:
Type of Failure Method of Simulation
CPU, Memory, Power Supply and Operating System
Active LAN
Total Data LAN
Reset of server
Removal of LAN cable from active LAN card
Removal of all Data LAN cables from server
1.3 Verification method
Upon startup of the package, the verification checkpoints are
a. Log onto surviving server and run the command cmviewcl to check that the package application is RUNNING
b. Ping the relocatable IP from another station in the same network
c. Check that all shared file systems are mounted.
1.4 Test Checklist
Five categories of test that will be performed are as follows:
a. Normal Bootup b. Manual Package Switching Functionality c. LAN Failure Tests Heartbeat Failure Data LAN Failure d. System Failure Tests e. Failures not affecting package These are sanity checks to ensure that failure of the adoptive node in the cluster has no side effect on the primary node.
No. Test Method of Simulation
Expected Result Check Remarks NORMAL BOOTUP SEQUENCE
1 Normal boot up Power on or reboot both servers Cluster is up with node1 and node2 running and package is running on node1
MANUAL PACKAGE SWITCHING FUNCTIONALITY
1 Package halts successfully on node1 Run cmhaltpkg v package command Application shuts down successfully and package is halted properly
2 Package starts successfully on node2
Run cmrunpkg v n node2 package command Package starts up successfully on node2
3 Package halts successfully on node2 Run cmhaltpkg v package command Application shuts down successfully and package is halted properly
4 Package starts successfully on node1
Run cmrunpkg v n node1 package command Package starts up successfully on node1
No. Test Method of Simulation
Expected Result Check Remarks LAN FAILURE TESTS
1 Heartbeat LAN failure on node1
package is running on node1
Pull out lan0 cable on node1
lan1 takes over as Heartbeat LAN and package remains running on node1
2 Pri Data LAN failure on node1 package is running on node1
Pull out lan1 cable on node1
Sec LAN, lan5 takes over as active LAN and package remains running on node1
3 Sec Data LAN failure on node1 package is running on node1
Pull out lan5 cable on node1 Pri LAN, lan1 takes over as active LAN and package remains running on node1
4 Total Data LAN Failure on node1
package is running on node1
Pull out lan1 and lan5 from node1
Package fails to node2 if it is running as a node in the cluster ; 50 % chance of failing on adoptive node as unable to get cluster lock and panic reboots
5 Heartbeat LAN failure on node2
package is running on node2
Pull out lan0 cable on node2
lan1 takes over as Heartbeat LAN and package remains running on node2
6 Pri Data LAN failure on node2 package is running on node2
Pull out lan1 cable on node2 Secondary LAN, lan5 takes over as active LAN and package remains running on node2
7 Sec Data LAN failure on node2 package is running on node2
Pull out lan5 cable on node2
Pri LAN, lan1 takes over as active LAN and package remains running on node2
8 Total Data LAN Failure on node2
package is running on node2
Pull out lan1 and lan5 from node2
Package fails to node1 if it is running as a node in the cluster; ; 50 % chance of failing on adoptive node as unable to get cluster lock and panic reboots
You may wish to extend the test to test the functionaility of MC/ServiceGuard with regards to application monitoring scripts and application failover.
No. Test Method of Simulation
Result Check Remarks SYSTEM FAILURE TESTS
1 node1 failure
package is running on node1
Reset node1 (try both shutdown ry and reboot or rs from console)
Package fails to node2 if it is running as a node in the cluster Yes 2 node2 failure package is running on node2
Reset node2 (try both shutdown ry and reboot or rs from console)
Package fails to node1 if it is running as a node in the cluster Yes FAILURES NOT AFFECTING PACKAGE
1 node2 failure package is running on node1 Cluster reforms to a single node cluster and package continues to run on node1
Yes
MC/ServiceGuard Troubleshooting
Troubleshooting using log files
For troubleshooting, there are a few files that will help to log problems experienced by MC/ServiceGuard, these are:
a. /var/adm/syslog/syslog.log b. /etc/cmcluster/packagedir/packagename.cntl.log
These files need to be maintained as the file size will grow. This can ultimately affect / file system if not maintained.
The package control log file will contain information regarding package start/stop. Each package will have its own package control log file.
Note : Always use cmviewcl or cmviewcl v to help to see the status of your cluster.
Common Problems :
1. Problems of configuration - missing entries /etc/services, /etc/inetd.conf - .rhosts or cmclnodelist not configured - grammatic errors in config and control files
2. Warning : Missing cluster lock disk - Will repeat itself every hour by cmcld daemon in syslog.log - This problem occurs after something has changed affecting the cluster lock disk eg. SCSI ID of disk changed - No issue at the moment, but when a tie breaker period occurs, nodes will not be able to detect the disk and all nodes may panic reboot.
Solution :
1. Schedule downtime to halt the cluster (cmhaltcl) 2. Run vgchange c n vgsh to remove the cluster lock volume group from the cluster. 3. Activate vgsh on the node where the cluster configuration ascii file exists by running vgchange a y vgsh and do a cmapplyconf v C /etc/cmcluster/cluster.ascii. Answer yes to the change and then run vgchange a n vg02 to deactivate the cluster lock volume disk. 4. Start the cluster (cmruncl)
3. Warning : I/O error on cluster lock disk - Will repeat itself every hour by cmcld daemon in syslog.log - This problem usually occurs if something is wrong with one of the SPUs or controllers of the disk array connected to one of the nodes. - If happened on the primary node, it would be possible that the application would already have hung. - No issue if occur on adoptive node at the moment, but when a tie breaker period occurs, nodes will not be able to detect the disk and all nodes may panic reboot. - In other cases, cluster lock disk itself could be faulty and a hung situation wrf to the application and bdfwill occur.
Solution :
- Schedule downtime and ask CE to check the SPU or controller
4. Cluster failures - Cluster cannot start - missing entries /etc/services, /etc/inetd.conf - .rhosts or cmclnodelist not configured - grammatic errors in config and control files - could be hardware, package induced, application problem. Again check log files.
5. Package failures - Package unable to start totally on all nodes - Check syslog and package log file. Possible config problem or control script problem or application script name changed. - Package cannot failover to adoptive node but can start on primary node - Check syslog and package log file. Possible could be package switching or node disabled Cmmodpkg e package name to enable package switching Cmmodpkg e package name n node name to enable node (package to run on this node) - Package cannot mount/umount filesystems from package log - Package failed to start because of mount problems Possible shared VG not marked as cluster or activated - manually mounted fileystem or someone accessing umounted directory Unmount all filesystems, check who accessing directory and get that person to exit, vgchange c y vgsh to mark cluster and deactivate and try starting again. Harddisk problem - Package failed to halt application process hung and could not be killed. Hardddisk problem
6. Service Failure - Cmviewcl v to see the status of all packages and their services. - Trace from the package control file and syslog to see why did it fail etc. - Possible config problem or control script problem or application script name changed.
7. Node timeout - Recommended node timeout value in cluster config file is 5-8 seconds - Otherwise if use default 2 seconds, system may panic reboot due to tie breaker scenario because of poor network performance.
8. GSP problems - Known problem for L class servers (certain generation) - Cause system to panic reboot and failover package to adoptive node - Patch recommended/GSP Firmware upgrade need to be done
9. LAN problems - NMID problems
10. Disk problems - SCSI ID changed /conflict perhaps due to controller card factory default setting Cannot bring up cluster Need CE to change accordingly. - Cluster lock disk failed If lock disk RAID1 or RAID5 no problem If lock disk LVM mirror need to do vgcfgrestore and vgsync to recover the lock info which is stored on the BBR table part of the disk If no mirror, then need reapply cluster
On-Going Upgrades/Changes to systems/cluster /package
- Pro-active Patch installation (node by node) - Data Centre outages (shutdown entire cluster) - Rolling upgrades (node by node)
Keychain Cluster - Shutdown and Startup Procedure ------------------------------------------------- Last update: 19 June SGP 2002
******************************************************************* Please follow these steps whenever you need to arrange a shutdown for sgpue036.sgp.hp.com & sgpue037.sgp.hp.com. Special handling is required because of their MC/Serviceguard HA environment. *******************************************************************
Before you shutdown a node -------------------------- 1. Get agreement with application support on schedule, scope and duration of shutdown.
2. Ensure both nodes in the cluster are up and running. If any node is down or appears to be having problems, DO NOT proceed with shutdown.
3. If shutting down a primary node, goto section titled "Shutting down and restarting the primary node".
If shutting down a secondary node, goto section titled "Shutting down and restarting the secondary node".
If shutting down the entire cluster, goto section titled "Shutting down and restarting the MC/SG cluster".
If doing rolling upgrade, goto section titled "Doing a rolling upgrade".
Shutting down and restarting the primary node ------------------------------------------------ We assume primary node = sgpue036 and secondary node = sgpue037 in the following examples.
1. Before shutdown, make a note of all packages currently running on each node.
sgpue036# cmviewcl
> CLUSTER STATUS > knet up > > NODE STATUS STATE > sgpue036 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kci2stg up running enabled sgpue036 > > NODE STATUS STATE > sgpue037 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kcdbstg up running enabled sgpue037 > kcnfs up running enabled sgpue037
2. Halt primary node sgpue036
sgpue036# cmhaltnode -f -v sgpue036
Production packages will failover from sgpue036 to sgpue037. sgpue036 will cease to be a member of the active cluster.
3. Check package status on cluster
sgpue036# cmviewcl
> CLUSTER STATUS > knet up > > NODE STATUS STATE > sgpue037 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kci2stg up running disabled sgpue037 > kcdbstg up running enabled sgpue037 > kcnfs up running enabled sgpue037 > > NODE STATUS STATE > sgpue036 down halted
4. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the following line:
AUTOSTART_CMCLD = 0
5. Now we can proceed to shutdown (for PM, repair) or reboot (for patching, kernel regen) sgpue036, eg:
6. When repair or reboot is over, sgpue036 should be booted up to run level 3
sgpue036# who -r . run-level 3 Jan 17 08:01 3 0 S
7. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the following line:
AUTOSTART_CMCLD = 1
8. Make sgpue036 join the cluster
sgpue036# cmrunnode -v sgpue036
9. Halt production packages on sgpue037
sgpue037# cmhaltpkg kci2stg
10. Restart production packages on sgpue036
sgpue036# cmrunpkg kci2stg
11. Re-enable package switching on production packages
sgpue036# cmmodpkg -e kci2stg
12. Check package status on cluster. You should see the same listing as shown in Step 1 ie.
sgpue036# cmviewcl
> CLUSTER STATUS > knet up > > NODE STATUS STATE > sgpue036 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kci2stg up running enabled sgpue036 > > NODE STATUS STATE > sgpue037 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kcdbstg up running enabled sgpue037 > kcnfs up running enabled sgpue037
13. Release sgpue036 to customers (notify by phone, email etc)
Shutting down and restarting the secondary node ---------------------------------------------
1. Before shutdown, make a note of all packages currently running on each node
sgpue037# cmviewcl
> CLUSTER STATUS > knet up > > NODE STATUS STATE > sgpue036 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kci2stg up running enabled sgpue036 > > NODE STATUS STATE > sgpue037 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kcdbstg up running enabled sgpue037 > kcnfs up running enabled sgpue037
2. Halt secondary node sgpue037
sgpue037# cmhaltnode -f -v sgpue037
Production packages will failover from sgpue037 to sgpue036. sgpue037 will cease to be a member of the active cluster.
3. Check package status on cluster
sgpue037# cmviewcl
> CLUSTER STATUS > knet up > > NODE STATUS STATE > sgpue036 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kci2stg up running enabled sgpue036 > kcdbstg up running disabled sgpue036 > kcnfs up running disabled sgpue036 > > NODE STATUS STATE > sgpue037 down halted
4. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include the following line:
AUTOSTART_CMCLD = 0
5. Now we can proceed to shutdown (for PM, repair) or reboot (for patching, kernel regen) sgpue037, eg:
12. Check package status on cluster. You should see the same listing as shown in Step 1 ie.
sgpue037# cmviewcl
> CLUSTER STATUS > knet up > > NODE STATUS STATE > sgpue036 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kci2stg up running enabled sgpue036 > > NODE STATUS STATE > sgpue037 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kcdbstg up running enabled sgpue037 > kcnfs up running enabled sgpue037
13. Release sgpue037 to customers (notify by phone, email etc) Shutting down and restarting the MC/SG cluster ---------------------------------------------- We assume primary node = sgpue036 and secondary node = sgpue037 in the following examples.
1. Log in to sgpue036 or sgpue037 as superuser and issue command to halt cluster daemon
sgpue036# cmhaltcl -f -v
2. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include the following line:
4. After planned activity is over, bootup each node to run level 3
sgpue036# who -r sgpue037# who -r . run-level 3 Jan 17 08:01 3 0 S
5. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include the following line:
AUTOSTART_CMCLD = 1
6. Startup the cluster daemon from any node
sgpue036# cmruncl -v
7. Check package status on cluster. It should look exactly like the following
sgpue036# cmviewcl
> CLUSTER STATUS > knet up > > NODE STATUS STATE > sgpue036 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kci2stg up running enabled sgpue036 > > NODE STATUS STATE > sgpue037 up running > > PACKAGE STATUS STATE AUTO_RUN NODE > kcdbstg up running enabled sgpue037 > kcnfs up running enabled sgpue037
8. Release machines to customers (notify by phone, email etc)
Doing a rolling upgrade ----------------------- This is the most common scenario where we work on 1 node at a time without bringing down the entire cluster. This ensures there is at least 1 node available to run the application packages. The steps are already detailed above. Either:
1. Shutting down and restarting the primary node 2. Shutting down and restarting the secondary node
or
1. Shutting down and restarting the secondary node 2. Shutting down and restarting the primary node
Note : This may apply to OS upgrades eg. 10.20 to 11.00 whereby MC/SG is from ver 10.10 to 11.X Another method, you may deploy is building a separate cluster on a separate machine with the latest OS and just copy all config files over, and just swap package IPs.
- Modifying the cluster o Anything to do with the cluster will need to reapply the cluster (go through the cluster.conf file to see what are the parameters) so need downtime to halt the cluster, except for adding/removing nodes and packages which can be done while cluster is still up and running. Eg. Node timeout, heartbeat interval Eg. cluster name Eg. Heartbeat IPs Eg. No. of packages Eg. Change of node names Eg. Manual change / add of volume group o Steps Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run Cmgetconf v c cluster name outputfilename (cluster ascii file - name it something different) to get latest copy of cluster config file. Modify the outputfilename to make the intended changes to the cluster. cmcheckconf v C outputfilename - cluster ascii file) check for any errors Cmapplyconf v C outputfilename - cluster ascii file) if no errors Start the cluster Cmruncl
- Adding/removing nodes to the cluster o Adding o Online method Heartbeat must be configured and network ready Can be done on any node (preferably node where original cluster config file was placed) cmquerycl [w] [full] v C /etc/cmcluster/outputfilename n primary node n secondary node n new node (Note : This will query the system configuration and generate the new cluster config file, according to whatever name you specified as the outputfilename.) Cmgetconf v c cluster name outputfilename (cluster ascii file - name it something different) to get latest copy of cluster config file. Check and Combine the 2 configurations into one final config file. cmcheckconf v C finalconfigfile - cluster ascii file) check for any errors Cmapplyconf v C finalconfigfile - cluster ascii file) if no errors Cmrunnode node name to join the cluster Modify all package config files to include the new node if desired. (Remember modifying the package config file will need a downtime to apply the package config file) o Offline method Same except perform with cluster halted and then when made all the changes, start cluster
o Removing o Online method Modify all package config files to exclude the new node if configured in the package. (Remember modifying the package config file will need a downtime to apply the package config file) Halt all ACTIVE packages on the node cmhaltpkg package names Halt the node cmhaltnode v node name Cmgetconf v c cluster name outputfilename (cluster ascii file - name it something different) to get latest copy of cluster config file. Edit this cluster ascii file to remove the node details cmcheckconf v C outputfilename - cluster ascii file) check for any errors Cmapplyconf v C outputfilename - cluster ascii file) if no errors Do whatever with the node, power down, redeploy Vgexport vgsh (off the removed node) o Offline method Same except perform with cluster halted and then when made all the changes, start cluster, skip the halt package and halt node steps
Note : While cluster is running, you can remove node from cluster while the node is reachable ie connected to LAN recommended. In the event, if the node is unreachable, it can still be removed from cluster, only if there are no packages which specify the unreachable node. If there are packages that depend on the unreachable node, then best to halt the cluster and do the changes on the package and cluster config files to remove the node from the cluster.
- Adding/removing packages to the cluster o Adding o Online method o Create Packages on primary node mkdir /etc/cmcluster/packagedir cmmakepkg p /etc/cmcluster/packagedir/packagename.conf Edit the configuration file cmmakepkg s /etc/cmcluster/ packagedir/packagename.cntl Edit the control script.
Note : If the package and control file is special (e.g NFS required) then do not run the cmmakepkg command, just get the pre-defined scripts from the MC/SG NFS extension toolkit, You may still need to do some adjustments. (similar for SAP extension).
Note : It is possible that packages do not use any volume groups.
ftp the control script file to the adoptive nodes On primary node cmcheckconf v P packagename.conf package config file) check for any errors Cmapplyconf v P packagename.conf package config file) if no errors Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching Test package on all adoptive nodes if possible
Note : Repeat steps Create Packages to here again if there are more packages required in the cluster. o Offline method Same except perform with cluster halted and then when made all the changes, start cluster
o Removing o Online method Cmhaltpkg v package name Cmdeleteconf f v p package name Cmviewcl (to view that it is no longer part of the cluster)
Note : The package config and control files are not removed ie deleted from system, just removed from the cluster.
o Offline method Same except perform with cluster halted and then when made all the changes, start cluster
- Modifying packages o 2 parts package config file and package control file o Anything to do with modifying the package config file will need to reapply the package(go through the package.conf file to see what are the parameters)
Parameters that can be changed without stopping package ie cluster and package is up and running. Eg. Failover policy, Failback policy Eg. Add/Remove/modify Node names E.g Switching parameters
Steps Cmgetconf v p package name outputfilename (package config file - name it something different) to get latest copy of package config file. Modify the outputfilename to make the intended changes to the package config. cmcheckconf v P outputfilename package config file) check for any errors Cmapplyconf v P outputfilename package config file) if no errors
Parameters that must be changed by stopping package ie package is down but cluster is up and running. Eg. package name (if possible change hosting directory name as well) Eg. Change Run/Halt Scripts Eg. Add/remove Service names Eg. Add/remove Subnet
Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, run Cmgetconf v p package name outputfilename (package config file - name it something different) to get latest copy of package config file. Modify the outputfilename to make the intended changes to the package config. cmcheckconf v P outputfilename package config file) check for any errors Cmapplyconf v P outputfilename package config file) if no errors Start the package o Cmrunpkg package name o Cmmodpkg e package name to re-enable package switching
o Anything to do with modifying the package control file will NOT need to reapply the package(go through the package.cntl file to see what are the parameters) script, but need downtime to halt the package, but the cluster and other packages in the cluster can still be running. Eg. VG name and no. of VGs Eg. LVs, names of mount points and no.s Eg. Nfs mounts Eg. Package IPs and subnet Eg. Service names Eg. Subnet E.g Application start/stop scripts o Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, modify the package control file to make the intended changes. Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching
- Adding/modifying LAN cards in the cluster o If there is a need to add or upgrade/replace LAN cards in a clustered environment, need to take note of the LAN ID (NMID) o Usually adding will not cause an issue, unless it will be part of cluster, and it is already connected to the network need to reconfigure and reapply cluster config file. o For upgrading/replacing LAN cards, NMID may change, eg. Upgrading from a 10BT to a 100BT or replacing a 1 port LAN card with a 4 port LAN card. In such a case, the cluster cannot startup, because the cluster setting is different (cluster trying to find LAN1 configured in the cluster config file, but the NMID has already changed to LAN2. We will need to reform, re-apply the cluster, before running it.
o Steps Method 1 Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run o cmquerycl [w] [full] v C /etc/cmcluster/outputfilename n primary node n secondary node [n other nodes in the cluster]
(Note : This will query the system configuration and generate the new cluster config file, according to whatever name you specified as the outputfilename. This should have automatically generated the cluster config file with the new LAN card NMID. run Cmgetconf v c cluster name outputfilename (cluster ascii file - name it something different) to get latest copy of cluster config file. Check and Combine the 2 configurations into one final config file. cmcheckconf v C finalconfigfile - cluster ascii file) check for any errors Cmapplyconf v C finalconfigfile - cluster ascii file) if no errors Start the cluster Cmruncl
Method 2 not recommended Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster run Cmgetconf v c cluster name outputfilename (cluster ascii file - name it something different) to get latest copy of cluster config file. Modify the outputfilename to make the intended changes to the cluster (if you are aware of the change in NMID of the LAN card. cmcheckconf v C outputfilename - cluster ascii file) check for any errors Cmapplyconf v C outputfilename - cluster ascii file) if no errors Start the cluster Cmruncl
- Extending/Reducing logical volumes in the cluster packages o (ONLINE) No downtime required provided OnlineJFS installed o Make changes on node where logical volumes are mounted o No action required on adoptive nodes o Extending : Lvextend L newsizeinBL /dev/vgsh/shlvol Fsadm f vxfs b newsizeinKB /shname o Reducing : Fsadm f vxfs b newsizeinKB /shname Lvreduce L newsizeinBL /dev/vgsh/shlvol
- LVMTAB needs to be updated when : o Adding/removing Disks Logical volumes Volume groups
- Adding/Removing new Physical volumes/Disks to the volume group owned by package o Adding o On the primary node (node where shared VG is activated, where package is running) Pvcreate new disk Vgextend new disk to the identified shared volume group VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsh o Removing Same steps except that use vgreduce (no pvcreate required)
o (Online) No downtime required, but it will be good to schedule one if you want to test the failover. o Do I need to re-apply the cluster and package? No.
- Adding/Removing logical volumes to the volume group owned by the package o Adding o On the primary node, (node where shared VG is activated, where package is running) Lvcreate L . Newfs . Mkdir /filesystem Mount fileystem manually and assign correct ownershipd and permissions Umount fileystem VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid Mkdir /filesystem VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsh o Schedule time to halt the package -(only package affected). Cmhaltpkg package name o After package halted, modify the package control script (.cntl) to include the new filesystem on all nodes. o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching o Verify that filesystem is mounted and accessible. o Test on all adoptive nodes.
o Removing Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the package from cluster Vgchange a y vgsh to activate vg Lvremove the logical volume Vgchange a n vgsh to deactivate the vg Vgchange c y vgsh to mark the vg as part of the cluster Modify package control files on all nodes to exclude this LV and filesystem Cmrunpkg package name - to restart package Cmmodpkg e . to re-enable package switching Vgexport mapfile on primary and ftp to all adoptive nodes Vgexport., vgimport . Mapfile on adoptive nodes Test on all adoptive nodes
o Offline for package affected, but cluster can be up and running, other packages can be up and running. o Do I need to re-apply the cluster/package (changing package control file does not need a reapplication)? No. o Can I create a LV/filesystem that is not mounted by my package but belongs to the same volume group ie I mount it via /etc/fstab ? No, this will cause a problem since the VG will need to be activated/deactivated package may fail.
- Adding new Volume groups to the cluster packages o Adding o On the primary node (node where shared VG is activated, where package is running) Pvcreate new disk Mkdir /dev/vgsh new share vg Mknod /dev/vgsh/group c 64 0x0. Vgcreate new shared volume group Create necessary lvols and filesystems or raw devices for VG Mount the filesystems and change permissions and ownerships accordingly VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsh Mkdir /filesystems for the logical volumes o On the primary node, Vgchange c y /dev/vgsh to mark the VG as part of the cluster Umount all filesystems in this new shared VG and deactivate it vgchange a n vgsh. Check /var/adm/syslog/syslog.log to see if this vg has been successfully marked in the cluster Cmgetconf v c cluster name outputfilename (name it something different) to see that it has been entered into the cluster config file. If no, then we will need to down the entire cluster, check and re-apply the cluster. o Method 1 (do this if successfully marked) o Schedule time to halt the package -(only package affected). Cmhaltpkg package name o After package halted, modify the package control script (.cntl) to include the new filesystem, and Volume Group on all nodes. o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching o Verify that the VG is activated and filesystems are mounted and accessible. o Test on all adoptive nodes.
o Method 2 (do this if not marked successfully) o Schedule time to halt the entire cluster. Cmhaltcl o After cluster halted, run Cmgetconf v c cluster name outputfilename (cluster ascii file - name it something different) to see that it has been entered into the cluster config file. o If not entered, try to manually type in the new shared VG into the new cluster outputfilename. o Cmcheckconf v C outputfilename - cluster ascii file) check for any errors o Cmapplyconf v C outputfilename - cluster ascii file) if no errors o modify the package control script (.cntl) to include the new filesystem and Volume Group on all nodes. o Start the cluster cmruncl o Verify that the VG is activated and filesystems are mounted and accessible. o Test that the VG can be mounted on all adoptive nodes.
o Removing Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the package from cluster Modify package control files on all nodes to exclude the LV and filesystem and VG name Cmrunpkg package name - to restart package Cmmodpkg e package name to re-enable package switching (stop here if you want to keep the volume group on the systems, but remove it from the cluster) Vgexport vgsh on all nodes to remove it from systems Test on all adoptive nodes
o Offline for package affected, but cluster can be up and running, other packages can be up and running. o Do I need to re-apply the cluster/package (changing package control file does not need a reapplication)? No, unless the cluster marking does not work as described above.
- Removing the cluster o Cmhaltcl f o Cmdeleteconf f v c cluster name o Cmviewcl (will display error message)