You are on page 1of 55

HA with MC/ServiceGuard (Concepts)

http://uxsl.europe.hp.com/doc/tech/ha/HAtrain/ Prepared by Anand



Other platforms have other HA software

HA means the following :
- no SPOC
- N+1 redundancy - Ideal : Dual power sources/vendors ; hubs and switches connected to dual power sources.
- Not load balancing (foundry / cisco local director, software load balancers)

HA Terminology :
- Cluster (1)
- Node (1 to many)
- Package (1 to many) - Floating IPs (single/multiple eg. BAMM) ; Can specify hostnames in DNS for each
floating IP

Question : Can we have a node in 2 clusters ? Not advisable - dependencies

Availability :
99% - standard server
99.5% - MC/ServiceGuard application, not node
99.99% - ??

Criteria for HA :
Ensure that both (all) nodes in the cluster
- Are of same build hardware and software wise (patch level, kernel changes, user accounts)

Type of disks applicable for use with HA MC/ServiceGuard :
- In general, disks with 2 SPUs/controllers :
- VA
- FC10, SC10
- XP
- DS
- AutoRAID 12H
- Nike disks

Not recommended
- Jamaica disks
- Desktop

Note : Disks should have HA (RAID1, RAID5) as well.

Question : Can MC/ServiceGuard work across DCs or countries ie one node in Singapore, the other node in
Japan ?

Answer : Yes, provided the heartbeat cable is long enough, or more importantly the subnet is the same and
the shared disk system is accessible by both servers.
Software Licenses

Part# Description Qty Unit Price
B3935DA MC/SG software system license for HPUX 11.x 2 USD 0.0
B3935DA-AE5 MC/SG software license for K/N class 2 USD 5117.0
B3935DA-ABA MC/SG software English localization 2 USD 0.0
B3935DA-0S6 MC/SG 24x7 Support (first year) 2 USD 496.8
B5140BA MC/SG NFS toolkit license 2 USD 322.5
B5140BA-0S6 MC/SG NFS toolkit 24x7 support (first year) 2 USD 64.8
B5139DA Enterprise Cluster Extension 2 USD 427.85
B5139DA-0S6 Enterprise Cluster Extension 24x7 Support 2 USD 86.4
first year)
H6194AA MC/SG Implementation 1 USD 15000.0
(to be included only if you want to buy consulting and implementation service from HPC)
B7885BA MC/SG LTU Extension for SAP 1 USD 12900.0
(per SAP instance)


**Pls verify with the SAP team if any other SAP related license is needed.

If you would like to buy service from HPC, what our team usually do is to approach Vincent who's the
Account manager for HPO and he will arrange for someone from HPC to work with us. (Do remember to
include the USD15k)



Software Installation

Note : MC/ServiceGuard can be installed from ctss144 depots. (/var/depot/applications/11.00/hp-
ux.,/var/depot/applications/11.11/hp-ux.,)

We have in our depots :
Version 11.09
Version 11.13 - recommended


MC/ServiceGuard software to install (basic setup, install on both machines):

B3935DA A.11.13 MC / Service Guard
B5140BA A.11.00.04 MC/ServiceGuard NFS Toolkit install only if NFS
is required to work within the cluster
B5139DA B.01.06 Enterprise Cluster Master Toolkit - optional
B8324BA A.01.03 HP Cluster Object Manager optional

Note : Only install the above software from the same DART/CD version, do not try to mix and match from different
releases.


Note : If OS is ver 11.11 (11i) and your OS is Mission Critical Environment, then it should come with
MC/ServiceGuard installed.
Note : Do check /etc/services and /etc/inetd.conf files for the MC/ServiceGuard related services, esp. for 11i
mission critical OS.


/etc/services
hacl-hb 5300/tcp # High Availability (HA) Cluster heartbeat
hacl-gs 5301/tcp # HA Cluster General Services
hacl-cfg 5302/tcp # HA Cluster TCP configuration
hacl-cfg 5302/udp # HA Cluster UDP configuration
hacl-probe 5303/tcp # HA Cluster TCP probe
hacl-probe 5303/udp # HA Cluster UDP probe
hacl-local 5304/tcp # HA Cluster Commands
hacl-test 5305/tcp # HA Cluster Test
hacl-dlm 5408/tcp # HA Cluster distributed lock manager

/etc/inetd.conf
hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c
hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -f /var/opt/cmom/cmomd.log


Depending on what version of MC/ServiceGuard is installed, MC/ServiceGuard patches must be installed:

http://haweb.cup.hp.com/Support/Patches/SG11.00.html


Question : can we install one node with MC/ServiceGuard version 11.09 and the other with version 11.13 or
something else, ie different versions?

Answer : Not advisable, compatibility issues. Unless, your doing rolling upgrades.


















MC/ServiceGuard Network Design



Note : Usually heartbeat LAN, use internal LAN card
Primary and Secondary LANs use 2 separate LAN cards.


Question : How will a 3 node, 4 node cluster be like ?
How can we configure the packages to failover ?? Many possibilities


Heartbeat network
- cross UTP
- Serial cable
- dedicated heartbeat subnet
- Primary LAN usually set as secondary heartbeat


Cluster/Package Node Configurations
- ACTIVE ; ACTIVE
- ACTIVE ; PASSIVE


Cluster Lock Disk
- Tie breaker
- Who gets the lock disk who will reform the cluster, the other will panic reboot usually
- What if the cluster lock disk is dead?? - UNPLANNED OUTAGE


switch 1 switch 2
User Lan (Securenet)
lan0 lan1 lan1 lan0
lan2
FC2
lan2
sgpue036 sgpue037
heartbeat lan
(cross UTP cable)
Primary lan (15.209.0.25) -
cable name : sgpue036
Primary lan (15.209.0.26) -
cable name : sgpue037
192.0.0.1
192.0.0.2
Keychain Database MC/ServiceGuard network design
Failover lan
(no physical IP, but must
be connected to switch 1))
cable name: sgpue037s
Failover lan
(no physical IP, but must
be connected to switch 2))
cable name: sgpue036s
MC/ServiceGuard Monitoring
- Hardware
- Application
- ITO
- ClusterviewPlus
- NNM


MC/ServiceGuard Commands

Cmquerycl
Cmcheckconf
Cmapplyconf will distribute binary configuration details to all nodes in the cluster
Cmgetconf

Cluster specific commands
Cmruncl
Cmviewcl
Cmhaltcl

Node specific commands
Cmrunnode
Cmhaltnode

Package specific commands
Cmrunpkg
Cmhaltpkg
Cmmodpkg


MC/ServiceGuard with SAM


MC/ServiceGuard backups
Database vendors online backup tools
Split mirror
Business Copy (VA, XP) KNET
JFS snapshots
Practise of backup for HPMS if no special requests
o for filesystem backup whatever filesystem is mounted on which
system, therefore if failed over.
o for database SAP/DBA will consult tools team on backup strategy
usually configure omniback to detect and backup by floating IP.
Issues with BAMM ??




Project Timeline (TAT)
Gathering information 2 days
Hardware setup (LAN)- 2 days
Configuration - 3 days (varies) dependencies : Application/DB scripts
Testing - 1 day (require CE presence)











































MC/8ERVCEGUARD MPLEMENTATON /
CONFGURATON


Configure /etc/rc.config.d/netconf on each of the nodes in the cluster with the heartbeat LAN (if using LAN
and not serial interface)

# PR1|ARY LAN
1NTERFA0E_NA|E|0="Jan1"
1P_A00RESS|0="15.209.0.25"
Su8NET_|ASK|0="255.255.255.192"
8R0A00AST_A00RESS|0=""
1NTERFA0E_STATE|0=""
0R0P_ENA8LE|0=0

# REART8EAT LAN
1NTERFA0E_NA|E|1="Jan0"
1P_A00RESS|1="192.0.0.1"
Su8NET_|ASK|1="255.255.255.0"
8R0A00AST_A00RESS|1=""
1NTERFA0E_STATE|1=""
0R0P_ENA8LE|1=0

No1e : Secondary LAN {1f any) does no1 need 1o be conf1gured.

0onf1gure 1he 1roo11hone1roo11.rhos1s on each of 1he nodes 1n 1he cJus1er 1o 1ncJude
11seJf and o1her nodes 1n 1he cJus1er.
E.g. E.g. E.g. E.g.
Node 1 roo1 Node 1 roo1 Node 1 roo1 Node 1 roo1
Node 2 roo1 Node 2 roo1 Node 2 roo1 Node 2 roo1

Sgpue036.sgp.hp.com root
Sgpue037.sgp.hp.com root

No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1 No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1 No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1 No1e : AJ1erna11veJy We can crea1e 1he 1e1c1cncJus1er1cncJnodeJ1s1 f1Je on aJJ 1he er1cncJnodeJ1s1 f1Je on aJJ 1he er1cncJnodeJ1s1 f1Je on aJJ 1he er1cncJnodeJ1s1 f1Je on aJJ 1he
nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes. nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes. nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes. nodes. Th1s 1s necessary for 1he cJus1er 1o 1den11fy aJJ 11s nodes.

Unmount Logical Volumes and deactivate the Volume Groups that will be controlled/run by the cluster.
(These do not need to be entered in /etc/fstab)

E.g.
1. vgchange a n vg02
2. vgchange a n vg03

Note : It is possible that a cluster does not have any cluster lock disk or a even VG at all.
Same for packages. Also each VG must be unique for each package, cannot use the same VG for
other packages.

Export and distribute the Volume Groups to the secondary (failover) node.

E.g.
1. vgexport p s m v /tmp/vg02.map /dev/vg02
2. vgexport p s m v /tmp/vg03.map /dev/vg03


-p option : preview mode, so that the volume group will not be exported
off the original node.
- s option : Sharable option, Series 800 only. When the s option is
is specified, then the -p, -v, and m options must also be
specified. A mapfile is created that can be used to
create volume group entries on other systems in the high
availability cluster (with the vgimport command).
- m option : generates the map file
- v option : print verbose

FTP the .map files to secondary (failover) node.

On Secondary (failover) node, create the volume group directories:

E.g.
3. mkdir /dev/vg02
4. mkdir /dev/vg03
5. ls l /dev/*/group
6. mknod /dev/vg01/group c 64 0x020000
7. mknod /dev/vg02/group c 64 0x030000

Import the volume groups onto the secondary (failover) node
E.g.
8. vgimport s m /tmp/vg02.map /dev/vg02
9. vgimport s m /tmp/vg03.map /dev/vg03

Note : Leave the cluster volume groups decactivated.

Configure the Cluster (do this on one node)
1. cmquerycl [w] [full] v C /etc/cmcluster/cluster.conf n primary node n secondary node
[n other nodes in the cluster]

(Note : This will generate the cluster config file.)

2. Edit the /etc/cmcluster/cluster.conf file






















# **********************************************************************
# ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************
# ***** For complete details about cluster parameters and how to ****
# ***** set them, consult the ServiceGuard manual. ****
# **********************************************************************

# Enter a name for this cluster. This name will be used to identify the
# cluster when viewing or manipulating it.

CLUSTER_NAME Kcdatabases


# Cluster Lock Parameters
#
# The cluster lock is used as a tie-breaker for situations
# in which a running cluster fails, and then two equal-sized
# sub-clusters are both trying to form a new cluster. The
# cluster lock may be configured using either a lock disk
# or a quorum server.
#
# You can use either the quorum server or the lock disk as
# a cluster lock but not both in the same cluster.
#
# Consider the following when configuring a cluster.
# For a two-node cluster, you must use a cluster lock. For
# a cluster of three or four nodes, a cluster lock is strongly
# recommended. For a cluster of more than four nodes, a
# cluster lock is recommended. If you decide to configure
# a lock for a cluster of more than four nodes, it must be
# a quorum server.

# Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and
# FIRST_CLUSTER_LOCK_PV parameters to define a lock disk.
# The FIRST_CLUSTER_LOCK_VG is the LVM volume group that
# holds the cluster lock. This volume group should not be
# used by any other cluster as a cluster lock device.

# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL,
# and QS_TIMEOUT_EXTENSION parameters to define a quorum server.
# The QS_HOST is the host name or IP address of the system
# that is running the quorum server process. The
# QS_POLLING_INTERVAL (microseconds) is the interval at which
# ServiceGuard checks to make sure the quorum server is running.
# The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase
# the time interval after which the quorum server is marked DOWN.
#
# The default quorum server timeout is calculated from the
# ServiceGuard cluster parameters, including NODE_TIMEOUT and
# HEARTBEAT_INTERVAL. If you are experiencing quorum server
# timeouts, you can adjust these parameters, or you can include
# the QS_TIMEOUT_EXTENSION parameter.
#
# For example, to configure a quorum server running on node
# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to
# add 2 seconds to the system assigned value for the quorum server
# timeout, enter:
#
# QS_HOST qshost
# QS_POLLING_INTERVAL 120000000
# QS_TIMEOUT_EXTENSION 2000000

FIRST_CLUSTER_LOCK_VG /dev/vg02 < - -This is automatically searched for.


# Definition of nodes in the cluster.
# Repeat node definitions as necessary for additional nodes.

NODE_NAME sgpue036
NETWORK_INTERFACE lan0
HEARTBEAT_IP 192.0.0.1. <<- need to change manually from stationary IP to Heartbeat IP
NETWORK_INTERFACE lan1
HEARTBEAT_IP 15.209.0.25
NETWORK_INTERFACE lan5
FIRST_CLUSTER_LOCK_PV /dev/dsk/c3t0d0
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Primary Network Interfaces on Bridged Net 1: lan0.
# Warning: There are no standby network interfaces on bridged net 1. <<- because using cross UTP
# Primary Network Interfaces on Bridged Net 2: lan1.
# Possible standby Network Interfaces on Bridged Net 2: lan5.

NODE_NAME sgpue037
NETWORK_INTERFACE lan0
HEARTBEAT_IP 192.0.0.2
NETWORK_INTERFACE lan1
HEARTBEAT_IP 15.209.0.26
NETWORK_INTERFACE lan5
FIRST_CLUSTER_LOCK_PV /dev/dsk/c3t0d0
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Primary Network Interfaces on Bridged Net 1: lan0.
# Warning: There are no standby network interfaces on bridged net 1.
# Primary Network Interfaces on Bridged Net 2: lan1.
# Possible standby Network Interfaces on Bridged Net 2: lan5.


# Cluster Timing Parameters (microseconds).

# The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds).
# This default setting yields the fastest cluster reformations.
# However, the use of the default value increases the potential
# for spurious reformations due to momentary system hangs or
# network load spikes.
# For a significant portion of installations, a setting of
# 5000000 to 8000000 (5 to 8 seconds) is more appropriate.
# The maximum value recommended for NODE_TIMEOUT is 30000000
# (30 seconds).

HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 8000000


# Configuration/Reconfiguration Timing Parameters (microseconds).

AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000

# Package Configuration Parameters.
# Enter the maximum number of packages which will be configured in the cluster.
# You can not add packages beyond this limit.
# This parameter is required.
MAX_CONFIGURED_PACKAGES 8



# List of cluster aware LVM Volume Groups. These volume groups will
# be used by package applications via the vgchange -a e command.
# Neither CVM or VxVM Disk Groups should be used here.
# For example:
# VOLUME_GROUP /dev/vgdatabase
# VOLUME_GROUP /dev/vg02

VOLUME_GROUP /dev/vg02
VOLUME_GROUP /dev/vg03





Verify the Cluster Configuration (do this on one node)
1. cmcheckconf [k] v C /etc/cmcluster/cluster.conf

Note : If there are no errors, means that the cluster is ready to be applied.

Distributing the Binary Configuration File (do this on one node)
1. vgchange a y /dev/vg02 (cluster lock volume group)
2. cmapplyconf [k] v C /etc/cmcluster/cluster.conf
3. vgchange a n /dev/vg02

Note : Need to activate cluster lock volume group in order for it to be applied for first time
clusters. Subsequent changes to the cluster may not need to activate cluster lock or even may
not need to down the cluster ie can be done online but not recommended.

Note : Need to deactivate cluster lock disk right after cluster changes are applied.

Backing up Volume Group and Cluster Lock Configuration Data (optional)
1. vgcfgbackup u /dev/vg02
2. vgcfgbackup u /dev/vg03

Note : This does not requires the volume groups to be activated.

Checking Cluster Operation (do on either node)
1. cmruncl v
2. cmhaltnode v primary node
3. cmrunnode v primary node
4. cmhaltcl v
5. cmruncl v
6. cmhaltcl v

Note : Try this on all other nodes in the cluster as well.

Disable Automount of Volume Groups (On both nodes)
1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0

Note : This is necessary as we do not want the cluster volume groups to be activated when a
system reboots. It is now under the control of the cluster now.

Disable Autostart Features (On both nodes)
1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=0

Note : This is to prevent the cluster node from automatically joining the cluster after a
reboot. Usually done when doing maintenance.

Create Packages
E.g.
1. mkdir /etc/cmcluster/kci2prd < - can be any name
2. cmmakepkg p /etc/cmcluster/kci2prd.conf < - can be any name
3. Edit the configuration file

Note : If the package and control file is special (e.g NFS required) then do not run the
cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS
extension toolkit (similar for SAP extension). You still need to do adjustments to the files to
suit your needs.




# **********************************************************************
# ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) *******
# **********************************************************************
# ******* Note: This file MUST be edited before it can be used. ********
# * For complete details about package parameters and how to set them, *
# * consult the MC/ServiceGuard ServiceGuard OPS Edition manuals *******
# **********************************************************************

# Enter a name for this package. This name will be used to identify the
# package when viewing or manipulating it. It must be different from
# the other configured package names.

PACKAGE_NAME kci2prd

# Enter the package type for this package. PACKAGE_TYPE indicates
# whether this package is to run as a FAILOVER or SYSTEM_MULTI_NODE
# package.
#
# FAILOVER package runs on one node at a time and if a failure
# occurs it can switch to an alternate node.
#
# SYSTEM_MULTI_NODE
# package runs on multiple nodes at the same time.
# It can not be started and halted on individual nodes.
# Both NODE_FAIL_FAST_ENABLED and AUTO_RUN must be set
# to YES for this type of package. All SERVICES must
# have SERVICE_FAIL_FAST_ENABLED set to YES.
#
# NOTE: Packages which have a PACKAGE_TYPE of SYSTEM_MULTI_NODE are
# not failover packages and should only be used for applications
# provided by Hewlett-Packard.
#
# Since SYSTEM_MULTI_NODE packages run on multiple nodes at
# one time, following parameters are ignored:
#
# FAILOVER_POLICY
# FAILBACK_POLICY
#
# Since an IP address can not be assigned to more than node at a
# time, relocatable IP addresses can not be assigned in the
# package control script for multiple node packages. If
# volume groups are assigned to multiple node packages they must
# activated in a shared mode and data integrity is left to the
# application. Shared access requires a shared volume manager.
#
#
# Examples : PACKAGE_TYPE FAILOVER (default)
# PACKAGE_TYPE SYSTEM_MULTI_NODE
#

PACKAGE_TYPE FAILOVER


# Enter the failover policy for this package. This policy will be used
# to select an adoptive node whenever the package needs to be started.
# The default policy unless otherwise specified is CONFIGURED_NODE.
# This policy will select nodes in priority order from the list of
# NODE_NAME entries specified below.
#
# The alternative policy is MIN_PACKAGE_NODE. This policy will select
# the node, from the list of NODE_NAME entries below, which is
# running the least number of packages at the time this package needs
# to start.

FAILOVER_POLICY CONFIGURED_NODE


# Enter the failback policy for this package. This policy will be used
# to determine what action to take when a package is not running on
# its primary node and its primary node is capable of running the
# package. The default policy unless otherwise specified is MANUAL.
# The MANUAL policy means no attempt will be made to move the package
# back to its primary node when it is running on an adoptive node.
#
# The alternative policy is AUTOMATIC. This policy will attempt to
# move the package back to its primary node whenever the primary node
# is capable of running the package.

FAILBACK_POLICY MANUAL


# Enter the names of the nodes configured for this package. Repeat
# this line as necessary for additional adoptive nodes.
#
# NOTE: The order is relevant.
# Put the second Adoptive Node after the first one.
#
# Example : NODE_NAME original_node
# NODE_NAME adoptive_node
#
# If all nodes in cluster is to be specified and order is not
# important, "NODE_NAME *" may be specified.
#
# Example : NODE_NAME *

NODE_NAME sgpue036
NODE_NAME sgpue037


# Enter the value for AUTO_RUN. Possible values are YES and NO.
# The default for AUTO_RUN is YES. When the cluster is started the
# package will be automatically started. In the event of a failure the
# package will be started on an adoptive node. Adjust as necessary.
#
# AUTO_RUN replaces obsolete PKG_SWITCHING_ENABLED.

AUTO_RUN YES


# Enter the value for LOCAL_LAN_FAILOVER_ALLOWED.
# Possible values are YES and NO.
# The default for LOCAL_LAN_FAILOVER_ALLOWED is YES. In the event of a
# failure, this permits the cluster software to switch LANs locally
# (transfer to a standby LAN card). Adjust as necessary.
#
# LOCAL_LAN_FAILOVER_ALLOWED replaces obsolete NET_SWITCHING_ENABLED.

LOCAL_LAN_FAILOVER_ALLOWED YES


# Enter the value for NODE_FAIL_FAST_ENABLED.
# Possible values are YES and NO.
# The default for NODE_FAIL_FAST_ENABLED is NO. If set to YES,
# in the event of a failure, the cluster software will halt the node
# on which the package is running. All SYSTEM_MULTI_NODE packages must have
# NODE_FAIL_FAST_ENABLED set to YES. Adjust as necessary.
NODE_FAIL_FAST_ENABLED NO


# Enter the complete path for the run and halt scripts. In most cases
# the run script and halt script specified here will be the same script,
# the package control script generated by the cmmakepkg command. This
# control script handles the run(ning) and halt(ing) of the package.
# Enter the timeout, specified in seconds, for the run and halt scripts.
# If the script has not completed by the specified timeout value,
# it will be terminated. The default for each script timeout is
# NO_TIMEOUT. Adjust the timeouts as necessary to permit full
# execution of each script.
# Note: The HALT_SCRIPT_TIMEOUT should be greater than the sum of
# all SERVICE_HALT_TIMEOUT specified for all services.

RUN_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl
RUN_SCRIPT_TIMEOUT NO_TIMEOUT
HALT_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl
HALT_SCRIPT_TIMEOUT NO_TIMEOUT


# Enter the names of the storage groups configured for this package.
# Repeat this line as necessary for additional storage groups.
#
# Storage groups are only used with CVM disk groups. Neither
# VxVM disk groups or LVM volume groups should be listed here.
# By specifying a CVM disk group with the STORAGE_GROUP keyword
# this package will not run until the VxVM-CVM-pkg package is
# running and thus the CVM shared disk groups are ready for
# activation.
#
# NOTE: Should only be used by applications provided by
# Hewlett-Packard.
#
# Example : STORAGE_GROUP dg01
# STORAGE_GROUP dg02
# STORAGE_GROUP dg03
# STORAGE_GROUP dg04
#


# Enter the SERVICE_NAME, the SERVICE_FAIL_FAST_ENABLED and the
# SERVICE_HALT_TIMEOUT values for this package. Repeat these
# three lines as necessary for additional service names. All
# service names MUST correspond to the service names used by
# cmrunserv and cmhaltserv commands in the run and halt scripts.
#
# The value for SERVICE_FAIL_FAST_ENABLED can be either YES or
# NO. If set to YES, in the event of a service failure, the
# cluster software will halt the node on which the service is
# running. If SERVICE_FAIL_FAST_ENABLED is not specified, the
# default will be NO.
#
# SERVICE_HALT_TIMEOUT is represented in the number of seconds.
# This timeout is used to determine the length of time (in
# seconds) the cluster software will wait for the service to
# halt before a SIGKILL signal is sent to force the termination
# of the service. In the event of a service halt, the cluster
# software will first send a SIGTERM signal to terminate the
# service. If the service does not halt, after waiting for the
# specified SERVICE_HALT_TIMEOUT, the cluster software will send
# out the SIGKILL signal to the service to force its termination.
# This timeout value should be large enough to allow all cleanup
# processes associated with the service to complete. If the
# SERVICE_HALT_TIMEOUT is not specified, a zero timeout will be
# assumed, meaning the cluster software will not wait at all
# before sending the SIGKILL signal to halt the service.
#
# Example: SERVICE_NAME DB_SERVICE
# SERVICE_FAIL_FAST_ENABLED NO
# SERVICE_HALT_TIMEOUT 300
#
# To configure a service, uncomment the following lines and
# fill in the values for all of the keywords.
#
SERVICE_NAME kci2prd
SERVICE_FAIL_FAST_ENABLED NO
SERVICE_HALT_TIMEOUT 300


# Enter the network subnet name that is to be monitored for this package.
# Repeat this line as necessary for additional subnet names. If any of
# the subnets defined goes down, the package will be switched to another
# node that is configured for this package and has all the defined subnets
# available.

SUBNET 15.209.0.0


# The keywords RESOURCE_NAME, RESOURCE_POLLING_INTERVAL,
# RESOURCE_START, and RESOURCE_UP_VALUE are used to specify Package
# Resource Dependencies. To define a package Resource Dependency, a
# RESOURCE_NAME line with a fully qualified resource path name, and
# one or more RESOURCE_UP_VALUE lines are required. The
# RESOURCE_POLLING_INTERVAL and the RESOURCE_START are optional.
#
# The RESOURCE_POLLING_INTERVAL indicates how often, in seconds, the
# resource is to be monitored. It will be defaulted to 60 seconds if
# RESOURCE_POLLING_INTERVAL is not specified.
#
# The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED.
# The default setting for RESOURCE_START is AUTOMATIC. If AUTOMATIC
# is specified, ServiceGuard will start up resource monitoring for
# these AUTOMATIC resources automatically when the node starts up.
# If DEFERRED is selected, ServiceGuard will not attempt to start
# resource monitoring for these resources during node start up. User
# should specify all the DEFERRED resources in the package run script
# so that these DEFERRED resources will be started up from the package
# run script during package run time.
#
# RESOURCE_UP_VALUE requires an operator and a value. This defines
# the resource 'UP' condition. The operators are =, !=, >, <, >=,
# and <=, depending on the type of value. Values can be string or
# numeric. If the type is string, then only = and != are valid
# operators. If the string contains whitespace, it must be enclosed
# in quotes. String values are case sensitive. For example,
#
# Resource is up when its value is
# --------------------------------
# RESOURCE_UP_VALUE = UP "UP"
# RESOURCE_UP_VALUE != DOWN Any value except "DOWN"
# RESOURCE_UP_VALUE = "On Course" "On Course"
#
# If the type is numeric, then it can specify a threshold, or a range to
# define a resource up condition. If it is a threshold, then any operator
# may be used. If a range is to be specified, then only > or >= may be used
# for the first operator, and only < or <= may be used for the second operator.
# For example,
# Resource is up when its value is
# --------------------------------
# RESOURCE_UP_VALUE = 5 5 (threshold)
# RESOURCE_UP_VALUE > 5.1 greater than 5.1 (threshold)
# RESOURCE_UP_VALUE > -5 and < 10 between -5 and 10 (range)
#
# Note that "and" is required between the lower limit and upper limit
# when specifying a range. The upper limit must be greater than the lower
# limit. If RESOURCE_UP_VALUE is repeated within a RESOURCE_NAME block, then
# they are inclusively OR'd together. Package Resource Dependencies may be
# defined by repeating the entire RESOURCE_NAME block.
#
# Example : RESOURCE_NAME /net/interfaces/lan/status/lan0
# RESOURCE_POLLING_INTERVAL 120
# RESOURCE_START AUTOMATIC
# RESOURCE_UP_VALUE = RUNNING
# RESOURCE_UP_VALUE = ONLINE
#
# Means that the value of resource /net/interfaces/lan/status/lan0
# will be checked every 120 seconds, and is considered to
# be 'up' when its value is "RUNNING" or "ONLINE".
#
# Uncomment the following lines to specify Package Resource Dependencies.
#
#RESOURCE_NAME <Full_path_name>
#RESOURCE_POLLING_INTERVAL <numeric_seconds>
#RESOURCE_START <AUTOMATIC/DEFERRED>
#RESOURCE_UP_VALUE <op> <string_or_numeric> [and <op> <numeric>]




Create Package Control Scripts
1. cmmakepkg s /etc/cmcluster/kci2prd/kci2prd.cntl
2. Edit the control script.

Note : If the package and control file is special (e.g NFS required) then do not run the
cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS
extension toolkit (similar for SAP extension). You still need to do adjustments to the files to
suit your needs.

Note : It is possible that packages do not use any volume groups.
































# **********************************************************************
# * *
# * HIGH AVAILABILITY PACKAGE CONTROL SCRIPT (template) *
# * *
# * Note: This file MUST be edited before it can be used. *
# * *
# **********************************************************************

# The PACKAGE and NODE environment variables are set by
# ServiceGuard at the time the control script is executed.
# Do not set these environment variables yourself!
# The package may fail to start or halt if the values for
# these environment variables are altered.


# UNCOMMENT the variables as you set them.

# Set PATH to reference the appropriate directories.
PATH=/usr/bin:/usr/sbin:/etc:/bin

# VOLUME GROUP ACTIVATION:
# Specify the method of activation for volume groups.
# Leave the default ("VGCHANGE="vgchange -a e") if you want volume
# groups activated in exclusive mode. This assumes the volume groups have
# been initialized with 'vgchange -c y' at the time of creation.
#
# Uncomment the first line (VGCHANGE="vgchange -a e -q n"), and comment
# out the default, if your disks are mirrored on separate physical paths,
#
# Uncomment the second line (VGCHANGE="vgchange -a e -q n -s"), and comment
# out the default, if your disks are mirrored on separate physical paths,
# and you want the mirror resynchronization to ocurr in parallel with
# the package startup.
#
# Uncomment the third line (VGCHANGE="vgchange -a y") if you wish to
# use non-exclusive activation mode. Single node cluster configurations
# must use non-exclusive activation.
#
# VGCHANGE="vgchange -a e -q n"
# VGCHANGE="vgchange -a e -q n -s"
# VGCHANGE="vgchange -a y"
VGCHANGE="vgchange -a e" # Default

# CVM DISK GROUP ACTIVATION:
# Specify the method of activation for CVM disk groups.
# Leave the default
# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite")
# if you want disk groups activated in the exclusive write mode.
#
# Uncomment the first line
# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"),
# and comment out the default, if you want disk groups activated in
# the readonly mode.
#
# Uncomment the second line
# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"),
# and comment out the default, if you want disk groups activated in the
# shared read mode.
#
# Uncomment the third line
# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"),
# and comment out the default, if you want disk groups activated in the
# shared write mode.
#
# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"
# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"
# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"
CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite"

# VOLUME GROUPS
# Specify which volume groups are used by this package. Uncomment VG[0]=""
# and fill in the name of your first volume group. You must begin with
# VG[0], and increment the list in sequence.
#
# For example, if this package uses your volume groups vg01 and vg02, enter:
# VG[0]=vg01
# VG[1]=vg02
#
# The volume group activation method is defined above. The filesystems
# associated with these volume groups are specified below.
#
VG[0]=vg02
VG[1]=vg03

# CVM DISK GROUPS
# Specify which cvm disk groups are used by this package. Uncomment
# CVM_DG[0]="" and fill in the name of your first disk group. You must
# begin with CVM_DG[0], and increment the list in sequence.
#
# For example, if this package uses your disk groups dg01 and dg02, enter:
# CVM_DG[0]=dg01
# CVM_DG[1]=dg02
#
# The cvm disk group activation method is defined above. The filesystems
# associated with these volume groups are specified below in the CVM_*
# variables.
#
#CVM_DG[0]=""

# VxVM DISK GROUPS
# Specify which VxVM disk groups are used by this package. Uncomment
# VXVM_DG[0]="" and fill in the name of your first disk group. You must
# begin with VXVM_DG[0], and increment the list in sequence.
#
# For example, if this package uses your disk groups dg01 and dg02, enter:
# VXVM_DG[0]=dg01
# VXVM_DG[1]=dg02
#
# The cvm disk group activation method is defined above.
#
#VXVM_DG[0]=""

#
# NOTE: A package could have LVM volume groups, CVM disk groups and VxVM
# disk groups.
#
# FILESYSTEMS
# Specify the filesystems which are used by this package. Uncomment
# LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]="" and fill in the name of your first
# logical volume, filesystem and mount option for the file system. You must
# begin with LV[0], FS[0] and FS_MOUNT_OPT[0] and increment the list in
# sequence.
#
# For the LVM example, if this package uses the file systems pkg1a and
# pkg1b, which are mounted on the logical volumes lvol1 and lvol2 with
# read and write options enter:
# LV[0]=/dev/vg01/lvol1; FS[0]=/pkg1a; FS_MOUNT_OPT[0]="-o rw"
# LV[1]=/dev/vg01/lvol2; FS[1]=/pkg1b; FS_MOUNT_OPT[1]="-o rw"
#
# For the CVM or VxVM example, if this package uses the file systems
# pkg1a and pkg1b, which are mounted on the volumes lvol1 and lvol2
# with read and write options enter:
# LV[0]="/dev/vx/dsk/dg01/vol01"; FS[0]="/pkg1a"; FS_MOUNT_OPT[0]="-o rw"
# LV[1]="/dev/vx/dsk/dg01/vol02"; FS[1]="/pkg1b"; FS_MOUNT_OPT[1]="-o rw"
#
# The filesystems are defined as triplets of entries specifying the logical
# volume, the mount point and the mount options for the file system. Each
# filesystem will be fsck'd prior to being mounted. The filesystems will be
# mounted in the order specified during package startup and will be unmounted
# in reverse order during package shutdown. Ensure that volume groups
# referenced by the logical volume definitions below are included in
# volume group definitions above.
#
#LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]=""

LV[0]=/dev/vg02/lvol1; FS[0]=/oracle/KCI2PRD/data01; FS_MOUNT_OPT[0]="-o rw,suid,largefiles"
LV[1]=/dev/vg02/lvol2; FS[1]=/oracle/KCI2PRD/data02; FS_MOUNT_OPT[1]="-o rw,suid,largefiles"
LV[2]=/dev/vg02/lvol3; FS[2]=/oracle/KCI2PRD/data03; FS_MOUNT_OPT[2]="-o rw,suid,largefiles"
LV[3]=/dev/vg02/lvol4; FS[3]=/oracle/KCI2PRD/data04; FS_MOUNT_OPT[3]="-o rw,suid,largefiles"
LV[4]=/dev/vg02/lvol5; FS[4]=/oracle/KCI2PRD/data05; FS_MOUNT_OPT[4]="-o rw,suid,largefiles"
LV[5]=/dev/vg02/lvol6; FS[5]=/oracle/KCI2PRD/data06; FS_MOUNT_OPT[5]="-o rw,suid,largefiles"
LV[6]=/dev/vg02/lvol7; FS[6]=/oracle/KCI2PRD/data07; FS_MOUNT_OPT[6]="-o rw,suid,largefiles"
LV[7]=/dev/vg02/lvol8; FS[7]=/oracle/KCI2PRD/data08; FS_MOUNT_OPT[7]="-o rw,suid,largefiles"
LV[8]=/dev/vg02/lvol9; FS[8]=/oracle/KCI2PRD/data09; FS_MOUNT_OPT[8]="-o rw,suid,largefiles"
LV[9]=/dev/vg02/lvol10; FS[9]=/oracle/KCI2PRD/data10; FS_MOUNT_OPT[9]="-o rw,suid,largefiles"
LV[10]=/dev/vg02/lvol11; FS[10]=/oracle/KCI2PRD/mirrlogA; FS_MOUNT_OPT[10]="-o
rw,suid,largefiles"
LV[11]=/dev/vg02/lvol12; FS[11]=/oracle/KCI2PRD/mirrlogB; FS_MOUNT_OPT[11]="-o
rw,suid,largefiles"
LV[12]=/dev/vg02/lvol13; FS[12]=/oracle/KCI2PRD/origlogA; FS_MOUNT_OPT[12]="-o
rw,suid,largefiles"
LV[13]=/dev/vg02/lvol14; FS[13]=/oracle/KCI2PRD/origlogB; FS_MOUNT_OPT[13]="-o
rw,suid,largefiles"
LV[14]=/dev/vg03/lvol1; FS[14]=/oracle/KCI2PRD/arch; FS_MOUNT_OPT[14]="-o rw,suid,largefiles"
LV[15]=/dev/vg03/lvol2; FS[15]=/oracle/KCI2PRD/bkup01; FS_MOUNT_OPT[15]="-o rw,suid,largefiles"

#
# VOLUME RECOVERY
#
# When mirrored VxVM volumes are started during the package control
# bring up, if recovery is required the default behavior is for
# the package control script to wait until recovery has been
# completed.
#
# To allow mirror resynchronization to ocurr in parallel with
# the package startup, uncomment the line
# VXVOL="vxvol -g \$DiskGroup -o bg startall" and comment out the default.
#
# VXVOL="vxvol -g \$DiskGroup -o bg startall"
VXVOL="vxvol -g \$DiskGroup startall" # Default

# FILESYSTEM UNMOUNT COUNT
# Specify the number of unmount attempts for each filesystem during package
# shutdown. The default is set to 1.
FS_UMOUNT_COUNT=1


# FILESYSTEM MOUNT RETRY COUNT.
# Specify the number of mount retrys for each filesystem.
# The default is 0. During startup, if a mount point is busy
# and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and
# the script will exit with 1. If a mount point is busy and
# FS_MOUNT_RETRY_COUNT is greater than 0, the script will attempt
# to kill the user responsible for the busy mount point
# and then mount the file system. It will attempt to kill user and
# retry mount, for the number of times specified in FS_MOUNT_RETRY_COUNT.
# If the mount still fails after this number of attempts, the script
# will exit with 1.
# NOTE: If the FS_MOUNT_RETRY_COUNT > 0, the script will execute
# "fuser -ku" to freeup busy mount point.
FS_MOUNT_RETRY_COUNT=0

# CONCURRENT VGCHANGE OPERATIONS
# Specify the number of concurrent volume group activations or
# deactivations to allow during package startup or shutdown.
# Setting this value to an appropriate number may improve the performance
# while activating or deactivating a large number of volume groups in the
# package. If the specified value is less than 1, the script defaults it
# to 1 and proceeds with a warning message in the package control script
# logfile.
CONCURRENT_VGCHANGE_OPERATIONS=1

# CONCURRENT DISK GROUP OPERATIONS
# Specify the number of concurrent VxVM DG imports or deports to allow
# during package startup or shutdown.
# Setting this value to an appropriate number may improve the performance
# while importing or deporting a large number of disk groups in the
# package. If the specified value is less than 1, the script defaults it
# to 1 and proceeds with a warning message in the package control script
# logfile.
CONCURRENT_DISKGROUP_OPERATIONS=1

# CONCURRENT FSCK OPERATIONS
# Specify the number of concurrent fsck to allow during package startup.
# Setting this value to an appropriate number may improve the performance
# while checking a large number of file systems in the package. If the
# specified value is less than 1, the script defaults it to 1 and proceeds
# with a warning message in the package control script logfile.
CONCURRENT_FSCK_OPERATIONS=1

# CONCURRENT MOUNT AND UMOUNT OPERATIONS
# Specify the number of concurrent mounts and umounts to allow during
# package startup or shutdown.
# Setting this value to an appropriate number may improve the performance
# while mounting or un-mounting a large number of file systems in the package.
# If the specified value is less than 1, the script defaults it to 1 and
# proceeds with a warning message in the package control script logfile.
CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS=1

# IP ADDRESSES
# Specify the IP and Subnet address pairs which are used by this package.
# Uncomment IP[0]="" and SUBNET[0]="" and fill in the name of your first
# IP and subnet address. You must begin with IP[0] and SUBNET[0] and
# increment the list in sequence.
#
# For example, if this package uses an IP of 192.10.25.12 and a subnet of
# 192.10.25.0 enter:
# IP[0]=192.10.25.12
# SUBNET[0]=192.10.25.0 # (netmask=255.255.255.0)
#
# Hint: Run "netstat -i" to see the available subnets in the Network field.
#
# IP/Subnet address pairs for each IP address you want to add to a subnet
# interface card. Must be set in pairs, even for IP addresses on the same
# subnet.
#
#IP[0]=""
#SUBNET[0]=""

IP[0]="15.209.0.33"
SUBNET[0]="15.209.0.0" # netmask 255.255.255.192

# SERVICE NAMES AND COMMANDS.
# Specify the service name, command, and restart parameters which are
# used by this package. Uncomment SERVICE_NAME[0]="", SERVICE_CMD[0]="",
# SERVICE_RESTART[0]="" and fill in the name of the first service, command,
# and restart parameters. You must begin with SERVICE_NAME[0], SERVICE_CMD[0],
# and SERVICE_RESTART[0] and increment the list in sequence.
#
# For example:
# SERVICE_NAME[0]=pkg1a
# SERVICE_CMD[0]="/usr/bin/X11/xclock -display 192.10.25.54:0"
# SERVICE_RESTART[0]="" # Will not restart the service.
#
# SERVICE_NAME[1]=pkg1b
# SERVICE_CMD[1]="/usr/bin/X11/xload -display 192.10.25.54:0"
# SERVICE_RESTART[1]="-r 2" # Will restart the service twice.
#
# SERVICE_NAME[2]=pkg1c
# SERVICE_CMD[2]="/usr/sbin/ping"
# SERVICE_RESTART[2]="-R" # Will restart the service an infinite
# number of times.
#
# Note: No environmental variables will be passed to the command, this
# includes the PATH variable. Absolute path names are required for the
# service command definition. Default shell is /usr/bin/sh.
#
#SERVICE_NAME[0]=""
#SERVICE_CMD[0]=""
#SERVICE_RESTART[0]=""


SERVICE_NAME[0]=kci2prd
SERVICE_CMD[0]="/etc/cmcluster/kci2prd/kci2prd.sh monitor"
SERVICE_RESTART[0]=""

# DEFERRED_RESOURCE NAME
# Specify the full path name of the 'DEFERRED' resources configured for
# this package. Uncomment DEFERRED_RESOURCE_NAME[0]="" and fill in the
# full path name of the resource.
#
#DEFERRED_RESOURCE_NAME[0]=""

# DTC manager information for each DTC.
# Example: DTC[0]=dtc_20
#DTC_NAME[0]=


# START OF CUSTOMER DEFINED FUNCTIONS

# This function is a place holder for customer define functions.
# You should define all actions you want to happen here, before the service is
# started. You can create as many functions as you need.

function customer_defined_run_cmds
{
# ADD customer defined run commands.
: # do nothing instruction, because a function must contain some command.

/etc/cmcluster/kci2prd/kci2prd.sh start

test_return 51
}

# This function is a place holder for customer define functions.
# You should define all actions you want to happen here, before the service is
# halted.

function customer_defined_halt_cmds
{
# ADD customer defined halt commands.
: # do nothing instruction, because a function must contain some command.

/etc/cmcluster/kci2prd/kci2prd.sh shutdown

test_return 52
}

# END OF CUSTOMER DEFINED FUNCTIONS

..

Ftp all ascii scripts to secondary (failover) node/nodes.

Verify the Cluster Configuration (Do this on the packages primary node)
1. cmcheckconf [C] [/etc/cmcluster/cluster.conf] P /etc/cmcluster/kci2prd/kci2prd.conf

Note : If there are no errors, means that the package is ready to be applied

Distribute the Cluster Configuration File (Do this on the packages primary node)
1. vgchange a y /dev/vg02 (cluster lock volume group)
2. cmapplyconf [v] [C] [/etc/cmcluster/cluster.conf] P
/etc/cmcluster/kci2prd/kci2prd.conf
3. vgchange a n /dev/vg02

Note : You should not need to activate and later deactivate cluster lock volume group while
applying packages.

Note : Repeat steps Create Packages to here again if there are more packages required in the
cluster.

Configure Automounter (Do this only if your system is using automounter)
1. Check that in /etc/rc.config.d/nfsconf, the automounter section should be:
AUTOMOUNT=1
AUTOMASTER="/etc/auto_master"
AUTOMOUNT_OPTIONS="-f $AUTO_MASTER"
AUTOMOUNTD_OPTIONS=
2. Check in /etc/rc.config.d/nfsconf, one nfs client and one nfs server daemon is configured to
run:
NFS_CLIENT=1
NFS_SERVER=1
NUM_NFSD=4
NUM_NFSIOD=4
3. Add this line to /etc/auto_master
/- /etc/auto.direct
4. Create an /etc/auto.direct file
/oracle <relocdbci_s>:/export/
5. Restart the automounter with
/sbin/init.d/nfs.client stop
/sbin/init.d/nfs.client start
Disable Automount of Volume Groups (On both nodes)
1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0

Enable Autostart Features (On both nodes)
1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=1

Checking Package Operation (do on either node)
7. cmruncl v
8. cmhaltnode v primary node (node will be halted and package failed
over to secondary (adoptive) node)
9. cmrunnode v primary node (node will rejoin cluster)
10. cmhaltpkg package name (halt package on adoptive node)
11. cmrunpkg package name (run package on original node)
12. cmmodpkg e package name (enable package switching)
13. cmhaltcl v

Note : Use cmviewcl or cmviewcl v to view results of each command.












MC/ServiceGuard Template


System Configuration

Hardware Information

Hostname
Model
Operating System version
Physical Memory
Swap Space


Non-Shared HDs
Shared HDs
Tapes
LAN Cards
Primary and Standby Network
Type

Heartbeat Network Type
MC ServiceGuard Version
MirrorDisk/UX Version
Online JFS Version
Application name / Application
version

Database name / Database version
OS/Appls Patch Level


System Information


Server Hostname
Server IP Address
Server IP Netmask
Server Default Router
Primary Network on separate
Switch

Standby Network on separate
Switch



Operation System File System Layout

Volume Group Logical FS Type Size (mb) Mount point









MC/ServiceGuard Configuration

Cluster Information


Cluster Name
Cluster Members
Cluster Lock Disk
Heartbeat Interval Default Value is 1
Node Timeout Default Value is 2 ; recommended 8
Network Polling Interval Default Value is 2
Autostart Delay Default Value is 10mins
Maximum Configured Packages To allow online package reconfiguration


Packages Overview

The cluster consist of ________ packages:

1.
2.
3.

Detailed Package Information:


Package Name
Re-locatable Hostname
Re-locatable IP Address
Monitor Subnet
Primary Node
Adoptive Node
Run/Halt Script
Run/Halt Script Timeout
Package Switch Enabled
Network Switch Enabled
Node Failfast Enabled
Service Name
Volume Groups
Logic Logical Volume and File System Details
Device file Size/ Type Mount Point Owner Group Perm.
















Custer Configuration File

Parameter

Value
CLUSTER_NAME


FIRST_CLUSTER_LOCK_VG



NODE_NAME


NETWORK_INTERFACE


HEARTBEAT_IP


NETWORK_INTERFACE


HEARTBEAT_IP


FIRST_CLUSTER_LOCK_PV



NODE_NAME


NETWORK_INTERFACE


HEARTBEAT_IP


NETWORK_INTERFACE


HEARTBEAT_IP

FIRST_CLUSTER_LOCK_PV



HEARTBEAT_INTERVAL

(Default value is 1s)

NODE_TIMEOUT (Default value is 2s)

AUTO_START_TIMEOUT

(Default value is 10 mins)
NETWORK_POLLING_INTERVAL

(Default value is 2s)
MAX_CONFIGURED_PACKAGES

(To allow and add for online package
reconfiguration)

VOLUME_GROUP










Package Configuration File:

Parameter

Value
PACKAGE_NAME


NODE_NAME


NODE_NAME



RUN_SCRIPT


RUN_SCRIPT_TIMEOUT


HALT_SCRIPT


HALT_SCRIPT_TIMEOUT



SERVICE_NAME



SUBNET



AUTO_RUN
(PKG_SWITCHING_ENABLED)

YES
LOCAL_LAN_FAILOVER_ALLOWED
(NET_SWITCHING_ENABLED)
YES

NODE_FAIL_FAST_ENABLED

NO




























Package Control 8cript:

Parameter

Value
PATH

(Default value is 2s)

VGCHANGE

"vgchange a e"
VG[0]
VG[1]


LV[0]
LV[1]
LV[2]




FS[0]
FS[1]
FS[2]



IP[0]
SUBNET[0]


SERVICE_NAME[0]
SERVICE_CMD[0]
SERVICE_RESTART[0]



function
customer_defined_run_cmds


function
customer_defined_halt_cmds





























Customer 8hell 8cript {optional} :

Parameter

Value
INFORMIX_HOME or
ORACLE_HOME


INFORMIX_SESSION_NAME or
ORACLE_SESSION HOME

(Mount point and session name)
MONITOR_INTERVAL (Time between checks)

MONITOR_PROCESSES (Processes like dataserver etc)

PACKAGE_NAME



TIME_OUT (Waiting time in seconds for Informix/Oracle
abort to complete before killing
Informix/Oracleprocesses)


Note : If it is oracle, SAP or NFS, there are pre-defined scripts for these, provided you
install the enterprise master toolkit and nfs toolkit - /opt/cmcluster/


























TESTING MC/SERVICEGUARD

1.1 Test Overview

This section contains the test requirement and test plan for the MC/ServiceGuard

1.2 Test Requirement

The MC/ServiceGuard product is a High Availability solution that performs system failure detection and transfers the application
from the primary node to the adoptive node when a system failure occurs.

Note : We assume that there is only 1 package in the cluster. If in the event there are more packages, please change/add steps
accordingly.

The faults to be tested and the appropriate methods are listed below:

Type of Failure Method of Simulation

CPU, Memory, Power Supply and
Operating System

Active LAN

Total Data LAN



Reset of server


Removal of LAN cable from active LAN card

Removal of all Data LAN cables from server

1.3 Verification method

Upon startup of the package, the verification checkpoints are

a. Log onto surviving server and run the command cmviewcl to check that the package application is RUNNING

b. Ping the relocatable IP from another station in the same network

c. Check that all shared file systems are mounted.

1.4 Test Checklist

Five categories of test that will be performed are as follows:

a. Normal Bootup
b. Manual Package Switching Functionality
c. LAN Failure Tests
Heartbeat Failure
Data LAN Failure
d. System Failure Tests
e. Failures not affecting package
These are sanity checks to ensure that failure of the adoptive node in the cluster has no side effect on the primary node.


No. Test Method of
Simulation

Expected Result Check Remarks
NORMAL BOOTUP SEQUENCE

1 Normal boot
up
Power on or reboot
both servers
Cluster is up with
node1 and node2
running and package is
running on node1


MANUAL PACKAGE SWITCHING FUNCTIONALITY

1 Package halts
successfully
on node1
Run cmhaltpkg v
package command
Application shuts
down successfully and
package is halted
properly


2 Package
starts
successfully
on node2

Run cmrunpkg v
n node2 package
command
Package starts up
successfully on node2

3 Package halts
successfully
on node2
Run cmhaltpkg v
package command
Application shuts
down successfully and
package is halted
properly


4 Package
starts
successfully
on node1

Run cmrunpkg v
n node1 package
command
Package starts up
successfully on node1



No. Test Method of
Simulation

Expected Result Check Remarks
LAN FAILURE TESTS

1 Heartbeat
LAN failure
on node1

package is running
on node1

Pull out lan0 cable
on node1

lan1 takes over as
Heartbeat LAN and
package remains
running on node1


2 Pri Data LAN
failure on
node1
package is running
on node1

Pull out lan1 cable
on node1

Sec LAN, lan5 takes
over as active LAN
and package remains
running on node1


3 Sec Data
LAN failure
on node1
package is running
on node1

Pull out lan5 cable
on node1
Pri LAN, lan1 takes
over as active LAN
and package remains
running on node1


4 Total Data
LAN Failure
on node1

package is running
on node1

Pull out lan1 and
lan5 from node1

Package fails to node2
if it is running as a
node in the cluster ; 50
% chance of failing on
adoptive node as
unable to get cluster
lock and panic reboots

5 Heartbeat
LAN failure
on node2

package is running
on node2

Pull out lan0 cable
on node2

lan1 takes over as
Heartbeat LAN and
package remains
running on node2


6 Pri Data LAN
failure on
node2
package is running
on node2

Pull out lan1 cable
on node2
Secondary LAN, lan5
takes over as active
LAN and package
remains running on
node2


7 Sec Data
LAN failure
on node2
package is running
on node2

Pull out lan5 cable
on node2

Pri LAN, lan1 takes
over as active LAN
and package remains
running on node2


8 Total Data
LAN Failure
on node2

package is running
on node2

Pull out lan1 and
lan5 from node2

Package fails to node1
if it is running as a
node in the cluster; ;
50 % chance of failing
on adoptive node as
unable to get cluster
lock and panic reboots




You may wish to extend the test to test the functionaility of MC/ServiceGuard with regards to application monitoring
scripts and application failover.





















No. Test Method of
Simulation

Result Check Remarks
SYSTEM FAILURE TESTS

1 node1 failure

package is running
on node1

Reset node1
(try both shutdown
ry and reboot or rs
from console)

Package fails to node2
if it is running as a
node in the cluster
Yes
2 node2 failure package is running
on node2

Reset node2
(try both shutdown
ry and reboot or
rs from console)


Package fails to node1
if it is running as a
node in the cluster
Yes
FAILURES NOT AFFECTING PACKAGE

1 node2 failure package is running
on node1
Cluster reforms to a
single node cluster and
package continues to
run on node1

Yes


MC/ServiceGuard Troubleshooting

Troubleshooting using log files

For troubleshooting, there are a few files that will help to log problems experienced by MC/ServiceGuard, these are:

a. /var/adm/syslog/syslog.log
b. /etc/cmcluster/packagedir/packagename.cntl.log

These files need to be maintained as the file size will grow. This can ultimately affect / file system if not maintained.

The package control log file will contain information regarding package
start/stop. Each package will have its own package control log file.

Note : Always use cmviewcl or cmviewcl v to help to see the status of your
cluster.



Common Problems :



1. Problems of configuration
- missing entries /etc/services, /etc/inetd.conf
- .rhosts or cmclnodelist not configured
- grammatic errors in config and control files



2. Warning : Missing cluster lock disk
- Will repeat itself every hour by cmcld daemon in syslog.log
- This problem occurs after something has changed affecting the cluster
lock disk
eg. SCSI ID of disk changed
- No issue at the moment, but when a tie breaker period occurs, nodes
will not be able to detect the disk and all nodes may panic reboot.

Solution :

1. Schedule downtime to halt the cluster (cmhaltcl)
2. Run vgchange c n vgsh to remove the cluster lock volume group from
the cluster.
3. Activate vgsh on the node where the cluster configuration ascii file
exists by running
vgchange a y vgsh and do a cmapplyconf v C /etc/cmcluster/cluster.ascii.
Answer yes
to the change and then run vgchange a n vg02 to deactivate the cluster
lock volume
disk.
4. Start the cluster (cmruncl)



3. Warning : I/O error on cluster lock disk
- Will repeat itself every hour by cmcld daemon in syslog.log
- This problem usually occurs if something is wrong with one of the
SPUs or controllers of the disk array connected to one of the nodes.
- If happened on the primary node, it would be possible that the
application would already have hung.
- No issue if occur on adoptive node at the moment, but when a tie
breaker period occurs, nodes will not be able to detect the disk and all nodes
may panic reboot.
- In other cases, cluster lock disk itself could be faulty and a hung
situation wrf to the application and bdfwill occur.

Solution :

- Schedule downtime and ask CE to check the SPU or controller



4. Cluster failures
- Cluster cannot start
- missing entries /etc/services, /etc/inetd.conf
- .rhosts or cmclnodelist not configured
- grammatic errors in config and control files
- could be hardware, package induced, application problem. Again check
log files.


5. Package failures
- Package unable to start totally on all nodes
- Check syslog and package log file.
Possible config problem or control script problem or
application script name changed.
- Package cannot failover to adoptive node but can start on primary
node
- Check syslog and package log file.
Possible could be package switching or node disabled
Cmmodpkg e package name to enable package switching
Cmmodpkg e package name n node name to enable node
(package to run on this node)
- Package cannot mount/umount filesystems from package log
- Package failed to start because of mount problems
Possible shared VG not marked as cluster or activated -
manually mounted fileystem or someone accessing umounted directory
Unmount all filesystems, check who accessing directory and get
that person to exit, vgchange c y vgsh to mark cluster and deactivate and try
starting again.
Harddisk problem
- Package failed to halt
application process hung and could not be killed.
Hardddisk problem



6. Service Failure
- Cmviewcl v to see the status of all packages and their services.
- Trace from the package control file and syslog to see why did it fail
etc.
- Possible config problem or control script problem or application
script name changed.



7. Node timeout
- Recommended node timeout value in cluster config file is 5-8 seconds
- Otherwise if use default 2 seconds, system may panic reboot due to
tie breaker scenario because of poor network performance.



8. GSP problems
- Known problem for L class servers (certain generation)
- Cause system to panic reboot and failover package to adoptive node
- Patch recommended/GSP Firmware upgrade need to be done



9. LAN problems
- NMID problems



10. Disk problems
- SCSI ID changed /conflict
perhaps due to controller card factory default setting
Cannot bring up cluster
Need CE to change accordingly.
- Cluster lock disk failed
If lock disk RAID1 or RAID5 no problem
If lock disk LVM mirror need to do vgcfgrestore and vgsync to
recover the lock info which is stored on the BBR table part of the disk
If no mirror, then need reapply cluster


















On-Going Upgrades/Changes to systems/cluster /package

- Pro-active Patch installation (node by node)
- Data Centre outages (shutdown entire cluster)
- Rolling upgrades (node by node)


Keychain Cluster - Shutdown and Startup Procedure
-------------------------------------------------
Last update: 19 June SGP 2002

*******************************************************************
Please follow these steps whenever you need to arrange a shutdown
for sgpue036.sgp.hp.com & sgpue037.sgp.hp.com.
Special handling is required because of their MC/Serviceguard HA
environment.
*******************************************************************

Before you shutdown a node
--------------------------
1. Get agreement with application support on schedule, scope and
duration of shutdown.

2. Ensure both nodes in the cluster are up and running. If any node
is down or appears to be having problems, DO NOT proceed with
shutdown.

3. If shutting down a primary node, goto section titled "Shutting down
and restarting the primary node".

If shutting down a secondary node, goto section titled "Shutting down
and restarting the secondary node".

If shutting down the entire cluster, goto section titled "Shutting down
and restarting the MC/SG cluster".

If doing rolling upgrade, goto section titled "Doing a rolling upgrade".



Shutting down and restarting the primary node
------------------------------------------------
We assume primary node = sgpue036 and secondary node = sgpue037
in the following examples.

1. Before shutdown, make a note of all packages currently running
on each node.

sgpue036# cmviewcl

> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037

2. Halt primary node sgpue036

sgpue036# cmhaltnode -f -v sgpue036

Production packages will failover from sgpue036 to sgpue037. sgpue036
will cease to be a member of the active cluster.

3. Check package status on cluster

sgpue036# cmviewcl

> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running disabled sgpue037
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037
>
> NODE STATUS STATE
> sgpue036 down halted

4. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the
following line:

AUTOSTART_CMCLD = 0

5. Now we can proceed to shutdown (for PM, repair) or reboot
(for patching, kernel regen) sgpue036, eg:

sgpue036# /etc/shutdown -h 0
sgpue036# /etc/shutdown -r 0

6. When repair or reboot is over, sgpue036 should be booted up to
run level 3

sgpue036# who -r
. run-level 3 Jan 17 08:01 3 0 S

7. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the
following line:

AUTOSTART_CMCLD = 1

8. Make sgpue036 join the cluster

sgpue036# cmrunnode -v sgpue036

9. Halt production packages on sgpue037

sgpue037# cmhaltpkg kci2stg

10. Restart production packages on sgpue036

sgpue036# cmrunpkg kci2stg

11. Re-enable package switching on production packages

sgpue036# cmmodpkg -e kci2stg

12. Check package status on cluster.
You should see the same listing as shown in Step 1 ie.

sgpue036# cmviewcl

> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037

13. Release sgpue036 to customers (notify by phone, email etc)


Shutting down and restarting the secondary node
---------------------------------------------

1. Before shutdown, make a note of all packages currently running
on each node

sgpue037# cmviewcl

> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037

2. Halt secondary node sgpue037

sgpue037# cmhaltnode -f -v sgpue037

Production packages will failover from sgpue037 to sgpue036. sgpue037
will cease to be a member of the active cluster.

3. Check package status on cluster

sgpue037# cmviewcl

> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
> kcdbstg up running disabled sgpue036
> kcnfs up running disabled sgpue036
>
> NODE STATUS STATE
> sgpue037 down halted

4. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include the
following line:

AUTOSTART_CMCLD = 0

5. Now we can proceed to shutdown (for PM, repair) or reboot
(for patching, kernel regen) sgpue037, eg:

sgpue037# /etc/shutdown -h 0
c# /etc/shutdown -r 0

6. When repair or reboot is over, sgpue037 should be booted up to
run level 3

sgpue037# who -r
. run-level 3 Jan 17 08:01 3 0 S

7. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include the
following line:

AUTOSTART_CMCLD = 1

8. Make sgpue037 join the cluster

sgpue037# cmrunnode -v sgpue037

9. Halt production packages on sgpue036

sgpue036# cmhaltpkg kcdbstg
sgpue036# cmhaltpkg kcnfs

10. Restart production packages on sgpue037

sgpue037# cmrunpkg kcdbstg
sgpue037# cmrunpkg kcnfs

11. Re-enable package switching on production packages

sgpue037# cmmodpkg -e kcdbstg
sgpue037# cmmodpkg -e kcnfs

12. Check package status on cluster.
You should see the same listing as shown in Step 1 ie.

sgpue037# cmviewcl

> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037

13. Release sgpue037 to customers (notify by phone, email etc)
Shutting down and restarting the MC/SG cluster
----------------------------------------------
We assume primary node = sgpue036 and secondary node = sgpue037 in
the following examples.

1. Log in to sgpue036 or sgpue037 as superuser and issue command to
halt cluster daemon

sgpue036# cmhaltcl -f -v

2. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include
the following line:

AUTOSTART_CMCLD = 0

3. Proceed to shutdown each node

sgpue036# /etc/shutdown -h 0
sgpue037# /etc/shutdown -h 0

4. After planned activity is over, bootup each node to run level 3

sgpue036# who -r
sgpue037# who -r
. run-level 3 Jan 17 08:01 3 0 S

5. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include the
following line:

AUTOSTART_CMCLD = 1

6. Startup the cluster daemon from any node

sgpue036# cmruncl -v

7. Check package status on cluster.
It should look exactly like the following

sgpue036# cmviewcl

> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037

8. Release machines to customers (notify by phone, email etc)


Doing a rolling upgrade
-----------------------
This is the most common scenario where we work on 1 node at a time
without bringing down the entire cluster. This ensures there is at
least 1 node available to run the application packages. The steps
are already detailed above. Either:

1. Shutting down and restarting the primary node
2. Shutting down and restarting the secondary node

or

1. Shutting down and restarting the secondary node
2. Shutting down and restarting the primary node


Note : This may apply to OS upgrades eg. 10.20 to 11.00 whereby MC/SG is from
ver 10.10 to 11.X
Another method, you may deploy is building a separate cluster on a
separate machine with
the latest OS and just copy all config files over, and just swap package
IPs.





- Modifying the cluster
o Anything to do with the cluster will need to reapply the cluster (go
through the cluster.conf file to see what are the parameters) so need
downtime to halt the cluster, except for adding/removing nodes and packages
which can be done while cluster is still up and running.
Eg. Node timeout, heartbeat interval
Eg. cluster name
Eg. Heartbeat IPs
Eg. No. of packages
Eg. Change of node names
Eg. Manual change / add of volume group
o Steps
Schedule downtime to halt entire cluster
Cmhaltcl f to halt the cluster
After cluster halted, run Cmgetconf v c cluster name
outputfilename (cluster ascii file - name it something different) to get
latest copy of cluster config file.
Modify the outputfilename to make the intended changes to the
cluster.
cmcheckconf v C outputfilename - cluster ascii file) check
for any errors
Cmapplyconf v C outputfilename - cluster ascii file) if no
errors
Start the cluster
Cmruncl




- Adding/removing nodes to the cluster
o Adding
o Online method
Heartbeat must be configured and network ready
Can be done on any node (preferably node where original cluster
config file was placed)
cmquerycl [w] [full] v C /etc/cmcluster/outputfilename n
primary node n secondary node n new node
(Note : This will query the system configuration and generate
the new cluster config file, according to whatever name you specified as the
outputfilename.)
Cmgetconf v c cluster name outputfilename (cluster ascii file
- name it something different) to get latest copy of cluster config file.
Check and Combine the 2 configurations into one final config
file.
cmcheckconf v C finalconfigfile - cluster ascii file) check
for any errors
Cmapplyconf v C finalconfigfile - cluster ascii file) if no
errors
Cmrunnode node name to join the cluster
Modify all package config files to include the new node if
desired. (Remember modifying the package config file will need a downtime to
apply the package config file)
o Offline method
Same except perform with cluster halted and then when made all
the changes, start cluster

o Removing
o Online method
Modify all package config files to exclude the new node if
configured in the package. (Remember modifying the package config file will
need a downtime to apply the package config file)
Halt all ACTIVE packages on the node cmhaltpkg package names
Halt the node cmhaltnode v node name
Cmgetconf v c cluster name outputfilename (cluster ascii file
- name it something different) to get latest copy of cluster config file.
Edit this cluster ascii file to remove the node details
cmcheckconf v C outputfilename - cluster ascii file) check
for any errors
Cmapplyconf v C outputfilename - cluster ascii file) if no
errors
Do whatever with the node, power down, redeploy
Vgexport vgsh (off the removed node)
o Offline method
Same except perform with cluster halted and then when made all
the changes, start cluster, skip the halt package and halt node steps

Note : While cluster is running, you can remove node from cluster while the
node is reachable ie connected to LAN recommended. In the event, if the node
is unreachable, it can still be removed from cluster, only if there are no
packages which specify the unreachable node. If there are packages that depend
on the unreachable node, then best to halt the cluster and do the changes on
the package and cluster config files to remove the node from the cluster.



- Adding/removing packages to the cluster
o Adding
o Online method
o Create Packages on primary node
mkdir /etc/cmcluster/packagedir
cmmakepkg p /etc/cmcluster/packagedir/packagename.conf
Edit the configuration file
cmmakepkg s /etc/cmcluster/ packagedir/packagename.cntl
Edit the control script.

Note : If the package and control file is special (e.g NFS required) then do
not run the cmmakepkg command, just get the pre-defined scripts from the MC/SG
NFS extension toolkit,
You may still need to do some adjustments. (similar for SAP extension).

Note : It is possible that packages do not use any volume groups.

ftp the control script file to the adoptive nodes
On primary node
cmcheckconf v P packagename.conf package config file)
check for any errors
Cmapplyconf v P packagename.conf package config file)
if no errors
Start the package
Cmrunpkg package name
Cmmodpkg e package name to re-enable package switching
Test package on all adoptive nodes if possible

Note : Repeat steps Create Packages to here again if there are more packages required in the cluster.
o Offline method
Same except perform with cluster halted and then when made all
the changes, start cluster

o Removing
o Online method
Cmhaltpkg v package name
Cmdeleteconf f v p package name
Cmviewcl (to view that it is no longer part of the cluster)

Note : The package config and control files are not removed ie deleted from
system,
just removed from the cluster.

o Offline method
Same except perform with cluster halted and then when made all
the changes, start cluster


- Modifying packages
o 2 parts package config file and package control file
o Anything to do with modifying the package config file will need to
reapply the package(go through the package.conf file to see what are the
parameters)

Parameters that can be changed without stopping package ie
cluster and package is up and running.
Eg. Failover policy, Failback policy
Eg. Add/Remove/modify Node names
E.g Switching parameters

Steps
Cmgetconf v p package name outputfilename (package config
file - name it something different) to get latest copy of package config
file.
Modify the outputfilename to make the intended changes to
the package config.
cmcheckconf v P outputfilename package config file)
check for any errors
Cmapplyconf v P outputfilename package config file)
if no errors


Parameters that must be changed by stopping package ie package
is down but cluster is up and running.
Eg. package name (if possible change hosting directory name as
well)
Eg. Change Run/Halt Scripts
Eg. Add/remove Service names
Eg. Add/remove Subnet


Steps
Schedule downtime to halt package affected
Cmhaltpkg package name to halt the package
After package halted, run Cmgetconf v p package name
outputfilename (package config file - name it something different) to get
latest copy of package config file.
Modify the outputfilename to make the intended changes to
the package config.
cmcheckconf v P outputfilename package config file)
check for any errors
Cmapplyconf v P outputfilename package config file)
if no errors
Start the package
o Cmrunpkg package name
o Cmmodpkg e package name to re-enable package
switching

o Anything to do with modifying the package control file will NOT need
to reapply the package(go through the package.cntl file to see what are the
parameters) script, but need downtime to halt the package, but the cluster
and other packages in the cluster can still be running.
Eg. VG name and no. of VGs
Eg. LVs, names of mount points and no.s
Eg. Nfs mounts
Eg. Package IPs and subnet
Eg. Service names
Eg. Subnet
E.g Application start/stop scripts
o Steps
Schedule downtime to halt package affected
Cmhaltpkg package name to halt the package
After package halted, modify the package control file to make
the intended changes.
Start the package
Cmrunpkg package name
Cmmodpkg e package name to re-enable package switching




- Adding/modifying LAN cards in the cluster
o If there is a need to add or upgrade/replace LAN cards in a clustered
environment, need to take note of the LAN ID (NMID)
o Usually adding will not cause an issue, unless it will be part of
cluster, and it is already connected to the network need to reconfigure and
reapply cluster config file.
o For upgrading/replacing LAN cards, NMID may change, eg. Upgrading
from a 10BT to a 100BT or replacing a 1 port LAN card with a 4 port LAN card.
In such a case, the cluster cannot startup, because the cluster setting is
different (cluster trying to find LAN1 configured in the cluster config file,
but the NMID has already changed to LAN2. We will need to reform, re-apply the
cluster, before running it.

o Steps
Method 1
Schedule downtime to halt entire cluster
Cmhaltcl f to halt the cluster
After cluster halted, run
o cmquerycl [w] [full] v C
/etc/cmcluster/outputfilename n primary node n secondary node [n other nodes
in the cluster]

(Note : This will query the system configuration and generate
the new cluster config file, according to whatever name you specified as the
outputfilename. This should have automatically generated the cluster config
file with the new LAN card NMID.
run Cmgetconf v c cluster name outputfilename (cluster ascii
file - name it something different) to get latest copy of cluster config
file.
Check and Combine the 2 configurations into one final config
file.
cmcheckconf v C finalconfigfile - cluster ascii file) check
for any errors
Cmapplyconf v C finalconfigfile - cluster ascii file) if no
errors
Start the cluster
Cmruncl

Method 2 not recommended
Schedule downtime to halt entire cluster
Cmhaltcl f to halt the cluster
run Cmgetconf v c cluster name outputfilename (cluster ascii
file - name it something different) to get latest copy of cluster config
file.
Modify the outputfilename to make the intended changes to the
cluster (if you are aware of the change in NMID of the LAN card.
cmcheckconf v C outputfilename - cluster ascii file) check
for any errors
Cmapplyconf v C outputfilename - cluster ascii file) if no
errors
Start the cluster
Cmruncl




- Extending/Reducing logical volumes in the cluster
packages
o (ONLINE) No downtime required provided OnlineJFS installed
o Make changes on node where logical volumes are mounted
o No action required on adoptive nodes
o Extending :
Lvextend L newsizeinBL /dev/vgsh/shlvol
Fsadm f vxfs b newsizeinKB /shname
o Reducing :
Fsadm f vxfs b newsizeinKB /shname
Lvreduce L newsizeinBL /dev/vgsh/shlvol





- LVMTAB needs to be updated when :
o Adding/removing
Disks
Logical volumes
Volume groups



- Adding/Removing new Physical volumes/Disks to the
volume group owned by package
o Adding
o On the primary node (node where shared VG is activated, where package
is running)
Pvcreate new disk
Vgextend new disk to the identified shared volume group
VGEXPORT with preview option the particular shared VG mapfile
Vgexport m vgsh.map p s v vgsh
Ftp mapfile to the adoptive nodes
o On the adoptive nodes
VGEXPORT the identified shared volume group off the system
Vgexport vgsh
Mkdir /dev/vgsh
Mknod /dev/vgsh/group c 64 0x. same vgid
VGIMPORT the shared volume group to the system with the mapfile
Vgimport m vgsh.map s v vgsh
o Removing
Same steps except that use vgreduce (no pvcreate required)

o (Online) No downtime required, but it will be good to schedule one if
you want to test the failover.
o Do I need to re-apply the cluster and package? No.




- Adding/Removing logical volumes to the volume
group owned by the package
o Adding
o On the primary node, (node where shared VG is activated, where
package is running)
Lvcreate L .
Newfs .
Mkdir /filesystem
Mount fileystem manually and assign correct ownershipd and
permissions
Umount fileystem
VGEXPORT with preview option the particular shared VG mapfile
Vgexport m vgsh.map p s v vgsh
Ftp mapfile to the adoptive nodes
o On the adoptive nodes
VGEXPORT the identified shared volume group off the system
Vgexport vgsh
Mkdir /dev/vgsh
Mknod /dev/vgsh/group c 64 0x. same vgid
Mkdir /filesystem
VGIMPORT the shared volume group to the system with the mapfile
Vgimport m vgsh.map s v vgsh
o Schedule time to halt the package -(only package affected).
Cmhaltpkg package name
o After package halted, modify the package control script (.cntl) to
include the new filesystem on all nodes.
o Start the package
Cmrunpkg package name
Cmmodpkg e package name to re-enable package switching
o Verify that filesystem is mounted and accessible.
o Test on all adoptive nodes.

o Removing
Schedule downtime to halt the package
Cmhaltpkg package name - on primary node
Vgchange c n vgsh - to unmark the VG that belongs to the
package from cluster
Vgchange a y vgsh to activate vg
Lvremove the logical volume
Vgchange a n vgsh to deactivate the vg
Vgchange c y vgsh to mark the vg as part of the cluster
Modify package control files on all nodes to exclude this
LV and filesystem
Cmrunpkg package name - to restart package
Cmmodpkg e . to re-enable package switching
Vgexport mapfile on primary and ftp to all adoptive nodes
Vgexport., vgimport . Mapfile on adoptive nodes
Test on all adoptive nodes

o Offline for package affected, but cluster can be up and running,
other packages can be up and running.
o Do I need to re-apply the cluster/package (changing package control
file does not need a reapplication)? No.
o Can I create a LV/filesystem that is not mounted by my package but
belongs to the same volume group ie I mount it via /etc/fstab ? No, this will
cause a problem since the VG will need to be activated/deactivated package
may fail.



- Adding new Volume groups to the cluster packages
o Adding
o On the primary node (node where shared VG is activated, where package
is running)
Pvcreate new disk
Mkdir /dev/vgsh new share vg
Mknod /dev/vgsh/group c 64 0x0.
Vgcreate new shared volume group
Create necessary lvols and filesystems or raw devices for VG
Mount the filesystems and change permissions and ownerships
accordingly
VGEXPORT with preview option the particular shared VG mapfile
Vgexport m vgsh.map p s v vgsh
Ftp mapfile to the adoptive nodes
o On the adoptive nodes
VGEXPORT the identified shared volume group off the system
Vgexport vgsh
Mkdir /dev/vgsh
Mknod /dev/vgsh/group c 64 0x. same vgid
VGIMPORT the shared volume group to the system with the mapfile
Vgimport m vgsh.map s v vgsh
Mkdir /filesystems for the logical volumes
o On the primary node,
Vgchange c y /dev/vgsh to mark the VG as part of the cluster
Umount all filesystems in this new shared VG and deactivate it
vgchange a n vgsh.
Check /var/adm/syslog/syslog.log to see if this vg has been
successfully marked in the cluster
Cmgetconf v c cluster name outputfilename (name it something
different) to see that it has been entered into the cluster config file.
If no, then we will need to down the entire cluster, check and
re-apply the cluster.
o Method 1 (do this if successfully marked)
o Schedule time to halt the package -(only package affected).
Cmhaltpkg package name
o After package halted, modify the package control script (.cntl) to
include the new filesystem, and Volume Group on all nodes.
o Start the package
Cmrunpkg package name
Cmmodpkg e package name to re-enable package switching
o Verify that the VG is activated and filesystems are mounted and
accessible.
o Test on all adoptive nodes.

o Method 2 (do this if not marked successfully)
o Schedule time to halt the entire cluster.
Cmhaltcl
o After cluster halted, run Cmgetconf v c cluster name outputfilename
(cluster ascii file - name it something different) to see that it has been
entered into the cluster config file.
o If not entered, try to manually type in the new shared VG into the
new cluster outputfilename.
o Cmcheckconf v C outputfilename - cluster ascii file) check for
any errors
o Cmapplyconf v C outputfilename - cluster ascii file) if no errors
o modify the package control script (.cntl) to include the new
filesystem and Volume Group on all nodes.
o Start the cluster
cmruncl
o Verify that the VG is activated and filesystems are mounted and
accessible.
o Test that the VG can be mounted on all adoptive nodes.

o Removing
Schedule downtime to halt the package
Cmhaltpkg package name - on primary node
Vgchange c n vgsh - to unmark the VG that belongs to the
package from cluster
Modify package control files on all nodes to exclude the LV
and filesystem and VG name
Cmrunpkg package name - to restart package
Cmmodpkg e package name to re-enable package switching
(stop here if you want to keep the volume group on the systems, but remove it
from the cluster)
Vgexport vgsh on all nodes to remove it from systems
Test on all adoptive nodes

o Offline for package affected, but cluster can be up and running,
other packages can be up and running.
o Do I need to re-apply the cluster/package (changing package control
file does not need a reapplication)? No, unless the cluster marking does not
work as described above.



- Removing the cluster
o Cmhaltcl f
o Cmdeleteconf f v c cluster name
o Cmviewcl (will display error message)




MC,SC End ol supporl dolos ..

You might also like