Ha Manager

WebSphere Application Server V6
High Availability Manager Overview
David Currie, IT Specialist

EMEA Software Lab Services
© 2006 IBM Corporation

Agenda
 Introduction
 Core groups
 Core group bridge service
 High availability groups and policies
 Transaction service
 WebSphere XD partitioning facility
 Service Integration Bus
 Questions
2 High Availability Manager Overview © 2003 IBM Corporation

What is the high availability manager?
 Component of WebSphere Application Server V6 providing

high availability services to other WAS components
 Runs in every WAS process
 Provides three key capabilities:
– High availability of singleton services
– Bulletin board for exchanging state data between processes
– Data Replication Service (DRS) memory-to-memory replication

Agenda
 Introduction
 Core groups
 Questions

Core groups
 Set of processes representing a high availability domain in

which failover and replication take place
 Each process must be a member of just one group
 All cluster members must belong to the same group
 Each group must contain at least one node agent or
deployment manager
 Processes are added to DefaultCoreGroup when created
 Members of a group must have full IP visibility and
bidirectional communication to all other members

Core group coordinator
 Every core group has a coordinator that manages the

failover of highly available singleton services and distributes
state data to interested members
 Coordinator uses CPU and heap to perform these tasks
 Coordinator election occurs whenever the view changes (a
member stops or starts) and this consumes resources
 Specify stable servers with spare capacity as preferred
coordinator servers
 Multiple coordinators may be necessary if resource usage is
excessive

Core group transport options
 Channel framework transport

– Default is to use distribution and consistency services (DCS)
transport chain
– DCS_SECURE transport chain uses SSL for encryption
– LTPA token can be used to authenticate incoming requests
 Unicast transport
– Standard network connection avoiding channel framework
overhead
– LTPA token for authentication but no SSL
 Multicast transport
– Uses UDP but still requires TCP/IP for failure detection
protocol

Core group discovery protocol
 Establishes network connectivity with other

members of the core group on startup
 Retries unavailable connections periodically
– Default is every 30s (WAS 6.0.2) or 15s (WAS 6.0/6.0.1)
– Configurable via core group custom property
• IBM_CS_UNICAST_DISCOVERY_INTERVAL_SECS
 On successful connection DCSV1032I event logged
and view synchrony protocol starts

Core group failure detection protocol
 Monitors the connections that the discovery protocol

establishes
 Failure of core group member detected by:
– Inbound or outbound sockets closing
– Active heart beating configure via core group custom
properties
• IBM_CS_FD_PERIOD_SECS (default 30s)
• IBM_CS_FD_CONSECUTIVE_MISSED (default 6)
 Failed members are report to the discovery protocol and
view synchrony protocol

Core group view synchrony protocol
 A view consists of a set of connected core group members

communicating over this protocol
 View change occurs when a member is discovered or fails
– Activities relating to the old view must complete and as a result
temporary spikes in CPU and network usage may occur
– One of core group members is elected to send its current
configuration to all the other members
– Inconsistencies in HA policy or coordinator configuration are
tolerated but those in core group membership are not

Core group scaling

 System resource usage does not scale linearly with core group size
 Resource requirements of view synchrony protocol dependent on:
– Number of applications running
– Type of applications running
– High availability manager services that are used
 View changes use a lot of system resource:
– Each member communicates its state to other members
– All messages sent or received must be acknowledged
– May cause changes in routing table or singleton services
 For core groups that are too large degenerate network timing conditions occur which may cause the
installation of the new view to fail, recovery from this is CPU intensive which may result in paging
causing further failures
 Mx(N-M) discovery messages each period (M=running member, N=group size)
 Mx(M-1) heartbeating messages
 Amount of CPU utilization by other components e.g. WLM and ODR is also linked to core group size
 Recommendation is for maximum 50 members per core group

Core group configuration
 Core group configuration is stored at cell scope

– List of core group members
– HA policies for the core group
– Core group coordinator configuration
– Core group transport configuration
 Each process also has its own configuration
– Whether HA manager is enabled
– Transport buffer size
– Name of the core group to which server belongs
– How frequently the HA manager checks the health of singletons

Why have multiple core groups in a cell?
 One or more firewalls within a cell – a core group

can not contain members from multiple firewall
protection domains
 Resource usage increases exponentially with core
group size

Agenda
 Introduction
 Core groups
 Questions

Core group bridge service
 Core groups may need to communicate to share availability

information
 Access point is a collection of core groups that
communicate
 Each access point has one or more bridge interfaces defined
as a node, server and transport chain
 A server hosting a bridge interface is a core group bridge
server
 If communicating with a core group in another cell a peer
access point is defined which may specify one or more peer
ports or, if not directly accessible, a peer proxy port

Communication between core groups in a cell

Communication between core groups across cells




Communication between core groups across networks



Core group bridge service custom properties
 Custom properties
– CGB_ENABLE_602_FEATURES enables core group
bridge servers to be added without restarting other
servers and allows discovery of peer ports
– FW_PASSIVE_MEMBER can be used if a firewall is
configured to listen only
– IBM_CS_LS_DATASTACK_MEG can be used to
increase the data stack size if you are seeing warning
DCSV2005W

Agenda
 Introduction
 Core groups
 Questions

High availability groups
 Created dynamically when an application server component

requests to join a group
 Other instances of the component across multiple
processes may join the group
 HA group has a name made up of name value pairs
– Company=IBM,ComponentName=TM,policy=DefaultNoQuorumOneOfNPolicy
 Each group member is either idle, active or disabled

 Scoped by a core group

High availability group policies
 Statically defined HA policies govern which members of HA

group are active
 Policy selected for HA group based on having the most
name-value pairs matching the group name
 At least one policy must match and there must be only one
policy with the most number of matches
 Policy rules applied when
– Member joins or leaves HA group
– State of a member changes e.g. from idle to disabled
 Policy changes are picked up dynamically

High availability policy settings
 Policy type  Is alive timer used to

– All active determine failure of process or
component
– M of N active
 Quorum setting only activates
– No operation members once majority are
– One of N available
– Static  Dynamic policy information
may also be contained in
 Preferred servers (ordered list)
group name e.g. GN_PS
 Fail back – always active on contains the names of the
most preferred server available preferred servers
 Preferred servers only

Policy modification
 Do not delete the IBM provided policies

 Do not try to change the type of an existing policy
 Ensure that the components that a policy will
match support that policy type
 Ensure that match criteria are not ambiguous by
adding criteria or deleting unwanted policies

Viewing HA group information
 The current HA groups can be viewed on the Runtime tab of a core

group
 Show servers will then displays the HA groups for each server
 Shows groups displays the HA groups for the entire group
 Viewing group shows members and their current status

Disabling HA manager
 CPU, heap and socket usage increases

exponentially with core group size
 HA manager can be disabled on a per-process
basis if its functionality is not required
 Do not disable on HA manager on deployment
manager or node agent unless disabled on all
servers in that core group

Agenda
 Introduction
 Core groups
 Questions

Transaction service high availability

Cluster
server1
Txn Log
server2
Txn Log
server3
Txn Log

Transaction service default policy and HA groups
 Default Clustered TM Policy is One of N policy with match

criteria of type=WAS_TRANSACTIONS and failback enabled
 By default each server joins just one group containing its
name as a dynamic preferred server

Enabling transaction service high availability
 Select the “High availability for

persistent services” option on
the cluster definition
 Configure recovery log
location for transaction service
on each server with path
accessible from every server
 Each server will then also join
the group for every other
server in the cluster

File locking
 By default file locking is used

to prevent recovery in event of
system overload or network
partitioning
 NFSv3 does not release the file
locks held by a failed host
 If these conditions can be
avoided then disable file
locking

Manual peer recovery
 If using NFSv3 and system

overload or network
partitioning may occur
consider manual peer recovery
 Define a static HA policy for
each server in the cluster with
matching criteria for
type=WAS_TRANSACTIONS
and
GN_PS=cellname\nodename\s
ervername
 Add the corresponding server
to the static group
 On failure, add the peer server
to the static group

Agenda
 Introduction
 Core groups
 Questions

WebSphere XD partitioning facility
 Enables incoming requests to be distributed across

application servers dependent on the content of the request
 Data accessed by requests can then be distributed across
those servers and held in memory
cell
server1
partition1
ODR or partition2
EJB stub
server2
partition3
partition4

WPF HA groups and policies

Agenda
 Introduction
 Core groups
 Questions

Service Integration Bus high availability
 Adding a cluster as a member of a service integration bus

results in the creation of a single messaging engine
 The messaging engine will start in the first available server
and, if that becomes unavailable, failover to another
cluster
server1 server2
ME0
Q1


cluster
server1 server2
ME0
Q1


cluster
server2
BANG ME0
Q1 Q1

SIB default policy and HA groups
 Default SIB Policy is One of N policy with match criteria of

type=WSAF_SIB
 Each server joins a group containing the name of the messaging
engine
 HA manager activates the messaging engine on the first available
server

Service Integration Bus workload management
 Additional messaging engines can be added to provide

workload management capabilities
 Destinations associated with the cluster bus member are
partitioned across the messaging engines and messages
distributed across those partitions
 HA policies must be configured else all the messaging
engines will start on the first available server!
cluster
server1 server2
ME0 ME1
Q1 Q1

SIB HA policies for workload management
 If failover isn’t required, create a static policy for each messaging

engine tying it to a particular server
 If failover is required, create a One of N policy for each messaging
engine specifying a preferred server and failback
 Policy must match messaging engine name and type to avoid
ambiguity with default policy

SIB HA policies for large clusters
 Specify an ordered list of

preferred servers for each
messaging engine rotating
across a subset of the
cluster members
 Select Preferred servers
only


Summary
 Core groups are static collections of communicating servers

 Bridge servers can be used to share information between core
groups
 HA groups are created dynamically by WebSphere components
 HA policies define where those components should be active
 Examples of where you may come in to contact with HA manager
include:
– Transaction Service
– WebSphere XD partitioning facility
– Service Integration Bus

References
 Transactional High Availability and Deployment Considerations in WAS V6
– http://www.ibm.com/developerworks/websphere/techjournal/0504_beaven/0504_beaven.html
 WebSphere Application Server V6 System Management and Configuration Handbook
– http://www.redbooks.ibm.com/abstracts/sg246451.html
 WebSphere Application Server Network Deployment V6: High availability solutions
 WebSphere Application Server V6 Scalability and Performance Handbook
 WAS V6 InfoCenter: High availability and workload sharing
– http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/topic/com.ibm.websphere.pmc.nd.doc/tasks/tjt9999_.html
 WAS V6 InfoCenter: Setting up a high availability environment
– http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/topic/com.ibm.websphere.nd.doc/info/ae/ae/trun_ha_environment.html
 WebSphere XD InfoCenter: HA manager and the partitioning facility
– http://publib.boulder.ibm.com/infocenter/wxdinfo/v6r0/topic/com.ibm.websphere.xd.doc/info/WPF51/cwpfha_pdf.html


Ha Manager

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ha Manager

Uploaded by

Copyright:

Available Formats

WebSphere Application Server V6

High Availability Manager Overview

David Currie, IT Specialist

© 2006 IBM Corporation

2 High Availability Manager Overview © 2003 IBM Corporation

What is the high availability manager?

 Component of WebSphere Application Server V6 providing

3 High Availability Manager Overview © 2003 IBM Corporation

4 High Availability Manager Overview © 2003 IBM Corporation

 Set of processes representing a high availability domain in

5 High Availability Manager Overview © 2003 IBM Corporation

Core group coordinator

 Every core group has a coordinator that manages the

6 High Availability Manager Overview © 2003 IBM Corporation

Core group transport options

 Channel framework transport

7 High Availability Manager Overview © 2003 IBM Corporation

Core group discovery protocol

 Establishes network connectivity with other

8 High Availability Manager Overview © 2003 IBM Corporation

Core group failure detection protocol

 Monitors the connections that the discovery protocol

9 High Availability Manager Overview © 2003 IBM Corporation

Core group view synchrony protocol

 A view consists of a set of connected core group members

10 High Availability Manager Overview © 2003 IBM Corporation

Core group scaling

11 High Availability Manager Overview © 2003 IBM Corporation

Core group configuration

 Core group configuration is stored at cell scope

12 High Availability Manager Overview © 2003 IBM Corporation

Why have multiple core groups in a cell?

 One or more firewalls within a cell – a core group

13 High Availability Manager Overview © 2003 IBM Corporation

14 High Availability Manager Overview © 2003 IBM Corporation

Core group bridge service

 Core groups may need to communicate to share availability

15 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups in a cell

16 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups across cells

17 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups across cells

18 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups across cells

19 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups across cells

20 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups across networks

21 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups across networks

22 High Availability Manager Overview © 2003 IBM Corporation

Communication between core groups across networks

23 High Availability Manager Overview © 2003 IBM Corporation

Core group bridge service custom properties

24 High Availability Manager Overview © 2003 IBM Corporation

25 High Availability Manager Overview © 2003 IBM Corporation

High availability groups

 Created dynamically when an application server component

 Each group member is either idle, active or disabled

26 High Availability Manager Overview © 2003 IBM Corporation

High availability group policies

 Statically defined HA policies govern which members of HA