You are on page 1of 22

/w EPDw UBMGRk

©2009 Microsoft Corporation. All rights


reserved.
How a Server Cluster Works
Updated: March 28, 2003
How a Server Cluster Works
In this section
• Server Cluster Architecture
• Server Cluster API
• Server Cluster Processes
• Related Information
A server cluster is a collection of servers, called nodes that communicate with each other to make
a set of services highly available to clients. Server clusters are based on one of the two clustering
technologies in the Microsoft Windows Server 2003 operating systems. The other clustering
technology is Network Load Balancing. Server clusters are designed for applications that have
long-running in-memory state or frequently updated data. Typical uses for server clusters include
file servers, print servers, database servers, and messaging servers.
This section provides technical background information about how the components within a
server cluster work.
Server Cluster Architecture
The most basic type of cluster is a two-node cluster with a single quorum device. For a definition
of a single quorum device, see “What Is a Server Cluster? [ http://technet.microsoft.com/en-
us/library/cc785197(WS.10).aspx ] .” The following figure illustrates the basic elements of a
server cluster, including nodes, resource groups, and the single quorum device, that is, the cluster
storage.
Basic Elements of a Two-Node Cluster with Single Quorum Device
Applications and services are configured as resources on the cluster and are grouped into
resource groups. Resources in a resource group work together and fail over together when
failover is necessary. When you configure each resource group to include not only the elements
needed for the application or service but also the associated network name and IP address, then
that collection of resources runs as if it were a separate server on the network. When a resource
group is configured this way, clients can consistently get access to the application using the same
network name, regardless of which node the application is running on.
The preceding figure showed one resource group per node. However, each node can have
multiple resource groups. Within each resource group, resources can have specific dependencies.
Dependencies are relationships between resources that indicate which resources need to come
online before another resource can come online. When dependencies are configured, the Cluster
service can bring resources online or take them offline in the correct order during failover.
The following figure shows two nodes with several resource groups in which some typical
dependencies have been configured between resources. The figure shows that resource groups
(not resources) are the unit of failover.
Resource Dependencies Configured Within Resource Groups
Cluster Service Component Diagrams and Descriptions
The Cluster service runs on each node of a server cluster and controls all aspects of server cluster
operation. The Cluster service includes multiple software components that work together. These
components perform monitoring, maintain consistency, and smoothly transfer resources from one
node to another.
Diagrams and descriptions of the following components are grouped together because the
components work so closely together:
• Database Manager (for the cluster configuration database)
• Node Manager (working with Membership Manager)
• Failover Manager
• Global Update Manager
Separate diagrams and descriptions are provided of the following components, which are used in
specific situations or for specific types of applications:
• Checkpoint Manager
• Log Manager (quorum logging)
• Event Log Replication Manager
• Backup and Restore capabilities in Failover Manager
Diagrams of Database Manager, Node Manager, Failover Manager, Global Update
Manager, and Resource Monitors
The following figure focuses on the information that is communicated between Database
Manager, Node Manager, and Failover Manager. The figure also shows Global Update Manager,
which supports the other three managers by coordinating updates on other nodes in the cluster.
These four components work together to make sure that all nodes maintain a consistent view of
the cluster (with each node of the cluster maintaining the same view of the state of the member
nodes as the others) and that resource groups can be failed over smoothly when needed.
Basic Cluster Components: Database Manager, Node Manager, and Failover Manager

The following figure shows a Resource Monitor and resource dynamic-link library (DLL)
working with Database Manager, Node Manager, and Failover Manager. Resource Monitors and
resource DLLs support applications that are cluster-aware, that is, applications designed to work
in a coordinated way with cluster components. The resource DLL for each such application is
responsible for monitoring and controlling that application. For example, the resource DLL saves
and retrieves application properties in the cluster database, brings the resource online and takes it
offline, and checks the health of the resource. When failover is necessary, the resource DLL
works with a Resource Monitor and Failover Manager to ensure that the failover happens
smoothly.
Resource Monitor and Resource DLL with a Cluster-Aware Application

Descriptions of Database Manager, Node Manager, Failover Manager, Global Update


Manager, and Resource Monitors
The following descriptions provide details about the components shown in the preceding
diagrams.
Database Manager
Database Manager runs on each node and maintains a local copy of the cluster configuration
database, which contains information about all of the physical and logical items in a cluster.
These items include the cluster itself, cluster node membership, resource groups, resource types,
and descriptions of specific resources, such as disks and IP addresses. Database Manager uses
the Global Update Manager to replicate all changes to the other nodes in the cluster. In this way,
consistent configuration information is maintained across the cluster, even if conditions are
changing such as if a node fails and the administrator changes the cluster configuration before
that node returns to service.
Database Manager also provides an interface through which other Cluster service components,
such as Failover Manager and Node Manager, can store changes in the cluster configuration
database. The interface for making such changes is similar to the interface for making changes to
the registry through the Windows application programming interface (API). The key difference is
that changes received by Database Manager are replicated through Global Update Manager to all
nodes in the cluster.
Database Manager functions used by other components
Some Database Manager functions are exposed through the cluster API. The primary purpose for
exposing Database Manager functions is to allow custom resource DLLs to save private
properties to the cluster database when this is useful for a particular clustered application. (A
private property for a resource is a property that applies to that resource type but not other
resource types; for example, the SubnetMask property applies for an IP Address resource but not
for other resource types.) Database Manager functions are also used to query the cluster
database.
Node Manager
Node Manager runs on each node and maintains a local list of nodes, networks, and network
interfaces in the cluster. Through regular communication between nodes, Node Manager ensures
that all nodes in the cluster have the same list of functional nodes.
Node Manager uses the information in the cluster configuration database to determine which
nodes have been added to the cluster or evicted from the cluster. Each instance of Node Manager
also monitors the other nodes to detect node failure. It does this by sending and receiving
messages, called heartbeats, to each node on every available network. If one node detects a
communication failure with another node, it broadcasts a message to the entire cluster, causing
all nodes that receive the message to verify their list of functional nodes in the cluster. This is
called a regroup event.
Node Manager also contributes to the process of a node joining a cluster. At that time, on the
node that is joining, Node Manager establishes authenticated communication (authenticated RPC
bindings) between itself and the Node Manager component on each of the currently active nodes.
Note
• A down node is different from a node that has been evicted from the cluster. When you
evict a node from the cluster, it is removed from Node Manager’s list of potential cluster
nodes. A down node remains on the list of potential cluster nodes even while it is down;
when the node and the network it requires are functioning again, the node joins the
cluster. An evicted node, however, can become part of the cluster only after you use
Cluster Administrator or Cluster.exe to add the node back to the cluster.
Membership Manager
Membership Manager (also called the Regroup Engine) causes a regroup event whenever another
node’s heartbeat is interrupted (indicating a possible node failure). During a node failure and
regroup event, Membership Manager and Node Manager work together to ensure that all
functioning nodes agree on which nodes are functioning and which are not.
Cluster Network Driver
Node Manager and other components make use of the Cluster Network Driver, which supports
specific types of network communication needed in a cluster. The Cluster Network Driver runs in
kernel mode and provides support for a variety of functions, especially heartbeats and fault-
tolerant communication between nodes.
Failover Manager and Resource Monitors
Failover Manager manages resources and resource groups. For example, Failover Manager stops
and starts resources, manages resource dependencies, and initiates failover of resource groups.
To perform these actions, it receives resource and system state information from cluster
components on the node and from Resource Monitors. Resource Monitors provide the execution
environment for resource DLLs and support communication between resources DLLs and
Failover Manager.
Failover Manager determines which node in the cluster should own each resource group. If it is
necessary to fail over a resource group, the instances of Failover Manager on each node in the
cluster work together to reassign ownership of the resource group.
Depending on how the resource group is configured, Failover Manager can restart a failing
resource locally or can take the failing resource offline along with its dependent resources, and
then initiate failover.
Global Update Manager
Global Update Manager makes sure that when changes are copied to each of the nodes, the
following takes place:
• Changes are made atomically, that is, either all healthy nodes are updated, or none are
updated.
• Changes are made in the order they occurred, regardless of the origin of the change. The
process of making changes is coordinated between nodes so that even if two different
changes are made at the same time on different nodes, when the changes are replicated
they are put in a particular order and made in that order on all nodes.
Global Update Manager is used by internal cluster components, such as Failover Manager, Node
Manager, or Database Manager, to carry out the replication of changes to each node. Global
updates are typically initiated as a result of a Cluster API call. When an update is initiated by a
node, another node is designated to monitor the update and make sure that it happens on all
nodes. If that node cannot make the update locally, it notifies the node that tried to initiate the
update, and changes are not made anywhere (unless the operation is attempted again). If the node
that is designated to monitor the update can make the update locally, but then another node
cannot be updated, the node that cannot be updated is removed from the list of functional nodes,
and the change is made on available nodes. If this happens, quorum logging is enabled at the
same time, which ensures that the failed node receives all necessary configuration information
when it is functioning again, even if the original set of nodes is down at that time.
Diagram and Description of Checkpoint Manager
Some applications store configuration information locally instead of or in addition to storing
information in the cluster configuration database. Applications might store information locally in
two ways. One way is to store configuration information in the registry on the local server;
another way is to use cryptographic keys on the local server. If an application requires that
locally-stored information be available on failover, Checkpoint Manager provides support by
maintaining a current copy of the local information on the quorum resource.
The following figure shows the Checkpoint Manager process.
Checkpoint Manager

Checkpoint Manager handles application-specific configuration data that is stored in the registry
on the local server somewhat differently from configuration data stored using cryptographic keys
on the local server. The difference is as follows:
• For applications that store configuration data in the registry on the local server,
Checkpoint Manager monitors the data while the application is online. When changes
occur, Checkpoint Manager updates the quorum resource with the current configuration
data.
• For applications that use cryptographic keys on the local server, Checkpoint Manager
copies the cryptographic container to the quorum resource only once, when you configure
the checkpoint. If changes are made to the cryptographic container, the checkpoint must
be removed and re-associated with the resource.
Before a resource configured to use checkpointing is brought online (for example, for failover),
Checkpoint Manager brings the locally-stored application data up-to-date from the quorum
resource. This helps make sure that the Cluster service can recreate the appropriate application
environment before bringing the application online on any node.
Note
• When configuring a Generic Application resource or Generic Service resource, you
specify the application-specific configuration data that Checkpoint Manager monitors and
copies. When determining which configuration information must be marked for
checkpointing, focus on the information that must be available when the application
starts.
Checkpoint Manager also supports resources that have application-specific registry trees (not just
individual keys) that exist on the cluster node where the resource comes online. Checkpoint
Manager watches for changes made to these registry trees when the resource is online (not when
it is offline). When the resource is online and Checkpoint Manager detects that changes have
been made, it creates a copy of the registry tree on the owner node of the resource and then sends
a message to the owner node of the quorum resource, telling it to copy the file to the quorum
resource. Checkpoint Manager performs this function in batches so that frequent changes to
registry trees do not place too heavy a load on the Cluster service.
Diagram and Description of Log Manager (for Quorum Logging)
The following figure shows how Log Manager works with other components when quorum
logging is enabled (when a node is down).
Log Manager and Other Components Supporting Quorum Logging
When a node is down, quorum logging is enabled, which means Log Manager receives
configuration changes collected by other components (such as Database Manager) and logs the
changes to the quorum resource. The configuration changes logged on the quorum resource are
then available if the entire cluster goes down and must be formed again. On the first node
coming online after the entire cluster goes down, Log Manager works with Database Manager to
make sure that the local copy of the configuration database is updated with information from the
quorum resource. This is also true in a cluster forming for the first time — on the first node, Log
Manager works with Database Manager to make sure that the local copy of the configuration
database is the same as the information from the quorum resource.
Diagram and Description of Event Log Replication Manager
Event Log Replication Manager, part of the Cluster service, works with the operating system’s
Event Log service to copy event log entries to all cluster nodes. These events are marked to show
which node the event occurred on.
The following figure shows how Event Log Replication Manager copies event log entries to
other cluster nodes.
How Event Log Entries Are Copied from One Node to Another

The following interfaces and protocols are used together to queue, send, and receive events at the
nodes:
• The Cluster API
• Local remote procedure calls (LRPC)
• Remote procedure calls (RPC)
• A private API in the Event Log service
Events that are logged on one node are queued, consolidated, and sent through Event Log
Replication Manager, which broadcasts them to the other active nodes. If few events are logged
over a period of time, each event might be broadcast individually, but if many are logged in a
short period of time, they are batched together before broadcast. Events are labeled to show
which node they occurred on. Each of the other nodes receives the events and records them in the
local log. Replication of events is not guaranteed by Event Log Replication Manager — if a
problem prevents an event from being copied, Event Log Replication Manager does not obtain
notification of the problem and does not copy the event again.
Diagram and Description of Backup and Restore Capabilities in Failover Manager
The Backup and Restore capabilities in Failover Manager coordinate with other Cluster service
components when a cluster node is backed up or restored, so that cluster configuration
information from the quorum resource, and not just information from the local node, is included
in the backup. The following figure shows how the Backup and Restore capabilities in Failover
Manager work to ensure that important cluster configuration information is captured during a
backup.
Backup Request on a Node That Does Not Own the Quorum Resource
DLLs Used by Core Resource Types
A number of DLLs that are used by core resource types are included with server clusters in
Windows Server 2003. The resource DLL defines and manages the resource. The extension DLL
(where applicable) defines the resource’s interaction with Cluster Administrator.
Core Resource Types and Their Associated DLLs

Resource Types DLLs


Physical Disk
Internet Protocol (IP) Address
Network Name
Print Spooler
File Share Resource DLL: Clusres.dll
Generic Application Extension DLL: Cluadmex.dll
Generic Script
Generic Service
Local Quorum
Majority Node Set
Resource DLL: VSSTask.dll
Volume Shadow Copy Service Task
Extension DLL: VSSTskEx.dll
DHCP Service Resource DLL: Clnetres.dll
WINS Service Extension DLL: Clnetrex.dll
Resource DLL: Mtxclu.dll
Distributed Transaction Coordinator (DTC)
Extension DLL: not applicable
Resource DLL: Mqclus.dll
Message Queuing
Extension DLL: not applicable
Cluster Service Files
The following table lists files that are in the cluster directory (systemroot\cluster, where
systemroot is the root directory of the server’s operating system).
Cluster Service Files in Systemroot\Cluster

File Description
Cladmwiz.dll Cluster Administrator Wizard
Clcfgsrv.dll DLL file for Add Nodes Wizard and New Server Cluster Wizard
Clcfgsrv.inf Setup information file for Add Nodes Wizard and New Server Cluster Wizard
Clnetres.dll Resource DLL for the DHCP and WINS services
Clnetrex.dll Extension DLL for the DHCP and WINS services
Cluadmex.dll Extension DLL for core resource types
Cluadmin.exe Cluster Administrator
Cluadmmc.dll Cluster Administrator MMC extension
Clusres.dll Cluster resource DLL for core resource types
Clussvc.exe Cluster service
Debugex.dll Cluster Administrator debug extension
Mqclus.dll Resource DLL for Message Queuing
Resrcmon.exe Cluster Resource Monitor
Vsstask.dll Resource DLL for Volume Shadow Copy Service Task
Vsstskex.dll Extension DLL for Volume Shadow Copy Service Task
Wshclus.dll Winsock helper for the Cluster Network Driver
The following table lists log files for server clusters.
Log Files for Server Clusters

Log File Folder Location Description


cluster.log systemroot\Cluster Records the activity of the Cluster service,
(default name) Resource Monitor, and resource DLLs on
that node. The default name of this log can
be changed by changing the System
environment variable called ClusterLog.
Records the creation and deletion of
cluster objects and other activities of the
Object Manager of the cluster; useful for a
cluster.oml systemroot\Cluster
developer writing a tool for analyzing the
translation of GUIDs to friendly names in
the cluster.
Records activity of Cluster configuration
clcfgsrv.log systemroot\system32\LogFiles\Cluster wizards; useful for troubleshooting
problems during cluster setup.
Records cluster-related activity that occurs
clusocm.log systemroot\system32\LogFiles\Cluster
during an operating system upgrade.
Records the activity that occurs during the
compatibility check at the start of an
cluscomp.log systemroot\system32\LogFiles\Cluster
operating system upgrade on a cluster
node.
The following table lists files that are in systemroot\system32, systemroot\inf, or subfolders in
systemroot\system32.
Additional Cluster Service Files

File Folder Description


clusapi.dll systemroot\system32 Server Cluster API
Cluster extension for the Optional Component
clusocm.dll systemroot\system32\Setup
Manager
clusocm.inf systemroot\inf Cluster INF file for the Optional Component Manager
A DLL that enables the Cluster service on one node to
clussprt.dll systemroot\system32 send notice of local cluster events to the Event Log
service on other nodes
cluster.exe systemroot\system32 Cluster command-line interface
msclus.dll systemroot\system32 Cluster Automation Server
Resutils.dll systemroot\system32 Utility routines used by resource DLLs
Clusnet.sys systemroot\system32\drivers Cluster Network Driver
Clusdisk.sys systemroot\system32\drivers Cluster Disk Driver
The following table lists files that have to do with the quorum resource and (for a single quorum
device cluster, the most common type of cluster) are usually in the directory q:\mscs, where q is
the quorum disk drive letter and mscs is the name of the directory.
Files Related to the Quorum Resource
File Description
The quorum log, which contains records of cluster actions that involve changes to
Quolog.log
the cluster configuration database.
Copies of the cluster configuration database (also known as checkpoints). Only the
Chk*.tmp
latest one is needed.
Directory for each resource that requires checkpointing; the resource GUID is the
{GUID}
name of the directory.
{GUID}\*.cpt Resource registry subkey checkpoint files.
{GUID}\*.cpr Resource cryptographic key checkpoint files.

Server Cluster API


With the Server Cluster application programming interface (API), developers can write
applications and resource DLLs for server clusters. The following table lists Server Cluster API
subsets.
Subsets in the Server Cluster API

API Subset Description


Works directly with cluster objects and interacts with the Cluster
Cluster API
service.
Manages resources through a Resource Monitor and a resource
Resource API
DLL.
Cluster Administrator Extension Enables a custom resource type to be administered by Cluster
API Administrator.
For more information, see Server Cluster APIson MSDN.
Server Cluster Processes
The following sections provide information about the following processes:
• How nodes form, join, and leave a cluster.
• How heartbeats, regroup events, and quorum arbitration work in a cluster. These
processes help the cluster to keep a consistent internal state and maintain availability of
resources even when failures occur.
• How resource groups are brought online, taken offline, failed over, and failed back.
How Nodes Form, Join, and Leave a Cluster
Nodes must form, join, and leave a cluster in a coordinated way so that the following are always
true:
• Only one node owns the quorum resource at any given time.
• All nodes maintain the same list of functioning nodes in the cluster.
• All nodes can maintain consistent copies of the cluster configuration database.
Forming a Cluster
The first server that comes online in a cluster, either after installation or after the entire cluster
has been shut down for some reason, forms the cluster. To succeed at forming a cluster, a server
must:
• Be running the Cluster service.
• Be unable to locate any other nodes in the cluster (in other words, no other nodes can be
running).
• Acquire exclusive ownership of the quorum resource.
If a node attempts to form a cluster and is unable to read the quorum log, the Cluster service will
not start, because it cannot guarantee that it has the latest copy of the cluster configuration. In
other words, the quorum log ensures that when a cluster forms, it uses the same configuration it
was using when it last stopped.
The sequence in which a node forms a cluster is as follows:
1. The node confirms that it can start the Cluster service.
2. The node reviews the information stored in the local copy of the cluster configuration
database.
3. Using information from the local copy of the cluster configuration database, the node
confirms that no other nodes are running.
If another node is running, then the node that started most recently joins the cluster
instead of forming it.
4. Using information from the local copy of the cluster configuration database, the node
locates the quorum resource.
5. The node confirms that it can acquire exclusive ownership of the quorum resource and
that it can read from the quorum resource. If it can, the node takes ownership.
6. The node compares the sequence numbers on the copy of the cluster configuration
database on the quorum resource and the sequence numbers on the quorum log against
the sequence numbers on the node’s local copy of the cluster configuration database.
7. The node updates its local copy of the cluster configuration database with any newer
information that might be stored on the quorum resource.
8. The node begins to bring resources and resource groups online.
Joining a Cluster
The sequence in which a server joins an existing cluster is as follows:
1. The node confirms that it can start the Cluster service.
2. The node reviews the information stored in the local copy of the cluster configuration
database.
3. The node that is joining the cluster tries to locate another node (sometimes called a
sponsor node) running in the cluster. The node goes through the list of other nodes in its
local configuration database, trying one or more until one responds.
If no other nodes respond, the node tries to form the cluster, starting by locating the
quorum resource.
4. Node Manager on the sponsor node authenticates the new node. If the joining node is not
recognized as a cluster member, the request to join the cluster is refused.
5. Node Manager on the joining node establishes authenticated communication
(authenticated RPC bindings) between itself and the Node Manager component on each
of the currently active nodes.
6. Database Manager on the joining node checks the local copy of the configuration
database. If it is outdated, Database Manager obtains an updated copy from the sponsor
node.
Leaving a Cluster
A node can leave a cluster when the node shuts down or when the Cluster service is stopped.
When a node leaves a cluster during a planned shutdown, it attempts to perform a smooth
transfer of resource groups to other nodes. The node leaving the cluster then initiates a regroup
event.
Functioning nodes in a cluster can also force another node to leave the cluster if the node cannot
perform cluster operations, for example, if it fails to commit an update to the cluster
configuration database.
Heartbeats, Regroup Events, and Quorum Arbitration
When server clusters encounter changing circumstances and possible failures, the following
processes help the cluster to keep a consistent internal state and maintain availability of
resources:
• Heartbeats
• Regroup events
• Quorum arbitration
Heartbeats
Heartbeats are single User Datagram Protocol (UDP) packets exchanged between nodes once
every 1.2 seconds to confirm that each node is still available. If a node is absent for five
consecutive heartbeats, the node that detected the absence initiates a regroup event to make sure
that all nodes reach agreement on the list of nodes that remain available.
Server cluster networks can be private (node-to-node communication only), public (client-to-
node communication), or mixed (both node-to-node and client-to-node communication).
Heartbeats are communicated across all networks; however, the monitoring of heartbeats and the
way the cluster interprets missed heartbeats depends on the type of network:
• On private or mixed networks, which both carry node-to-node communication, heartbeats
are monitored to determine whether the node is functioning in the cluster.
A series of missed heartbeats can either mean that the node is offline or that all private
and mixed networks are offline; in either case, a node has lost its ability to function in the
cluster.
• On public networks, which carry only client-to-node communication, heartbeats are
monitored only to determine whether a node’s network adapter is functioning.
Regroup Events
If a node is absent for five consecutive heartbeats, a regroup event occurs. (Membership
Manager, described earlier, starts the regroup event.)
If an individual node remains unresponsive, the node is removed from the list of functional
nodes. If the unresponsive node was the owner of the quorum resource, the remaining nodes also
begin the quorum arbitration process. After this, failover begins.
Quorum Arbitration
Quorum arbitration is the process that occurs when the node that owned the quorum resource
fails or is unavailable, and the remaining nodes determine which node will take ownership.
When a regroup event occurs and the unresponsive node owned the quorum resource, another
node is designated to initiate quorum arbitration. A basic goal for quorum arbitration is to make
sure that only one node owns the quorum resource at any given time.
It is important that only one node owns the quorum resource because if all network
communication between two or more cluster nodes fails, it is possible for the cluster to split into
two or more partial clusters that will try to keep functioning (sometimes called a “split brain”
scenario). Server clusters prevent this by allowing only the partial cluster with a node that owns
the quorum resource to continue as the cluster. Any nodes that cannot communicate with the
node that owns the quorum resource stop working as cluster nodes.
How Clusters Keep Resource Groups Available
This section describes how clusters keep resource groups available by monitoring the health of
resources (polling), bringing resource groups online, and carrying out failover. Failover means
transferring ownership of the resources within a group from one node to another. This section
also describes how resource groups are taken offline as well as how they are failed back, that is,
how resource groups are transferred back to a preferred node after that node has come back
online.
Transferring ownership can mean somewhat different things depending on which of the group’s
resources is being transferred. For an application or service, the application or service is stopped
on one node and started on another. For an external device, such as a Physical Disk resource, the
right to access the device is transferred. Similarly, the right to use an IP address or a network
name can be transferred from one node to another.
Resource-related activities in server clusters include:
• Monitoring the health of resources (polling).
• Bringing a resource group online.
• Taking a resource group offline.
• Failing a resource group over.
• Failing a resource group back.
The administrator of the cluster initiates resource group moves, usually for maintenance or other
administrative tasks. Group moves initiated by an administrator are similar to failovers in that the
Cluster service initiates resource transitions by issuing commands to Resource Monitor through
Failover Manager.
Resource Health Monitoring (Polling)
Resource Monitor conducts two kinds of polling on each resource that it monitors: Looks Alive
(resource appears to be online) and Is Alive (a more thorough check indicates the resource is
online and functioning properly).
When setting polling intervals, it can be useful to understand the following:
• If a Generic Application resource has a long startup time, you can lengthen the polling
interval to allow the resource to finish starting up. In other words, you might not require a
custom resource DLL to ensure that the resource is given the necessary startup time.
• If you lengthen the polling intervals, you reduce the chance that polls will interfere with
each other (the chance for lock contention).
• You can bypass Looks Alive polling by setting the interval to 0.
How a Resource Group Comes Online
The following sequence is used when Failover Manager and Resource Monitor bring a resource
group online.
1. Failover Manager uses the dependency list (in the cluster configuration) to determine the
appropriate order in which to bring resources online.
2. Failover Manager works with Resource Monitor to begin bringing resources online. The
first resource or resources started are ones that do not depend on other resources.
3. Resource Monitor calls the Online entry point of the first resource DLL and returns the
result to Failover Manager.
○ If the entry point returns ERROR_IO_PENDING, the resource state changes to
OnlinePending. Resource Monitor starts a timer that waits for the resource either
to go online or to fail. If the amount of time specified for the pending timeout
passes and the resource is still pending (has not entered either the Online or Failed
state), the resource is treated as a failed resource and Failover Manager is notified.
○ If the Online call fails or the Online entry point does not move the resource into
the Online state within the time specified in the resource DLL, the resource enters
the Failed state, and Failover Manager uses Resource Monitor to try to restart the
resource, according to the policies defined for the resource in its DLL.
○ When the resource enters the Online state, Resource Monitor adds the resource to
its list of resources and starts monitoring the state of the resource.
4. The sequence is repeated as Failover Manager brings the next resource online. Failover
Manager uses the dependency list to determine the correct order for bringing resources
online.
After resources have been brought online, Failover Manager works with Resource Monitor to
determine if and when failover is necessary and to coordinate failover.
How a Resource Group Goes Offline
Failover Manager takes a resource group offline as part of the failover process or when an
administrator moves the group for maintenance purposes. The following sequence is used when
Failover Manager takes a resource group offline:
1. Failover Manager uses the dependency list (in the cluster configuration) to determine the
appropriate order in which to bring resources offline.
2. Failover Manager works with Resource Monitor to begin taking resources offline. The
first resource or resources stopped are ones on which other resources do not depend.
3. Resource Monitor calls the Offline entry point of the resource DLL and returns the result
to Failover Manager.
○ If the entry point returns ERROR_IO_PENDING, the resource state changes to
OfflinePending. Resource Monitor starts a timer that waits for the resource either
to go offline or to fail. If the amount of time specified for the pending timeout
passes and the resource is still pending (has not entered either the Offline or
Failed state), the resource is treated as a failed resource and Failover Manager is
notified.
○ If the Offline call fails or the Offline entry point does not move the resource into
the Offline state within the time specified in the resource DLL, Failover Manager
uses Resource Monitor to terminate the resource and the resource enters the
Failed state.
○ If the Offline call succeeds, Resource Monitor takes the resource off its list and
stops monitoring the resource.
4. The sequence is repeated as Failover Manager brings the next resource offline. Failover
Manager uses the dependency list to determine the correct order for taking resources
offline.
How Failover Occurs
Group failover happens when the group or the node that owns the group fails. Individual
resource failure causes the group to fail if you configure the Affect the group property for the
resource.
Failover takes two forms, as described in the sections that follow:
• Resource failure and group failure (without node failure)
• Node failure or loss of communication between nodes
Resource Failure and Group Failure (Without Node Failure)
When a resource fails, the following process occurs:
1. Resource Monitor detects the failure, either through Looks Alive or Is Alive polling or
through an event signaled by the resource. Resource Monitor calls the IsAlive entry point
of the resource DLL to confirm that the resource has failed.
2. If IsAlive fails, the state of the resource changes to Failed.
If you configured the resource to be restarted on failure, Failover Manager attempts to
restart the resource by trying to bring it online. If the attempts to bring the resource online
fail more than is allowed by the restart Threshold and Period properties, Resource
Monitor stops polling the resource.
3. Through Resource Monitor, Failover Manager calls the Terminate entry point of the
resource DLL.
The rest of this process concerns how the group fails over.
4. If the resource is set to Affect the group, the sequence continues. Otherwise, it ends
without further action. If the resource is set to Affect the group, Failover Managers on
the cluster nodes work together to reassign ownership of the group.
5. On the node on which the resource failed, Failover Manager terminates the resource that
failed and the resources that depend on it, and then Failover Manager takes the remaining
resources in the dependency tree offline in order of dependencies.
6. Failover Manager on the node on which the resource failed notifies Failover Manager on
the node that will take ownership of the resource (and also notifies Failover Manager on
other nodes about the changes that are happening).
7. If any of the resources have been configured so that application-specific configuration
information (registry subkeys) for that resource is checkpointed, Checkpoint Manager
restores the saved registry subkeys for those resources from the quorum resource.
8. The destination node Failover Manager brings the resources online one by one, using the
dependency list to determine the correct order.
9. The node that now owns the group turns control of the group’s resources over to their
respective Resource Monitors.
Node Failure or Loss of Communication Between Nodes
Failover that occurs when a node fails is different from failover that occurs when a resource fails.
For the purposes of clustering, a node is considered to have failed if it loses communication with
other nodes.
As described in previous sections, if a node misses five heartbeats, this indicates that it has
failed, and a regroup event (and possibly quorum arbitration) occurs. After node failure,
surviving nodes negotiate for ownership of the various resource groups. On two-node clusters the
result is obvious, but on clusters with more than two nodes, Failover Manager on the surviving
nodes determines group ownership based on the following:
• The nodes you have specified as possible owners of the affected resource groups.
• The order in which you specified the nodes in the group’s Preferred Owners list.
Note
• When setting up a preferred owners list for a resource group, we recommend that you list
all nodes in your server cluster and put them in priority order.
How Failback Occurs
Failback is the process by which the Cluster service moves resource groups back to their
preferred node after the node has failed and come back online. You can configure both whether
and when failback occurs. By default, groups are not set to fail back.
The node to which the group will fail back initiates the failback. Failover Manager on that node
contacts Failover Manager on the node where the group is currently online and negotiates for
ownership. The instances of Failover Manager on the two nodes work together to smoothly
transfer ownership of the resource group back to the preferred node.
You can test failback configuration settings by following procedures in Help and Support Center.
Related Information
The following sources provide information that is relevant to this section.
• “Planning Server Deployments” in the Windows Server 2003 Deployment Kit on the
Microsoft Web site for more information about failover policies, choices for cluster
storage, and ways that applications can operate within a server cluster.
• What Is a Server Cluster? [ http://technet.microsoft.com/en-
us/library/cc785197(WS.10).aspx ]
Tags:

Community Content

You might also like