You are on page 1of 29

Introduction to Clustering

Prerequisites
Before starting this session, you should
understand what fault tolerance and
load balancing mean.

Industry Definition of
Cluster
Cluster Definition:
A group of computers and storage
devices that work together and can yet
be accessed as a single system.

A Cluster provides:
Distribution of processing load
Automatic recovery from failure of one
or more components in the cluster

Availability, Scalability and


Manageability
Availability:
Measure of the amount of time a system or
component performs its specified function.

Scalability:
The ability to incrementally add smaller,
standard systems as needed to meet overall
processing power requirements.

Manageability:
The ease of administering a cluster solution to
include configuration, updates and/or patches,
and new additions.
4

Availability Overview
Node1
Node1 Services
Services
Web
Web Clients
Clients
Node1
Node1

Node1
Node1

Node2
Node2 Services
Services
Web
Web Clients
Clients

Node2
Node2

Node4
Node4 Provides
Provides access
access
to
to SQL
SQL Database
Database

Node3
Node3

Node3
Node3

Node4
Node4

Node4
Node4

Cluster
Cluster
Solution
Solution

Before
Before Node1
Node1 Failure
Failure

Node2
Node2

Node4
Node4 Provides
Provides access
access
to
SQL
Database
to SQL Database

Node1
Node1
Fails
Fails
5

Cluster
Cluster
Solution
Solution

After
After Node1
Node1 Failure
Failure

Scalability Overview
Scaling up:
Scaling up is achieved by adding more
resources, such as memory, processors,
and disk drives to a system.

Scaling Out:
Scaling out delivers high performance
when the throughput requirements of an
application exceed the capabilities of an
individual system.
6

Manageability
Overview
The following questions must be answered:

Setup

How easy is it to install the cluster solution?

Configuration

How easy is it to install applications into the cluster and administer the
different aspects of the clustering software?
How easy is it to dynamically increase or scale up the cluster solution when
your business requirements exceed the current capacity?

Disaster Recovery

How quickly and easily in the event of a complete and total disaster can
you bring the cluster solution back into production?

Application

On the applications that you install into the cluster, what type of additional
maintenance and administration is required above a stand-alone version of
the application.

Application updates

How easy is it to update the applications when the time comes for new
features or security updates?

Operating System patch management.

How easy is it to update the Core OS on which the cluster server runs or
update the cluster service due to security patches being released or
patches to resolve bugs in the existing software.
7

Cluster Solution Benefits


Factors to be considered while planning a Cluster Deployment:

Cost of hardware
Cost of the Computers or Nodes
Networking devices such as Switches or routers.
Shared or External Storage (SAN)

Cost of the Cluster Software Product or Suite


This would be the OS, Clustering Software and applications
that will be used to run on the cluster.

Cost of ownership
What you need to take into account is that the hardware
might be cheaper but it will possibly take more man-hours
from your administrative and developer staff to implement,
design, create and maintain the cluster solution.
8

Cluster Models and Their


Configurations
Active/ Active
Active/ Passive

Active / Active
Capacity to
Failover
Group 2
File/
Print
Group 1
Serv
er

Cluste
r
Capacity to
Failover
Group 1
10

Serv
er
File/
Print
Group 2

Active / Passive
File/
Print
Group 1
Serv
er

Cluste
r
Capacity to
Failover
Group 1
11

Serv
er

Active/Active
Configuration
Cluster
Service
Group 1
Capacity
to failover
Group 1

Group 2

Disk 1

\\Engineering

Node A

Capacity
to failover
Group 2

Disk 2

Quoru
m
Disk 1
Disk 2

12

\\Accounting
Node B

Active/Passive
Configuration
Group 1
Cluster Service
Disk 1

\\Accounting

Node A manages
virtual server
\\Accounting.
Node B is
configured as a hot
spare and will take
ownership of
\\accounting if Node
A goes offline

Quorum
Disk 1

Node A

Node B
13

Microsoft Technologies for


Clustering

Two Microsoft technologies for clustering:

Network Load Balancing (NLB)


Server Cluster (MSCS)

NLB and MSCS must be installed on separate machines


Example

Front-End NLB servers hosting IIS and communicating with a Backend MSCS
Cluster for Database information

Client
Client
MSCS
MSCS Hosting
Hosting
Database
Database

NLB
NLB Hosting
Hosting IIS
IIS
14

Microsoft Windows 2003 Server


Cluster (MSCS)
Additional Capabilities provided by MSCS

Every node has full connectivity and communication with


the other nodes in the cluster through the following:
One or more shared SCSI, iSCSI or Fibre Channel buses for
Block Level storage.
A private network, or interconnect, that carries only internal
cluster communication.
One or more public networks.

Every node in the cluster is aware when another system


joins or leaves the cluster.
Every node in the cluster is aware of the resources that
are running locally as well as the resources that are
running on all other cluster nodes.
15

Server Cluster and NLB


Comparison
Server Cluster
NLB
Used for databases, e-mail services,
line of business (LOB) applications,
and custom applications

Used for Network Services such as


Web servers, FTP Servers, firewalls,
and other networking services

Included with Windows Server 2003,


Enterprise Edition, and Windows
Server 2003, Datacenter Edition

Included with all four versions of


Windows Server 2003

Provides high availability, scalability


and server consolidation

Provides high availability and


scalability

Can be deployed on a single network


or geographically distributed

Generally deployed on a single


network but can span multiple
networks if properly configured

Supports clusters up to eight nodes

Supports clusters up to 32 nodes

Requires the use of shared or


replicated storage

Does not require any special hardware


or software and works out of the box
16

Microsoft Server Cluster Terminology


and Definitions (1)
CLUSTER

AA group
group of
of independent
independent network
network servers
servers that
that present
present themselves
themselves to
to aa
network
network as
as aa single
single system
system

NODE

AA cluster
cluster node
node isis aa Microsoft
Microsoft Windows
Windows 2003
2003 Server
Server system
system that
that has
has aa
working
working installation
installation of
of the
the Cluster
Cluster service.
service.

RESOURCES

Resources
Resources are
are physical
physical or
or logical
logical entities,
entities, such
such as
as aa file
file share,
share, that
that are
are
managed
managed by
by the
the Cluster
Cluster service
service

RESOURCE
STATES

All
All resources
resources can
can have
have the
the following
following states:
states: Online,
Online, Offline,
Offline, Online
Online
pending,
pending, Offline
Offline pending
pending and
and Failed.
Failed.

RESOURCE
DEPENDENCIES

AA dependency
dependency isis aa two-way
two-way association
association between
between resources.
resources.

GROUPS

Groups
Groups are
are aa collection
collection of
of resources
resources that
that need
need to
to be
be managed
managed as
as aa
single
single unit
unit for
for configuration
configuration and
and recovery
recovery purposes.
purposes.

FAILOVER

Failover
Failover isis the
the process
process of
of moving
moving aa group
group of
of resources
resources from
from one
one node
node
to
to another
another in
in the
the case
case of
of aa failure
failure or
or for
for administrative
administrative tasks.
tasks.

17

Microsoft Server Cluster Terminology


and Definitions (2)
FAILBACK
VIRTUAL SERVER
CLUSTER
NETWORK
SHARED DISKS
QUORUM
RESOURCE

Failback
Failback isis the
the process
process of
of returning
returning aa group
group of
of resources
resources to
to the
the node
node
on
on which
which itit was
was running
running before
before aa failover
failover occurred
occurred ..
Groups
Groups that
that contain
contain an
an IP
IP Address
Address resource
resource and
and aa network
network name
name
resource
resource and
and appear
appear as
as individual
individual servers
servers to
to clients
clients
All
All nodes
nodes must
must have
have aa network
network link
link between
between them
them that
that they
they can
can use
use to
to
communicate
communicate with
with each
each other.
other.
The
The shared
shared disks
disks are
are logical
logical devices
devices that
that all
all the
the cluster
cluster nodes
nodes are
are
attached
attached to
to via
via the
the shared
shared bus.
bus.
AA group
group of
of independent
independent network
network servers
servers that
that present
present themselves
themselves to
to aa
network
network as
as aa single
single system
system
Cluster service is the collection of software on each node that

Cluster service is the collection of software on each node that


CLUSTER SERVICE manages
manages all
all cluster
cluster specific
specific activity.
activity.

CLUSTER-AWARE
AND CLUSTERENABLED
APPLICATIONS

AA cluster-aware
cluster-aware application
application isis any
any application
application that
that has
has been
been designed
designed
to
to function
function on
on aa cluster
cluster and
and ships
ships with
with aa resource
resource DLL.
DLL.

18

New Cluster Setup Features


(1)
The
default
installation
of
The
default
installation
of Clustering
Clustering reduces
reduces the
the administrative
administrative
Installed by Default
overhead
overhead and
and also
also does
does not
not require
require aa reboot.
reboot.

Node Eviction

Node
Node eviction
eviction does
does not
not require
require aa reboot.
reboot. This
This results
results in
in increased
increased
availability
availability and
and easier
easier disaster
disaster recovery
recovery when
when there
there isis aa node
node failure.
failure.

Rolling Upgrades

Allows
Allows other
other nodes
nodes in
in the
the cluster
cluster to
to function
function while
while aa node
node OS,
OS, isis
upgraded
upgraded to
to aa newer
newer version.
version.

Queued Changes

The
The cluster
cluster service
service can
can queue
queue up
up changes
changes that
that need
need to
to be
be completed
completed ifif
aa node
node isis offline.
offline.

Simpler Uninstallation

Uninstalling
Uninstalling Cluster
Cluster Service
Service from
from aa node
node isis now
now aa one-step
one-step process
process of
of
evicting
evicting the
the node.
node.

Remote
Administration

Remote
Remote Administration
Administration allows
allows full
full remote
remote creation
creation and
and configuration
configuration of
of
the
the server
server cluster.
cluster.
19

New Cluster Setup Features


(2)
Pre-configuration AA pre-configuration
pre-configuration analysis
analysis ensures
ensures that
that any
any known
known incompatibilities
incompatibilities
Analysis

are
are detected
detected prior
prior to
to configuration.
configuration.

Installation of cluster service now allows multiple nodes to be added to


Multi-Node Addition Installation of cluster service now allows multiple nodes to be added to
aa server
server cluster
cluster in
in aa single
single operation.
operation.
Quorum Selection

The
The disk
disk that
that needs
needs to
to be
be used
used as
as the
the Quorum
Quorum Resource
Resource isis
automatically
automatically configured
configured on
on the
the smallest
smallest disk
disk that
that isis larger
larger then
then 50
50 MB
MB
and
and formatted
formatted with
with NTFS.
NTFS.

Local Quorum

IfIf aa node
node isis not
not attached
attached to
to aa shared
shared disk,
disk, itit will
will automatically
automatically configure
configure
as
as aa "Local
"Local Quorum"
Quorum" resource.
resource.

20

Password Change

Administrative
Enhancements
In
In Windows
Windows Server
Server 2003,
2003, you
you can
can change
change the
the Cluster
Cluster Service
Service account
account

Enhanced Node
Failover

Cluster
Cluster Service
Service now
now includes
includes enhanced
enhanced logic
logic for
for Group
Group Failover,
Failover, when
when
you
you have
have aa cluster
cluster with
with three
three or
or more
more nodes.
nodes.

Group Affinity
Support

Group
Group Affinity
Affinity Support
Support allows
allows an
an application
application to
to describe
describe itself
itself as
as an
an N+I
N+I
(N
(N active
active nodes
nodes and
and II spare
spare nodes)
nodes)

WMI Support

WMI
WMI allows
allows server
server clusters
clusters to
to be
be managed
managed as
as part
part of
of an
an overall
overall WMI
WMI
environment.
environment.

Resource Deletion

Resources
Resources can
can be
be deleted
deleted in
in Cluster
Cluster Administrator
Administrator or
or with
with Cluster.exe
Cluster.exe
without
without taking
taking the
the resources
resources offline
offline first.
first.

password
password without
without having
having to
to take
take the
the cluster
cluster offline.
offline.

21

Supporting and Troubleshooting


Enhancements (1)
Software Tracing

Software
Software Tracing
Tracing isis aa new
new method
method for
for debugging
debugging that
that allows
allows debugging
debugging
the
the Cluster
Cluster Service
Service without
without loading
loading checked
checked build
build versions
versions of
of the
the dlls.
dlls.

Event Log

The
The use
use of
of Event
Event Log
Log allows
allows event
event log
log parsing
parsing and
and management
management tools
tools
to
to be
be used
used to
to track
track successful
successful failovers
failovers rather
rather than
than just
just catastrophic
catastrophic
failures.
failures.

Clcfgsrv.log

During
During configuration
configuration of
of Cluster
Cluster Service,
Service, aa separate
separate setup
setup log
log
(%SystemRoot%\system32\Logfiles\Cluster\ClCfgSrv.log)
(%SystemRoot%\system32\Logfiles\Cluster\ClCfgSrv.log) isis created
created to
to
assist
assist in
in troubleshooting.
troubleshooting.

Chkdsk logging

The
The use
use of
of the
the Chkdsk
Chkdsk utility
utility enables
enables easier
easier monitoring
monitoring and
and
troubleshooting.
troubleshooting.

22

Supporting and Troubleshooting


Enhancements (2)
Cluster.log
new info

The
The cluster.log
cluster.log file
file has
has been
been changed
changed to
to add
add logging
logging levels
levels (ERR,
(ERR,
INFO,
INFO, WARN)
WARN) to
to entries
entries in
in the
the log,
log, thereby
thereby making
making itit easier
easier to
to locate
locate
problem
problem sections
sections in
in the
the log.
log.

Cluster.obj

The
The cluster.obj
cluster.obj file
file eliminates
eliminates the
the need
need to
to open
open the
the registry
registry to
to figure
figure out
out
the
the friendly
friendly name
name of
of the
the resource.
resource.

Offline/Failure
Reason Codes

The
The Offline/Failure
Offline/Failure Reason
Reason Codes
Codes allow
allow the
the application
application to
to have
have
different
different semantics
semantics ifif the
the applications
applications has
has failed
failed or
or some
some dependency
dependency
of
of the
the application
application has
has failed
failed

Clusdiag

The
The Cluster
Cluster diagnostic
diagnostic tool
tool greatly
greatly assists
assists in
in the
the analysis
analysis of
of cluster
cluster
logs
logs by
by capturing
capturing the
the Cluster.log
Cluster.log file
file from
from each
each node.
node.

23

Disaster Recovery
Enhancements
NT-Backup
/ ASR

Confdisk and Clusterrecovery

24

Confdisk

Confdisk.exe -- is a tool that can be used to recover failed disks in a


cluster. We need to use Confdisk.exe in conjunction with the Cluster
Recovery and Cluster Administrator tools due to the nature of cluster
25
troubleshooting.

Clusterrecovery

26

Microsoft Windows Server Cluster


Benefits of Microsoft Clusters:
Support for automatic recovery of services in the event of failure of one or
more computers within the cluster.
Provision of data consistency across all nodes in the cluster.
Standard, cross-platform application programming interface (API) for
developing and supporting cluster-aware and cluster-enabled
applications.
Standard set of clustering services for clusters from many different hardware
vendors.
Increased scalability by allowing new components to be added as system
load increases without taking existing cluster services offline.
Ability to allow administrators to manage a cluster as a single system and to
manage applications as if they were running on a single server.
Improves the availability of client/server applications by increasing the
availability of server resources.
By clustering existing hardware with new computers, you protect your
investment in both hardware and software: Instead of replacing an existing
computer with a new one of twice the capacity, you can simply add another
computer of equal capacity.
27

Additional References
The following Microsoft articles provide information on
Cluster, SAN and Disk Management.
http://technet.microsoft.com/enus/library/aa996161%28v=exchg.65%29.aspx
http://blogs.technet.com/b/askcore/archive/2007/11/12/sowhat-does-cluster-recovery-actually-recover-anyway.aspx
http://support.microsoft.com/kb/323437
280297: How to Configure Volume Mount Points on a
Clustered Server
304736: How to Extend the Partition of a Cluster Shared
Disk
301647: Cluster Service improvements for Storage Area
Networks (SANs)
28

Q&A

Thank
you

You might also like