You are on page 1of 27

Advanced Technical Skills (ATS) North America

HADR (High Availability Disaster Recovery)


with TSM

IBM Advanced Technical Skills


Richard Crespo
racrespo@us.ibm.com

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

TSM Advanced Technical Skills Team


Dave Canan
ddcanan@us.ibm.com

Richard Crespo
racrespo@us.ibm.com

Dave Daun
djdaun@us.ibm.com

Tom Hepner
hep@us.ibm.com

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Topics
What is HADR
Why use HADR
Types of HADR Synchronization
Configuring HADR
Validating HADR configuration
TSM Recover on HADR copy

HADR Failover

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

What is HADR
High availability disaster recovery (HADR) is a data replication feature that
provides a high availability solution for both partial and complete site
failures. HADR protects against data loss by replicating changes from a
source database (primary) to a target database (standby).

DB
transactions

Backup
data

Client

Primary

Heartbeat

Standby

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Why use HADR


You most likely already have it.
HADR is a standard feature of DB2 and is included with TSM 6.x. You
already have the ability to leverage HADR
It is hardware agnostic.
Since the HADR communication is done by the database itself the
type disk subsystem your DB2 instance on becomes irrelevant.

Runs on TCP/IP network so there is no need for any special hardware


or software
Easy to setup and manage
Setup only requires a few configuration parameters to be modified.
Only 3 commands are needed to manage HADR once it is configured.
Choice of synchronization levels

Choose the best sync mode for your business need: sync, nearSync,
async, superAsync

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

HADR Synchronization levels


The HADR SYNC level determines the degree of protection your
DB2 High Availability Disaster Recovery (HADR) database
solution has against transaction loss. The synchronization mode
determines when the primary database server considers a
transaction complete, based on the state of the logging on the
standby database.

Sync
NearSync

Async
SuperAsync

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Synchronization levels cont.


SYNC
1. Transactions are sent to the primary and they are received

into memory.
2. Transactions are written to the logs on the primary.
3. Transactions are sent to the standby.
4. Transactions are received into memory then written to the
logs on the standby
5. Acknowledgement is sent from standby to primary that it has
received the transaction and has written them to the log.
6. Transaction is considered complete and next transaction
may take place.

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Synchronization levels cont.


NearSYNC
1. Transactions are sent to the primary.
2. Once primary receives transaction into memory it
immediately send the transaction to the standby
without waiting till transactions are written to the logs.

3. Once the standby receives the transaction into


memory it send an acknowledgement to the primary
that it received the transaction.
4. Transaction is considered complete and next
transaction may take place.

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Synchronization levels cont.


ASYNC
1. Transactions are sent to the primary.
2. Once primary receives transaction into memory it
immediately send the transaction to the standby and
writes the transaction to its own log.
3. Transaction is considered complete and next
transaction may take place.

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Synchronization levels cont.


SuperASYNC
1. Transactions are sent to the primary.
2. Once primary receives transaction into memory it is
immediately written to the log
3. Transaction is considered complete and next
transaction may take place.

10

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Configuring HADR

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Synchronizing time
It is very important that the time on the primary and
standby servers have their time in sync. Having the
times out of sync will create problems for the log being
sent to the standby server.

Use NTP (network time protocol) to ensure the time is


in sync.
Make sure both the primary and standby server are
configured in the same timezone.

12

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Priming the Database


You must initialize or prime the TSMDB1 database on the standby system
so that subsequent log updates that occur on the primary TSMDB1
database can be applied to the standby TSMDB1 database. You do the
priming using the DB2 backup db utility.
1. Back up the database on primary TSM host
tsm:server1> halt

su - tsminst1
db2 backup db tsmdb1 to /space/mx/hadrtest
Do not start the server (do not issue the dsmserv command).
2. Restore the primary TSM host database to the server on the standby
host.
Stop the Tivoli Storage Manager server if it is running.
su - tsminst1
db2 drop db tsmdb1

db2 restore db tsmdb1 from /space/mx/hadrtest


3. Do not start the server (do not issue the dsmserv command).

13

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Finding available ports


You will need to Identify an available TCP port for the
HADR_LOCAL_SVC and HADR_REMOTE_SVC that can be used
by both the primary and standby server.

Work with system admin to identify the port.

Try and keep port number consistent with other port numbers
used by DB2.

Make sure the same port number is available on both primary


and standby servers.

14

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

DB2 Parameters to configure

15

Parameter

Description

hadr_local_host

Local host name

hadr_local_svc

Local TCP/IP port to be assigned to


HADR process

hadr_remote_host

Remote host name that the peer


HADR resides on

hadr_remote_inst

Remote database instance that the


peer TSMDB1 database resides in

hadr_remote_svc

Remote port of the peer HADR


process

hadr_syncmode

How primary log writes are


synchronized with standby

hadr_timeout

Time HADR process waits before


communication attempt with peer is
considered as failed

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Configuring primary host in HADR pair


Issue the following sequence of commands to configure HADR on the primary
node. Ensure that you are running under the DB2 instance that TSMDB1 is
contained in.
su - tsminst1
db2 update db cfg for tsmdb1 using hadr_local_host host.primaryname.com
db2 update db cfg for tsmdb1 using hadr_local_svc 60010
db2 update db cfg for tsmdb1 using hadr_remote_host host.standbyname.com
db2 update db cfg for tsmdb1 using hadr_remote_inst tsminst1
db2 update db cfg for tsmdb1 using hadr_remote_svc 60010
db2 update db cfg for tsmdb1 using hadr_syncmode SYNC
db2 update db cfg for tsmdb1 using hadr_timeout 120

16

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Configuring standby host in HADR pair


Issue the following sequence of commands to configure HADR on the standby
node. Ensure that you are running under the DB2 instance that TSMDB1 is
contained in.
su - tsminst1
db2 update db cfg for tsmdb1 using hadr_local_host host.standyname.com
db2 update db cfg for tsmdb1 using hadr_local_svc 60010
db2 update db cfg for tsmdb1 using hadr_remote_host host.primaryname.com
db2 update db cfg for tsmdb1 using hadr_remote_inst tsminst1
db2 update db cfg for tsmdb1 using hadr_remote_svc 60010
db2 update db cfg for tsmdb1 using hadr_syncmode SYNC
db2 update db cfg for tsmdb1 using hadr_timeout 120

17

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Starting HADR
Start HADR on the secondary server. Issue the following commands:
su - tsminst1
db2 start hadr on db tsmdb1 as standby

Start HADR on the primary server. Issue the following commands:


su - tsminst1

db2 start hadr on db tsmdb1 as primary


At this point you can bring up your TSM server on the primary using the
dsmserv command.

18

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Monitoring HADR
To check the status of HADR use the db2pd command:
db2pd hadr db tsmdb1
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = SYNC

STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
PRIMARY_MEMBER_HOST = host.primaryname.com
PRIMARY_INSTANCE = tsminst1
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = host.standbyname.com
STANDBY_INSTANCE = tsminst1
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 06/08/2011 13:38:10.199479 (1307565490)

HEARTBEAT_INTERVAL(seconds) = 25
HADR_TIMEOUT(seconds) = 100

19

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

TSM Configuration

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Recoverability of data on secondary server


If your primary TSM server fails you need to ensure you can
recover data from the standby server.

Ensure copy pool data is backed up to a copy storage pool that


resides on a NAS filesystem that is accessible by both primary
and standby servers.
Ensure the paths to the volumes is identical on both servers.

You should be using de-duplication for copy storage pools


Clients will need to have their option files reconfigured to point to
the standby server or a DNS change will need to be put in place.

This solution only give you the ability to recover data from the
standby server in case of a failure of the primary.

21

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Backing up data to standby server.


If your primary server is going to be down for an extended period of
time and you need to backup data to the standby server there are
some considerations you need to think about.
Ensure you have storage available for primary disk pools.
Make sure DNS changes have been made to route client data to

the standby server.


Keep in mind network throughput from primary site to standby
site may not have bandwidth to accommodate full backup load.

22

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Failover Types
Failovers are done when you need to have your standby database
take over the role of the primary database. There are 2 types of
failovers that can be performed with HARD.
Graceful a planed failover general done when both the primary
and standby are available and running fine.
Forced Usually performed when the primary is unavailable and
you want to start up TSM on the standby

23

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Performing a Failover
If the need arises to have the standby server take over for the primary you will need
to perform a failover. When ever possible try and perform a graceful but if you must
you may do a forced takeover. For a graceful failover complete section A and B
below, for forced complete section B only.

A.

On the primary server


1.

Halt TSM

2.

Update role on primary to change it to a standby.

B.

su - tsminst1
db2 start hadr on db tsmdb1 as standby

On the standby server


1.

Issue the takeover command

su - tsminst1
db2 takeover hadr on db tsmdb1 by force
Verify each server has correct role with db2pd command.
Start TSM with DSMSERV command.

To switch roles back to the original configuration perform the same steps above but
substitute primary for standby and standby for primary.

24

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Considerations before failing back


Before failing back you should make sure the db2 database on
both the primary and standby are in sync. This is best done using
the db2pd monitoring command

db2pd hadr db tsmdb1


STANDBY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315


STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
STANDBY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
STANDBY_REPLAY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
STANDBY_RECV_BUF_SIZE(pages) = 16
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 0
PEER_WINDOW(seconds) = 0
READS_ON_STANDBY_ENABLED = Y
STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N

25

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Final Thoughts
HADR is easy to setup
You already have it with TSM 6.x
A lot of thought should go into what sync level you use.
Considerations for running on HADR standby for an extended
period of time.

26

2013 IBM Corporation

Advanced Technical Skills (ATS) North America

Questions ?

27

2013 IBM Corporation

You might also like