You are on page 1of 22

High Availability Architecture Design with RAC

Ruiping Sun

Abstract
HA infrastructure design is integration of database, clustering, Network and storage technologies in which Real application server plays the key role. This presentation addresses how to integrate RAC with Linux Server, Networking, Clustering and HA storage. In addition, developing HA SAN and NAS is critical and challenge task, there will special emphasis on technologies that are involved in RAC .

Section One: HA Requirements and Infrastructure


Application requirements Explain RAC design strategy and configurations Describe Oracle CRS Components and functionalities Identify Storage design and HA integration issues HA Design and fail over scenarios

Application Requirements analysis


Multiple Instances (> 10) Total Data size > 100T OLTP DB: Ex: POD Report DB: Ex: POW Data warehouse: Ex: EDW

Availability requirements Analysis

True 24x365 Availability 5 9S - 6 9S


99.9999% 30sec /y unscheduled down time 99.999% 300sec/y unscheduled down time 99.99% 3000sec/y unscheduled down time Scheduled: APP/OS/Oracle patches, configuration changes Unscheduled: Server/Network/Storage failures

Handles: scheduled & unscheduled down time


Disaster recovery(proof):

Onsite Offsite (BCP)


5

RAC Architecture
Application 1 Application 2 Application x 10 nodes Cluster

Node1

Node10

instance 1

instance 10

Storage: (NAS or SAN) Switches + Storage

RAC Components

CRS

Run on each node as daemon Manage Cluster membership Node failure Manage VIP Manage configuration and services Run on each node 1 instance/node Request services from CRS Utilize inter-connection to communicate with other instance

RAC DB

Storage

NAS:
SAN: require oracle ASM (per instance)
7

CRS: Clustering Components


Application servers: communicate with RAC DB through VIP LAN

Network Switch1 10 nodes Cluster Node1: CRS IP resource management (VIP) Voting Disk: tiebreaker OCR: configuration repository Provide services to RAC instance Do not process ANY application data

Network Switch2

Node10: CRS IP resource management (VIP) Voting Disk: tiebreaker OCR: configuration repository Provide services to RAC instance Do not process ANY application data

Storage: (NAS or SAN) Switches + Storage

Bonding: Is the key

Inter connection

2 x 1G Ethernet RAC inter-communication Cluster management 2x1G Ethernet All application traffic

Public connection

Linux Bonding: Public and Inter-connection


9

Fail over scenarios


DB server MTBF: 2Y-20Y Network/SAN Switch MTBF: ? NIC Card MTBF: 20-40Y HBA MTBF:20-40Y DISK MTBF: 100Y Storage Subsystem MTBF:?

10

Fail over scenarios


Public Network failures
1 of 2 Network interface or 1 of 2 Switches ports

Failure detection: < 20 seconds Total fail over time: < 60 seconds:

During this time, no error is returned to application Fail over is totally transparent to applications. From application view of point, it appears slow network response.

application timeout must be set > 60s All current sessions on that node: lost connection All new connection requests: redirect to different node
11

2 of 2 Network interface or 2 of 2 Switches ports


Fail over scenarios


Inter connection failure

1 of 2:

Detection takes < 5 s Total fail over time < 10 s

Transparent to application and RAC

2 of 2 :

Cause the RAC instance fails

12

Fail over scenarios


1 Cluster node failure CRS detect failure Oracle instance on surviving node will recovery the failed instance and it takes: < 75 s (so Application time out > 75S ) 75s?: Deponents on how much recover has to be done

All current sessions on failed node: lost All current sessions on surviving nodes: hang (< 75s) All new connection requests: hang All hang sessions or connection requests continue after recovery is done

13

Session two: Real World examples

RM Data warehouse

SAN example Bonding example NAS example 4 active-active Bonding example

NAS example

14

SAN Storage Architecture


10 nodes Cluster
Node1 RAC ASM Power path Driver Node10 RAC ASM Power path Driver

Fibre switch1

Fibre switch2

CX4 LUNS SNAP

15

RM Example <1> Overview

16

RM Example <2> Bonding

17

RM Example <3> SAN

18

NAS Example <1>

19

4 Way Active/Active Bonding

20

Section 3 Conclusion

RAC is the key technology to achieve higher availability

All hardware failures should not cause outage due to the RAC architecture Application malfunctions and human errors are the only contributors to outage

Future Infrastructure Enhancements

21

Section 3 Conclusion
Integrated HA Infrastructure
RAC
HA storage

Cluster(CRS)

Application

22

You might also like