RAC Overview

High Availability Architecture Design with RAC
Ruiping Sun
Abstract
HA infrastructure design is integration of database, clustering, Network and storage technologies in which Real application server plays the key role. This presentation addresses how to integrate RAC with Linux Server, Networking, Clustering and HA storage. In addition, developing HA SAN and NAS is critical and challenge task, there will special emphasis on technologies that are involved in RAC .
Section One: HA Requirements and Infrastructure

Application requirements Explain RAC design strategy and configurations Describe Oracle CRS Components and functionalities Identify Storage design and HA integration issues HA Design and fail over scenarios
Application Requirements analysis

Multiple Instances (> 10) Total Data size > 100T OLTP DB: Ex: POD Report DB: Ex: POW Data warehouse: Ex: EDW
Availability requirements Analysis
True 24x365 Availability 5 9S - 6 9S

99.9999% 30sec /y unscheduled down time 99.999% 300sec/y unscheduled down time 99.99% 3000sec/y unscheduled down time Scheduled: APP/OS/Oracle patches, configuration changes Unscheduled: Server/Network/Storage failures
Handles: scheduled & unscheduled down time

Disaster recovery(proof):
Onsite Offsite (BCP)

5
RAC Architecture
Application 1 Application 2 Application x 10 nodes Cluster
Node1
Node10
instance 1
instance 10
Storage: (NAS or SAN) Switches + Storage
RAC Components
CRS

Run on each node as daemon Manage Cluster membership Node failure Manage VIP Manage configuration and services Run on each node 1 instance/node Request services from CRS Utilize inter-connection to communicate with other instance
RAC DB

Storage
NAS:
SAN: require oracle ASM (per instance)
7
CRS: Clustering Components

Application servers: communicate with RAC DB through VIP LAN
Network Switch1 10 nodes Cluster Node1: CRS IP resource management (VIP) Voting Disk: tiebreaker OCR: configuration repository Provide services to RAC instance Do not process ANY application data
Network Switch2
Node10: CRS IP resource management (VIP) Voting Disk: tiebreaker OCR: configuration repository Provide services to RAC instance Do not process ANY application data
Storage: (NAS or SAN) Switches + Storage
Bonding: Is the key
Inter connection

2 x 1G Ethernet RAC inter-communication Cluster management 2x1G Ethernet All application traffic
Public connection

Linux Bonding: Public and Inter-connection

9
Fail over scenarios

DB server MTBF: 2Y-20Y Network/SAN Switch MTBF: ? NIC Card MTBF: 20-40Y HBA MTBF:20-40Y DISK MTBF: 100Y Storage Subsystem MTBF:?
10
Fail over scenarios

Public Network failures
1 of 2 Network interface or 1 of 2 Switches ports

Failure detection: < 20 seconds Total fail over time: < 60 seconds:
During this time, no error is returned to application Fail over is totally transparent to applications. From application view of point, it appears slow network response.
application timeout must be set > 60s All current sessions on that node: lost connection All new connection requests: redirect to different node
11
2 of 2 Network interface or 2 of 2 Switches ports

Fail over scenarios

Inter connection failure
1 of 2:

Detection takes < 5 s Total fail over time < 10 s
Transparent to application and RAC
2 of 2 :
Cause the RAC instance fails
12
Fail over scenarios

1 Cluster node failure CRS detect failure Oracle instance on surviving node will recovery the failed instance and it takes: < 75 s (so Application time out > 75S ) 75s?: Deponents on how much recover has to be done

All current sessions on failed node: lost All current sessions on surviving nodes: hang (< 75s) All new connection requests: hang All hang sessions or connection requests continue after recovery is done
13
Session two: Real World examples
RM Data warehouse

SAN example Bonding example NAS example 4 active-active Bonding example
NAS example

14
SAN Storage Architecture

10 nodes Cluster
Node1 RAC ASM Power path Driver Node10 RAC ASM Power path Driver
Fibre switch1
Fibre switch2
CX4 LUNS SNAP
15
RM Example <1> Overview
16
RM Example <2> Bonding
17
RM Example <3> SAN
18
NAS Example <1>
19
4 Way Active/Active Bonding
20
Section 3 Conclusion
RAC is the key technology to achieve higher availability
All hardware failures should not cause outage due to the RAC architecture Application malfunctions and human errors are the only contributors to outage
Future Infrastructure Enhancements
21
Section 3 Conclusion
Integrated HA Infrastructure
RAC
HA storage
Cluster(CRS)
Application
22

RAC Overview

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RAC Overview

Uploaded by

Copyright:

Available Formats

High Availability Architecture Design with RAC

Section One: HA Requirements and Infrastructure

Application Requirements analysis

Availability requirements Analysis

True 24x365 Availability 5 9S - 6 9S

Handles: scheduled & unscheduled down time

Onsite Offsite (BCP)

Storage: (NAS or SAN) Switches + Storage

CRS: Clustering Components

Storage: (NAS or SAN) Switches + Storage

Bonding: Is the key

Linux Bonding: Public and Inter-connection

Fail over scenarios

Fail over scenarios

2 of 2 Network interface or 2 of 2 Switches ports

Fail over scenarios

Detection takes < 5 s Total fail over time < 10 s

Transparent to application and RAC

Cause the RAC instance fails

Fail over scenarios

Session two: Real World examples

SAN example Bonding example NAS example 4 active-active Bonding example

SAN Storage Architecture

CX4 LUNS SNAP

RM Example <1> Overview

RM Example <2> Bonding

RM Example <3> SAN

NAS Example <1>

4 Way Active/Active Bonding

RAC is the key technology to achieve higher availability

Future Infrastructure Enhancements

You might also like