Professional Documents
Culture Documents
Ruiping Sun
Abstract
HA infrastructure design is integration of database, clustering, Network and storage technologies in which Real application server plays the key role. This presentation addresses how to integrate RAC with Linux Server, Networking, Clustering and HA storage. In addition, developing HA SAN and NAS is critical and challenge task, there will special emphasis on technologies that are involved in RAC .
Application requirements Explain RAC design strategy and configurations Describe Oracle CRS Components and functionalities Identify Storage design and HA integration issues HA Design and fail over scenarios
Multiple Instances (> 10) Total Data size > 100T OLTP DB: Ex: POD Report DB: Ex: POW Data warehouse: Ex: EDW
99.9999% 30sec /y unscheduled down time 99.999% 300sec/y unscheduled down time 99.99% 3000sec/y unscheduled down time Scheduled: APP/OS/Oracle patches, configuration changes Unscheduled: Server/Network/Storage failures
Disaster recovery(proof):
RAC Architecture
Application 1 Application 2 Application x 10 nodes Cluster
Node1
Node10
instance 1
instance 10
RAC Components
CRS
Run on each node as daemon Manage Cluster membership Node failure Manage VIP Manage configuration and services Run on each node 1 instance/node Request services from CRS Utilize inter-connection to communicate with other instance
RAC DB
Storage
NAS:
SAN: require oracle ASM (per instance)
7
Network Switch1 10 nodes Cluster Node1: CRS IP resource management (VIP) Voting Disk: tiebreaker OCR: configuration repository Provide services to RAC instance Do not process ANY application data
Network Switch2
Node10: CRS IP resource management (VIP) Voting Disk: tiebreaker OCR: configuration repository Provide services to RAC instance Do not process ANY application data
Inter connection
2 x 1G Ethernet RAC inter-communication Cluster management 2x1G Ethernet All application traffic
Public connection
DB server MTBF: 2Y-20Y Network/SAN Switch MTBF: ? NIC Card MTBF: 20-40Y HBA MTBF:20-40Y DISK MTBF: 100Y Storage Subsystem MTBF:?
10
Failure detection: < 20 seconds Total fail over time: < 60 seconds:
During this time, no error is returned to application Fail over is totally transparent to applications. From application view of point, it appears slow network response.
application timeout must be set > 60s All current sessions on that node: lost connection All new connection requests: redirect to different node
11
1 of 2:
2 of 2 :
12
All current sessions on failed node: lost All current sessions on surviving nodes: hang (< 75s) All new connection requests: hang All hang sessions or connection requests continue after recovery is done
13
RM Data warehouse
NAS example
14
Fibre switch1
Fibre switch2
15
16
17
18
19
20
Section 3 Conclusion
All hardware failures should not cause outage due to the RAC architecture Application malfunctions and human errors are the only contributors to outage
21
Section 3 Conclusion
Integrated HA Infrastructure
RAC
HA storage
Cluster(CRS)
Application
22