You are on page 1of 47

Oracle RAC

The Effectiveness
of Scaling Out

Erik Peterson
RAC Development
Server Technologies
Oracle

1
Agenda

 What is RAC?
 Why does it Scale?
 Why Scale Out?
 Scale Out Examples?
 Scale Out or Scale Up?
 Improving Scalability

2
Oracle RAC
Architecture

Centralized Application Servers/ Users


Management Network
Console
Low Latency Interconnect
VIA or Proprietary No Single
High Speed
Switch or
Point Of Failure
Interconnect

Clustered
Database Instances
Shared Cache

Hub or
Switch Storage Area Network
Fabric
Drive and Exploit
Mirrored Disk Industry Advances in
Subsystem Clustering
Why Does RAC Scale to Many Nodes?
Messaging cost independent of cluster size
Instance A Instance B
3

Update Current
Block 10 225

1 Master 2

Requester GCS Holder

200
Instance C
Why Scale Out?

5
Economies of Scale
System
Cost

Performance

Sweet Spot

Number of CPU’s in a single node

6
Higher Availability

CRM
OE
Payroll
Email
Higher Availability

CRM
OE
Payroll
Email

Node failure has less impact


Scale Out
or
Scale Up

9
Major Bank testing Siebel

10 CPU 20 CPU 2 x 10 CPU

time
onse
Response Time

Resp

Response time
with RAC

User load

10
Scale Out at a Fraction of the Cost
SMP RAC
SMP RAC
120000

100000

80000
SMP RAC
SMP RAC
60000

40000
SMP RAC
SMP RAC
SMP RAC
SMP RAC x10 x10
20000

0
16 CPUs 48 CPUs 64 CPUs 72 CPUs
Audited
Audited Customer
Customer Audited
Audited Customer
Customer
Benchmark
Benchmark Benchmark
Benchmark Benchmark
Benchmark Benchmark
Benchmark
HP RAC vs. SMP TPC-C
118%
SMP RAC Of Big SMP
1,184,893 Result
1,200,000 1,008,144

1,000,000

800,000
tpmC 600,000

400,000 RAC = $5.52 / tpmC


SMP = $8.33 / tpmC
200,000

0
1X64 16X4
Nodes X CPUs per Node Same 1.5 GHz Itanium2 CPUs

As of September 13, 2006: HP Integrity Superdome, 1,1008,144.49/tpmC, $8.33/tpmC, available 4/14/04. Hp Integrity rx5670, 1,184,893 tpmC, 12
$5.52/tpmC, available 4/30/06. Source: Transaction Processing Performance Council (TPC), www.tpc.org
HP RAC vs. SMP TPC-C

 Details
– Same CPUs (Intel Itanium2 1.5 GHz)
– RAC had less total memory (768 GB vs. 1024
GB)
– RAC / Linux vs. SMP / HP-UX
 List prices for processor hardware*
– SMP $7,921,505
– Cluster $2,620,866

* Includes processors, OS, memory, cluster interconnects and support

As of September 13, 2006: HP Integrity Superdome, 1,1008,144.49/tpmC, $8.33/tpmC, available 4/14/04. Hp Integrity rx5670, 1,184,893 tpmC, 13
$5.52/tpmC, available 4/30/06. Source: Transaction Processing Performance Council (TPC), www.tpc.org
IBM Oracle Applications Benchmark
Oracle Applications Standard Benchmark (OASB)

SMP RAC 96%


Of Big SMP
25,000 22,008* 21,168* Result

20,000

15,000
# Users
10,000

5,000

0
1X16 4X4
Nodes X CPUs per Node Same 1.7 GHz Power4+ CPUs
*Audited
14
Source http://www.oracle.com/apps_benchmark
IBM Oracle Applications Benchmark

 Details
– Same IBM CPUs (1.7 GHz)
– Same total memory (256 GB)
– Same operating system (AIX 5L)
 List prices for processor hardware*
– SMP $1,405,750 list
– Cluster $788,000 list **

* Includes processors, OS, memory, and cluster interconnects


** RAC software adds $320,000 list

15
Customer Loan Processing Benchmark
115%
SMP RAC Of Big SMP
35,000 Result

30,000 31,578
27,500
25,000
# Loans 20,000
Applications
Processed 15,000
Per Minute 10,000
5,000
0
1 Node 2 Nodes
(1 X 48) (2 X 24) Same Sun 900 MHz CPUs

16
Customer Loan Processing Benchmark

 Details
– Same CPUs (Sun 900 MHz)
– Same total memory (288 GB)
– Same Sun Solaris operating system
 Price comparison (N/A)
– Cluster was constructed by partitioning 48
CPU Sun Fire 15K server into two 24 CPU
domains

17
Customer Loan Processing Benchmark

 Customer is one of world’s largest financial


services companies
 Goal: Determine which platforms can meet
peak processing load requirements
– Process 30 million loan applications in 15 hours
 2 million/hour or 33,333/minute sustained
– Mix of transactions, e.g.,
 T1 = 13 reads, 4 inserts, 2 updates
 T2 = 5 reads, 2 updates
 T3 = 1 read, 6 updates

18
Customer Telecom Benchmark
103%
SMP RAC Of Big SMP
450,000 Result
437,070
400,000 423,420
350,000
300,000
Trans / Hr 250,000
200,000
150,000
100,000
50,000
0
1 Node 2 Nodes
(1 X 72) (2 X 36) Same Sun 1.2 GHz CPUs

19
Customer Telecom Benchmark

 Customer is one of world’s largest Telecom


companies
 Goal: Determine which platforms can meet peak
processing load requirements
– 400,000 transactions per hour
– < 0.8 second response time
– Complex mix of transactions, e.g.,

 Complex Routing 1%  Node Query 12%


 Route Report 46%  Location Query 11%
 Status Change 16%  Termination Query 12%
 Simple Enquiry 2%

20
Customer Telecom Benchmark

 Details
– Same CPUs (Sun 1.2 GHz)
– Same total memory (96 GB)
– Same Sun Solaris operating system
 Price comparison (N/A)
– Cluster was constructed by partitioning Sun
SMP server into two 36 CPU domains

21
Cost Savings
CPU Costs (List) for 550,000
Transactions /hour

$3,000,000
$2,700,000
$2,000,000

$1,000,000
$160,000 $60,000
$0
(1) 72-CPU UNIX (4) 4-CPU Dell (10) 2-CPU Dell
SMP Itanium 2-based Xeon-based
7250s 1750s
List Prices

Joint Tests done by Dell, EMC & Oracle


Based on Telecom Application Workload
22
Hitachi BladeSymphony Test Using
a Real Stock Exchange Workload
2000 tps 80%↑
↑ 3600 tps 80%↑
↑ 6480 tps
8

67%↑
↑ 64%↑
↑ 67%↑

6
# of CPUs

1200 tps 83%↑


↑ 2200 tps 78%↑
↑ 3888 tps 70%↑
↑ 6624 tps
4
88%↑
↑ 83%↑
↑ 76%↑

640 tps 88%↑


↑ 1200 tps 84%↑
↑ 2208 tps
2
10 business units 24 business units

1 2 4 6 8
# of Nodes
23
Why Does RAC Scale Well?

 Scalable Interconnect vs. Complex System Bus


– Max 3 way protocol, regardless of cluster size
– 99% of customers use the capacity of a single GigE
interconnect, but easily able to add another.
– SMP requires synchronization for every load and store
operation (millions+/sec). RAC generate 3 to 4 orders of
magnitude less messages
 Extending the limits of a single machine architecture
– Don’t need a complex bus
– Virtually unlimited HBAs, Memory, CPUs
– Avoid in Memory Contention

24
Scale Out
Examples

25
Increasing # of Wide Clusters

26
Source: New Linux environments in the RAC Customer Tracking System
7 of 8 Biggest Linux DW run
on RAC

16 nodes
4 nodes
8 nodes
8 nodes
3 nodes

2 nodes
8 nodes

Source: Winter Corp 2005 Survey

27
Amazon RAC/Linux DW – Oracle10g

Query ETL In Oct 2005 Amazon merged


its 35 TB Clickstream DW’s
#Nodes 16 x 4 8x4 data into its 25 TB Query
x #cpus DW. It ran rock-solid through
Total DB 61 TB 10 TB its end of year peak season.
At the time Clickstream and
Data 51 TB 7 TB Query were already 2 of the
Top 10 DWs in the world:
Index 2 TB 1-2 TB
www.wintercorp.com
Disk 71 TB 36 TB
Query DW is now 61 TB+.

28
Amazon DW Modular Architecture
Oracle10g RAC
Amazon’s RAC is so cost-effective they run 2 concurrently and still save money!!

1. Extract 2. Integrate, 3. Query and 4. Data access


from source transform, and analyze and publishing
systems denormalize
Users
Extract STG1 ADS1 Query DW
Servers (ETL/Staging) (Atomic Data Store)
8 nodes 4 cpus 16 nodes 4 cpus DSS UI
Client
ETL
Manager
2nd pair of identical
STG2 ADS2 Query DW RAC clusters means
(ETL/Staging) (Atomic Data Store) ‘no need for backup’
8 nodes 4 cpus 16 nodes 4 cpus for active online data
Amazon.com – Oracle10g
 61 TB production Query DWs
– 100,000 queries/week (mostly complex ad hoc)
– Amazon runs 2 identical 61 TB+ query DWs loaded concurrently.
Config for each is:
 16 node RAC/Linux cluster
 Oracle10gR1 RAC using ASM on Red Hat Enterprise Linux 3
 16 HP DL580s, each w/ 4 3-GHz cpus
 71 HP StorageWorks MSA1000
 8 32-port Brocade switches
 1 Gigabit interconnect
– DW Metrics
 Each holds 51 TB raw data (growing at 2x per year)
 Each is 61 TB total database size w/ only 2 TB indexes
 71 TB total disk for each

30
Amazon.com DW Statistics
2000 2001 2002 2003 2004 2005 2006
DW Size ~1 TB 3.5 TB 10 TB 15 TB 20 TB 25 TB 61 TB
DW Data ~1 TB 2.3 TB 9 TB 13 TB 18 TB 23 TB 51 TB
Users 330 512 800 830 830 830 830
Queries/Day 630 1000 4200+ 6,000 7,000 8,000 14,000
% < SLA 63% 77% 80%+ 80%+ 80%+ 80%+ 80%+
Direct SQL Access No Yes Yes Yes Yes Yes Yes
User-Pub'd Repts No Yes Yes Yes Yes Yes Yes
In just 6 years:
- 50x growth in data volume
- 16x growth in query volume
- ~3x growth in number of users
- additional lines of business / product lines supported
- huge standard reporting growth -> many more partners supported
…and still meeting SLAs – with ever-improving price/performance!
Mercado Libre
 eBay in Latin America
 Runs marketplace on RAC
 Scaled incrementally as marketplace grew
1,600,000
1,400,000
Business Volume

1,200,000
1,000,000

Nodes
800,000
600,000
400,000
200,000
0
2004 2005 2006

32
Mercado Libre
Performance Characteristics

MercadoLibre’s 13 node Linux Itanium cluster


• 460 GB RAM clusterwide
• 286 GB SGA
• 14,500 URLS/second
• 47 GB/ redo /day

Only use a maximum 40% of the capacity of a


single Gigabit Ethernet interconnect

33
J2 Global

Reporting OLTP

Single 16 node Oracle 10g Sun Solaris Cluster


 12 nodes DataGuard copy of production for reporting & DR
 4 nodes for several OLTP databases

34
Dell IT – Tests of Oracle EBS
User Count Scalability

4000
3500
3000
2500
Users

Ideal
2000
Actual
1500
1000
500
0
2 3 4 5 6 7 8
Nodes

35
128 Node Scalability Proof of Concept in Japan

36
System Configuration Overview

 128 “blade servers” for the RAC instances


 Two NFS servers for storage
 Two workload generator servers
 Two network segments
• #1 for CSS / RAC traffic
• #2 for NFS / Application traffic

37
How far can RAC scale with a
single interconnect?
Internode Parallel Query Test Results (Scalability)
128

120

112
Scalability (Elapsed time / Elapsed Time @ 1 instance)

104

96

88

80

72

64

56

48

40

32

24

16

0
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
Degree of Parallelism (#instances)

38
AC3 - Australia

 World’s Largest RAC/Linux Cluster


• High Performance Technical Computing config.
• 155 node cluster of 2-way IA32 Dell servers
• Total purchase price < $1m AUD
 Oracle10g RAC Proof of Concept
• Red Hat Linux 3.0
• Network Appliance Storage
• 63-node database cluster built
• Linear scalability demonstrated.

39
Gas Natural Grid Environment
Clusters in Production and in Process

Corporate DW
• Wide Linux RAC
SAP BI environments are
now standard
Electricity Dispatching deployment
• Order of
magnitude cost
Siebel - Europe
savings
• Showing
SAP ERP scalability of OLTP
& DW
environments
Siebel - Brazil

40
When does RAC Scale?

If your application will scale


transparently on SMP, then it is
realistic to expect it to scale well on
RAC, without having to make any
changes to the application code.

41
Network Resources and
Scalability
 Verify interconnect resources
– private network
– ports set to maximum bit rate ( e.g. 1 Gb/sec )
– full duplex
– network buffers ( e.g. socket receive buffers , RX/TX
descriptors )
 Monitor the interconnect network and IPC
– bandwidth used
– discarded, dropped packets
– buffer overflows
– reassembly failures
– “lost blocks”: gc blocks lost

42
OS and Disk Resources and
Scalability
 Higher and fixed priority for block server processes
– scheduling/starvation affects message latency
 Fewer block server processes ( LMS ) usually more
efficient
 Determine max possible read/write IO throughput
– important for loading and querying in parallel
 Establish baseline for write IO latency
– “slow” log writes may affect block access time
 Many long-term and transient performance and
scalability problems are caused by OS and disk
resource/capacity problems

43
Improving Scalability
 Tune serializing contention
– concurrent access to the same block does not scale
anywhere
 Tune SQL execution
– the most efficient plan in single instance is also the most
efficient one in RAC
 Large cache for Oracle sequence numbers
 “Sparse” ( high PCTFREE ) or small block sizes
– for small, in-memory tables with frequent concurrent
access
 Parallel execution ( inter- or intranode )
– large data scans and loads and index rebuilds

44
Performance Diagnostics

 Use advisories provided by EM or ADDM


– interpreted findings and recommendations
– thresholds and alerts in infrastructure
 Use the same rationales as with one instance
– the interconnect network is an “IO” resource
 Save the AWR repository

45
RAC Scalability Best Practices

 Better HA with 4+ nodes


 2+ CPU nodes for OLTP, 4+ CPU for
DW
 Use same scalability tuning mechanisms
for RAC as you would for SMP

46
Q U E S T I O N S
A N S W E R S

47

You might also like