You are on page 1of 3

6/1/2018 Document 461662.

PowerView is Off Sal (Available) (0) Contact Us Help

Dashboard Knowledge Service Requests Patches & Updates 461662.1

Give Feedback... You have been directed to this document based on an ID match. Alternatively, click here to search on this phrase. Hide

Copyright (c) 2018, Oracle. All rights reserved.

Frequent Instance Eviction in 9i or Node Eviction in 10g/11g (Doc ID 461662.1) To Bottom

In this Document Was this document helpful?

Symptoms Yes
No
Changes
Cause
Document Details
Solution
Configure hugepages when using large SGA size:
Type:
PROBLEM
References Status:
PUBLISHED
Last Major
13-May-2014
Update:
22-Mar-2018
APPLIES TO: Last Update:

Oracle Database - Enterprise Edition - Version 9.2.0.8 to 11.2.0.4 [Release 9.2 to 11.2]
Related Products
Linux x86-64
Oracle Database - Enterprise
Edition
SYMPTOMS
Information Centers
For 9.2.0.8, 4 node RAC cluster on Linux x86-64 RHEL 4.0, frequently a node becomes unresponsive and an instance is evicted
with ORA-29740.

For 10g/11g, the symptom will be frequent node eviction (node reboot) with instance termination and error ORA-29702.
Document References
CHANGES

Either new installation or recently increased sga size.


In this sample, There are 32GB physical memory on the server and SGA is set to 12GB.
Recently Viewed

CAUSE FAQ: OPatch/Patch


Questions/Issues for Oracle
Clusterware (Grid
Frequent CPU spikes (0% idle) causes a node to be unresponsive and further causes an instance eviction (9i) or node Infrastructure or CRS) and
eviction(10g/11g). RAC Environments
[1339140.1]
For 9i, if there is no cpu for lmon to respond to heartbeat ping from the other node and thus CPU spike time lasts longer than Top 5 Database and/or
Instance Performance Issues
300 seconds, this instance will be evicted by the other instance after timeout.
in RAC Environment
[1373500.1]
For 10g/11g, if there is no cpu for 1 node to respond to the other node's heartbeat ping and it does not respond within the RAC: Frequently Asked
misscount setting(60 seconds or 30seconds depending on platform), the node will be evicted at the CRS level and the instance Questions (RAC FAQ)
will be terminated with ORA-29702. [220970.1]
How to Use AWR Reports to
Diagnose Database
CPU spike is caused by kswapd0 process. Turning on OSwatcher (per Note 301137.1), we can see there are gaps in OSwatcher Performance Issues
[1359094.1]
data collection, which means there is no CPU for OSwatcher to run.
RAC Node Eviction
Troubleshooting Tool
Sometimes vmstat will show 0% idle CPU and high SYS CPU usage in the last output before gap happens or node reboot, top [1549954.1]
shows kswapd0 in the top process list. Sometimes it does not show this obvious result.
Show More

For example, from ps output on node 1, we can see during first 43 min at 02:00 hour, kswapd0 only used 21 sec CPU. But
between 02:43 to 02:47 gap, within 4 min, kswapd0 used 40 sec CPU. Between the next gap of 02:52 to 02:59, another 33 sec
CPU was used by kswapd0.

Similarly on node 4, during first 47 min of hour 02:00, kswapd0 only used 16 sec CPU, but during 8
mins gap of 02:47 to 02:55, this process used 1min 21 sec CPU.

This indicates that kswapd0 is working hard during CPU spike time. This could happen if there is a lot of memory pages need to
be maintained and hugepages is not configured.

cat /proc/meminfo shows:


HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB

There are 32GB physical memory on the server, max SGA is 12GB, but hugepages is not used, thus all memory is managed in 4k

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=560618461853827&id=461662.1&_adf.ctrl-state=uaimt4tjs_441 1/3
6/1/2018 Document 461662.1
page size. The spinning in kernel mode caused by managing those memory page could cause a CPU spike and further cause an
instance eviction (or node eviction) when heartbeat ping is not responded to due to lack of CPU.

SOLUTION

Configure hugepages when using large SGA size:

1. Make changes to /etc/sysctl.conf file and execute the sysctl command.


work out vm.nr_hugepages = sga_max_size / Hugepagesize = 12GB/2048KB = 6144 (can be set slightly bigger than this figure)

# echo "vm.nr_hugepages=6146" >> /etc/sysctl.conf


# sysctl -p
# grep HugePages_Total /proc/meminfo

If the expected value of 6146 does not appear the system will have to be rebooted because
there is either not enough memory or not enough physically contiguous pages free for allocation. If 6146 appears, then we are
done from an O/S standpoint.

2. Make changes to the /etc/security/limits.conf file.


For Oracle to be able to use hugepages in RHEL 4, the following settings need to be added to
the /etc/security/limits.conf file:

oracle soft memlock 12582950


oracle hard memlock 12582950

This value is derived from:


sga_max_size = 12GB
12GB/1024 = 12582912
Setting a value slightly bigger than above calculated value.

3. Reboot all RAC nodes within the cluster if needed

4. After hugepages setup and instance restart, please check output of "cat /proc/meminfo", it should show large number of
hugepages being consumed, for example:

HugePages_Total: 6146
HugePages_Free: 3120 << this number is decreasing when more memory is consumed
Hugepagesize: 2048 kB

If the HugePages_Free is close to the number of HugePages_Total, for example:

HugePages_Total: 6146
HugePages_Free: 6140 << this number does not decrease
Hugepagesize: 2048 kB

Then it is likely the hugepages setting is insufficient, please increase vm.nr_hugepages setup until the hugepages are consumed
after the instance restart. Please refer to the following note to obtain recommended value:
NOTE:401749.1 Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration

Note: In RHEL 3.0 environment, if similar issue is experienced and the process consumes most CPU is kscand0, one can consider
set kscand_work_percent=10 (default 100) apart from setup hugepages.

Database - RAC/Scalability Community


To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the
My Oracle Support Database - RAC/Scalability Community

REFERENCES

NOTE:361468.1 - HugePages on Oracle Linux 64-bit


NOTE:361670.1 - Slow Performance with High CPU Usage on 64-bit Linux with Large SGA
NOTE:401749.1 - Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration
BUG:6367514 - INSTANCE EVICTIONS (SOMETIMES NESTED) IN RAC ENVIRONMENT
Didn't find what you are looking for? Ask in Community...

Related
Products
Oracle Database Products > Oracle Database Suite > Oracle Database > Oracle Database - Enterprise Edition > Clusterware > Cluster Node Reboot/Eviction

Keywords
EVICTION; HEARTBEAT; HIGH CPU USAGE; HUGEPAGES; HUGETLB; INSTANCE EVICTION; NODE EVICTION

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=560618461853827&id=461662.1&_adf.ctrl-state=uaimt4tjs_441 2/3
6/1/2018 Document 461662.1
Errors
ORA-29702; ORA-29740

Back to Top
Copyright (c) 2018, Oracle. All rights reserved. Legal Notices and Terms of Use Privacy Statement

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=560618461853827&id=461662.1&_adf.ctrl-state=uaimt4tjs_441 3/3

You might also like