You are on page 1of 16

Common Shared Pool Issues

Why Bother?
Shared pool is the core to make Oracle to run smoothly and efficiently. ORA-04031 error
When free shared pool memory is low When free memory is fragmented

Database Hangs
When data dictionary cache, or row cache, runs into problem, for example, large number of purge activities

How are the issues dealt with?


Flush shared pool Bounce the instance Repeat above processes later

Shared Pool Top Consumers


First thing first: find the top consumers.

Data Dictionary Cache (Row Cache)

Partition maintenance related

RAC related

SQL Cursors

Row Cache KQR L PO


Next step: the devil is in the details.

1. KQR L PO: KQR - kernel query layer row cache, L large (S small, M - medium, X extra large), PO - parent object (SO subordinate object). 2. Size per entry: S 256 bytes, M 512 bytes, L 1024 bytes, X - not sure 3. Typical KQR L PO: dc_objects(dc_object_ids), dc_segments, dc_histogram_defs. 4. KQR L PO too big? dc_objects: too many table/index partitions and subpartitions dc_segments: too many table/index partitions and subpartitions dc_histogram_defs: looks like too many histograms. Further research points to column statistics, at global, partition and subpartition level.

The Impact of Very Large Row Cache


When free memory is low or fragmented: instance slow down or hang. GV$ view hangs. This happens when one or more operations or queries need large number of shared pool memory or want to bring a large number of data dictionary entries into row cache. Oracle tries to purge row cache entries to free memory, LCK0 process will become busy. Combination of the following waits:
latch: row cache objects Latch: shared pool row cache lock library cache lock DFS lock handle reliable messages

Find the Source of ROW CACHE


dc_objects, dc_object_ids and dc_segments
Using dba_objects, count and group by owner and object_name.

dc_histogram_defs
Using hist_head$ count to have a rough idea. Using aggregated view dba_part_col_statistics or dba_subpart_col_statistics, count and group by owner and table name. Check if global stats for concerned tables are missing.

Dc_histogram_data
Using dba_part_histograms and dba_subpart_histograms, count and group by owner and table name Check stats gather procedures, especially for partition and sub partition level stats.

Who Triggered Large Row Cache Activities?


Using fixed view
Looking for SQLs with high execution counts, related to obj$, seg$ and hist_head$, usually with parsing schema as sys. Tracking the change of KQR L PO and the execution count of those recursive queries. Because those are recursive calls, we need find out the source user queries. If we suspect some queries, we should try to find a chance to trace without other types of querying running at the same time.

Using AWR
dba_hist_sgastat: looking for large changes by snaps for individual entries. dba_hist_rowcache_summary: looking for large changes of total_usages by snaps for individual row cache, also pay attention to dlm_releases (purge) and dlm_requests. For the snaps with large dlm_releases, using dba_hist_sqlstats to check if any new queries with large SQL area, or any new queries could access the tables related to the tables with large number of partitions/subpartitions.

1. Here dc_histogram_defs, DLM_REQUESTS count is very close to SQL 96g93hntrzjtr execution count. Its usage count is smaller because we also have DML_RELEASES count, which usually means row cache purge. If the SQL execution count and those row cache counts increases/decrease very quickly in very short period, we should check what type of queries are possible troublemakers. 2. The data from dc_segments is linked to another query 9tgj4g8y4wy8.

1. Here dc_segments DLM_REQUESTS count is very close to SQL 2ym6hhar30r73 execution count. Its usage count is smaller because we also have DML_RELEASES count, which usually means row cache purge. If the SQL execution count and those row cache counts increases/decrease very quickly in very short period, we should check what type of queries are possible troublemakers. 2. The data from dc_object_ids is linked to another query 5ngzsfstg8tmy. 3. I am not sure why I got service SYS$BACKGROUND for SQL 6cr55dpp3n44a. Looks like it links to dc_objects

Stat job started during this snap.

New Entries were added. Pay attention to the time with large DLM_REQUESTS_DIFF and/or DLM_RELEASE_DIFF

Short of free memory, started to purge older entries. When the purge count is large, process LCK0 will be very busy, hence DB slow down of hang.

Possible Work Around


Reduce the sudden large shared pool memory requests, for example, in PIEDB, the issue is large parallel query cursors cross instances. Reducing DOP or forcing the query not crossing instances will reduce sharable memory usages. Use node affinity to split the tasks to different nodes. One task will always run on one node except failover. This way, a single node will not get all the requests for all the (unnecessary) row cache entries. On the other hand, the possibility for one similar task to squeeze the row caches from other tasks will be much smaller.

Large PRTMV

1. Bug 9578670 - ORA-4031 from frequent Partition Maintenance Operation [ID 9578670.8] 2. When Oracle adds/drops a partition, the library cache object related to the table becomes invalid. 3. Inside v$db_object_cache, we will see multiple counts of the table name with type as MULTI-VERSIONED OBJECT. (In 10g, they will have type as invalid.) 4. Unfortunately, memory leaks cause these invalid objects stay inside shared pool forever.

SQLA - High Version Counts

1. SQLA of the instance (acpm51p1) is at 1,717MB. SQL 193t90vhkylj4 uses 1,534MB, with version count of 18,191. 2. High shared memory usage SQL can be found using v$sqlarea 3. The reason of high version count can be found using v$sql_shared_cursor. It will take pretty long time if a SQL has very high version count, so it is better to check individual child cursor by specifying child_number. 4. In this case, the high version is caused by PX_MISMATCH, an issue when using gv$ views. Other important reasons include PQ_SLAVE_MISMATCH, ROLL_INVALID_MISMATCH and BIND_MISMATCH. 11g also introduced another annoying mismatch: USE_FEEDBACK_STATS from cardinality feedback. 5. One type of PX_MISMATCH and PQ_SLAVE_MISMATCH is caused by default parallel degrees on small table or its indexes and the query is also very small.

1. No dominant SQL or high version count for high shared memory usages. 2. The major consumers of SQL area are CTAS DDL queries. Not sure why Oracle wants to cache so many DDL queries. 3. The impact: the normal user queries are aged out very quickly.

1. A lot of queries look similar, with same plan hash value, but different SQL_ID, a typical case without using bind variables. 2. The query patterns can be identified using v$db_object_cache.

ORA-04301 When Shared Pool Has Large Free Memory

Oracle divides shared pool into sub pools, depending on the CPU count (one sub pool for each 4 CPUs, up to 7) and available memory. The purpose is to have more shared pool latches to reduce latch contentions. The related hidden parameter is _kghdsidx_count. The memory usages might not be balanced among sub pools. The example here (PIEDB) which has 4 subpool. Index 0 is common or unallocated memory. Once its free memory reaches zero, it is possible ORA-04301 error will happen, when one subpool (for example 2) has very low free memory, even the other subpool (4 here) has plenty of free memory. The work around is to reduce subpool count, even to a single one.

You might also like