You are on page 1of 18

Trapped Inside Nested Table

I was looking for some simple long running queries to prepare some slides about Oracle performance analyzer. One query on RMX NE1 ADX database caught my attention from long running query scan.

Real time session monitor shows one query has run for more than 3 hours. There are more queries waiting on it (enq: TM contention). The query is one of the RMX ADX database hourly job for analytic data. It usually takes 30 - 40 minutes.

Query execution specific active session history is very useful to identify bottlenecks inside execution plan. For most of the query performance analysis, execution specific waits and SQL plan monitor are sufficient. The surprise here is, the majority of the time is the CPU usages without execution plan line id.

Real time SQL plan monitor is very useful to check query resource usages (for example physical IO and memory) and progress. Physical IO is not high here. On the other hand, Oracle does not measure external table IO usage. Each hourly job usually inserts around 5M rows, so far, only half (2.5M) has been done. Now we need identify the source of high CPU usage.

Even the query uses 8 parallel processes, SQLmonitor shows the majority of the work is done by query coordinator session. The query coordinator session has high CPU usages, possibly from high logic IO (BUFFER_GETS).

Session stats are the other useful metrics to figure out what the query is exactly doing. Here is the session stat snapshots, differences and average data for 279 seconds for the main session. There is very high logic IO, for example, consistent gets is at 400,699/sec. Other interesting information shown here are active txn count during cleanout and cleanout number of ktugct calls. Other important information not shown inside the limited screen space are db blocks get, recursive calls, session logic IO. Since the main query is very simple INSERT ... SELECT, the high recursive calls at more 200/sec is very interesting.

Open cursors of a specific session is very useful to identify behind the scene works and Oracle internal queries. Query with SQL_ID fm3bwqvkd3msr looks interesting.

The query fm3bwqvkd3msris very active. It has very high execution count (2,980,856,436). The logic IO (BUFFER_GETS) is most likely wrapped a lot of times.

The recursive query is an INSERT query to a nested table ANL_NESTED_RESERVE_PRICE, with one row for each call. The destination table ANL_SERVE_FACTY_RESERVE_PRICE has a column using nested table. The table self uses composite partitions, range partitioned by YMDH (DATE), and hash partitioned by ENTITY_ID.

One of the blocked session (639) is the partition retention job on the concerned table. The query has run for 1180 times, with 10K rows deleted per run.

Here is part of the retention procedure. It first tries batch deleting rows from a partition of the fact table, then drop the partition when the partition is empty. The retention job was enabled in the morning. Since then, ETL query started long suffer. The question is, why cannot partitions be dropped directly?

Nested Table

Nested table (and other user defined types) is Oracles answer to object oriented technology. Nested table (and other user defined types) can be used directly as table columns, just like NUMBER, VARCHAR, etc. The data of nested table are stored in another table and the name is given by the main table DDL. For each row of the main table, a hidden column (16 bytes) will be generated as unique key. Inside the nested table, a column NESTED_TABLE_ID will be generated with the same value. The NESTED_TABLE_ID and the hidden column inside the main table are linked using foreign key. If the main table is partitioned, the nested table can be stored along with the same partition locally. But it does not work if the main table uses composite partition like (range, hash) partitions. Once this happens, the nested table will not be partitioned. Partition maintenance like DROP PARTITION on non-empty partitions will encounter foreign key constraint error. That is why we saw the retention script tries to batch delete rows before drop the partition.

A user defined type reserve_price_object_type is defined, which contains several number column. A nested table type reserve_price_table_type is defined on it. The fact table ANL_SERVE_FACT_RESERVE_PRICE DDL uses nested table reserve_price store as anl_nested_reserve_price to specify the nested table name as ANL_NESTED_RESERVE_PRICE. An index is created on column NESTED_TABLE_ID (foreign key column). The table size is around 1TB, the index size is around 800GB. No partitions (which can be verified from DBA_SEGMENTS).

Why Use Nested Table?


When used in stored procedures, especially from client application to pass data to/retrieve data from Oracle, user defined types like nested table can be very powerful. For example, for online shopping, all the information from one order, like user information, billing information, shipping information, all items, etc, can be passed to the backend procedure using a single call. When used as storage, it can be used to remove redundancy to save storage. For example, each RMX event has multiple reserved prices (I am wondering if that is actually the case), we only need store one row per event. Without nested table, we have to store one row for each reserve price, that was how RMX ADX database dealt with segment_ids in opportunity data.

Why Not Use Nested Table?


For large table, nested table has some serious performance drawbacks, from what learned by analyzing this performance case. Partition maintenance issue: we have seen the problem when the main table uses composite partition. The nested table cannot be partitioned which causes dropping non-empty partition fail. Dropping partitions by first batch deleting records increases logic IO on nested table during ETL by more than 50 100 times. ETL even like direct load does not work in parallel. Although ETL job can read the source data in parallel, all the actual insertions are on the query coordinator session. So this will not be efficient if the source data is very large. The load to the nested table cannot be even batched. For each record to be inserted, Oracle will start a SQL, parse it, then execute it, also there is one index to maintain and foreign key to check. Even without this logic IO explosion issue, using 30-40 minutes to direct insert 5M rows with DOP of 8 for an hourly job is not efficient.

The following lists the impact from the retention job with batch deletion. This is the result from one hour job: 201305261600, UTC. SQL_ID: bkbw6m0cxg9un. Unfortunately, we dont have any idea how to resolve it. RMX data team has to start a new table for ETL and using UNION to reporting queries.

NE1 (With Retention Job Deployed) Elapsed Time (Sec) CPU Time (Sec) LIO (buffer_gets) 74,381 74,234 52,304,137,655

AC4 (Without Retention Job Deployed) 1,582 1,523 84,998,139

The following is 5 minute session stats for the query mention in previous slide, for the query coordinator session.

You might also like