You are on page 1of 6

Quickly improve SQL performance with dbms_stats

Oracle Tips by Burleson Consulting


May 8, 2003 - Revised April 27, 2005

The old fashioned “analyze table” and dbms_utility methods for generating CBO
statistics are obsolete and somewhat dangerous to SQL performance. This is because the
cost-based SQL Optimizer (CBO) relies on the quality of the statistics to choose the best
execution plan for all SQL statements. The dbms_stats utility does a far better job in
estimating statistics, especially for large partitioned tables, and the better stats results in
faster SQL execution plans.

Let’s see how dbms_stats works. It’s easy! Here is a sample execution of dbms_stats
with the options clause:

exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size repeat', -
degree => 34 -
)

When the options clause is specified you may specify GATHER options. When
GATHER AUTO is specified, the only additional valid parameters are ownname, stattab,
statid, objlist and statown; all other parameter settings are ignored.

exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
options => 'GATHER AUTO'
)

There are several values for the options parameter that we need to know about:

• gather – re-analyzes the whole schema.

• gather empty – Only analyze tables that have no existing statistics.

• gather stale – Only re-analyze tables with more than 10%


modifications (inserts, updates, deletes).

• gather auto – This will re-analyze objects which currently have no


statistics and objects with stale statistics. Using gather auto is like
combining gather stale and gather empty.

Note that both gather stale and gather auto require monitoring. If you issue the “alter
table xxx monitoring” command, Oracle tracks changed tables with the
dba_tab_modifications view. Below we see that the exact number of inserts, updates and
deletes are tracked since the last analysis of statistics.

SQL> desc dba_tab_modifications;

Name Type
--------------------------------
TABLE_OWNER VARCHAR2(30)
TABLE_NAME VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
INSERTS NUMBER
UPDATES NUMBER
DELETES NUMBER
TIMESTAMP DATE
TRUNCATED VARCHAR2(3)

The most interesting of these options is the gather stale option. Because all statistics will
become stale quickly in a robust OLTP database, we must remember the rule for gather
stale is > 10% row change (based on num_rows at statistics collection time).

Hence, almost every table except read-only tables will be re-analyzed with the gather
stale option. Hence, the gather stale option is best for systems that are largely read-only.
For example, if only 5% of the database tables get significant updates, then only 5% of
the tables will be re-analyzed with the “gather stale” option.

The CASCADE option


When analyzing specific tables, the cascade option can be used to analyze all related
objects based on foreign-key constraints. For example, stats$snapshot has foreign key
referential integrity into all subordinate tables (stats$sysstat, etc.), so a single analyze can
invoke an analyze of all subordinate tables:

exec dbms_stats.gather_table_stats( -
ownname => 'PERFSTAT', -
tabname => ’STATS$SNAPSHOT’ -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size skewonly', -
cascade => true, -
degree => 7 -
)

The DEGREE Option


Note that you can also parallelize the collection of statistics because the CBO does full-
table and full-index scans. When you set degree=x, Oracle will invoke parallel query
slave processes to speed up table access. Degree is usually about equal to the number of
CPUs, minus 1 (for the OPQ query coordinator).

Automating sample size with dbms_stats


Now that we see how the dbms_stats options works, get see how to specify the sample
size for dbms_stats. The following estimate_percent argument is a new way to allow
Oracle’s dbms_stats to automatically estimate the “best” percentage of a segment to
sample when gathering statistics:

estimate_percent => dbms_stats.auto_sample_size

You can verify the accuracy of the automatic statistics sampling by looking at the
dba_tables sample_size column. It is interesting to note that Oracle chooses between 5%
to 20% for a sample_size when using automatic sampling.

In our next installment we will look at automatics the collection of histogram data from
dbms_stats.

Arup Nanda has a great article on extended statistics with dbms_stats, specialty
histogram analysis using function-based columnar data:

Next, re-gather statistics on the table and collect the extended statistics on the expression
upper(cust_name).
begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'CUSTOMERS',
method_opt => 'for all columns size skewonly for columns (upper(cust_name))'
);
end;

Alternatively you can define the column group as part of the gather statistics command.
You do that by placing these columns in the method_opt parameter of the
gather_table_stats procedure in dbms_stats as shown below:
begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'BOOKINGS',
estimate_percent=> 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS(HOTEL_ID,RATE_CATEGORY)',
cascade => true

See my related dbms_stats notes


The method_opt option

The method_opt parameter for dbms_stats is very useful for refreshing statistics when the
table and index data change. The method_opt parameter is also very useful for
determining which columns require histograms.

In some cases, the distribution of values within an index will effect the CBOs decision to
use an index versus perform a full-table scan. This happens when a where clause has a
disproportional amount of values, making a full-table scan cheaper than index access.

Oracle histograms statistics can be created when you have a highly skewed index, where
some values have a disproportional number of rows. In the real world, this is quite rare,
and one of the most common mistakes with the CBO is the unnecessary introduction of
histograms in the CBO statistics. As a general rule, histograms are used when a column's
values warrant a change to the execution plan.

To aid in intelligent histogram generation, Oracle uses the method_opt parameter of


dbms_stats. There are also important new options within the method_opt clause, namely
skewonly, repeat and auto:
method_opt=>'for all indexed columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for columns size auto'

The skewonly option is very time-intensive because it examines the distribution of values
for every column within every index.

If dbms_stats discovers an index whose columns are unevenly distributed, it will create
histograms for that index to aid the cost-based SQL optimizer in making a decision about
index versus full-table scan access. For example, if an index has one column that is in 50
percent of the rows, a full-table scan is faster than an index scan to retrieve these rows.
--*************************************************************
-- SKEWONLY option—Detailed analysis
--
-- Use this method for a first-time analysis for skewed indexes
-- This runs a long time because all indexes are examined
--*************************************************************

begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
cascade => true,
degree => 7
);
end;

If you need to reanalyze your statistics, the reanalyze task will be less resource intensive
with the repeat option. Using the repeat option will only reanalyze indexes with existing
histograms, and will not search for other histograms opportunities. This is the way that
you will reanalyze you statistics on a regular basis.
--**************************************************************
-- REPEAT OPTION - Only reanalyze histograms for indexes
-- that have histograms
--
-- Following the initial analysis, the weekly analysis
-- job will use the “repeat” option. The repeat option
-- tells dbms_stats that no indexes have changed, and
-- it will only reanalyze histograms for
-- indexes that have histograms.
--**************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
cascade => true,
degree => 7
);
end;

The auto option within dbms_stats is used when Oracle table monitoring is implemented
using the alter table xxx monitoring; command. The auto option, shown in Listing D,
creates histograms based upon data distribution and the manner in which the column is
accessed by the application (e.g., the workload on the column as determined by
monitoring). Using method_opt=>’auto’ is similar to using the gather auto in the option
parameter of dbms_stats.
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
cascade => true,
degree => 7
);
end;

PART 2 - CBO Statistics

The most important key to success with the CBO is to carefully define and manage your
statistics. In order for the CBO to make an intelligent decision about the best execution
plan for your SQL, it must have information about the table and indexes that participate
in the query. When the CBO knows the size of the tables and the distribution, cardinality,
and selectivity of column values, the CBO can make an informed decision and almost
always generates the best execution plan.

As a review, the CBO gathers information from many sources, and he has the lofty goal
of using DBA-provided metadata to always make the "best" execution plan decision:
Oracle uses data from many sources to make an execution plan

Let's examine the following areas of CBO statistics and see how to gather top-quality
statistics for the CBO and how to create an appropriate CBO environment for your
database.

Getting top-quality statistics for the CBO. The choices of executions plans made by the
CBO are only as good as the statistics available to it. The old-fashioned analyze table
and dbms_utility methods for generating CBO statistics are obsolete and somewhat
dangerous to SQL performance. As we may know, the CBO uses object statistics to
choose the best execution plan for all SQL statements.

You might also like