You are on page 1of 8

Oracle Performance Analysis Using Wait Events

Pat Crain
Craig Hanson
CableData, Inc.

The Oracle RDBMS offers a rich set of well-known usage statistics that provide visibility into database
performance. Lesser-known is the fact that Oracle provides direct visibility of wait events encountered by the
RDBMS kernel. Unfortunately, wait events are not as well understood or documented as other statistics. This
paper presents a method to diagnose performance problems based on wait event observation and
interpretation.

query1:
AUDIENCE
SELECT *
FROM v$event_name;
This paper is intended for database administrators,
application developers, and performance analysts The query returns the names and associated
tasked with ensuring the performance of Oracle- parameters of all wait events. Version 7.3 includes
based applications and systems. Familiarity with 105 wait events. It is important to note the
RDBMS concepts, performance analysis, and Oracle ephemeral use of specific wait events across Oracle
terminology is assumed. releases and versions2. The caveat has been, and
continues to be: "wait events are tied to the internal
OBJECTIVES implementation of Oracle and therefore subject to
change or deletion without notice."3 This speaks to
This paper provides: the interface's roots as an internal diagnostic tool.
This also suggests that its reliability and accuracy
• An overview of Oracle wait event management; need careful scrutiny.
• A description of database objects related to the
wait event interface; PERFORMANCE VIEWS
• Practical methods to observe wait events;
• Interpretations of common wait events; and If you have ever tuned an Oracle database the V$
• High-level tuning considerations to mitigate performance views are immediately familiar. The
excessive waiting. performance views associated with wait event
management include:
BACKGROUND
• V$EVENT_NAME (Oracle7.3+)
The Oracle database server records the occurrence • V$SESSION_EVENT
of blocking procedures and associated wait times. • V$SESSION_WAIT
Blocking procedures are synchronous operations • V$SYSTEM_EVENT
that create a wait in user SQL processes or instance
processes. The execution of a blocking procedure V$SYSTEM_EVENT provides a high-level system
generates a “wait event”. Wait events are recorded view and is used by the performance scripts
in database objects known as dynamic performance utlbstat.sql and utlestat.sql (collectively referred to as
views, or V$ (pronounced “vee dollar”) tables. bestat). The view contains grouped statistics about
wait events recorded since instance startup. Use it
In addition to V$ tables, applications can record wait to acquire a system-level viewpoint.
event records in a trace file. This is sometimes called
“10046 trace” after the statement used to enable it:

exec sql ALTER SESSION SET EVENTS '10046 trace


name context forever, level 12';
1
This and all subsequent SQL queries and scripts are
Like any form of tracing, 10046 trace is resource intended for illustrative purposes only. Each query and
intensive; however, it provides a detailed session- script has been tested and appears to work as intended.
2
level view of resource contention which is invaluable This paper is based on work performed with Oracle
in many situations. Release 7. The concepts and conclusions presented are
applicable to the latest Oracle version, Release 8.
3
A complete list of the wait events for the version of [LANE97] Paul Lane, “Oracle8 Server Reference”,
Oracle being used can be viewed with the following Oracle Corporation, 1997
clock time recorded when the wait
Table 1. V$SYSTEM_EVENT
began. If State is anything other
Column Description than “waiting” the value of this
Event Name of the wait event. column is unreliable.
Total_waits Total number of times this specific
event was logged since instance OBSERVING WAIT EVENTS
startup.
Total_timeouts Total number of times the wait Utlbstat and Utlestat
resulted in a timeout. Not all wait
events use a timeout. The simplest way to observe wait events is with
Time_waited Total time in centiseconds (.01s) “canned” performance measurement scripts called
all sessions have waited for Event utlbstat.sql and utlestat.sql, collectively referred to as
since instance startup. bestat. The advantage of bestat is the ability to
Average_wait Mean time in centiseconds (.01s) measure intervals or “rate of change” without having
all sessions have waited for Event to create a custom SQL script. The disadvantage is
since instance startup. bestat’s system orientation, which prohibits session-
level analysis.
V$SESSION_EVENT view is similar to
V$SYSTEM_EVENT with the addition of session On Unix systems the scripts reside in
identifier (SID) as the high order key. $ORACLE_HOME/rdbms/admin. Utlbstat begins a
measurement by creating temporary tables and
V$SESSION_WAIT provides a real-time view of populating them with the begin values of metrics
session-level waits. Oracle frequently updates the from several V$ tables. Utlestat ends a
view making observation difficult. Each query of the measurement by populating the temporary tables
view returns a snapshot that is immediately stale with end values, calculating the delta, and writing
because system state has changed. This view is results to a file called report.txt located in
very useful but keep its transitory nature in mind. $ORACLE_HOME/rdbms/admin/.

Table 2. V$SESSION_WAIT Prior to running bestat, ensure the initialization


parameter TIMED_STATISTICS is set to TRUE.
Column Description This enables time-based data collection. When and
SID Session identifier. how long to measure depends on the workload type
Seq# This is a sequence number that and volume. Avoid spanning unlike workloads with a
functions as a counter. It single measurement. Both scripts can be run as the
increments each time the session SYS user from SQL*Plus4 or Server Manager. You
identified by SID logs an occurrence can also run the scripts from Server Manger (svrmgr)
of the wait event identified by Event. after connecting as “internal”.
Event The event name. There are 105
events as of Oracle7.3. Among other information, bestat reports utilization
P[1-3] Parameters describing Event or statistics about:
pointing to details about Event.
P[1-3]Raw The raw value of Pn. • Library cache
P[1-3]Text The name of Pn. A value is not • System wide usage
always present. • System wide wait events
State Describes the event’s state as
• File I/O
being one of the following: “Waiting”
• Tablespace I/O
– currently waiting for this event;
“Waited unknown time” - the • Latches
initialization parameter • Rollback segments
TIMED_STATISTICS is set to false;
“Waited short time” - session woke The wait event section reports the difference
up in the same clock tick it went between begin and end values of
sleep (<.01 seconds); “Waited V$SYSTEM_EVENT. Following is an example of
known time” – session waited the bestat’s wait event section5.
value of column Wait_time. >Rem System wide wait events.
Wait_time If the value of the column State is > select n1.event "Event Name",
“waited known time” this value > n1.event_count Count",
represents actual wait time in .01s. > n1.time_waited "Total Time",
> (n1.time_waited/n1.event_count)
If State is anything other than
“waited known time” the value of
4
this column is unreliable. SQL*Plus is a registered trademark of Oracle
Seconds_in If the value of the column State is Corporation.
5
_wait “waiting” this value represents wall- Formatting was changed to fit the page.
> "Average Time" COLUMN program FORMAT A20 HEADING 'Prog'
> from stats$event n1 COLUNM username FORMAT A12 HEADING 'DBUsr'
> where n1.event_count > 0 COLUMN osuser FORMAT A10 HEADING 'OSUsr'
> order by n1.time_waited desc; SELECT s.sid,
p.spid,
Event Name Count Total Time Average s.process,
---------- ------ ---------- ------- p.program,
rdbms ipc m 4316 886325 205.357 p.username,
Null event 1785 537302 301.009 s.osuser
smon timer 6 180006 30001 FROM v$process p, v$session s
pmon timer 692 178991 258.657 WHERE p.addr = s.paddr;
pipe get 372 178056 478.645
db file seq 130240 113824 .873
latch free 63632 29334 .460
Using the SID value you have obtained, execute the
db file sca 32865 14795 .450 following to view wait events.
db file par 572 5006 8.751
log file syn 2593 4450 1.716 COLUMN sid FORMAT 999 HEADING 'Sid'
log file par 2769 4352 1.571 COLUMN event FORMAT a25 HEADING 'Event'
enqueue 10 997 99.7 COLUMN totwaits FORMAT 9999999 HEADING 'Count'
write comple 61 443 7.262 COLUMN timouts FORMAT 9999999 HEADING 'TimOuts'
control file 66 55 .833 COLUMN timwait FORMAT 9999999 HEADING 'TotTime'
buffer busy 26 49 1.884 COLUMN avgwait FORMAT 9999999 HEADING 'AvgTime'
db file sin 50 16 .32 SELECT sid,
row cache lock 1 1 1 event,
library cach 5 1 .2 total_waits totwaits,
control file 18 0 0 total_timeouts timouts,
client mess 910634 0 0 time_waited timwait,
average_wait avgwait
Bestat execution steps are outlined below: FROM v$session_event
WHERE sid = &sid;

1. Set the parameter TIMED_STATISTICS to


TRUE. The database instance must be stopped Using Trace to Locate Wait Events
and started for any parameter change to take
affect.
The Oracle trace facility profiles SQL statements
associated with a specific user SQL session
2. Run utlbstat.sql from svrmgrl:
(process). Trace records contain execution statistics
> svrmgrl by default and may include wait event records if
SVRMGR> connect internal; explicitly enabled.
Connected
SVRMGR> @$ORACLE_HOME/rdbms/admin/utlbstat
The advantage of trace over other tools such as
bestat or V$ queries is two-fold. First, applications
3. Wait the desired measurement interval. can be instrumented to enable and disable trace on
the fly. This allows analysts to focus on specific
4. Run utlestat.sql from svrmgrl: sessions and users. Second, trace combines wait
> svrmgrl
event statistics with execution statistics. This
SVRMGR> connect internal; facilitates quick isolation of problematic SQL
Connected statements and contention for shared resources.
SVRMGR> @$ORACLE_HOME/rdbms/admin/utlestat
Three global Oracle initialization parameters affect
Automating bestat execution is often desirable. In trace:
Unix, use the cron utility to run utlbstat and utlestat
at specified times. • TIMED_STATISTICS must be set to TRUE;
otherwise, time-based statistics are not
Using V$ Queries to Locate Wait Events recorded. Gathering timed statistics introduces
overhead into SQL statement processing.
You can directly query V$ tables with SQL However, unless a system is CPU bound, timed
statements from SQL*Plus or svrmgr. This is statistics should have a negligible impact to
particularly handy to take snapshots of performance.
V$SESSION_EVENT using a known SID. • USER_DUMP_DEST specifies the location of
trace files. The default Unix path is
Remember that V$SESSION_EVENT is updated by /home/oracle/admin/SID/udump. The name of
Oracle in real-time. You must take several samples the trace file is platform specific. On Unix, the
to develop a representative picture. Given a client or naming convention is ora_PID.trc. PID is a
server process id, use the following query to obtain numeric identifier used by the operating system
SID. to manage processes.
• MAX_DUMP_FILE_SIZE specifies the maximum
COLUMN sid FORMAT 9999999 HEADING 'SID' trace file size in operating system blocks.
COLUMN spid FORMAT 9999999 HEADING 'SvrPID'
COLUMN process FORMAT 9999999 HEADING 'CltPID' Traces that include wait events capture a
tremendous amount of data. Set the file size to event records; therefore, you must extract and
at least 100 megabytes. format wait event records from trace files. Wait
records are recorded in the format:
One way to leverage trace is simply to include the
ability to enable trace in your application. Ideally, you WAIT #CURSOR: nam="EVENT" ela=n p1=n p2=n
want the ability to enable trace in a running program p3=n
from a command line. For example, a Unix
application can include code to toggle trace on or off CURSOR = Cursor number
upon receiving the signal SIGUSR1. In this way, a Nam = Wait event name
DBA can easily control trace with the command: Ela = Elapsed wait time in centiseconds
p[1-3] = Event-specific parameters
kill -usr1 PID
For example, the following record indicates the
Given a trace-enabled application a DBA could session waited less than .01 seconds for the event
research poor response time with the following “SQL*Net message from client” during the execution
steps: of cursor number one. Parameter P1 is the “driver
ID”, which is not too useful. Parameter P2 is the
1. Identify the application's PID. number of bytes received by the server from the
2. Signal the process to enable trace. client.
3. Have the user execute the problem transaction.
4. Signal the application to disable trace. WAIT #1: nam='SQL*Net message from client' ela=
5. Analyze the trace output. 0 p1=537277696 p2=1 p3=0

Applications written using Oracle pre-compilers such It is important to understand why elapsed wait time is
as Pro*C6 use embedded SQL (ESQL) statements to zero. There are two reasons for a zero value.
access the database engine. The following ESQL TIMED_STATISTICS may not be set to TRUE.
statements enable and disable trace: Alternatively, timed statistics may be enabled but the
actual wait time was less than .01 second. Time
exec sql ALTER SESSION SET SQL_TRACE TRUE;
exec sql ALTER SESSION SET SQL_TRACE FALSE;
granularity is limited to .01 second, commonly known
as a “tick”. This second case occurs quite
Wait events are not written to trace files by default. frequently, especially with “SQL*Net” wait events.
To enable trace and include wait events, use the Be careful not to ignore zero value waits, as their
following ESQL statement: cumulative value can be significant. Ten thousand
zero value events, for example, represent close to
exec sql ALTER SESSION SET EVENTS '10046 trace one hundred elapsed seconds.
name context forever, level 12';
INTERPRETING WAIT EVENTS
This is referred to as “10046” trace and is not
documented by Oracle. Harrison7 and Velpuri8 The following query executed against a Version 7.3
address event 10046 and other aspects of the trace database shows 105 unique wait events.
facility. Oracle support personnel routinely use event
trace capability and consider it safe. SELECT COUNT(*)
FROM v$event_name;
Oracle Version 7.3 allows trace to be enabled in a
COUNT(*)
session from svrmgr. This is useful but it does not ----------
enable the session to record wait events. To enable 105
trace for a session use this procedure:
Samples taken from production systems, however,
DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION reveal an average of twenty-five common events. Of
these twenty-five, we are concerned with only those
The procedure requires that you obtain the values of that represent significant session wait time.
columns SID and SERIAL# from V$SESSION for the
PID or user in question. See Version 7.3 Client message | SQL*Net9 message from
documentation for details on this procedure. client

Oracle provides a utility called tkprof to format raw “Client message” is observed prior to Version 7.3. It
trace files. Unfortunately the utility ignores wait is superseded by “SQL*Net message” in Versions
7.3 and above. Both events occur when an Oracle
6
Pro*C is a registered trademark of Oracle Corporation server (shadow) process awaits a message from its
7
[HARR96] Guy Harrison, “Getting the most from the client (foreground) process.
SQL_TRACE Facility”, OTJ, Winter 1996
8 9
[RAMA95] Rama Velpuri, “Oracle Backup & Recovery SQL*Net is a registered trademark of Oracle
Handbook”, 1995 Corporation
p2 block
FROM v$session_wait
As of Version 7.3, interprocess communication (IPC) WHERE event like ‘db file%’;
between client and server processes occurs via
SQL*Net. SQL*Net is an application and session The following query resolves file# and block# to a
level communication interface which uses native database object name which is usually a table.
operating system IPC mechanisms and network
communication protocol stacks such as TCP/IP. The COLUMN owner FORMAT a15 HEADING “Owner"
use of SQL*Net does not necessarily imply the use COLUMN sname FORMAT a20 HEADING “TableName"
of networked communication. COLUMN stype FORMAT a10 HEADING “SegType"
COLUMN tbsn FORMAT a10 HEADING “TblSpcName”
COLUMN fname FORMAT a40 HEADING “FileName”
This event is a product of a client process' SELECT owner,
synchronous relationship with the server process. segment_name sname,
The wait can be viewed as server idle time -- segment_type stype,
e.tablespace_name tbsn,
implying client busy time, user think time or, to a far file_name fname
less extent, IPC latency. It is a common, from dba_extents e,
unavoidable event consistently observed as one of dba_data_files f
where e.file_id = f.file_id
the top three wait events during extensive sampling and e.file_id = &file_id
of Version 7.1.6 production databases and Version and e.block_id <= &block_id
7.3 development databases. and e.block_id + e.blocks > &block_id

Null event Use the following query to identify SQL sessions


accessing the table identified by “tablename”.
Presence of the Null event indicates that a user SELECT buffer_gets,
session performed an unspecified event resulting in sql_text
a wait. Oracle indicates this event should not be FROM v$sqlarea
WHERE sql_text like &tablename
used and a bug reported if observed. Several ORDER BY buffer_gets desc;
observations of a large 7.1.6 database revealed
multiple occurrences of this event with an average The following query identifies SQL statements
wait time of three seconds. This event has not been resulting in the largest number of physical reads:
observed in Version 7.3. The number of null event
occurrences was always very close to the usage SELECT disk_reads,
statistic “background timeout” count, suggesting that sql_text
it is not a user session wait. Report occurrence of FROM v$sqlarea
WHERE disk_reads > &threshold
this event to Oracle. ORDER BY disk_reads desc;

DB file [sequential | scattered] read Several techniques may be used to reduce “db file
wait”. First, identify and, if possible, optimize
These wait events occur every time a user session offensive SQL statements. Use the output of tkprof
waits while reading database blocks. Sequential to identify resource intensive statements. The
reads are associated with indexed access and queries listed at the end of this section identify SQL
scattered reads with scan (multi-block read) statements resulting in high I/O. Next, physically
operations. These waits are unavoidable, but do not separate files. Aronoff et al.10 suggest separating
ignore them. Excessive wait time suggests I/O the following objects: tables from indexes; rollback
contention. segments from tables; rollback segments from online
redo logs; online redo logs from archived redo log
Drill-down analysis involves two tasks. The first step files; temporary tablespaces from data tablespaces;
is to identify wait parameters P1 (file#) and P2 and SYSTEM tablespace from the rest of the
(block#) and resolve them to a specific database database. Other techniques to explore include the
table. The next step is to map the table to SQL use of operating system file striping (if available) and
11
sessions accessing it. table partitioning (an option with Oracle8 ).

If the wait is observed using trace, file# and block# Enqueue


are available. Bestat, however, is driven by
V$SYSTEM_EVENT which does not include P1 or An enqueue is a locking mechanism that permits
P2. Look for wait parameters by sampling many processes to concurrently share a resource.
V$SESSION_WAIT with the following query: Enqueue waits occur when a session waits to obtain
a lock. This usually occurs when a session attempts
COLUMN sid FORMAT 9999 HEADING ‘SID’
COLUMN event FORMAT a25 HEADING ‘Event’ to modify data that is locked.
COLUMN filenum FORMAT 99999999 HEADING ‘File#’
COLUMN block FORMAT 99999999 HEADING ‘Block#’ 10
SELECT sid, [ARNO97] Aronoff, Loney, and Sonawalla, “Advanced
event, Oracle tuning and Administration”, 1997
p1 filenum, 11
Oracle8 is a registered trademark of Oracle Corporation
space, a user process must obtain the latch. After
One common enqueue is the data manipulation allocating space for a redo entry, the entry must be
language (DML) enqueue. The DML enqueue is copied into the redo log buffer. This is called “redo
represented in the enqueue wait parameter P1 as copy”. This can only be done if the redo entry is less
“TM”. A TM enqueue is acquired during the than the size defined by
execution of a transaction that references a table LOG_SMALL_ENTRY_MAX_SIZE. If the entry is
with a DML statement. The enqueue protects the too large, the process must acquire another latch
object from being altered by another session. There called “redo copy latch”. Heavy redo log buffer
need to be enough enqueues to support multiple access can result in redo buffer latch contention,
DML statements executed against multiple objects possibly impacting overall performance.
by multiple sessions. Oracle suggests
enqueue_resources should equal the product of “Library cache”, “library cache pin”, and “shared
database objects and the number of concurrent DML pool” latches protect access to shared pool objects.
operations executed against the objects plus about Shared pool latches allocate and release space in
twenty for kernel overhead. the shared pool. Library cache and library cache pin
latches are used during SQL statement parsing and
Memory consumed by enqueue resources is execution. Oracle cites the practice of not using bind
negligible relative to the database instance as a variables as the primary reason for library cache
whole. If enqueue waits are observed, significantly contention. Bind variables allow values to be bound
increase enqueue_resources parameter and during statement execution rather than during
measure again. parsing. Statements can then be reused without the
need to re-parse.
If excessive enqueue waits persist, drill-down
analysis is required. Identify specific enqueue Access to database block buffer cache is protected
names and locking modes and contact Oracle for by “cache buffer chain” and “cache buffer lru”
interpretation. latches. A cache buffer chain latch is acquired when
sessions access a block in the buffer. Access
To determine enqueue name and mode, query requires the session compute a hash value of the
V$SESSION_WAIT. P1 contains name|mode. block address to locate it. If the buffer is not found or
Enqueue names are always two characters. The has been modified, it must read the buffer from disk
following SQL statement returns names: or create a read consistent clone. Doing so requires
it locate a free buffer using the LRU mechanism.
SELECT chr(bitand(p1, -16777216)/16777215)|| The cache buffer lru latch controls access to the
chr(bitand(p1,16711680)/65535) Enqueue
FROM v$session_wait
LRU buffer.
WHERE event = ‘enqueue’;
Buffer busy
This statement returns mode:
This event occurs when a session cannot access a
SELECT bitand(p1, 65535)
FROM v$session_wait
block in buffer cache. A buffer is considered busy
WHERE event = ‘enqueue’; when another session is reading it or the block is
being converted into a “cleaned up” state. The most
Latch Free common causes of this wait are insufficient free lists
for a table or too few rollback segments.
Latches are analogous to semaphores. Latches
This wait is commonly seen when jobs are migrated
ensure certain operations are executed by a single
from a single execution thread to multiple execution
session. Requestors waiting for a latch concurrently
threads. The following illustrates the increase in
retry so any process may get the latch; the net effect
buffer busy waits as more execution threads are
is random assignments. It is possible for the first
made available to an application architected for fan-
process attempting to acquire a latch to be the last to
out parallelism:
acquire it.
Threads Waits Wait Seconds
Total latch free waits equals the total of all sleeps as
1 1 .01
reported in the latch section of bestat. Waits
2 15,060 80.09
definitely indicate latch contention. Latches held too
3 22,634 154.93
long suggest the resources they protect are being
4 31,013 123.96
held too long, implying higher level contention.
Oracle suggests that latch contention may affect
Drill down by querying V$WAITSTAT.
performance if the ratio of MISSES to GETS
exceeds 1%. SELECT *
FROM V$WAITSTAT;
The “redo allocation” latch controls the allocation of
space for entries in the redo log buffer. To allocate CLASS COUNT TIME
------------------ ---------- ---------- small or too few. Oracle documentation suggests
data block 46 173
sort block 0 0
increasing the value of LOG_BUFFER. This is the
save undo block 0 0 number of bytes allocated to the redo log buffer in
segment header 0 0 the SGA. In general, larger values reduce I/O
save undo header 0 0 frequency to on-line redo log files. I/O is reduced
free list 0 0
system undo header 0 0 because LGWR writes when the buffer is 1/3 full –
system undo block 0 0 bigger buffer means less frequent writes.
undo header 37 468
undo block 5 8
Log file parallel write
If the count for “data block” or “free list” is high,
consider adding “free lists” to tables subject to This event is analogous to “log file sync” but instead
concurrent insert operations. If “undo header” or of a session wait, it is encountered by LGWR. The
“undo block” counts are high, you may consider number of waits should equal the value of the
adding rollback segments. statistic “redo writes” as reported by bestat.
Generally the wait count and total wait time will be
Log file sync very close to “log file sync”. See section “Log file
sync”, for drill down considerations.
This is the unavoidable wait a session encounters at Free buffer waits
transaction commit when redo log entries are flushed
from the redo log buffer in memory to redo log files
on disk. This event appears to be rare. It represents “inward”
pressure on the database buffer cache. Waits occur
The wait is the time it takes the LGWR process to when a session attempts to read a block from disk
complete the I/O. LGWR writes to disc at user into cache. If free (clean) buffers are not available,
commit and at a timed interval occurring about every DBWR must write dirty buffers to disk. Waits simply
three seconds. The count for this event should suggest that DBWR is not writing enough buffers to
equal the value of “user commits” statistic plus disk.
(measurement interval / 3). Average wait time
generally should not exceed .050 seconds. If higher Quantify physical disk I/O activity to ensure I/O is
waits impact performance, investigate several distributed. You will often find DBWR is slow simply
possible problem areas and consider employing an because your I/O system is slow or not balanced.
appropriate tuning strategy. Use the following query to view I/O distribution
across data files:
Redo log files may exist on a slow device or a device COLUMN name FORMAT a40 HEADING 'File'
connected to a slow I/O subsystem. Move the redo COLUMN phyrds FORMAT 999999 HEADING 'PhysReads'
log files to a faster device or I/O subsystem. COLUMN phywrts FORMAT 999999 HEADING 'PhyWrites'
Similarly, the device where redo log files reside may SELECT name,
phyrds,
be too busy. Reduce I/O activity by physically phywrts
separating frequently accessed files from the redo FROM v$filestat f,
log files. Ideally, redo log files should be placed on a v$datafile d
dedicated disk and I/O channel. Finally, refer to WHERE f.file# = d.file#;
Oracle’s tuning guidelines for optimum redo log
configuration. If this wait becomes problematic, consider employing
one or more of the following tuning strategies:
Log file space/switch
• Use asynchronous I/O for DBWR.
• Configure multiple DBWR processes.
This wait encountered by LGWR occurs when an
online redo log file is full but a log file switch cannot • Enlarge the data block buffer cache.
be performed. The log switch is probably waiting for • Stripe data files across multiple physical disks.
archiving to complete. Another reason for the wait • Use “raw” disk devices. Raw disc devices are a
could be that LGWR couldn’t wrap into the next log configuration option offered by some Unix
file because the checkpoint for that log is incomplete. vendors. Raw devices allow applications to
Oracle documentation indicates a separate wait bypass operating system managed disc cache
event for the latter, called “log file switch (checkpoint and may result in performance improvement.
incomplete)”. This event was not observed during
research for this paper. Also examine the wait event “db file parallel write”.
This event indicates the time DBWR takes to write
Several sample observations revealed average wait buffers to disc. DBWR writes dirty buffers to disc in
time of log file space/switch ranging from .007 several scenarios.
seconds to .050 seconds with a peak frequency of
one wait every 12 seconds. If this wait is frequent The first scenario is a simple timeout event directing
and long, it may indicate online redo logs are too DBWR to write if it has not been active for three
seconds. In busy systems DBWR rarely stands idle.
A more common scenario occurs when a server puts
a block on the dirty list and finds the list has reached
a threshold equal to half the value of the parameter
DB_BLOCK_WRITE_BATCH. DBWR is also
directed to write when a server searches the LRU list
looking for free buffers and fails to find one within a
threshold defined by the value of the configuration
parameter DB_BLOCK_MAX_SCAN_CNT. Finally,
DBWR writes upon receipt of a checkpoint from
LGWR.

The count for this event should be close to the sum


of statistics “DBWR checkpoints”, “DBWR make free
requests”, and “DBWR timeouts”. High average wait
time indicates an I/O bottleneck or that
DB_BLOCK_WRITE_BATCH is too large.

CONCLUSION

Diagnosing performance problems in an Oracle


database is a challenge. The RDBMS is a complex
system. However, visibility into the complexity is
possible using Oracle’s robust tracing capabilities,
including the wait event interface. By enabling this
functionality and learning to leverage its valuable
output, analysts have the ability to determine exactly
why a user SQL session or database processing
waits.

You might also like