Chapter 02 · Core Diagnostic Framework

Wait Events & the Oracle Performance Model

This is the chapter that separates average DBAs from great performance tuners. Everything in Oracle performance ultimately reduces to one question: what is the session waiting for? Master wait events and you can diagnose almost any problem in under 10 minutes.

Time-based tuning Wait classes ASH queries V$SESSION_WAIT Blocking sessions 3AM toolkit

The Time-Based Tuning Model

Before wait events existed, DBAs tuned by ratios — buffer cache hit ratio, parse ratio, latch hit ratio. The problem: ratios lie. A 99% buffer cache hit ratio sounds great, but if you're doing a billion logical reads, that 1% miss is still 10 million physical reads.

Oracle's performance team introduced a better model: account for every second of elapsed time. A session is either on CPU doing useful work, or waiting for something. That's it. Nothing else.

Total DB Time (Elapsed Time) CPU Time Doing real work — parsing, sorting, hashing Wait Time Blocked — disk, lock, network, latch User I/O Concurrency + more
💡 The tuner's job
Find the biggest wait category. Fix it. Re-measure. Repeat. Every session is either on CPU or waiting — there is nothing else. The whole art of Oracle tuning is identifying and eliminating the dominant wait.

Wait Classes — The 12 Categories

Oracle groups every wait event into a wait class. Knowing the class instantly tells you what category of problem you're dealing with — before you even look at the individual event name.

User I/O
db file sequential read
db file scattered read
direct path read
→ Index/FTS I/O issues
Concurrency
enq: TX - row lock
library cache lock
buffer busy waits
→ Blocking & contention
System I/O
db file parallel write
log file parallel write
control file I/O
→ Background proc I/O
Commit
log file sync
→ LGWR / redo disk speed
Network
SQL*Net message from client
SQL*Net more data
→ Client round-trips
Configuration
log file switch
control file sequential read
→ Sizing/placement issues
Administrative
wait for table stats gather
SQL*Loader events
→ DBA activity blocking
Scheduler
resmgr: cpu quantum
resmgr: become active
→ Resource Manager
Cluster
gc buffer busy acquire
gc cr request
→ RAC interconnect only
Other
latch: cache buffers chains
latch: shared pool
→ In-memory contention
Idle ⚠
SQL*Net message from client
jobq slave wait
wakeup time manager
→ ALWAYS FILTER OUT
Application
enq: TX - row lock contention
enq: TM - contention
→ Fix the app, not Oracle

The Idle Filter Rule

The single biggest mistake beginners make is looking at total wait time and panicking about SQL*Net message from client. That event just means "the session is sitting idle, waiting for the user to send the next SQL." It tells you nothing about Oracle's performance.

🚨 Rule: Always exclude Idle wait class
Every ASH query you write must include AND wait_class != 'Idle'. Without this filter, idle waits inflate the totals and hide your real problems. This one filter is the difference between seeing the problem and being confused by noise.
SQL — Top wait events with mandatory Idle filter
-- Top wait events RIGHT NOW — last 10 minutes, Idle excluded
SELECT event,
       wait_class,
       COUNT(*)                                          AS active_sessions,
       ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM   v$active_session_history
WHERE  sample_time  > SYSDATE - 10/1440
AND    session_type = 'FOREGROUND'
AND    wait_class  != 'Idle'           -- THE KEY FILTER
GROUP BY event, wait_class
ORDER BY active_sessions DESC
FETCH FIRST 15 ROWS ONLY;

Top Wait Events You Will See Every Day

Click each tab to explore the event, its root causes, diagnostic queries, and real production scenarios.

db file sequential read
db file scattered read
log file sync
TX row lock
library cache
temp spill
db file sequential read — single-block I/O
The #1 most common wait event in OLTP. Oracle read exactly one 8KB block from disk. Happens during index range scans, index unique scans, and rowid lookups. Not always bad — the question is whether those reads are necessary.
When it's acceptable
OLTP lookups by primary key or unique index. Each lookup is fast and selective. Expected in healthy systems.
When it's a problem
Millions of single-block reads per execution due to poor clustering factor or wrong plan. Index being used that shouldn't be.
SQL — Find SQL causing the most sequential reads
-- Find top SQL IDs causing db file sequential read (last 1 hour)
SELECT sql_id,
       COUNT(*)                                          AS ash_samples,
       ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM   v$active_session_history
WHERE  event      = 'db file sequential read'
AND    sample_time > SYSDATE - 1/24
GROUP BY sql_id
ORDER BY ash_samples DESC
FETCH FIRST 10 ROWS ONLY;

-- Then inspect the execution plan for the top sql_id
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('&sql_id', NULL, 'ALLSTATS LAST'));
🔎 Real scenario
Production is slow, top wait is db file sequential read. You find one SQL doing 2 million single-block reads per execution fetching 800,000 rows through an index. The clustering factor of the index is terrible — rows are physically scattered across the table. The index is being used but each ROWID lookup lands on a different block. Fix: rewrite to use a direct full table scan, or rebuild the table in sorted order.
db file scattered read — multi-block I/O (full scans)
Oracle is reading 8–128 contiguous blocks at once. "Scattered" refers to where the blocks land in the buffer cache (not sequentially), not the disk read pattern. Caused by full table scans and fast full index scans.
When it's acceptable
Large analytics/reporting queries that genuinely need most of the table. Deliberate full scans on small lookup tables.
When it's a problem
Small selective query missing an index. A large table full-scanned repeatedly, evicting OLTP blocks from buffer cache.
SQL — Find objects being full-scanned the most
-- Find which objects are being full-scanned most (last 1 hour)
SELECT ash.current_obj#,
       obj.object_name,
       obj.object_type,
       COUNT(*) AS scan_samples
FROM   v$active_session_history ash
JOIN   dba_objects obj ON obj.object_id = ash.current_obj#
WHERE  ash.event       = 'db file scattered read'
AND    ash.sample_time > SYSDATE - 1/24
GROUP BY ash.current_obj#, obj.object_name, obj.object_type
ORDER BY scan_samples DESC
FETCH FIRST 10 ROWS ONLY;
🔎 Real scenario
db file scattered read accounts for 40% of all wait time. The query above shows a 2GB AUDIT_LOG table being fully scanned 500 times per hour. The query is WHERE user_id = 101 AND action_date > SYSDATE - 7. There's no composite index on (user_id, action_date). Adding one drops this wait event off the top-10 list entirely.
log file sync — COMMIT bottleneck
Every COMMIT posts to LGWR: "write my redo to disk." The session waits on log file sync until LGWR confirms. Wait time = how long LGWR takes to flush. This event directly blocks every committing user.
Slow disk for redo logs
High average_wait (>10ms). Move redo logs to faster storage (NVMe/SSD).
Too many small commits
High total_waits, low avg_wait. Batch commits — every 1000 rows, not every row.
Redo logs too small
Frequent log switches cause ARCH pressure. Increase log file size to 500MB+.
Synchronous I/O
Consistent ~5ms delay. Enable async I/O: FILESYSTEMIO_OPTIONS=SETALL.
SQL — Diagnose log file sync
-- Check log file sync severity
SELECT event, total_waits, total_timeouts,
       time_waited, average_wait, max_wait
FROM   v$system_event
WHERE  event = 'log file sync';

-- What is LGWR itself waiting for?
SELECT event, total_waits, average_wait
FROM   v$system_event
WHERE  event LIKE 'log file%'
ORDER BY time_waited DESC;
🔎 Real scenario
A batch job inserts 1 million rows and takes 4 hours. You find it's committing after every single row. log file sync average_wait = 4ms. 1,000,000 commits × 4ms = 4,000 seconds = 66 minutes wasted just on commits. Change to commit every 5,000 rows — job time drops to 35 minutes.
enq: TX - row lock contention
Session A updated a row but hasn't committed. Session B tries to update the same row. Session B waits indefinitely. The blocker is often an inactive session — a user who started a transaction and walked away.
SQL — Find all blocking sessions
-- Find all blocked sessions and their blockers RIGHT NOW
SELECT
  w.sid                               AS waiting_sid,
  w.username                          AS waiting_user,
  w.event                             AS wait_event,
  w.seconds_in_wait,
  b.sid                               AS blocking_sid,
  b.username                          AS blocking_user,
  b.status                            AS blocker_status,
  b.sql_id                            AS blocker_sql,
  b.machine                           AS blocker_machine
FROM   v$session w
JOIN   v$session b ON b.sid = w.blocking_session
WHERE  w.blocking_session IS NOT NULL
ORDER BY w.seconds_in_wait DESC;

-- Find cascaded blocking chain (A blocks B which blocks C)
SELECT  LPAD(' ', 2*(LEVEL-1)) || sid   AS sid_tree,
        username, event, seconds_in_wait, blocking_session
FROM    v$session
START WITH blocking_session IS NULL
       AND sid IN (
           SELECT DISTINCT blocking_session
           FROM   v$session WHERE blocking_session IS NOT NULL)
CONNECT BY PRIOR sid = blocking_session
ORDER SIBLINGS BY seconds_in_wait DESC;
🔎 Real scenario
50 users are stuck. Session 142 (a background job) has been running 22 minutes without committing. It updated ORDERS.STATUS on 10,000 rows and is still processing. All 50 users are waiting to update their own order rows. Fix: break the batch job into smaller transactions with frequent commits.
library cache lock / library cache pin
Concurrency events inside the shared pool. Happen when DDL is being done on an object while sessions are using it, or when multiple sessions try to hard-parse the same SQL simultaneously. One of the most disruptive events during deployments.
SQL — Find library cache contention
-- Sessions waiting on library cache events
SELECT s.sid, s.username, s.event,
       s.seconds_in_wait, s.sql_id,
       s.p1raw, s.p2raw
FROM   v$session s
WHERE  s.event LIKE 'library cache%'
ORDER BY s.seconds_in_wait DESC;

-- Identify WHAT object is being held (put p1raw value in filter)
SELECT kglnaown AS owner,
       kglnaobj AS object_name,
       kglobtyp AS object_type
FROM   x$kglob
WHERE  kglhdadr = '&p1raw_value';
🔎 Real scenario
Deployment team runs ALTER PACKAGE compile_all at 2 AM. The ETL starts at 2:05 AM. Sessions pile up waiting on library cache pin. Fix: coordinate deployments with ETL windows, or use DBMS_UTILITY.COMPILE_SCHEMA with minimal locking.
direct path read temp / direct path write temp
When a sort or hash join doesn't fit in PGA, Oracle writes to the temp tablespace. Direct path means it bypasses the buffer cache. This is almost always a PGA sizing problem. Even a small percentage of multipass work area executions causes severe slowdowns.
SQL — Find sessions spilling to temp
-- Sessions currently spilling to temp
SELECT s.sid, s.username, s.sql_id,
       t.tablespace,
       t.blocks * 8 / 1024  AS temp_mb_used
FROM   v$session     s
JOIN   v$tempseg_usage t ON t.session_addr = s.saddr
ORDER BY temp_mb_used DESC;

-- SQL historically causing the most disk sorts
SELECT sql_id, sorts, disk_sorts,
       ROUND(disk_sorts * 100 / NULLIF(sorts, 0), 1) AS pct_disk_sorts
FROM   v$sqlarea
WHERE  sorts > 0 AND disk_sorts > 0
ORDER BY disk_sorts DESC
FETCH FIRST 15 ROWS ONLY;
🔎 Real scenario
A reporting query sorts 50 million rows. PGA_AGGREGATE_TARGET = 500MB. With 20 concurrent sessions, each gets ~25MB sort space. The query needs 800MB. Fix: increase pga_aggregate_target to 4GB, or rewrite to use an index delivering rows in sorted order — eliminating the sort entirely.

The Diagnostic Workflow

This is the exact process to follow when you get paged saying "the database is slow." Five steps. Every time.

1
Find top wait events in ASH (last 30 min)
Run the ASH query with wait_class != 'Idle'. This immediately tells you the category of the problem — I/O, locking, memory, or CPU. Don't skip this step. You need the category before you can investigate further.
2
Find which SQL IDs are causing the #1 wait
Filter ASH by the dominant event, group by sql_id. One or two SQL IDs usually account for 70–80% of the problem. Now you have a specific target.
3
Pull the execution plan with ALLSTATS LAST
Use DBMS_XPLAN.DISPLAY_CURSOR with the sql_id. Look at E-Rows vs A-Rows. A large gap (estimated 1,000, actual 1,000,000) means the optimizer is making wrong decisions — the root cause is usually stale statistics or missing histograms.
4
Route to SQL path or Resource path
Plan is wrong (bad rows, wrong operation) → SQL tuning path (indexes, stats, hints, Chapter 3/4). Plan looks correct but waits are still high → Resource path (memory, I/O, contention, Chapters 5/6/7).
5
Verify the fix actually worked
Re-run the original ASH query. Did the dominant wait event shrink or disappear? Did a new one surface? Re-measure elapsed time of the problem SQL. Numbers don't lie — if the wait didn't move, the fix didn't work.

The 3 AM Toolkit

Copy these five queries. Run them in order on any production system showing performance issues. They answer every top-level question in under 5 minutes.

SQL — Query 1: Top waits RIGHT NOW (last 10 min)
-- QUERY 1: What are the top waits RIGHT NOW? (last 10 minutes)
SELECT event, wait_class,
       COUNT(*)                                          AS sessions,
       ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM   v$active_session_history
WHERE  sample_time  > SYSDATE - 10/1440
AND    session_type = 'FOREGROUND'
AND    wait_class  != 'Idle'
GROUP BY event, wait_class
ORDER BY sessions DESC
FETCH FIRST 10 ROWS ONLY;
SQL — Query 2: Which SQL causes the #1 wait?
-- QUERY 2: Which SQL is causing the #1 wait event?
-- Replace 'db file sequential read' with your top event from Query 1
SELECT sql_id,
       COUNT(*)                                          AS samples,
       ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM   v$active_session_history
WHERE  event       = 'db file sequential read'
AND    sample_time > SYSDATE - 10/1440
GROUP BY sql_id
ORDER BY samples DESC
FETCH FIRST 5 ROWS ONLY;
SQL — Query 3: Current active sessions + wait details
-- QUERY 3: Every active session and what it's waiting for
SELECT s.sid, s.username, s.status, s.event,
       s.wait_class, s.seconds_in_wait,
       s.sql_id, s.blocking_session
FROM   v$session s
WHERE  s.status     = 'ACTIVE'
AND    s.type       = 'USER'
AND    s.wait_class != 'Idle'
ORDER BY s.seconds_in_wait DESC;
SQL — Query 4: Is there a blocking chain right now?
-- QUERY 4: Blocking chain — find all blockers and their victims
SELECT w.sid, w.username, w.event, w.seconds_in_wait,
       b.sid AS blocker, b.username AS blocker_user,
       b.sql_id AS blocker_sql
FROM   v$session w
JOIN   v$session b ON b.sid = w.blocking_session
WHERE  w.blocking_session IS NOT NULL;
SQL — Query 5: Historical problem window analysis
-- QUERY 5: Investigate a past problem window
-- Change the timestamps to your incident window
SELECT TO_CHAR(sample_time, 'HH24:MI') AS time_slot,
       event,
       COUNT(*) AS sessions
FROM   v$active_session_history
WHERE  sample_time BETWEEN TO_DATE('2026-03-21 14:00','YYYY-MM-DD HH24:MI')
                        AND TO_DATE('2026-03-21 14:30','YYYY-MM-DD HH24:MI')
AND    wait_class != 'Idle'
GROUP BY TO_CHAR(sample_time, 'HH24:MI'), event
ORDER BY time_slot, sessions DESC;

Wait Event Combination Patterns

Real incidents rarely show a single wait event. Knowing these combinations lets you diagnose the root cause from the wait list alone — before even pulling a plan.

What you see in ASHWhat it means
db file sequential read + high CPU Lots of index lookups — possibly missing index or stale statistics causing wrong plan
log file sync + log file parallel write LGWR is genuinely slow — disk I/O on redo log files is the root problem
enq: TX row lock + many sessions One session holding a lock too long — find the blocker, assess whether to kill
library cache lock + hard parses spiking DDL on a hot object, or application not using bind variables at all
direct path read temp + write temp together PGA too small — sort and hash join spilling to disk simultaneously
latch: cache buffers chains + one sql_id Hot block — one SQL hitting the same tiny set of blocks millions of times (sequence, hot index block)
CPU only, no waits visible in ASH CPU-bound work — the SQL itself is inefficient (bad loops, large in-memory sorts, function calls per row)
db file scattered read + db file sequential read together Mixed workload: OLTP index lookups plus a large full scan competing for buffer cache

Self-Check Questions

Before moving to Chapter 3, make sure you can answer these from memory.

  • Q1 A session has been in SQL*Net message from client for 5 minutes. Should you be concerned? Explain precisely why or why not, and name the wait class it belongs to.
  • Q2 You check top waits and see log file sync at 35% of all wait time. What are the first two things you check, and how do you distinguish between "slow disk" and "too many commits" as the root cause?
  • Q3 Twenty sessions are all waiting on enq: TX - row lock contention. Your blocking query shows Session 77 as the blocker. Session 77 is INACTIVE. What does that tell you, and what do you do next before killing it?
  • Q4 What is the precise difference between db file sequential read and db file scattered read in terms of the Oracle operation that caused each one? Can you see both on the same query?
  • Q5 You see latch: cache buffers chains in the top-3 waits and one sql_id accounts for 90% of it. What kind of object or access pattern usually causes this, and what are three design-level fixes?