The Time-Based Tuning Model
Before wait events existed, DBAs tuned by ratios — buffer cache hit ratio, parse ratio, latch hit ratio. The problem: ratios lie. A 99% buffer cache hit ratio sounds great, but if you're doing a billion logical reads, that 1% miss is still 10 million physical reads.
Oracle's performance team introduced a better model: account for every second of elapsed time. A session is either on CPU doing useful work, or waiting for something. That's it. Nothing else.
Wait Classes — The 12 Categories
Oracle groups every wait event into a wait class. Knowing the class instantly tells you what category of problem you're dealing with — before you even look at the individual event name.
db file scattered read
direct path read
library cache lock
buffer busy waits
log file parallel write
control file I/O
SQL*Net more data
control file sequential read
SQL*Loader events
resmgr: become active
gc cr request
latch: shared pool
jobq slave wait
wakeup time manager
enq: TM - contention
The Idle Filter Rule
The single biggest mistake beginners make is looking at total wait time and panicking about SQL*Net message from client. That event just means "the session is sitting idle, waiting for the user to send the next SQL." It tells you nothing about Oracle's performance.
AND wait_class != 'Idle'. Without this filter, idle waits inflate the totals and hide your real problems. This one filter is the difference between seeing the problem and being confused by noise.
-- Top wait events RIGHT NOW — last 10 minutes, Idle excluded SELECT event, wait_class, COUNT(*) AS active_sessions, ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct FROM v$active_session_history WHERE sample_time > SYSDATE - 10/1440 AND session_type = 'FOREGROUND' AND wait_class != 'Idle' -- THE KEY FILTER GROUP BY event, wait_class ORDER BY active_sessions DESC FETCH FIRST 15 ROWS ONLY;
Top Wait Events You Will See Every Day
Click each tab to explore the event, its root causes, diagnostic queries, and real production scenarios.
-- Find top SQL IDs causing db file sequential read (last 1 hour) SELECT sql_id, COUNT(*) AS ash_samples, ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct FROM v$active_session_history WHERE event = 'db file sequential read' AND sample_time > SYSDATE - 1/24 GROUP BY sql_id ORDER BY ash_samples DESC FETCH FIRST 10 ROWS ONLY; -- Then inspect the execution plan for the top sql_id SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('&sql_id', NULL, 'ALLSTATS LAST'));
db file sequential read. You find one SQL doing 2 million single-block reads per execution fetching 800,000 rows through an index. The clustering factor of the index is terrible — rows are physically scattered across the table. The index is being used but each ROWID lookup lands on a different block. Fix: rewrite to use a direct full table scan, or rebuild the table in sorted order.-- Find which objects are being full-scanned most (last 1 hour) SELECT ash.current_obj#, obj.object_name, obj.object_type, COUNT(*) AS scan_samples FROM v$active_session_history ash JOIN dba_objects obj ON obj.object_id = ash.current_obj# WHERE ash.event = 'db file scattered read' AND ash.sample_time > SYSDATE - 1/24 GROUP BY ash.current_obj#, obj.object_name, obj.object_type ORDER BY scan_samples DESC FETCH FIRST 10 ROWS ONLY;
db file scattered read accounts for 40% of all wait time. The query above shows a 2GB AUDIT_LOG table being fully scanned 500 times per hour. The query is WHERE user_id = 101 AND action_date > SYSDATE - 7. There's no composite index on (user_id, action_date). Adding one drops this wait event off the top-10 list entirely.log file sync until LGWR confirms. Wait time = how long LGWR takes to flush. This event directly blocks every committing user.-- Check log file sync severity SELECT event, total_waits, total_timeouts, time_waited, average_wait, max_wait FROM v$system_event WHERE event = 'log file sync'; -- What is LGWR itself waiting for? SELECT event, total_waits, average_wait FROM v$system_event WHERE event LIKE 'log file%' ORDER BY time_waited DESC;
log file sync average_wait = 4ms. 1,000,000 commits × 4ms = 4,000 seconds = 66 minutes wasted just on commits. Change to commit every 5,000 rows — job time drops to 35 minutes.-- Find all blocked sessions and their blockers RIGHT NOW SELECT w.sid AS waiting_sid, w.username AS waiting_user, w.event AS wait_event, w.seconds_in_wait, b.sid AS blocking_sid, b.username AS blocking_user, b.status AS blocker_status, b.sql_id AS blocker_sql, b.machine AS blocker_machine FROM v$session w JOIN v$session b ON b.sid = w.blocking_session WHERE w.blocking_session IS NOT NULL ORDER BY w.seconds_in_wait DESC; -- Find cascaded blocking chain (A blocks B which blocks C) SELECT LPAD(' ', 2*(LEVEL-1)) || sid AS sid_tree, username, event, seconds_in_wait, blocking_session FROM v$session START WITH blocking_session IS NULL AND sid IN ( SELECT DISTINCT blocking_session FROM v$session WHERE blocking_session IS NOT NULL) CONNECT BY PRIOR sid = blocking_session ORDER SIBLINGS BY seconds_in_wait DESC;
ORDERS.STATUS on 10,000 rows and is still processing. All 50 users are waiting to update their own order rows. Fix: break the batch job into smaller transactions with frequent commits.-- Sessions waiting on library cache events SELECT s.sid, s.username, s.event, s.seconds_in_wait, s.sql_id, s.p1raw, s.p2raw FROM v$session s WHERE s.event LIKE 'library cache%' ORDER BY s.seconds_in_wait DESC; -- Identify WHAT object is being held (put p1raw value in filter) SELECT kglnaown AS owner, kglnaobj AS object_name, kglobtyp AS object_type FROM x$kglob WHERE kglhdadr = '&p1raw_value';
ALTER PACKAGE compile_all at 2 AM. The ETL starts at 2:05 AM. Sessions pile up waiting on library cache pin. Fix: coordinate deployments with ETL windows, or use DBMS_UTILITY.COMPILE_SCHEMA with minimal locking.-- Sessions currently spilling to temp SELECT s.sid, s.username, s.sql_id, t.tablespace, t.blocks * 8 / 1024 AS temp_mb_used FROM v$session s JOIN v$tempseg_usage t ON t.session_addr = s.saddr ORDER BY temp_mb_used DESC; -- SQL historically causing the most disk sorts SELECT sql_id, sorts, disk_sorts, ROUND(disk_sorts * 100 / NULLIF(sorts, 0), 1) AS pct_disk_sorts FROM v$sqlarea WHERE sorts > 0 AND disk_sorts > 0 ORDER BY disk_sorts DESC FETCH FIRST 15 ROWS ONLY;
PGA_AGGREGATE_TARGET = 500MB. With 20 concurrent sessions, each gets ~25MB sort space. The query needs 800MB. Fix: increase pga_aggregate_target to 4GB, or rewrite to use an index delivering rows in sorted order — eliminating the sort entirely.The Diagnostic Workflow
This is the exact process to follow when you get paged saying "the database is slow." Five steps. Every time.
wait_class != 'Idle'. This immediately tells you the category of the problem — I/O, locking, memory, or CPU. Don't skip this step. You need the category before you can investigate further.sql_id. One or two SQL IDs usually account for 70–80% of the problem. Now you have a specific target.DBMS_XPLAN.DISPLAY_CURSOR with the sql_id. Look at E-Rows vs A-Rows. A large gap (estimated 1,000, actual 1,000,000) means the optimizer is making wrong decisions — the root cause is usually stale statistics or missing histograms.The 3 AM Toolkit
Copy these five queries. Run them in order on any production system showing performance issues. They answer every top-level question in under 5 minutes.
-- QUERY 1: What are the top waits RIGHT NOW? (last 10 minutes) SELECT event, wait_class, COUNT(*) AS sessions, ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct FROM v$active_session_history WHERE sample_time > SYSDATE - 10/1440 AND session_type = 'FOREGROUND' AND wait_class != 'Idle' GROUP BY event, wait_class ORDER BY sessions DESC FETCH FIRST 10 ROWS ONLY;
-- QUERY 2: Which SQL is causing the #1 wait event? -- Replace 'db file sequential read' with your top event from Query 1 SELECT sql_id, COUNT(*) AS samples, ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 1) AS pct FROM v$active_session_history WHERE event = 'db file sequential read' AND sample_time > SYSDATE - 10/1440 GROUP BY sql_id ORDER BY samples DESC FETCH FIRST 5 ROWS ONLY;
-- QUERY 3: Every active session and what it's waiting for SELECT s.sid, s.username, s.status, s.event, s.wait_class, s.seconds_in_wait, s.sql_id, s.blocking_session FROM v$session s WHERE s.status = 'ACTIVE' AND s.type = 'USER' AND s.wait_class != 'Idle' ORDER BY s.seconds_in_wait DESC;
-- QUERY 4: Blocking chain — find all blockers and their victims SELECT w.sid, w.username, w.event, w.seconds_in_wait, b.sid AS blocker, b.username AS blocker_user, b.sql_id AS blocker_sql FROM v$session w JOIN v$session b ON b.sid = w.blocking_session WHERE w.blocking_session IS NOT NULL;
-- QUERY 5: Investigate a past problem window -- Change the timestamps to your incident window SELECT TO_CHAR(sample_time, 'HH24:MI') AS time_slot, event, COUNT(*) AS sessions FROM v$active_session_history WHERE sample_time BETWEEN TO_DATE('2026-03-21 14:00','YYYY-MM-DD HH24:MI') AND TO_DATE('2026-03-21 14:30','YYYY-MM-DD HH24:MI') AND wait_class != 'Idle' GROUP BY TO_CHAR(sample_time, 'HH24:MI'), event ORDER BY time_slot, sessions DESC;
Wait Event Combination Patterns
Real incidents rarely show a single wait event. Knowing these combinations lets you diagnose the root cause from the wait list alone — before even pulling a plan.
| What you see in ASH | What it means |
|---|---|
| db file sequential read + high CPU | Lots of index lookups — possibly missing index or stale statistics causing wrong plan |
| log file sync + log file parallel write | LGWR is genuinely slow — disk I/O on redo log files is the root problem |
| enq: TX row lock + many sessions | One session holding a lock too long — find the blocker, assess whether to kill |
| library cache lock + hard parses spiking | DDL on a hot object, or application not using bind variables at all |
| direct path read temp + write temp together | PGA too small — sort and hash join spilling to disk simultaneously |
| latch: cache buffers chains + one sql_id | Hot block — one SQL hitting the same tiny set of blocks millions of times (sequence, hot index block) |
| CPU only, no waits visible in ASH | CPU-bound work — the SQL itself is inefficient (bad loops, large in-memory sorts, function calls per row) |
| db file scattered read + db file sequential read together | Mixed workload: OLTP index lookups plus a large full scan competing for buffer cache |
Self-Check Questions
Before moving to Chapter 3, make sure you can answer these from memory.
-
Q1
A session has been in
SQL*Net message from clientfor 5 minutes. Should you be concerned? Explain precisely why or why not, and name the wait class it belongs to. -
Q2
You check top waits and see
log file syncat 35% of all wait time. What are the first two things you check, and how do you distinguish between "slow disk" and "too many commits" as the root cause? -
Q3
Twenty sessions are all waiting on
enq: TX - row lock contention. Your blocking query shows Session 77 as the blocker. Session 77 is INACTIVE. What does that tell you, and what do you do next before killing it? -
Q4
What is the precise difference between
db file sequential readanddb file scattered readin terms of the Oracle operation that caused each one? Can you see both on the same query? -
Q5
You see
latch: cache buffers chainsin the top-3 waits and onesql_idaccounts for 90% of it. What kind of object or access pattern usually causes this, and what are three design-level fixes?