Chapter 1 — Oracle Architecture for Performance Tuners

1.1 · Core Concept

Instance vs Database

The single most important mental model shift for a tuner: Oracle is not just a database — it's a distributed system running entirely within a single server. It has its own memory manager, its own I/O scheduler, its own process pool, and its own caching layer.

⚠ Most tuners confuse these two

These are completely different things and the distinction governs where 90% of all performance problems live.

The Database

Files on disk — no intelligence

Data files, control files, redo log files. It just sits on disk. It has no CPU, no memory, no processes. When Oracle "shuts down", the files remain untouched.

The Instance

Memory + processes — the engine

SGA (shared memory) and background processes. Created in RAM at startup, destroyed at shutdown. Almost every performance problem lives here, not in the files.

When Oracle "starts up", it creates the instance in memory and then mounts and opens the database (the files). When Oracle "shuts down", the instance is destroyed but the files remain.

1.2 · Shared Memory

The SGA — System Global Area

The SGA is the shared memory region. Every server process and background process reads from and writes to the SGA. It is the most important structure to understand for tuning.

💡 The critical relationship

SGA and PGA compete for the same physical RAM. Too much SGA means tiny PGA work areas — sorts spill to disk. Too little SGA means buffer cache misses — physical I/O storms. Getting this balance right is the core of memory tuning (Chapter 5).

1.3 · Tuning Target #1

The Buffer Cache

When Oracle needs a data block (say, a row from the ORDERS table), it first checks the buffer cache. If the block is there — that's a logical read (buffer get). Fast. Cheap. If not — Oracle must go to disk. That's a physical read. It's 1000× slower on spinning disk, 50–100× slower even on SSD.

Buffer Cache Hit Ratio

A healthy OLTP system should show 95%+. Below 90% means your cache is too small or you have a full table scan problem.

SQL — Buffer Cache Hit Ratio

-- Buffer cache hit ratio — should be > 95% on OLTP
SELECT ROUND(
  (1 - (phyrds.value / (dbgets.value + congets.value))) * 100, 2
) AS buffer_cache_hit_ratio
FROM   v$sysstat phyrds,
       v$sysstat dbgets,
       v$sysstat congets
WHERE  phyrds.name  = 'physical reads'
AND    dbgets.name  = 'db block gets'
AND    congets.name = 'consistent gets';

🔎 Real-world scenario

A DBA calls you — "the app was fast yesterday, slow today, nothing changed." You query buffer cache hit ratio and see it dropped from 98% to 74%. You look at AWR and find a new nightly job was added that does a full table scan on a 50GB table — it's thrashing the buffer cache by loading 50GB of blocks, evicting the hot OLTP blocks that the application needs. Fix: schedule the job during off-peak hours, or use direct-path reads so it bypasses the buffer cache entirely.

1.4 · Parse Cost Prevention

The Shared Pool

Every SQL statement that Oracle executes must first be parsed. Parsing is expensive — Oracle checks syntax, resolves object names, checks permissions, and generates the execution plan. After a hard parse, Oracle stores the parsed form in the library cache inside the shared pool. The next time the same SQL comes in, Oracle skips the hard parse — that's a soft parse.

The Bind Variable Anti-Pattern

SQL — Bind Variable Anti-Pattern

-- BAD: every call is a hard parse (unique SQL text = unique entry in library cache)
SELECT * FROM orders WHERE order_id = 1001;
SELECT * FROM orders WHERE order_id = 1002;
SELECT * FROM orders WHERE order_id = 1003;
-- Oracle treats these as 3 completely different SQL statements

-- GOOD: bind variables — one parse, plan reused for all values of :b1
SELECT * FROM orders WHERE order_id = :b1;

SQL — Diagnose Missing Bind Variables

-- Find SQL with high parse counts and no bind variables
SELECT executions, parse_calls, sql_text
FROM   v$sqlarea
WHERE  parse_calls > 100
AND    executions  < parse_calls * 1.1   -- parsing nearly every execution
ORDER BY parse_calls DESC
FETCH FIRST 20 ROWS ONLY;

🔎 Real-world scenario

A new Java application is deployed. CPU on the DB server spikes to 95%. You look at v$sqlarea and find 50,000 distinct SQL statements — all identical except for the WHERE id = <literal>. The dev team is using string concatenation instead of bind variables. Fixing this drops CPU from 95% to 15% without changing a single index.

1.5 · Commit Path

The Redo Log Buffer

Every change to the database generates redo — the change record used for crash recovery. Redo lands first in the redo log buffer (in SGA). The background process LGWR flushes it to redo log files on disk on every COMMIT.

⚡ Critical path

A COMMIT does not return to the application until LGWR confirms the redo is safely on disk. Slow disk on redo logs = slow COMMIT for every user. You'll see this as log file sync wait events in ASH.

SQL — Check log file sync waits

-- High average_wait here = LGWR bottleneck = slow commits
SELECT event, total_waits, time_waited,
       average_wait, max_wait
FROM   v$system_event
WHERE  event = 'log file sync';

1.6 · Private Memory

The PGA — Program Global Area

Unlike the SGA which is shared, the PGA is private to each server process. It holds the sort work area, hash join area, bind variable values, cursor state, and PL/SQL variables. If a sort operation doesn't fit in the PGA sort area, Oracle spills it to disk — creating temp tablespace I/O.

SQL — Find sessions spilling sorts to disk

-- Sessions currently spilling sorts to disk (temp tablespace)
SELECT s.sid, s.username, s.sql_id,
       w.event, w.seconds_in_wait
FROM   v$session       s
JOIN   v$session_wait  w ON w.sid = s.sid
WHERE  w.event LIKE '%temp%'
ORDER BY w.seconds_in_wait DESC;

1.7 · The Engine Room

Background Processes

These run silently behind every Oracle instance. As a tuner, you need to know what each one does and what its failure or slowness means in terms of wait events.

Process	What it does	Tuning concern
DBWR	Writes dirty buffers from buffer cache to data files	Slow = waits on `db file parallel write`, I/O bottleneck
LGWR	Writes redo log buffer to redo log files on every COMMIT	Slow = `log file sync` on every COMMIT
CKPT	Updates control file and data file headers at checkpoints	Frequent checkpoints = high I/O spike every N seconds
SMON	System monitor — recovers transactions, cleans temp segments	If stuck, temp space won't be freed after sessions end
PMON	Process monitor — cleans up after failed sessions	If stuck, dead sessions linger holding locks and resources
ARCH	Archives filled redo logs to archive log destination	If slow, `log file switch (archiving needed)` stalls sessions

SQL — Check which background processes are running

-- Active background processes (paddr != '00' means running)
SELECT name, description
FROM   v$bgprocess
WHERE  paddr != '00'
ORDER BY name;

1.8 · The Full Picture

How a Query Actually Executes

When your application fires a SELECT, here is exactly what happens — and which wait events map to each step. This is the most important thing to internalize for tuning.

1

Application sends SQL to Oracle

SQL text is transmitted over SQL*Net. Oracle receives it and begins the parse phase. The exact text (including whitespace and case) determines if it matches an existing cursor in the shared pool.

2

Parse — check shared pool (library cache)

Soft parse: SQL found in library cache → skip to execute. Hard parse: not found → syntax check, semantic check, privilege check, optimizer generates execution plan (expensive!). Plan stored in library cache for next time.

library cache lock · library cache pin

3

Execute — follow the execution plan

Oracle executes the row source tree — accessing rows according to the plan. For each block needed, checks buffer cache first. For DML, this phase writes undo and redo records. For SELECT, this phase drives the row production engine.

4

Buffer cache lookup — memory or disk?

For each block access: is it in the buffer cache? HIT → logical read (microseconds). MISS → physical read from disk (milliseconds). The ratio of hits vs misses determines how I/O-bound your query is.

db file sequential read db file scattered read

5

Fetch rows to client

Rows are returned in batches. Default array fetch size in JDBC is often 10–15 rows — optimal is 100–500. Fetching 1 row at a time means 1 network round-trip per row. On a result set of 100,000 rows, that's the difference between 10ms and 100 seconds.

SQL*Net message to client

💡 The mapping that matters

Every wait event you'll ever diagnose maps to one of these five steps. library cache lock → step 2 (parse). db file sequential read → step 4 (buffer miss). log file sync → appears in step 3 for DML on COMMIT. Know the step, you know the category of the problem.

1.9 · Daily Toolkit

Key V$ Views — Day 1 Queries

These are the live windows into Oracle's memory and processes. You'll use these every single day.

SQL — What is every session doing right now?

-- What is every session doing RIGHT NOW?
SELECT sid, username, status, event,
       seconds_in_wait, sql_id
FROM   v$session
WHERE  type = 'USER'
ORDER BY seconds_in_wait DESC;

SQL — Current SGA memory allocation

-- Current SGA memory allocation breakdown
SELECT name, bytes/1024/1024 AS mb
FROM   v$sgainfo
WHERE  name IN (
  'Buffer Cache Size',
  'Shared Pool Size',
  'Large Pool Size',
  'Redo Buffers'
);

SQL — Hard parses vs soft parses + key I/O stats

-- Hard parses vs soft parses + key I/O stats since instance startup
SELECT name, value
FROM   v$sysstat
WHERE  name IN (
  'parse count (hard)',
  'parse count (total)',
  'execute count',
  'db block gets',
  'consistent gets',
  'physical reads'
);

1.10 · Validate Your Understanding

Self-Check Questions

Before moving to Chapter 2, make sure you can answer these from memory. These are real interview questions and real production scenarios.

Q1 What is the difference between the Oracle instance and the Oracle database? If Oracle crashes and the DBA runs STARTUP, what is created first — the instance or the database opens?
Q2 A developer says "my query was fast last month, now it's slow." You check the buffer cache hit ratio and it's 70%. What does this tell you and what two things would you investigate next?
Q3 Why does Oracle store parsed SQL in the shared pool? What happens if the application doesn't use bind variables — and what is the performance impact at 10,000 executions per second?
Q4 A session is doing ORDER BY on 10 million rows. Where does the sort happen first? What happens if the sort doesn't fit, and which wait event appears in that case?
Q5 You see log file sync as the #1 wait event in ASH. Which background process is the bottleneck? Name three possible root causes and one fix for each.