Chapter 4 — Indexing Deep Dive

4.1 · How It Actually Works

B-tree Index Internals

Most tuners treat indexes as a black box. Don't. You need to understand the physical structure — it explains why indexes fail, when they're ignored, and what clustering factor actually measures.

Root block

Branch blocks

Leaf blocks (data lives here)

Table heap blocks

💡 Key insight: two I/Os per row

An index does NOT store actual row data — only the key value and a ROWID pointer. Getting the full row requires a second lookup into the table heap. This is why index + table fetch = 2 I/Os minimum per row. If 40% of a large table matches, those 2-I/O hops across millions of rows become more expensive than a single sequential full table scan.

4.2 · The Most Ignored Metric

Clustering Factor

Clustering factor measures how well the physical row order in the table matches the sorted order of the index. It is the single most important number for predicting whether an index range scan will be fast or slow — and most DBAs never check it.

Good clustering factor

1

2

3

4

5

Block 1

6

7

8

9

10

Block 2

CF ≈ num_blocks
Index-ordered rows cluster together. Range scan reads few blocks. Fast.

Bad clustering factor

1

9

3

7

2

Block 1

8

4

6

10

5

Block 2

CF ≈ num_rows
Each row in a different block. Range scan reads 1 block per row. CBO picks full scan.

SQL — Clustering factor analysis query

-- Full clustering factor health check for all indexes on a table
SELECT i.index_name,
       i.num_rows,
       i.clustering_factor,
       t.blocks                                                   AS table_blocks,
       ROUND(i.clustering_factor / NULLIF(i.num_rows,  0), 2)     AS cf_vs_rows,
       ROUND(i.clustering_factor / NULLIF(t.blocks, 0), 2)      AS cf_vs_blocks,
       CASE
         WHEN i.clustering_factor <= t.blocks * 2      THEN 'EXCELLENT'
         WHEN i.clustering_factor <= t.blocks * 10     THEN 'GOOD'
         WHEN i.clustering_factor <= i.num_rows * 0.5  THEN 'POOR'
         ELSE 'TERRIBLE — index likely ignored for range scans'
       END AS verdict
FROM   dba_indexes i
JOIN   dba_tables  t ON t.table_name = i.table_name
                    AND t.owner      = i.owner
WHERE  i.table_name = 'ORDERS'
AND    i.owner      = 'APPS'
ORDER BY i.clustering_factor;

🔎 Real scenario — the date index trap

You add an index on ORDER_DATE. Queries by date range still do full table scans. You check clustering factor: it's 9,800,000 vs num_rows of 10,000,000 — a ratio of 0.98. Almost every row is in a different block from its index-order neighbours. This is because rows were inserted in ORDER_ID sequence (autoincrement), not ORDER_DATE sequence. The index exists but is physically useless for range scans. Fix options: (1) Rebuild the table with ORDER_DATE as clustering key, (2) Accept full scans and use partitioning to limit scope.

4.3 · Choosing the Right Tool

Five Index Types

Oracle supports multiple index structures. Choosing the right type for the use case is as important as choosing which columns to index.

B-tree (default)

The workhorse. Works for equality, range, ORDER BY, and LIKE 'prefix%'. Sorted key values + ROWIDs.

High cardinality Range queries Low cardinality cols

DDL

CREATE INDEX idx_ord_date
  ON orders(order_date);

Bitmap index

Stores a bit-vector per distinct value. Excellent for low-cardinality columns in data warehouses. Catastrophic in OLTP — one DML locks the entire bitmap segment.

Low cardinality Read-only / DW OLTP — NEVER

DDL

CREATE BITMAP INDEX idx_status
  ON orders(status);
-- OLAP / DW only!

Function-based (FBI)

Indexes the result of a function on a column. Without this, any function on an indexed column breaks index access entirely.

UPPER/LOWER lookups Computed predicates Needs stats on expression

DDL

CREATE INDEX idx_email_upper
  ON customers(UPPER(email));
-- WHERE UPPER(email)=:b1 now uses idx

Reverse key index

Reverses the bytes of each key. Spreads inserts across all leaf blocks instead of concentrating on the rightmost block. Fixes right-side contention on sequences. Cannot be used for range scans.

Sequence / monotonic keys High-concurrency inserts Range scans broken

DDL

CREATE INDEX idx_oid_rev
  ON orders(order_id) REVERSE;
-- Equality only after reversal

Composite (multi-column) index — the most powerful and most misused

Column order matters enormously. The leading column must be present in the WHERE clause for efficient access. Rule: most selective equality column first, range column last.

Multi-column WHERE Covering indexes Wrong column order = skip scan or ignored

DDL — Correct composite ordering

-- Query: WHERE status = 'OPEN' AND order_date > SYSDATE-7
-- WRONG: (order_date, status) — range on date first → many blocks read
-- RIGHT: (status, order_date) — equality on status narrows to 2%, then range
CREATE INDEX idx_ord_status_date
  ON orders(status, order_date);

-- This index also serves: WHERE status = 'OPEN'  (leading col only — works)
-- But NOT:                WHERE order_date > ...  (skips leading col — full scan)

4.4 · The Golden Rule

Composite Index Column Ordering

This is where most 1-year tuners make costly mistakes. The order of columns in a composite index determines which queries can use it efficiently.

Column ordering decision guide — ORDERS table (10M rows)

✓ GOOD (customer_id, status) customer_id has 50K distinct values → 200 rows average. Most selective equality col leads.

✗ BAD (status, customer_id) status has 3 values → 3.3M rows. CBO must filter 3.3M rows to find 200. Wasteful.

✓ GOOD (status, order_date) Equality predicate (status) before range predicate (order_date). Always put range last.

✗ BAD (order_date, status) Range on date first → cannot use the equality filter efficiently in the index.

✓ GOOD (customer_id, status, order_date) Both equalities first (selective one leading), range last. Covers Q1, Q2, Q4 from below.

The three-part golden rule

1. Equality predicates first — most selective equality column leading.
2. Range predicates last — ORDER BY / GROUP BY columns can trail at the end to avoid a sort operation.
3. Include frequently-needed SELECT columns to enable index-only access (covering index).

4.5 · Eliminate Table Lookups

Covering Indexes

A covering index contains all columns the query needs — SELECT list, WHERE clause, ORDER BY — so Oracle never has to visit the table heap at all. This eliminates the second I/O hop entirely.

SQL — Create and verify a covering index

-- Query running 50,000 times per hour:
-- SELECT order_date, total_amount FROM orders WHERE customer_id=:b1 AND status='OPEN'

-- Without covering: index → ROWID → table (2 I/Os per row)
-- With covering:    index only, no table visit (1 I/O per row)

CREATE INDEX idx_ord_covering
  ON orders(customer_id, status, order_date, total_amount);
--           ^^equality   ^^equality ^needed        ^needed
--           these two drive         these two satisfy the SELECT list
--           the scan                without visiting the table

-- Verify Oracle uses index-only access — look for this in the plan:
-- INDEX RANGE SCAN with NO subsequent TABLE ACCESS BY ROWID

-- Check which columns are in the index
SELECT column_name, column_position
FROM   dba_ind_columns
WHERE  index_name  = 'IDX_ORD_COVERING'
AND    table_owner = 'APPS'
ORDER BY column_position;

4.6 · The Index Graveyard

8 Conditions That Kill Your Index

You created the index. The query still does a full scan. Here is exactly why — with a fix for each. Click each condition to expand.

4.7 · Dead Weight

Finding Unused Indexes

Every index you maintain has a cost — every INSERT, UPDATE, DELETE on indexed columns must also update the index. Unused indexes slow down DML for zero benefit.

SQL — Monitor and identify unused indexes

-- Enable monitoring on suspect indexes
ALTER INDEX idx_ord_status MONITORING USAGE;

-- After a week, check if it was used
SELECT index_name, table_name, monitoring, used, start_monitoring
FROM   v$object_usage
WHERE  table_name = 'ORDERS';

-- 12c+: Automatic tracking via V$INDEX_USAGE_INFO
SELECT name, owner, total_access_count,
       total_exec_count, last_used
FROM   v$index_usage_info
WHERE  owner = 'APPS'
ORDER BY total_access_count ASC;  -- 0 = strong drop candidate

-- Find indexes whose write cost exceeds read benefit
SELECT i.index_name, i.num_rows,
       s_r.value  AS index_reads,
       s_w.value  AS index_writes,
       ROUND(s_r.value * 100 /
         NULLIF(s_w.value + s_r.value, 0), 1) AS read_pct
FROM   dba_indexes          i,
       v$segment_statistics  s_r,
       v$segment_statistics  s_w
WHERE  s_r.object_name    = i.index_name
AND    s_w.object_name    = i.index_name
AND    s_r.statistic_name = 'logical reads'
AND    s_w.statistic_name = 'db block changes'
AND    i.owner = 'APPS'
ORDER BY read_pct ASC; -- near 0 = almost pure write overhead

4.8 · Index Health

Index Fragmentation

Indexes get fragmented over time from deletes and updates. Deleted entries leave dead space in leaf blocks. This increases the number of blocks Oracle must read and in some cases the tree height.

SQL — Analyze index structure and decide on rebuild

-- Analyze index structure (12c+: use ONLINE to avoid locking)
ANALYZE INDEX idx_ord_status VALIDATE STRUCTURE ONLINE;

-- Check fragmentation stats
SELECT name, height, blocks, lf_rows,
       del_lf_rows,
       ROUND(del_lf_rows * 100 /
         NULLIF(lf_rows + del_lf_rows, 0), 1) AS pct_deleted,
       rows_per_key
FROM   index_stats;
-- Rebuild if: pct_deleted > 20% OR height >= 4

-- Rebuild options (choose based on availability window)
ALTER INDEX idx_ord_status REBUILD;                   -- offline, fastest
ALTER INDEX idx_ord_status REBUILD ONLINE;            -- no DML lock (11g+)
ALTER INDEX idx_ord_status REBUILD ONLINE PARALLEL 4; -- parallel rebuild
ALTER INDEX idx_ord_status COALESCE;                  -- compact only, faster

⚠ Don't rebuild on a schedule

Many DBAs rebuild indexes weekly "just in case." If pct_deleted is low and height is 3, a rebuild buys nothing but costs I/O and maintenance window time. Only rebuild when the metrics justify it.

4.9 · Safe Production Testing

Invisible Index — Your Safest Testing Weapon

Oracle 11g+ allows you to create an index invisible to the optimizer by default. Test it for a single session without risking anything on the rest of the system. This is the correct way to test a new index on production.

SQL — Full invisible index test workflow

-- Step 1: Create invisible — all other sessions unaffected
CREATE INDEX idx_ord_test
  ON orders(customer_id, status, order_date)
  INVISIBLE;

-- Step 2: Enable ONLY for your test session
ALTER SESSION SET optimizer_use_invisible_indexes = TRUE;

-- Step 3: Run the problem query — it will consider the invisible index
-- Pull the plan — is it better? Compare A-Rows, elapsed time
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(...));

-- Step 4a: Satisfied? Make visible to ALL sessions
ALTER INDEX idx_ord_test VISIBLE;

-- Step 4b: Not happy? Drop with zero production impact
DROP INDEX idx_ord_test;

4.10 · The Complete Process

Full Index Tuning Workflow

01

Pull execution plan with +PREDICATE

Look at "Predicate Information" section. Note every column appearing in access and filter predicates — these are your index candidates.

02

Check existing indexes on those columns

Query dba_ind_columns. Maybe an index already exists but the column order is wrong, or a function wraps the predicate breaking it.

03

Check clustering factor of candidate indexes

If CF ≈ num_rows, even a perfect index won't be used for range scans. Diagnose and fix the physical ordering issue first.

04

Check column stats — is selectivity accurate?

num_distinct, density, histogram type. Missing histogram on skewed column means CBO doesn't know how selective the index will be.

05

Create the index INVISIBLE and test in your session only

Set optimizer_use_invisible_indexes=TRUE for your session. Re-run the query. Compare E-Rows vs A-Rows and elapsed time before/after.

06

Make VISIBLE if good, DROP if not

ALTER INDEX ... VISIBLE to deploy. DROP INDEX to clean up. No rollback steps, no downtime either way.

SQL — Complete workflow queries

-- STEP 1: Pull plan with predicates
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  '&sql_id', NULL, 'ALLSTATS LAST +PREDICATE +COST'
));

-- STEP 2: Check existing indexes on predicate columns
SELECT index_name, column_name, column_position, descend
FROM   dba_ind_columns
WHERE  table_name  = 'ORDERS'
AND    table_owner = 'APPS'
ORDER BY index_name, column_position;

-- STEP 3: Check clustering factor
SELECT index_name, num_rows, clustering_factor,
       ROUND(clustering_factor/NULLIF(num_rows,0),2) AS cf_ratio
FROM   dba_indexes
WHERE  table_name = 'ORDERS' AND owner = 'APPS';

-- STEP 4: Check column stats on predicate columns
SELECT column_name, num_distinct, density, histogram, num_buckets
FROM   dba_tab_col_statistics
WHERE  table_name = 'ORDERS' AND owner = 'APPS'
AND    column_name IN ('STATUS','CUSTOMER_ID','ORDER_DATE');

-- STEP 5: Create invisible and test
CREATE INDEX idx_ord_cust_status_date
  ON orders(customer_id, status, order_date) INVISIBLE;

ALTER SESSION SET optimizer_use_invisible_indexes = TRUE;

-- Re-run problem query, check plan + elapsed time

-- STEP 6: Deploy or clean up
ALTER INDEX idx_ord_cust_status_date VISIBLE;   -- deploy
-- OR
DROP INDEX idx_ord_cust_status_date;              -- clean up

4.11 · Validate Your Understanding

Self-Check Questions

Q1 You create an index on orders(order_date) but queries like WHERE TRUNC(order_date) = TRUNC(SYSDATE) still do full table scans. Explain exactly what is happening at the B-tree level, and give two ways to fix it.
Q2 An index has clustering_factor = 9,400,000 and num_rows = 9,500,000. The table has 120,000 blocks. A range-scan query using this index is slower than a full table scan. Explain precisely why, and what the long-term architectural fix is.
Q3 You have a composite index on (status, customer_id). A query with WHERE customer_id = 101 does a full table scan. A query with WHERE status = 'OPEN' AND customer_id = 101 uses the index. Explain the exact rule governing this behavior.
Q4 You need to test a new index on a 500GB production table without affecting any running queries or DML performance. What are the exact steps — include DDL commands?
Q5 A developer says "we have an index on the email column but WHERE UPPER(email) = :b1 always does a full scan." What type of index solves this? Write the exact DDL and explain why a regular B-tree index on email cannot help here.