Monday, March 6, 2023

Column Order in Composite Index

Column Order in Oracle Composite Indexes - Practical Deep Dive
Oracle Performance Series

Column Order in Composite IndexesHow leading portions, skip scans, ordering, and workload shape the right key sequence

Composite indexes are not just several columns bundled into one object. Their order determines which predicates form a usable key prefix, which statements can benefit from ordered access, and whether one index can support several important query patterns or only one narrow path. In Oracle, that makes column order a design choice with optimizer consequences.

Leading portionsThe central rule
Skip scanThe important exception
Workload fitDesign by query pattern
Plans decideValidate, then trust

Start with the right mental model: a composite index is one ordered key

Oracle documentation uses both composite and concatenated index. Both terms describe the same concept: Oracle stores one ordered key, not several independent mini-indexes. An index on (customer_id, order_date, status) is ordered first by customer_id, then by order_date within each customer, and then by status.

Leading portion controls navigability Column order changes usable SQL shapes One index can serve several prefixes
Key idea

If an index is defined as (A, B, C), the documented leading portions are (A), (A, B), and (A, B, C). That is why a predicate on B alone is fundamentally different from a predicate on A alone.

Index definition: (customer_id, order_date, status) Leading portion 1 customer_id Leading portion 2 customer_id order_date Leading portion 3 customer_id order_date status Columns farther to the right depend more heavily on earlier columns being useful to the statement.

Good instinct

Ask which predicate patterns must work when only part of the key is supplied.

Weak instinct

Treat the index as if Oracle can jump to any column position for free.
Bottom line: Column order matters because Oracle navigates the key from the left.

How order changes which query patterns the index can serve

Suppose two possible indexes exist on the same table: (customer_id, order_date) and (order_date, customer_id). They contain the same columns, but they do not support the same workload. The difference is concrete: each order exposes a different leading prefix.

Typical predicate Index (customer_id, order_date) Index (order_date, customer_id) Reason
WHERE customer_id = :b1 Strong fit No leading prefix The first index exposes customer_id as a documented leading portion.
WHERE customer_id = :b1 AND order_date >= :b2 Strong fit Possible for date-led SQL, but not aligned to the same prefix The first order matches the common “customer first, then date range” pattern.
WHERE order_date BETWEEN :d1 AND :d2 No leading prefix Strong fit Only the second index leads with order_date.
WHERE customer_id = :b1 ORDER BY order_date Often attractive Poor alignment The first order can both filter by customer and preserve date order within that customer slice.
What this means

A single composite index can satisfy several important statements only when the workload uses compatible leading prefixes. Same columns does not mean same usefulness.

SQL - The same columns in different orders
CREATE INDEX idx_orders_cust_date
  ON orders(customer_id, order_date);

CREATE INDEX idx_orders_date_cust
  ON orders(order_date, customer_id);

-- Query pattern A: customer lookup with optional date range
SELECT *
FROM   orders
WHERE  customer_id = :cust_id
AND    order_date >= :start_date;

-- Query pattern B: reporting by date range
SELECT *
FROM   orders
WHERE  order_date BETWEEN :d1 AND :d2;

How to choose the order without falling for bad rules of thumb

“Put the most selective column first” is too blunt to be a complete design rule. Selectivity matters, but Oracle’s own guidance starts with workload shape: which keys appear together, which keys appear alone, and which leading prefixes let multiple important statements use the same index.

1. Lead with reused prefixes

If the workload often filters by customer_id alone and also by customer_id + order_date, leading with customer_id gives both patterns a useful prefix.

2. Keep equality predicates early

Earlier equality predicates usually preserve a longer navigable prefix than earlier broad range predicates.

3. Include ordering needs

If important statements filter on one key and return rows ordered by the next key, the index order can help twice.

4. Validate with plans

When two orders are plausible, compare real plan behavior instead of relying on rules of thumb.
When (A, B) is usually better

A appears by itself often, and the common two-column statements are A = ... AND B ....

When (B, A) may be better

The real workload mostly starts with B, or reporting statements are driven primarily by B and only sometimes add A.

Observed workload patternUsually stronger orderReasoning
WHERE account_id = :x and WHERE account_id = :x AND tran_date >= :d(account_id, tran_date)The shorter and longer patterns share the same leading prefix.
WHERE status = :s is rare, but WHERE status = :s AND created_on BETWEEN ... is commonDepends on whether status must ever drive access aloneIf not, another order may better reflect the real access path.
WHERE region = :r ORDER BY sale_date(region, sale_date)That order can align both filtering and sorting needs.
Queries frequently omit column A but use B heavilyDo not assume (A, B) is enoughThe missing leading column may require a different index or may rely only on skip scan in narrow cases.
Design caution

A selective column placed first is not automatically the right answer if that order makes the workload’s common partial predicates unusable. Column order is about useful prefixes, not just raw distinct-count trivia.

Bottom line: Choose the order that matches how SQL reaches the table in production.

Equality first, broader ranges later, and never ignore ORDER BY

Once you understand leading portions, a practical pattern becomes clear: earlier equality predicates usually let Oracle navigate deeper into the key before the scan opens into a range. That is why an index such as (customer_id, order_date) is often a stronger fit for statements like customer_id = :x AND order_date >= :d than the reverse order.

Index order (customer_id, order_date) Equality customer_id Range order_date Oracle can navigate to one customer slice and then scan the relevant date range inside it. Index order (order_date, customer_id) Range first order_date Later key customer_id Useful for date-driven SQL, but less aligned to the “one customer, then date range” pattern.
Ordering benefit

Oracle documentation also notes that an index can eliminate a sort when the ORDER BY columns match the order of the leading index columns. Column order therefore has to be evaluated against sorting needs as well as filtering predicates.

SQL - One index helping filter and order
CREATE INDEX idx_sales_region_date
  ON sales(region, sale_date);

SELECT sale_id, sale_date, amount
FROM   sales
WHERE  region = :region
ORDER BY sale_date;

-- This order is attractive because the region slice is narrow first,
-- and rows inside that slice are already arranged by sale_date.
Design questionWhy it matters
Does the workload frequently do equality on one column and range on the next?That pattern often benefits from putting the equality-driving column first.
Does the query need rows back in key order?The right composite order can reduce or remove an explicit sort.
Is the query broad enough that Oracle may prefer a full scan anyway?Even a well-ordered index remains optional to the optimizer, not mandatory.

Index skip scan is the exception that softens the old oversimplification

A common simplification is that Oracle can use a composite index only when the leading column is present. Oracle documentation qualifies that statement: index skip scan is the main exception. Skip scan lets Oracle consider a composite index even when the leading column is missing by conceptually probing logical subindexes based on the leading key’s distinct values.

Conceptual skip scan on index (gender, email) Logical slice 1 gender = 'F' probe email inside the F slice Logical slice 2 gender = 'M' probe email inside the M slice Predicate email = :target leading key not supplied Skip scan is most plausible when the leading column has few distinct values and the later column is selective enough to justify those probes.
Skip-scan limit

Skip scan does not make column order irrelevant. It adds another optimizer option in some cases. It is not a blanket substitute for a properly ordered index.

Good skip-scan candidate

Leading key has very few values and the later key sharply narrows the search.

Weak skip-scan candidate

Leading key has many distinct values, so the conceptual slice probes become expensive.

Still workload-driven

If the nonleading-column query is frequent and important, a dedicated index may still be the better design.

Be cautious

Skip scan is an optimizer choice, not a guarantee that every missing-leading-column predicate will perform well.
SQL - A skip-scan style setup
CREATE INDEX idx_customer_gender_email
  ON customers(gender, email);

-- Leading column GENDER is absent here.
-- Oracle may consider skip scan if gender has few distinct values
-- and email is selective enough.
SELECT customer_id
FROM   customers
WHERE  email = :email_value;

How to validate column order responsibly

Index design needs plan evidence. Once two plausible orders exist, validate them with current statistics, representative SQL, and runtime plan inspection. Oracle also gives you operational tools to observe whether an index is used and to test an alternative order without exposing it broadly right away.

1. Gather current stats

Make sure the optimizer is reasoning with current object metadata.

2. Run representative SQL

Use realistic predicates, not synthetic statements unrelated to production patterns.

3. Inspect runtime plans

Use DBMS_XPLAN.DISPLAY_CURSOR with actual row-source stats where possible.

4. Compare alternatives safely

Use usage monitoring or invisible indexes before dropping or promoting a design.
SQL - Runtime plan verification
BEGIN
  DBMS_STATS.GATHER_TABLE_STATS(
    ownname => USER,
    tabname => 'ORDERS',
    cascade => TRUE);
END;
/

ALTER SESSION SET statistics_level = ALL;

SELECT /*+ gather_plan_statistics */ order_id, order_date, amount
FROM   orders
WHERE  customer_id = :cust_id
AND    order_date >= :start_date
ORDER BY order_date;

SELECT *
FROM   TABLE(
         DBMS_XPLAN.DISPLAY_CURSOR(
           NULL,
           NULL,
           'ALLSTATS LAST +PREDICATE +ALIAS +BYTES +COST'
         )
       );
SQL - Monitor usage and test safely
-- Observe whether an existing index is actually used.
ALTER INDEX idx_orders_cust_date MONITORING USAGE;

SELECT index_name, used, monitoring, start_monitoring, end_monitoring
FROM   user_object_usage
WHERE  index_name = 'IDX_ORDERS_CUST_DATE';

-- Build an alternative order invisibly.
CREATE INDEX idx_orders_date_cust_inv
  ON orders(order_date, customer_id)
  INVISIBLE;

-- In a controlled session, allow the optimizer to consider it.
ALTER SESSION SET optimizer_use_invisible_indexes = TRUE;
Why invisible indexes help

An invisible index is maintained by DML but ignored by the optimizer unless OPTIMIZER_USE_INVISIBLE_INDEXES is enabled. That makes it useful for comparing an alternative column order without changing general optimizer choices immediately.

Edge cases and misconceptions that trip people up

MisconceptionBetter interpretation
“A query on the second column can never use the composite index.”Not as a normal leading-prefix range lookup, but Oracle may still choose skip scan or a broader index scan in specific circumstances.
“Most selective first” is always right.Not reliably. The more durable question is which leading prefix the workload needs most often.
“If all needed columns are in the index, leading-column rules stop mattering.”Covering can reduce table visits, but it does not create a missing leading prefix for navigational access.
“Rows with null key values are always in the B-tree index.”Oracle does not index rows where all key columns are null in a normal B-tree index. That matters for some query designs.
“One big composite index replaces every smaller index.”Sometimes, but not automatically. Overlapping indexes can still differ in size, maintenance cost, and suitability for important SQL.

Null nuance

All-null key rows are absent from normal B-tree entries, so be careful when reasoning about scans that would need those rows represented in the index.

Covering nuance

A composite index can still be a smaller, index-only source for some statements even when a nonleading predicate weakens navigational use.

Maintenance nuance

Each extra overlapping index adds DML work, storage, and design complexity. “Just add another index” should not be the default answer.
Operational warning

Do not drop a shorter overlapping index only because a longer composite index starts with the same columns. Oracle may still prefer the shorter one for some statements because it is smaller or because it satisfies the statement more cheaply.

Bottom line: Learn the documented rules, then verify the exact workload.

Hands-on lab: compare two column orders in a controlled test

This lab works well in a sandbox. It gives you a repeatable setup and highlights the plan shapes worth inspecting when you compare two plausible index orders.

Step 1: build a table with mixed access patterns

SQL - Create demo data
DROP TABLE order_demo PURGE;

CREATE TABLE order_demo AS
SELECT level AS order_id,
       MOD(level, 5000) + 1 AS customer_id,
       DATE '2024-01-01' + MOD(level, 365) AS order_date,
       CASE
         WHEN MOD(level, 20) = 0 THEN 'CLOSED'
         WHEN MOD(level, 5) = 0 THEN 'PENDING'
         ELSE 'OPEN'
       END AS status,
       MOD(level, 2000) AS amount
FROM   dual
CONNECT BY level <= 200000;

BEGIN
  DBMS_STATS.GATHER_TABLE_STATS(
    ownname => USER,
    tabname => 'ORDER_DEMO',
    cascade => FALSE);
END;
/

Step 2: create two competing orders

SQL - Competing composite indexes
CREATE INDEX idx_od_cust_date_status
  ON order_demo(customer_id, order_date, status);

CREATE INDEX idx_od_date_cust_status
  ON order_demo(order_date, customer_id, status)
  INVISIBLE;

BEGIN
  DBMS_STATS.GATHER_TABLE_STATS(
    ownname => USER,
    tabname => 'ORDER_DEMO',
    cascade => TRUE);
END;
/

Step 3: run three representative statements

SQL - Queries to compare
ALTER SESSION SET statistics_level = ALL;

-- Query A: pure leading-column lookup
SELECT /*+ gather_plan_statistics */ order_id, order_date, status
FROM   order_demo
WHERE  customer_id = 1001;

-- Query B: customer plus date range and ordered output
SELECT /*+ gather_plan_statistics */ order_id, order_date, amount
FROM   order_demo
WHERE  customer_id = 1001
AND    order_date BETWEEN DATE '2024-03-01' AND DATE '2024-03-31'
ORDER BY order_date;

-- Query C: broad date-driven reporting
SELECT /*+ gather_plan_statistics */ order_id, customer_id, status
FROM   order_demo
WHERE  order_date BETWEEN DATE '2024-03-01' AND DATE '2024-03-31';

Step 4: what to inspect in the plan output

Expect for Query A

The (customer_id, order_date, status) order is naturally aligned because customer_id is the leading portion.

Expect for Query B

That same order remains attractive because it aligns the customer filter, the date range, and potentially the returned order.

Expect for Query C

The date-led alternative may become more interesting because the predicate begins with order_date.

Keep in mind

Fixed costs or universal winners. Your environment, stats, and row counts decide the exact plan.
SQL - Show the last runtime plan
SELECT *
FROM   TABLE(
         DBMS_XPLAN.DISPLAY_CURSOR(
           NULL,
           NULL,
           'ALLSTATS LAST +PREDICATE +BYTES +COST'
         )
       );

Quiz: do you have the right instincts about column order?

Answer from Oracle behavior, not rules of thumb

Q1. Why is (customer_id, order_date) usually better than (order_date, customer_id) for queries that often filter by customer_id alone?
Because the second column is ignored completely.
Because the first order exposes customer_id as a leading portion that single-column predicates can use.
Because Oracle always prefers customer IDs alphabetically.
Because date columns cannot appear first in a composite index.
Correct answer: leading portions decide partial usability. If the workload often starts with customer_id, placing it first lets that shorter predicate use the index.
Q2. What does skip scan change?
It makes column order irrelevant.
It guarantees fast access whenever the leading column is missing.
It gives Oracle an additional way to consider a composite index even without the leading column, in suitable cases.
It converts a composite index into several permanent separate indexes.
Correct answer: it adds an exception, not a free pass. Skip scan is real, but it does not erase the importance of the leading key.
Q3. Which statement is healthiest when deciding the order of columns?
Start from the workload’s common predicate prefixes and validate with plans.
Always put the most selective column first and stop thinking.
Always place date columns last, no matter what the SQL does.
Always combine every frequently filtered column into one large index.
Correct answer: start with the workload. The best order is the one that gives the real workload the most useful leading prefixes.
Q4. Why can (region, sale_date) be appealing for WHERE region = :r ORDER BY sale_date?
Because every index forces the same global sort order.
Because ORDER BY stops mattering after an index is created.
Because range predicates are forbidden on date columns.
Because the same order may align the filter and help Oracle avoid or reduce an extra sort.
Correct answer: index order can help twice. The right sequence can support row location and useful ordering at the same time.
Q5. What is true about B-tree composite indexes and nulls?
Every row is always stored, even if all indexed columns are null.
Rows in which all key columns are null are not indexed in a normal B-tree index.
Nulls are forbidden in every composite index.
Any row with one null component is removed from the index automatically.
Correct answer: all-null B-tree keys are absent. That detail affects how some queries can or cannot rely on the index alone.
Q6. What is the safest way to compare two plausible column orders in production-like testing?
Drop the current index immediately and hope the new one is better.
Rely only on a verbal rule such as “more selective first.”
Use current stats, compare runtime plans, and consider monitoring usage or invisible indexes for controlled testing.
Create both indexes permanently without reviewing DML cost.
Correct answer: measure before you commit. Oracle gives you monitoring and invisible-index tools so you can test more safely.
Takeaways

Think in prefixes, not column lists. Put columns first because the workload needs them first. Treat skip scan as a documented exception, not as a design strategy. And do not finalize an order until the plans, row counts, and operational trade-offs agree with the theory.

No comments:

Post a Comment