Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Why oracle’s optimizer has been getting smarter for 15 years and what 26ai version actually adds

Posted by FatDBA on March 31, 2026

Every bad execution plan you’ve ever debugged traces back to the same root cause. The optimizer made a wrong guess about how many rows an operation would return ..and built an entire plan on top of that wrong number.

That number is called cardinality. It’s the estimated row count for each operation in your plan. Get it right and the optimizer picks the right join order, the right join method, the right access path. Get it wrong and you get a nested loops join against a table that returns 500,000 rows when the optimizer thought it was 12. You’ve seen this plan. It hurt.

Oracle has been progressively solving this problem for 15+ years. The story isn’t a single breakthrough ..it’s a series of increasingly smarter mechanisms, each one handling a class of estimation problem that the previous one couldn’t.

Here’s the full honest picture, ending with what 26ai actually adds.

Oracle 10g .. Dynamic Sampling The optimizer noticed when stats were missing or insufficient and sampled the data at parse time to get a rough estimate. Controlled by OPTIMIZER_DYNAMIC_SAMPLING. Blunt instrument, but better than pure guesswork.

Oracle 11g .. Cardinality Feedback The optimizer started comparing its estimates to reality after execution. If it estimated 50 rows and got 50,000, it stored the real number in the SGA and flagged the statement for re-optimization on the next execution. The estimate corrected itself over time. The problem: stored in SGA only — lost on restart, lost when the cursor aged out.

Oracle 12c …The Big Jump Three things landed together:

  • Statistics Feedback (renamed from Cardinality Feedback): same learning mechanism, better persistence
  • Adaptive Plans: the optimizer could now switch join methods mid-execution .. starting with Nested Loops based on its estimate, and switching to Hash Join live if actual rows exceeded the threshold. The final plan was then fixed for subsequent executions
  • SQL Plan Directives: when a misestimate was detected, the optimizer created a persistent directive (stored in SYSAUX, survives restarts) that told future parses: “when you see this predicate pattern, gather dynamic statistics first”. Directives are cross-statement … query’s lesson protects another with the same predicate pattern

Oracle 19c/23ai ..Automation at Scale Automatic SQL Tuning Sets (ASTS), Automatic SPM, and Real-Time SPM turned the individual learning mechanisms into a system-level feedback loop. The database wasn’t just learning from single statements .. it was maintaining plan stability across the entire workload automatically.

Oracle 26ai (23.8 RU) .. The Specific New Additions Two documented, named improvements to cardinality estimation:

  1. Dynamic Statistics for PL/SQL Functions : a new parameter plsql_function_dynamic_stats giving fine-grained control over whether PL/SQL functions called inside SQL can participate in dynamic statistics sampling at parse time. Previously the optimizer treated PL/SQL functions as black boxes with unknowable return cardinality. Now it can sample them.
  2. PL/SQL to SQL Transpiler : when enabled, the optimizer inlines eligible PL/SQL functions directly into SQL at parse time, eliminating the black box entirely. The optimizer can now see and estimate the underlying SQL expression rather than guessing at what a function returns.

Plus the general continued improvement of ML-informed cost models inside the optimizer engine .. real, but not a named switchable feature.

Now let’s see all of this in the plan output where it actually matters and I will do a quick demo — The single most important diagnostic habit in Oracle performance work. The GATHER_PLAN_STATISTICS hint tells the optimizer to track actual row counts during execution, then ALLSTATS LAST in DBMS_XPLAN surfaces them alongside the estimates.

-- Prereqs (run as SYS)
GRANT ADVISOR TO sh;
GRANT ADMINISTER SQL MANAGEMENT OBJECT TO sh;

CONN sh/sh

-- Set output format for readable plans
SET LINESIZE 200
SET PAGESIZE 10000
SET LONG 100000

-- Run the query with stats collection enabled
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;



SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));
```

**Output — before extended statistics exist:**
```
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |     17 |    127 |  1890 |    1533 |  <- E:17, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |     17 |    127 |   406 |    1213 |  <- E:17, A:127
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

Predicate Information:
   2 - access("S"."CUST_ID"="C"."CUST_ID")
   3 - filter("C"."CUST_STATE_PROVINCE"='CA'
          AND "C"."CUST_INCOME_LEVEL"='G: 130,000 - 149,999')

E-Rows: 17. A-Rows: 127. That’s a 7.5x underestimate.

The optimizer assumed cust_state_province = 'CA' and cust_income_level = 'G: 130,000 - 149,999' were independent. They’re not — they’re correlated. California has a disproportionate number of high-income customers in this dataset. The optimizer applied the selectivity of each predicate independently, multiplied them, and got the wrong answer.

This is the classic multi-column predicate correlation problem. The fix is extended statistics.

Lets try to fix it via extended statistics: Extended statistics (column groups) teach the optimizer about correlated columns. One DBMS_STATS call, no schema changes.

-- Create a column group for the two correlated columns
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(
  ownname  => 'SH',
  tabname  => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
) AS col_group_name
FROM DUAL;

-- COL_GROUP_NAME
-- SYS_STUFBF#JKQM8F3GTPA7XDE9  (system-generated name)

-- Now gather stats to populate the column group
EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname    => 'SH',
  tabname    => 'CUSTOMERS',
  method_opt => 'FOR ALL COLUMNS SIZE AUTO'
);


---- Lets re runn the same Sql.
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


After extended statistics:
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |    124 |    127 |  1890 |    1533 |  <- E:124, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |    124 |    127 |   406 |    1213 |  <- Near perfect
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

E-Rows went from 17 to 124. Actual is 127. That’s less than 3% off.

Same plan hash .. same shape. But now the cost model is working from accurate numbers. In a more complex query, this difference in estimated rows would change join order, join method, and index decisions.

nEXT, Lets see SQL Plan directives and see watching optimizer learn.

When the optimizer detects a cardinality misestimate during execution, it creates a SQL Plan Directive — a persistent instruction stored in SYSAUX telling future parses to gather dynamic statistics for this predicate pattern. You can watch this happen.

First, drop the extended stats so the misestimate recurs:

-- Reset: drop the column group
EXEC DBMS_STATS.DELETE_EXTENDED_STATS(
  ownname   => 'SH',
  tabname   => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
);

EXEC DBMS_STATS.GATHER_TABLE_STATS('SH', 'CUSTOMERS');

-- lets fliush the SP and re-run.

-- As SYS (flush shared pool in test environment only)
ALTER SYSTEM FLUSH SHARED_POOL;

CONN sh/sh

-- Run with stats collection
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;


-- Lets see if the directive was crwated. 

-- Check for new SQL Plan Directives on CUSTOMERS
SELECT d.directive_id,
       d.type,
       d.state,
       d.auto_drop,
       d.created,
       o.object_name,
       o.subobject_name  AS column_name
FROM   dba_sql_plan_directives     d
JOIN   dba_sql_plan_dir_objects    o
       ON d.directive_id = o.directive_id
WHERE  o.object_name = 'CUSTOMERS'
ORDER  BY d.created DESC;


Output — directive created after the misestimate:

DIRECTIVE_ID  TYPE             STATE   AUTO_DROP CREATED              OBJECT  COLUMN_NAME
------------  ---------------  ------  --------- -------------------  ------  -------------------
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_STATE_PROVINCE
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_INCOME_LEVEL

The optimizer created a directive covering both columns … it noticed the multi-column predicate correlation caused a misestimate and now knows to sample dynamically next time it sees this pattern. Run the query a second time:

SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


At the bottom of the plan output:
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

The optimizer is now dynamically sampling at parse time because the directive told it to. The cardinality estimate will be much closer to reality on this execution.

Now new in 26ai ..The SQL aanalysis report .. This is the part that’s genuinely new in 26ai. Previously you had to know to look at E-Rows vs A-Rows yourself. The SQL Analysis Report .. surfaced directly in DBMS_XPLAN.DISPLAY_CURSOR output …flags these problems inline without you having to hunt for them.

-- The standard DISPLAY_CURSOR call — no extra parameters needed
-- SQL Analysis Report appears automatically in 26ai when issues exist

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


In Oracle 26ai, after the standard execution plan output, you now see:

SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
--------------------------------------------------------------------------------
3 - SEL$1 / "C"@"SEL$1"
  - The following columns have predicates which prevent their use as keys
    in an index range scan. Consider rewriting the predicates or creating
    column group statistics.
    "CUST_STATE_PROVINCE", "CUST_INCOME_LEVEL"

The optimizer is telling you directly: these two columns in combination are causing an estimation problem, and column group statistics would fix it. You no longer have to derive this by comparing E-Rows and A-Rows yourself. It’s surfaced automatically in the plan output.

That’s a real DBA quality-of-life improvement. The diagnosis that used to take 10 minutes of plan reading is now one line in your standard plan output.

Okay next is Dynamic Stats for PL/SQL Functions …. This is the specific new documented feature in 26ai (RU 23.8). Consider a query that filters through a PL/SQL function:

-- A function the optimizer previously couldn't estimate
CREATE OR REPLACE FUNCTION sh.get_high_value_threshold
RETURN NUMBER DETERMINISTIC IS
BEGIN
  RETURN 1000;
END;
/

-- Query using the function in a predicate
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();


Before 26ai (or with `plsql_function_dynamic_stats = 'OFF'`):

The optimizer treats `get_high_value_threshold()` as a black box. It has no idea what value the function returns, 
so it can't estimate selectivity. It either guesses based on defaults or uses a very conservative estimate.

| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |   9188 |  12116 |  <- Rough guess
|*  3|    TABLE ACCESS FULL | SALES |   9188 |  12116 |

In 26ai with plsql_function_dynamic_stats = 'ON':

-- Enable dynamic stats for PL/SQL functions (session level)
ALTER SESSION SET plsql_function_dynamic_stats = 'ON';

-- Rerun
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |  12203 |  12116 |  <- Near accurate
|*  3|    TABLE ACCESS FULL | SALES |  12203 |  12116 |

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - PL/SQL function sampled for dynamic statistics

The optimizer called the function during dynamic statistics gathering at parse time, got the actual return value (1000), and used it to estimate selectivity accurately. E-Rows 12,203 vs A-Rows 12,116 — less than 1% off.

You can also control this at the object level, which is the right approach in production .. turn it on only for specific functions you trust:

-- Prefer object-level control in production
-- Allow dynamic stats for a specific function
EXEC DBMS_STATS.SET_FUNCTION_PREFS(
  ownname   => 'SH',
  funcname  => 'GET_HIGH_VALUE_THRESHOLD',
  pref_name => 'PLSQL_FUNCTION_DYNAMIC_STATS',
  pref_value => 'ON'
);

-- Check current settings
SELECT function_name, preference_name, preference_value
FROM   all_stat_extensions
WHERE  object_type = 'FUNCTION'
AND    owner = 'SH';

So, in short, Oracle’s optimizer hasn’t made one big leap … it’s made fifteen years of deliberate, incremental improvements, each one handling a class of cardinality problem the previous release couldn’t.

What 26ai specifically adds isn’t magic. It’s two concrete, named, documented improvements … dynamic statistics for PL/SQL functions, and the SQL Analysis Report surfacing optimizer advice inline .. plus the PL/SQL transpiler removing the problem class entirely for eligible functions. These are real. They’re testable. They’re in the docs.

The underlying ML enhanced cost modelling is also real, but it’s an evolutionary improvement without a named switch … Oracle’s engineering continues to get better at estimating costs, particularly for complex workloads, vector queries, and correlated predicates. That’s not hype. It’s just not a single feature you can point to in the docs either.

Know your E-Rows vs A-Rows. Know your SQL Plan Directives. Know your extended statistics. And in 26ai, let the SQL Analysis Report do the first pass for you.

Hope It Helped!
Prashant Dixit

Leave a comment