Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Posts Tagged ‘plsql’

Why oracle’s optimizer has been getting smarter for 15 years and what 26ai version actually adds

Posted by FatDBA on March 31, 2026

Every bad execution plan you’ve ever debugged traces back to the same root cause. The optimizer made a wrong guess about how many rows an operation would return ..and built an entire plan on top of that wrong number.

That number is called cardinality. It’s the estimated row count for each operation in your plan. Get it right and the optimizer picks the right join order, the right join method, the right access path. Get it wrong and you get a nested loops join against a table that returns 500,000 rows when the optimizer thought it was 12. You’ve seen this plan. It hurt.

Oracle has been progressively solving this problem for 15+ years. The story isn’t a single breakthrough ..it’s a series of increasingly smarter mechanisms, each one handling a class of estimation problem that the previous one couldn’t.

Here’s the full honest picture, ending with what 26ai actually adds.

Oracle 10g .. Dynamic Sampling The optimizer noticed when stats were missing or insufficient and sampled the data at parse time to get a rough estimate. Controlled by OPTIMIZER_DYNAMIC_SAMPLING. Blunt instrument, but better than pure guesswork.

Oracle 11g .. Cardinality Feedback The optimizer started comparing its estimates to reality after execution. If it estimated 50 rows and got 50,000, it stored the real number in the SGA and flagged the statement for re-optimization on the next execution. The estimate corrected itself over time. The problem: stored in SGA only — lost on restart, lost when the cursor aged out.

Oracle 12c …The Big Jump Three things landed together:

  • Statistics Feedback (renamed from Cardinality Feedback): same learning mechanism, better persistence
  • Adaptive Plans: the optimizer could now switch join methods mid-execution .. starting with Nested Loops based on its estimate, and switching to Hash Join live if actual rows exceeded the threshold. The final plan was then fixed for subsequent executions
  • SQL Plan Directives: when a misestimate was detected, the optimizer created a persistent directive (stored in SYSAUX, survives restarts) that told future parses: “when you see this predicate pattern, gather dynamic statistics first”. Directives are cross-statement … query’s lesson protects another with the same predicate pattern

Oracle 19c/23ai ..Automation at Scale Automatic SQL Tuning Sets (ASTS), Automatic SPM, and Real-Time SPM turned the individual learning mechanisms into a system-level feedback loop. The database wasn’t just learning from single statements .. it was maintaining plan stability across the entire workload automatically.

Oracle 26ai (23.8 RU) .. The Specific New Additions Two documented, named improvements to cardinality estimation:

  1. Dynamic Statistics for PL/SQL Functions : a new parameter plsql_function_dynamic_stats giving fine-grained control over whether PL/SQL functions called inside SQL can participate in dynamic statistics sampling at parse time. Previously the optimizer treated PL/SQL functions as black boxes with unknowable return cardinality. Now it can sample them.
  2. PL/SQL to SQL Transpiler : when enabled, the optimizer inlines eligible PL/SQL functions directly into SQL at parse time, eliminating the black box entirely. The optimizer can now see and estimate the underlying SQL expression rather than guessing at what a function returns.

Plus the general continued improvement of ML-informed cost models inside the optimizer engine .. real, but not a named switchable feature.

Now let’s see all of this in the plan output where it actually matters and I will do a quick demo — The single most important diagnostic habit in Oracle performance work. The GATHER_PLAN_STATISTICS hint tells the optimizer to track actual row counts during execution, then ALLSTATS LAST in DBMS_XPLAN surfaces them alongside the estimates.

-- Prereqs (run as SYS)
GRANT ADVISOR TO sh;
GRANT ADMINISTER SQL MANAGEMENT OBJECT TO sh;

CONN sh/sh

-- Set output format for readable plans
SET LINESIZE 200
SET PAGESIZE 10000
SET LONG 100000

-- Run the query with stats collection enabled
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;



SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));
```

**Output — before extended statistics exist:**
```
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |     17 |    127 |  1890 |    1533 |  <- E:17, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |     17 |    127 |   406 |    1213 |  <- E:17, A:127
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

Predicate Information:
   2 - access("S"."CUST_ID"="C"."CUST_ID")
   3 - filter("C"."CUST_STATE_PROVINCE"='CA'
          AND "C"."CUST_INCOME_LEVEL"='G: 130,000 - 149,999')

E-Rows: 17. A-Rows: 127. That’s a 7.5x underestimate.

The optimizer assumed cust_state_province = 'CA' and cust_income_level = 'G: 130,000 - 149,999' were independent. They’re not — they’re correlated. California has a disproportionate number of high-income customers in this dataset. The optimizer applied the selectivity of each predicate independently, multiplied them, and got the wrong answer.

This is the classic multi-column predicate correlation problem. The fix is extended statistics.

Lets try to fix it via extended statistics: Extended statistics (column groups) teach the optimizer about correlated columns. One DBMS_STATS call, no schema changes.

-- Create a column group for the two correlated columns
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(
  ownname  => 'SH',
  tabname  => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
) AS col_group_name
FROM DUAL;

-- COL_GROUP_NAME
-- SYS_STUFBF#JKQM8F3GTPA7XDE9  (system-generated name)

-- Now gather stats to populate the column group
EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname    => 'SH',
  tabname    => 'CUSTOMERS',
  method_opt => 'FOR ALL COLUMNS SIZE AUTO'
);


---- Lets re runn the same Sql.
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


After extended statistics:
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |    124 |    127 |  1890 |    1533 |  <- E:124, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |    124 |    127 |   406 |    1213 |  <- Near perfect
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

E-Rows went from 17 to 124. Actual is 127. That’s less than 3% off.

Same plan hash .. same shape. But now the cost model is working from accurate numbers. In a more complex query, this difference in estimated rows would change join order, join method, and index decisions.

nEXT, Lets see SQL Plan directives and see watching optimizer learn.

When the optimizer detects a cardinality misestimate during execution, it creates a SQL Plan Directive — a persistent instruction stored in SYSAUX telling future parses to gather dynamic statistics for this predicate pattern. You can watch this happen.

First, drop the extended stats so the misestimate recurs:

-- Reset: drop the column group
EXEC DBMS_STATS.DELETE_EXTENDED_STATS(
  ownname   => 'SH',
  tabname   => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
);

EXEC DBMS_STATS.GATHER_TABLE_STATS('SH', 'CUSTOMERS');

-- lets fliush the SP and re-run.

-- As SYS (flush shared pool in test environment only)
ALTER SYSTEM FLUSH SHARED_POOL;

CONN sh/sh

-- Run with stats collection
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;


-- Lets see if the directive was crwated. 

-- Check for new SQL Plan Directives on CUSTOMERS
SELECT d.directive_id,
       d.type,
       d.state,
       d.auto_drop,
       d.created,
       o.object_name,
       o.subobject_name  AS column_name
FROM   dba_sql_plan_directives     d
JOIN   dba_sql_plan_dir_objects    o
       ON d.directive_id = o.directive_id
WHERE  o.object_name = 'CUSTOMERS'
ORDER  BY d.created DESC;


Output — directive created after the misestimate:

DIRECTIVE_ID  TYPE             STATE   AUTO_DROP CREATED              OBJECT  COLUMN_NAME
------------  ---------------  ------  --------- -------------------  ------  -------------------
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_STATE_PROVINCE
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_INCOME_LEVEL

The optimizer created a directive covering both columns … it noticed the multi-column predicate correlation caused a misestimate and now knows to sample dynamically next time it sees this pattern. Run the query a second time:

SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


At the bottom of the plan output:
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

The optimizer is now dynamically sampling at parse time because the directive told it to. The cardinality estimate will be much closer to reality on this execution.

Now new in 26ai ..The SQL aanalysis report .. This is the part that’s genuinely new in 26ai. Previously you had to know to look at E-Rows vs A-Rows yourself. The SQL Analysis Report .. surfaced directly in DBMS_XPLAN.DISPLAY_CURSOR output …flags these problems inline without you having to hunt for them.

-- The standard DISPLAY_CURSOR call — no extra parameters needed
-- SQL Analysis Report appears automatically in 26ai when issues exist

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


In Oracle 26ai, after the standard execution plan output, you now see:

SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
--------------------------------------------------------------------------------
3 - SEL$1 / "C"@"SEL$1"
  - The following columns have predicates which prevent their use as keys
    in an index range scan. Consider rewriting the predicates or creating
    column group statistics.
    "CUST_STATE_PROVINCE", "CUST_INCOME_LEVEL"

The optimizer is telling you directly: these two columns in combination are causing an estimation problem, and column group statistics would fix it. You no longer have to derive this by comparing E-Rows and A-Rows yourself. It’s surfaced automatically in the plan output.

That’s a real DBA quality-of-life improvement. The diagnosis that used to take 10 minutes of plan reading is now one line in your standard plan output.

Okay next is Dynamic Stats for PL/SQL Functions …. This is the specific new documented feature in 26ai (RU 23.8). Consider a query that filters through a PL/SQL function:

-- A function the optimizer previously couldn't estimate
CREATE OR REPLACE FUNCTION sh.get_high_value_threshold
RETURN NUMBER DETERMINISTIC IS
BEGIN
  RETURN 1000;
END;
/

-- Query using the function in a predicate
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();


Before 26ai (or with `plsql_function_dynamic_stats = 'OFF'`):

The optimizer treats `get_high_value_threshold()` as a black box. It has no idea what value the function returns, 
so it can't estimate selectivity. It either guesses based on defaults or uses a very conservative estimate.

| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |   9188 |  12116 |  <- Rough guess
|*  3|    TABLE ACCESS FULL | SALES |   9188 |  12116 |

In 26ai with plsql_function_dynamic_stats = 'ON':

-- Enable dynamic stats for PL/SQL functions (session level)
ALTER SESSION SET plsql_function_dynamic_stats = 'ON';

-- Rerun
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |  12203 |  12116 |  <- Near accurate
|*  3|    TABLE ACCESS FULL | SALES |  12203 |  12116 |

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - PL/SQL function sampled for dynamic statistics

The optimizer called the function during dynamic statistics gathering at parse time, got the actual return value (1000), and used it to estimate selectivity accurately. E-Rows 12,203 vs A-Rows 12,116 — less than 1% off.

You can also control this at the object level, which is the right approach in production .. turn it on only for specific functions you trust:

-- Prefer object-level control in production
-- Allow dynamic stats for a specific function
EXEC DBMS_STATS.SET_FUNCTION_PREFS(
  ownname   => 'SH',
  funcname  => 'GET_HIGH_VALUE_THRESHOLD',
  pref_name => 'PLSQL_FUNCTION_DYNAMIC_STATS',
  pref_value => 'ON'
);

-- Check current settings
SELECT function_name, preference_name, preference_value
FROM   all_stat_extensions
WHERE  object_type = 'FUNCTION'
AND    owner = 'SH';

So, in short, Oracle’s optimizer hasn’t made one big leap … it’s made fifteen years of deliberate, incremental improvements, each one handling a class of cardinality problem the previous release couldn’t.

What 26ai specifically adds isn’t magic. It’s two concrete, named, documented improvements … dynamic statistics for PL/SQL functions, and the SQL Analysis Report surfacing optimizer advice inline .. plus the PL/SQL transpiler removing the problem class entirely for eligible functions. These are real. They’re testable. They’re in the docs.

The underlying ML enhanced cost modelling is also real, but it’s an evolutionary improvement without a named switch … Oracle’s engineering continues to get better at estimating costs, particularly for complex workloads, vector queries, and correlated predicates. That’s not hype. It’s just not a single feature you can point to in the docs either.

Know your E-Rows vs A-Rows. Know your SQL Plan Directives. Know your extended statistics. And in 26ai, let the SQL Analysis Report do the first pass for you.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »

Oracle’s Evolution: Sweet Little Dual Tables Exiled in 23c

Posted by FatDBA on December 28, 2023

Hi All,

Here comes what is likely the final post of the year 2023!

While there have been previous discussions on this topic, I’d like to delve deeper into the specific enhancements and features related to the use of the DUAL table in Oracle 23c and little more about this sweet little object in Oracle databases …

The DUAL table has been a part of the Oracle Database for a very long time. It has been a fundamental component of Oracle’s SQL language for decades. The DUAL table in Oracle Database serves a specific purpose and is often used for various tasks. Its primary function is to provide a one-row, one-column table that can be used for evaluating expressions or performing certain types of queries. DUAL is a table automatically created by Oracle Database along with the data dictionary. DUAL is in the schema of the user SYS but is accessible by the name DUAL to all users. It has one column, DUMMY, defined to be VARCHAR2(1), and contains one row with a value X.

Some of the very popular use cases of it are – When evaluating expressions (SELECT SYSDATE FROM DUAL), Providing Constants (SELECT 10 * 5 FROM DUAL), In PL/SQL blocks or scripts, the DUAL table can be used to store and retrieve scalar values, we often use the DUAL table during testing or debugging to check the result of an expression or function without the need for an actual table … I always remember it using as a quick way to check connectivity of the database etc. I mean while it seemingly mundane, the DUAL table is a small but significant component in the Oracle database ecosystem, serving a variety of purposes across different contexts.

One of the interesting performance tip related to DUAL tables that I remember is – Starting from Oracle Database 10g Release 1, when calculating an expression that excludes the DUMMY column in the DUAL table, logical I/O is not executed. This optimization is identified as FAST DUAL in the execution plan. However, if you explicitly SELECT the DUMMY column from DUAL, this optimization is bypassed, and logical I/O is performed.

A quick example :

SQL> explain plan for select * from dual;        ---> Logical IO will be performed.

select * from table(dbms_xplan.display(format=>'basic'));

----------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| DUAL |
----------------------------------

Yet, when choosing a standard expression from DUAL, the FAST DUAL row source is employed:

SQL> explain plan for select sysdate from dual;    ---> No Logical IO will be performed.

select * from table(dbms_xplan.display(format=>'basic'));

---------------------------------
| Id | Operation | Name |
---------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FAST DUAL | |
---------------------------------

Starting with Oracle 23c, there is no need to explicitly select expressions from the DUAL table; you can simply omit the FROM clause altogether. Lets do a quick demo.

---------------------------------
-- Prior 23c release
---------------------------------
SQL > select sysdate from dual 

SYSDATE
---------
28-DEC-23

SQL > select sysdate;
ORA-00923 FROM keyword not found where expected


-- One more test
SQL> create sequence emp2;
Sequence Created

SQL> select emp2.nextval from dual;

NEXTVAL
---------
1

SQL> select emp2.nextval;
ORA-00923 FROM keyword not found where expected




---------------------------------
-- In Oracle 23c 
---------------------------------
SQL > select sysdate;
SYSDATE
---------
28-DEC-23

SQL> create sequence emp2;
Sequence Created

SQL> select emp2.nextval from dual;

NEXTVAL
---------
1

SQL> select emp2.nextval;

NEXTVAL
---------
2

Hope It Helped
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | 2 Comments »