Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Posts Tagged ‘renaps’

Why oracle’s optimizer has been getting smarter for 15 years and what 26ai version actually adds

Posted by FatDBA on March 31, 2026

Every bad execution plan you’ve ever debugged traces back to the same root cause. The optimizer made a wrong guess about how many rows an operation would return ..and built an entire plan on top of that wrong number.

That number is called cardinality. It’s the estimated row count for each operation in your plan. Get it right and the optimizer picks the right join order, the right join method, the right access path. Get it wrong and you get a nested loops join against a table that returns 500,000 rows when the optimizer thought it was 12. You’ve seen this plan. It hurt.

Oracle has been progressively solving this problem for 15+ years. The story isn’t a single breakthrough ..it’s a series of increasingly smarter mechanisms, each one handling a class of estimation problem that the previous one couldn’t.

Here’s the full honest picture, ending with what 26ai actually adds.

Oracle 10g .. Dynamic Sampling The optimizer noticed when stats were missing or insufficient and sampled the data at parse time to get a rough estimate. Controlled by OPTIMIZER_DYNAMIC_SAMPLING. Blunt instrument, but better than pure guesswork.

Oracle 11g .. Cardinality Feedback The optimizer started comparing its estimates to reality after execution. If it estimated 50 rows and got 50,000, it stored the real number in the SGA and flagged the statement for re-optimization on the next execution. The estimate corrected itself over time. The problem: stored in SGA only — lost on restart, lost when the cursor aged out.

Oracle 12c …The Big Jump Three things landed together:

  • Statistics Feedback (renamed from Cardinality Feedback): same learning mechanism, better persistence
  • Adaptive Plans: the optimizer could now switch join methods mid-execution .. starting with Nested Loops based on its estimate, and switching to Hash Join live if actual rows exceeded the threshold. The final plan was then fixed for subsequent executions
  • SQL Plan Directives: when a misestimate was detected, the optimizer created a persistent directive (stored in SYSAUX, survives restarts) that told future parses: “when you see this predicate pattern, gather dynamic statistics first”. Directives are cross-statement … query’s lesson protects another with the same predicate pattern

Oracle 19c/23ai ..Automation at Scale Automatic SQL Tuning Sets (ASTS), Automatic SPM, and Real-Time SPM turned the individual learning mechanisms into a system-level feedback loop. The database wasn’t just learning from single statements .. it was maintaining plan stability across the entire workload automatically.

Oracle 26ai (23.8 RU) .. The Specific New Additions Two documented, named improvements to cardinality estimation:

  1. Dynamic Statistics for PL/SQL Functions : a new parameter plsql_function_dynamic_stats giving fine-grained control over whether PL/SQL functions called inside SQL can participate in dynamic statistics sampling at parse time. Previously the optimizer treated PL/SQL functions as black boxes with unknowable return cardinality. Now it can sample them.
  2. PL/SQL to SQL Transpiler : when enabled, the optimizer inlines eligible PL/SQL functions directly into SQL at parse time, eliminating the black box entirely. The optimizer can now see and estimate the underlying SQL expression rather than guessing at what a function returns.

Plus the general continued improvement of ML-informed cost models inside the optimizer engine .. real, but not a named switchable feature.

Now let’s see all of this in the plan output where it actually matters and I will do a quick demo — The single most important diagnostic habit in Oracle performance work. The GATHER_PLAN_STATISTICS hint tells the optimizer to track actual row counts during execution, then ALLSTATS LAST in DBMS_XPLAN surfaces them alongside the estimates.

-- Prereqs (run as SYS)
GRANT ADVISOR TO sh;
GRANT ADMINISTER SQL MANAGEMENT OBJECT TO sh;

CONN sh/sh

-- Set output format for readable plans
SET LINESIZE 200
SET PAGESIZE 10000
SET LONG 100000

-- Run the query with stats collection enabled
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;



SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));
```

**Output — before extended statistics exist:**
```
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |     17 |    127 |  1890 |    1533 |  <- E:17, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |     17 |    127 |   406 |    1213 |  <- E:17, A:127
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

Predicate Information:
   2 - access("S"."CUST_ID"="C"."CUST_ID")
   3 - filter("C"."CUST_STATE_PROVINCE"='CA'
          AND "C"."CUST_INCOME_LEVEL"='G: 130,000 - 149,999')

E-Rows: 17. A-Rows: 127. That’s a 7.5x underestimate.

The optimizer assumed cust_state_province = 'CA' and cust_income_level = 'G: 130,000 - 149,999' were independent. They’re not — they’re correlated. California has a disproportionate number of high-income customers in this dataset. The optimizer applied the selectivity of each predicate independently, multiplied them, and got the wrong answer.

This is the classic multi-column predicate correlation problem. The fix is extended statistics.

Lets try to fix it via extended statistics: Extended statistics (column groups) teach the optimizer about correlated columns. One DBMS_STATS call, no schema changes.

-- Create a column group for the two correlated columns
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(
  ownname  => 'SH',
  tabname  => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
) AS col_group_name
FROM DUAL;

-- COL_GROUP_NAME
-- SYS_STUFBF#JKQM8F3GTPA7XDE9  (system-generated name)

-- Now gather stats to populate the column group
EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname    => 'SH',
  tabname    => 'CUSTOMERS',
  method_opt => 'FOR ALL COLUMNS SIZE AUTO'
);


---- Lets re runn the same Sql.
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


After extended statistics:
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |    124 |    127 |  1890 |    1533 |  <- E:124, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |    124 |    127 |   406 |    1213 |  <- Near perfect
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

E-Rows went from 17 to 124. Actual is 127. That’s less than 3% off.

Same plan hash .. same shape. But now the cost model is working from accurate numbers. In a more complex query, this difference in estimated rows would change join order, join method, and index decisions.

nEXT, Lets see SQL Plan directives and see watching optimizer learn.

When the optimizer detects a cardinality misestimate during execution, it creates a SQL Plan Directive — a persistent instruction stored in SYSAUX telling future parses to gather dynamic statistics for this predicate pattern. You can watch this happen.

First, drop the extended stats so the misestimate recurs:

-- Reset: drop the column group
EXEC DBMS_STATS.DELETE_EXTENDED_STATS(
  ownname   => 'SH',
  tabname   => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
);

EXEC DBMS_STATS.GATHER_TABLE_STATS('SH', 'CUSTOMERS');

-- lets fliush the SP and re-run.

-- As SYS (flush shared pool in test environment only)
ALTER SYSTEM FLUSH SHARED_POOL;

CONN sh/sh

-- Run with stats collection
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;


-- Lets see if the directive was crwated. 

-- Check for new SQL Plan Directives on CUSTOMERS
SELECT d.directive_id,
       d.type,
       d.state,
       d.auto_drop,
       d.created,
       o.object_name,
       o.subobject_name  AS column_name
FROM   dba_sql_plan_directives     d
JOIN   dba_sql_plan_dir_objects    o
       ON d.directive_id = o.directive_id
WHERE  o.object_name = 'CUSTOMERS'
ORDER  BY d.created DESC;


Output — directive created after the misestimate:

DIRECTIVE_ID  TYPE             STATE   AUTO_DROP CREATED              OBJECT  COLUMN_NAME
------------  ---------------  ------  --------- -------------------  ------  -------------------
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_STATE_PROVINCE
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_INCOME_LEVEL

The optimizer created a directive covering both columns … it noticed the multi-column predicate correlation caused a misestimate and now knows to sample dynamically next time it sees this pattern. Run the query a second time:

SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


At the bottom of the plan output:
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

The optimizer is now dynamically sampling at parse time because the directive told it to. The cardinality estimate will be much closer to reality on this execution.

Now new in 26ai ..The SQL aanalysis report .. This is the part that’s genuinely new in 26ai. Previously you had to know to look at E-Rows vs A-Rows yourself. The SQL Analysis Report .. surfaced directly in DBMS_XPLAN.DISPLAY_CURSOR output …flags these problems inline without you having to hunt for them.

-- The standard DISPLAY_CURSOR call — no extra parameters needed
-- SQL Analysis Report appears automatically in 26ai when issues exist

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


In Oracle 26ai, after the standard execution plan output, you now see:

SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
--------------------------------------------------------------------------------
3 - SEL$1 / "C"@"SEL$1"
  - The following columns have predicates which prevent their use as keys
    in an index range scan. Consider rewriting the predicates or creating
    column group statistics.
    "CUST_STATE_PROVINCE", "CUST_INCOME_LEVEL"

The optimizer is telling you directly: these two columns in combination are causing an estimation problem, and column group statistics would fix it. You no longer have to derive this by comparing E-Rows and A-Rows yourself. It’s surfaced automatically in the plan output.

That’s a real DBA quality-of-life improvement. The diagnosis that used to take 10 minutes of plan reading is now one line in your standard plan output.

Okay next is Dynamic Stats for PL/SQL Functions …. This is the specific new documented feature in 26ai (RU 23.8). Consider a query that filters through a PL/SQL function:

-- A function the optimizer previously couldn't estimate
CREATE OR REPLACE FUNCTION sh.get_high_value_threshold
RETURN NUMBER DETERMINISTIC IS
BEGIN
  RETURN 1000;
END;
/

-- Query using the function in a predicate
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();


Before 26ai (or with `plsql_function_dynamic_stats = 'OFF'`):

The optimizer treats `get_high_value_threshold()` as a black box. It has no idea what value the function returns, 
so it can't estimate selectivity. It either guesses based on defaults or uses a very conservative estimate.

| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |   9188 |  12116 |  <- Rough guess
|*  3|    TABLE ACCESS FULL | SALES |   9188 |  12116 |

In 26ai with plsql_function_dynamic_stats = 'ON':

-- Enable dynamic stats for PL/SQL functions (session level)
ALTER SESSION SET plsql_function_dynamic_stats = 'ON';

-- Rerun
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |  12203 |  12116 |  <- Near accurate
|*  3|    TABLE ACCESS FULL | SALES |  12203 |  12116 |

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - PL/SQL function sampled for dynamic statistics

The optimizer called the function during dynamic statistics gathering at parse time, got the actual return value (1000), and used it to estimate selectivity accurately. E-Rows 12,203 vs A-Rows 12,116 — less than 1% off.

You can also control this at the object level, which is the right approach in production .. turn it on only for specific functions you trust:

-- Prefer object-level control in production
-- Allow dynamic stats for a specific function
EXEC DBMS_STATS.SET_FUNCTION_PREFS(
  ownname   => 'SH',
  funcname  => 'GET_HIGH_VALUE_THRESHOLD',
  pref_name => 'PLSQL_FUNCTION_DYNAMIC_STATS',
  pref_value => 'ON'
);

-- Check current settings
SELECT function_name, preference_name, preference_value
FROM   all_stat_extensions
WHERE  object_type = 'FUNCTION'
AND    owner = 'SH';

So, in short, Oracle’s optimizer hasn’t made one big leap … it’s made fifteen years of deliberate, incremental improvements, each one handling a class of cardinality problem the previous release couldn’t.

What 26ai specifically adds isn’t magic. It’s two concrete, named, documented improvements … dynamic statistics for PL/SQL functions, and the SQL Analysis Report surfacing optimizer advice inline .. plus the PL/SQL transpiler removing the problem class entirely for eligible functions. These are real. They’re testable. They’re in the docs.

The underlying ML enhanced cost modelling is also real, but it’s an evolutionary improvement without a named switch … Oracle’s engineering continues to get better at estimating costs, particularly for complex workloads, vector queries, and correlated predicates. That’s not hype. It’s just not a single feature you can point to in the docs either.

Know your E-Rows vs A-Rows. Know your SQL Plan Directives. Know your extended statistics. And in 26ai, let the SQL Analysis Report do the first pass for you.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »

Real Time Data Replication from Oracle to DataStax Cassandra using Oracle GoldenGate 23ai DAA

Posted by FatDBA on March 16, 2026

Modern enterprise architectures rarely live inside a single database engine anymore. It is very common to see operational data stored in relational systems like Oracle while analytics or distributed applications rely on NoSQL platforms such as Cassandra.

In several real-world projects I have worked on, Cassandra clusters were used to power high-volume applications while Oracle remained the authoritative system of record. In these cases, Oracle GoldenGate D23ai DAA becomes a powerful bridge that streams transactional changes from Oracle redo logs directly into Cassandra in near real time.

In this article we will walk through a complete end-to-end implementation of Oracle → DataStax Cassandra replication using Oracle GoldenGate.

The goal is to demonstrate:

• Initial data load
• Continuous change capture
• Replication of DML operations
• Validation on the Cassandra side

Source System : Oracle Database Server
Hostname: ora-prod01.lab.internal
IP: 10.10.10.11

Operating System: Oracle Linux 8
Database Version: Oracle 19c (19.21 RU)
GoldenGate Version: 23ai Classic Deployment

Target System: DataStax Enterprise Cassandra Node
Hostname: cass-node01.lab.internal
IP: 10.10.10.21

Operating System: Rocky Linux 8
Cassandra Distribution: DataStax Enterprise 6.8.50

GoldenGate Big Data / Java Adapter is installed on the Cassandra side.

Step 1 – Prepare Oracle Source Database

Oracle GoldenGate captures data changes directly from the Oracle redo logs, allowing near real-time replication without querying tables. To support this, the database must have ARCHIVELOG mode, force logging, and supplemental logging enabled so that all row-level changes are fully recorded and available for capture.

sqlplus / as sysdba

archive log list;
Database log mode              No Archive Mode
Automatic archival             Disabled

-- enable archivelog mode
shutdown immediate;
startup mount;
alter database archivelog;
alter database open;

-- enable force logging
alter database force logging;

-- enable supp logging 
alter database add supplemental log data;

-- enabled GG parameter in source database
alter system set enable_goldengate_replication=true scope=both;

-- streams pool size 
alter system set streams_pool_size=2G scope=both;

Step 2 – Create GoldenGate User in Oracle

A dedicated database user is required for GoldenGate to capture changes from the Oracle redo logs and access the necessary metadata. This user is granted the required privileges to read transaction logs, access dictionary information, and manage replication objects.

CREATE TABLESPACE ogg_ts 
DATAFILE '/u02/oradata/ogg_ts01.dbf' 
SIZE 500M AUTOEXTEND ON;

CREATE USER ggadmin IDENTIFIED BY "Welcome#123"
DEFAULT TABLESPACE ogg_ts
TEMPORARY TABLESPACE temp;

GRANT CONNECT, RESOURCE TO ggadmin;
GRANT SELECT ANY TABLE TO ggadmin;
GRANT SELECT ANY TRANSACTION TO ggadmin;
GRANT SELECT ANY DICTIONARY TO ggadmin;
GRANT FLASHBACK ANY TABLE TO ggadmin;
GRANT ALTER SYSTEM TO ggadmin;
GRANT UNLIMITED TABLESPACE TO ggadmin;


-- Granted OGG admin privs 
exec dbms_goldengate_auth.grant_admin_privilege(
   grantee => 'GGADMIN',
   privilege_type => 'CAPTURE',
   grant_select_privileges => TRUE,
   do_grants => TRUE);

Step 3 – Create Source Table in Oracle

To validate the replication pipeline, create a sample table and insert a few test records in the Oracle source database. These records will later be used to verify that GoldenGate successfully captures and replicates the data to the DataStax Cassandra target.

CREATE TABLE SALES_TRANSACTIONS
(
   TXN_ID NUMBER PRIMARY KEY,
   CUSTOMER_NAME VARCHAR2(100),
   PRODUCT_CODE VARCHAR2(50),
   TXN_AMOUNT NUMBER,
   TXN_TIMESTAMP DATE
);

INSERT INTO SALES_TRANSACTIONS VALUES
(101,'John Carter','PRD-A',1200,SYSDATE);

INSERT INTO SALES_TRANSACTIONS VALUES
(102,'Alice Brown','PRD-B',850,SYSDATE);

INSERT INTO SALES_TRANSACTIONS VALUES
(103,'Michael Lee','PRD-C',430,SYSDATE);

COMMIT;


select * from SALES_TRANSACTIONS;

TXN_ID CUSTOMER_NAME PRODUCT_CODE TXN_AMOUNT TXN_TIMESTAMP
-----------------------------------------------------------
101    John Carter   PRD-A        1200       05-MAR-26
102    Alice Brown   PRD-B        850        05-MAR-26
103    Michael Lee   PRD-C        430        05-MAR-26

Step 4 – Install DataStax Cassandra

On the target node install DataStax Enterprise. Example repository configuration:

sudo vi /etc/yum.repos.d/datastax.repo

[datastax]
name=DataStax Repo
baseurl=https://rpm.datastax.com/enterprise
enabled=1
gpgcheck=0

-- Install DSE 
sudo yum install dse-full-6.8.50 -y

-- Start DSE 
sudo systemctl start dse

-- Veroify cluster/nodes status 
nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal

--  Address       Load       Tokens Owns  Host ID                               Rack
UN  10.10.10.21   120 KB     256    100%  8c34a0c1-b8c1-4b35-a6a7-9c2d8bfae111   rack1

Step 5 – Create Cassandra Keyspace

In DataStax Cassandra, a keyspace is the top-level namespace that defines how data is replicated across the cluster. Before creating tables for replication, we first create a keyspace and configure its replication strategy and replication factor.

cqlsh 10.10.10.21 <port> -u xxxx -p xxx

CREATE KEYSPACE retail_ks
WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': 1
};

-- verify 
DESCRIBE KEYSPACES;


-- Created target tables in cassandra
USE retail_ks;

CREATE TABLE sales_transactions
(
   txn_id int,
   customer_name text,
   product_code text,
   txn_amount decimal,
   txn_timestamp timestamp,
   PRIMARY KEY(txn_id)
);

-- verofy table structure 
DESCRIBE TABLE retail_ks.sales_transactions;

Step 6 – Install GoldenGate on Oracle

Oracle GoldenGate software must be installed on the source server to capture database changes. In production environments, the silent installation method is commonly used because it allows automated and repeatable deployments without manual interaction.

-- Unpack GoldenGate binaries.
unzip 231000_fbo_ggs_Linux_x64_Oracle.zip

-- Run silent installation. Respoonse file
INSTALL_OPTION=ora23ai
SOFTWARE_LOCATION=/u01/app/ogg
START_MANAGER=true
MANAGER_PORT=7809
DATABASE_LOCATION=/u01/app/oracle/product/19c


-- Install
./runInstaller -silent -responseFile ogg.rsp

-- LaUnch OGG
cd /u01/app/ogg
./ggsci

GGSCI> INFO ALL
Program     Status
MANAGER     RUNNING

Step 7 – Enable Table Level Replication

Login to database and enable supplemental logging for the table.

GGSCI> DBLOGIN USERID ggadmin PASSWORD Welcome#123

GGSCI> ADD TRANDATA SALES_TRANSACTIONS
OGG-15132 Logging supplemental redo data enabled

Step 8 – Configure Extract

Create parameter file.

GGSCI> EDIT PARAMS EXT_SALES

EXTRACT EXT_SALES
USERID ggadmin PASSWORD Welcome#123
EXTTRAIL ./dirdat/st
TRANLOGOPTIONS INTEGRATEDPARAMS (PARALLELISM 4)
TABLE SALES_TRANSACTIONS;


-- Create extract
ADD EXTRACT EXT_SALES INTEGRATED TRANLOG BEGIN NOW
ADD EXTTRAIL ./dirdat/st EXTRACT EXT_SALES

-- start & verifgy its status 
START EXTRACT EXT_SALES
INFO EXTRACT EXT_SALES

EXTRACT    EXT_SALES     Last Started 2026-03-15 21:48   Status RUNNING
Checkpoint Lag           00:00:00 (updated 00:00:04 ago)
Process ID               32418
Log Read Checkpoint      Oracle Redo Logs
                        SCN  845732918 (0x0000.3267a3b6)
                        2026-03-15 21:48:12

Step 9 – Configure Data Pump

The GoldenGate Data Pump reads captured changes from the local trail files and forwards them to the target GoldenGate deployment. This process improves reliability and allows data to be securely transported to the target environment.

EDIT PARAMS PUMP_SALES

EXTRACT PUMP_SALES
RMTHOST 10.10.10.21, MGRPORT 7809
RMTTRAIL ./dirdat/rt
TABLE SALES_TRANSACTIONS;

-- add pump
ADD EXTRACT PUMP_SALES EXTTRAILSOURCE ./dirdat/st
ADD RMTTRAIL ./dirdat/rt EXTRACT PUMP_SALES

-- Start Pump
START EXTRACT PUMP_SALES
INFO PUMP_SALES

EXTRACT    PUMP_SALES    Last Started 2026-03-15 21:48   Status RUNNING
Checkpoint Lag           00:00:00 (updated 00:00:02 ago)
Process ID               32455
Log Read Checkpoint      File ./dirdat/es000000000
                        RBA 18425

Step 10 – Configure Cassandra Handler

On the GoldenGate Big Data / DAA node, configure the Cassandra handler properties so that the Replicat process knows how to connect to the DataStax Cassandra cluster and apply incoming changes. This configuration defines connection details, consistency level, datacenter information, and the Java classpath required for the Cassandra driver.

This configuration allows the GoldenGate Replicat process to establish a connection with the Cassandra cluster and apply transactional changes coming from the Oracle source database.

[oggadmin@cass-hub01 ~]$ vi /u01/app/ogg_23ai/dirprm/cassandra.props
gg.handlerlist=cassandra

gg.handler.cassandra.type=cassandra
gg.handler.cassandra.mode=op
gg.handler.cassandra.contactPoints=10.20.30.41
gg.handler.cassandra.port=9042
gg.handler.cassandra.keyspace=retail_ops_ks
gg.handler.cassandra.consistencyLevel=LOCAL_QUORUM
gg.handler.cassandra.cassandraMode=async
gg.handler.cassandra.compressedUpdates=true
gg.handler.cassandra.datacenter=DC_MAIN
gg.log=log4j
gg.log.level=INFO
gg.report.time=30sec
gg.classpath=/opt/datastax/cassandra/resources/cassandra/lib/*:/u01/app/ogg_23ai/DependencyDownloader/dependencies/cassandra_4.1/*
javawriter.stats.full=true
javawriter.stats.display=TRUE

-- If authentication is enabled in DataStax Cassandra, the property file can also include credentials depending on the adapter version and security model being used.
-- exampl;e
gg.handler.cassandra.username=oggsvc
gg.handler.cassandra.password=Str0ngPassw0rd

Step 11 – Configure Replicat

At this stage, the source-side Extract and Data Pump are already running and writing trail records to the remote trail on the target GoldenGate deployment. The next step is to configure the Replicat process on the Cassandra side so that trail records can be applied into the target DataStax Cassandra keyspace and table.

In a GoldenGate Big Data or DAA style deployment, the first thing you should always do is connect to the deployment administration interface and verify that the deployment is up before creating the Replicat.

-- Loginto deployment
[oggadmin@cass-hub01 ~]$ cd /u01/app/ogg_23ai/bin
[oggadmin@cass-hub01 bin]$ ./adminclient

Oracle GoldenGate Admin Client for Oracle
Version 23.10.0.25.10 OGGCORE_23.10.0.0.0_PLATFORMS_251018.0830

Copyright (C) 1995, 2025, Oracle and/or its affiliates. All rights reserved.

Linux, x64, 64bit (optimized), Generic on Oct 18 2025 14:00:54
Operating system character set identified as UTF-8.


-- connect to the deployment
OGG (not connected) 1> CONNECT http://10.10.40.25:9001 deployment CassandraDAA as oggadmin password "xxxxxx";
Successfully connected to CassandraDAA.

Before creating the Replicat, verify that the remote trail files are arriving from the Data Pump. This is a simple but important validation step. If the trail is not being written, the Replicat will start but will have nothing to process.

[oggadmin@cass-hub01 ~]$ cd /u01/app/ogg_23ai/var/lib/data
[oggadmin@cass-hub01 data]$ ls -ltr rt*

-rw-r----- 1 oggadmin oinstall  1048576 Mar 15 20:11 rt000000000
-rw-r----- 1 oggadmin oinstall   524288 Mar 15 20:16 rt000000001

This confirms that the remote trail prefix rt is present and the target deployment is receiving records from the source-side Pump process.

Next lets create the replicate.

OGG (http://10.10.40.25:9001 CassandraDAA as oggadmin@CassandraDAA) 3> EDIT PARAMS REP_CASS

REPLICAT REP_CASS
TARGETDB LIBFILE libggjava.so SET property=dirprm/cassandra.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 1000
DISCARDFILE ./var/lib/data/REP_CASS.dsc, APPEND, MEGABYTES 500
DISCARDROLLOVER AT 02:00
MAP APP_OWNER.SALES_ORDERS_SRC,
TARGET retail_ops_ks.sales_orders_rt;

-- add the repl process
OGG (http://10.10.40.25:9001 CassandraDAA as oggadmin@CassandraDAA) 4> ADD REPLICAT REP_CASS, EXTTRAIL ./var/lib/data/rt
REPLICAT added.

-- staret the process
OGG (http://10.10.40.25:9001 CassandraDAA as oggadmin@CassandraDAA) 5> START REPLICAT REP_CASS
Sending START request to Manager ...
REPLICAT REP_CASS starting.

REPLICAT   REP_CASS      Last Started 2026-03-15 21:49   Status RUNNING
Checkpoint Lag           00:00:00 (updated 00:00:01 ago)
Process ID               32504
Log Read Checkpoint      File ./dirdat/rt000000000
                        RBA 18874


Params explanations …

REPLICAT REP_CASS defines the Replicat group name.

TARGETDB LIBFILE libggjava.so SET property=dirprm/cassandra.props tells GoldenGate to use the Java-based Big Data adapter library and load all Cassandra handler settings from the referenced properties file.

REPORTCOUNT EVERY 1 MINUTES, RATE prints transaction statistics regularly into the report file, which is very useful during initial testing and performance validation.

GROUPTRANSOPS 1000 batches operations to improve throughput. This value can be tuned depending on transaction profile, row size, and Cassandra write latency.
DISCARDFILE captures records that fail processing, while DISCARDROLLOVER keeps the discard file manageable.
The MAP statement defines the source Oracle table and the target DataStax Cassandra table.

In this example:
Oracle source table: APP_OWNER.SALES_ORDERS_SRC
Cassandra target table: retail_ops_ks.sales_orders_rt

Step 12 – Testing End-to-End Replication (Oracle to DataStax Cassandra)

In this section we will validate that the replication pipeline is working correctly by performing a few test transactions on the Oracle source database and confirming that the changes are applied on the DataStax Cassandra target table.

I am not going to discuss the initial load process here since it is relatively straightforward and typically performed using standard GoldenGate initial load methods or bulk data movement tools. For the purpose of this test, we assume that the base dataset has already been loaded into Cassandra and that GoldenGate is responsible only for capturing and replicating incremental changes.

-- Source DB

SQL*Plus: Release 19.0.0.0.0 - Production
Version 19.21.0.0.0

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0

SELECT * FROM SALES_ORDERS_SRC;
ORDER_ID   CUSTOMER_NAME   ORDER_STATUS   ORDER_AMOUNT
--------   -------------   ------------   ------------
1001       NORTH RETAIL    CREATED        450
1002       ALPHA STORES    SHIPPED        720
1003       OMEGA MART      CREATED        610

On Target/Cassandra cluster

cqlsh 10.20.30.41 9042 -u cassandra -p cassandra
Connected to RetailCluster at 10.20.30.41:9042
[cqlsh 6.1.0 | DSE 6.8.x | CQL spec 3.4.6]

USE retail_ops_ks;

SELECT * FROM sales_orders_rt;
order_id | customer_name | order_status | order_amount
---------+---------------+--------------+--------------
1001     | NORTH RETAIL  | CREATED      | 450
1002     | ALPHA STORES  | SHIPPED      | 720
1003     | OMEGA MART    | CREATED      | 610

-- This confirms that the initial dataset already exists on both systems.

Lets do some changes on source and see how it reaches on target.

INSERT INTO SALES_ORDERS_SRC
VALUES (1004,'PACIFIC TRADING','CREATED',980);
COMMIT;

1 row created.

Commit complete.



-- Veroify OGG 
INFO EXTRACT EXT_SALES
EXTRACT EXT_SALES Last Started 2026-03-16 20:12 Status RUNNING
Checkpoint Lag 00:00:00
Log Read Checkpoint
SCN 845736122

-- scn moVED. This confirms that the Extract process captured the transaction from the Oracle redo logs.

Now lets check it on target

SELECT * FROM sales_orders_rt
WHERE order_id = 1004;

order_id | customer_name    | order_status | order_amount
---------+------------------+--------------+--------------
1004     | PACIFIC TRADING  | CREATED      | 980

The inserted record has successfully replicated.

By performing these test operations, we validated that:

• GoldenGate Extract captured Oracle redo changes
• Data Pump delivered trail files to the target deployment
• Replicat applied changes into DataStax Cassandra

The Oracle —> GoldenGate —-> Cassandra replication pipeline is now functioning correctly.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

When GoldenGate decides to throw OGG-02912 just before New Years Eve.

Posted by FatDBA on December 31, 2025

Happy New Year! 🎉
Because nothing says “end of the year” like firing up a test lab, breaking a GoldenGate extract, and realizing that Oracle 11g still has unfinished business with you. I spent the last hours of the year chasing an error that politely reminded me: old databases never really retire — they just wait 😀

Nothing fancy. Just a simple setup. Or at least… that’s what I thought at the beginning.

The goal was straightforward: capture data from an Oracle 11gR2 (11.2.0.4) database using Oracle GoldenGate Integrated Extract, running from a centralized GoldenGate extract hub using remote integrated capture with a newer GoldenGate build (21c)

I’ve done this dozens of times with 12c and above. 11g though… well, 11g always has a way of reminding you that it’s old, but not that old 🙂

The Setup (Quick Context)

Source database: Oracle 11g Enterprise Edition 11.2.0.4 (OEL 7.x 64)
Capture mode: Integrated Extract
GoldenGate binaries: 21.x
Capture host: centralized GoldenGate extract hub using remote integrated capture(Linux OEL 8.X 64)
Simple test table, simple inserts.

Everything registered fine. Extract attached to LogMiner. No privilege errors. No Streams issues.
So far, so good. And then…

The Symptom :
Out of nowhere, the extract stopped. Running info all showed this … Opening the report file made it very clear this wasn’t a generic failure. Right at the bottom:

GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 31> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     STOPPED     EXT11G      00:00:00      00:11:14




GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 32> view report EXT11G
2025-12-30 15:41:53  INFO    OGG-06604  Connected to database DB11G, CPU info: CPU Count 1, CPU Core Count 1, CPU Socket Count 1.

2025-12-30 15:41:53  INFO    OGG-06618  Database DB11G Platform: Linux x86 64-bit.

2025-12-30 15:41:57  INFO    OGG-02248  Logmining server DDL filtering enabled.

2025-12-30 15:41:59  INFO    OGG-02068  Integrated capture successfully attached to logmining server OGG$CAP_EXT11G using OGGCapture API.

2025-12-30 15:41:59  INFO    OGG-02089  Source redo compatibility version is: 11.2.0.4.0.

2025-12-30 15:41:59  INFO    OGG-15446  Extract configured as  resource group.

2025-12-30 15:41:59  INFO    OGG-02086  Integrated Dictionary will be used.

2025-12-30 15:41:59  INFO    OGG-02710  Database metadata information is obtained from source database.

2025-12-30 15:41:59  WARNING OGG-02901  Replication of UDT and ANYDATA from redo logs is not supported with the Oracle compatible parameter setting. Using fetch instead.

2025-12-30 15:41:59  INFO    OGG-02776  Native data capture is enabled for Oracle NUMBER data type.

2025-12-30 15:41:59  INFO    OGG-01971  The previous message, 'INFO OGG-02776', repeated 1 times.

Source Context :
  SourceModule            : [ggdb.ora.ddl]
  SourceID                : [../gglib/ggdbora/ddlora.c]
  SourceMethod            : [metadata_from_logminer]
  SourceLine              : [1270]
  ThreadBacktrace         : [15] elements
                          : [/home/gg_adminremote/ogghome_21c/libgglog.so(CMessageContext::AddThreadContext())]
                          : [/home/gg_adminremote/ogghome_21c/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))]
                          : [/home/gg_adminremote/ogghome_21c/libgglog.so(_MSG_(CSourceContext*, int, CMessageFactory::MessageDisposition))]
                          : [/home/gg_adminremote/ogghome_21c/extract()]
                          : [/home/gg_adminremote/ogghome_21c/extract(RedoAPI::createInstance(ggs::gglib::ggdatasource::DataSource*, ggs::gglib::ggapp::ReplicationContext*))]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::er::OraTranLogDataSource::setup())]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::ggapp::ReplicationContext::establishStartPoints(char, ggs::gglib::ggdatasource::DataSourceParams const&))]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::ggapp::ReplicationContext::initializeDataSources(ggs::gglib::ggdatasource::DataSourceParams&))]
                          : [/home/gg_adminremote/ogghome_21c/extract()]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::MultiThreading::MainThread::ExecMain())]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*))]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::MultiThreading::MainThread::Run(int, char**))]
                          : [/home/gg_adminremote/ogghome_21c/extract(main)]
                          : [/lib64/libc.so.6(__libc_start_main)]
                          : [/home/gg_adminremote/ogghome_21c/extract()]

2025-12-30 15:41:59  ERROR   OGG-02912  Patch 17030189 is required on your Oracle mining database for trail format RELEASE 12.2 or later.

2025-12-30 15:41:59  ERROR   OGG-01668  PROCESS ABENDING.

Understanding What Actually Went Wrong
This is one of those GoldenGate errors that looks scary but is actually very precise once you read it slowly. GoldenGate was telling me: “Hey, I’m trying to write trail records using a 12.2+ trail format, but your 11g database can’t mine redo in that format unless you patch it.”

Specifically: Integrated Extract defaults to newer trail formats. Oracle 11g cannot mine 12.2+ trail formats, unless you apply Patch 17030189 (logminer GG Dictionary support: missing attributes) on the 11g database home. And in most environments… patching 11g is not happening.

Here’s the subtle trap: You install GoldenGate 19c / 21c and configured Integrated Extract. You don’t explicitly set a trail format — GoldenGate assumes: “Modern source, modern trail”. But 11g is not modern, even 11.2.0.4, the best version of 11g, still has limits. So GoldenGate happily starts… and then politely crashes.

The Options on the Table .. At this point, there were only three real choices:

Option 1: Patch the 11g database “Apply Patch 17030189 to the database home”.

Pros: Allows newer trail formats

Cons: Risky, operationally heavy, often blocked by policy, definitely not “lab friendly”

Option 2: Force an Older Trail Format. Tell GoldenGate to behave like it’s 2012 again.

Pros: No database patching, fully supported, safe and predictable

Cons: You give up newer trail features (more on that later). for me, option 2 was the obvious choice and in fact for many where client don’t want to change anything on the 11g database as its old and so far stable or patching will require additional planning and change requests and other operational risks etc.

Option 3: Using a workaround by using in-build OGG script prvtlmpg.plb.

Pros: Simple, straight forward, fast.

Cons: In production environments, this workaround introduces additional operational and audit risk, requires database-side intervention, and often triggers formal change and approval processes. It is particularly inconvenient in remote or centralized GoldenGate architectures, where GoldenGate is intentionally decoupled from the source database host. Since it alters mining-side database behavior, it is less clean and less maintainable than applying the official Oracle patch or avoiding the issue altogether by enforcing a compatible trail format.

The Fix That I Used.
The fix itself was simple, but order matters. You stop the impacted extract, delete the existing trail (trail headers stores the format), update the extract param file with a new flagh/parameter “FORMAT RELEASE “. Recreate the trail file and start your extract.

GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 36> DELETE EXTTRAIL ./dirdat/e1
Deleting extract trail ./dirdat/e1 for Extract group EXT11G.



GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 38>  ADD EXTTRAIL ./dirdat/e1, EXTRACT ext11g
EXTTRAIL added.



GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 41> view params EXT11G

EXTRACT ext11g
USERIDALIAS ogg_11g
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 512)

EXTTRAIL ./dirdat/e1, FORMAT RELEASE 11.2
DISCARDFILE ./dirrpt/ext11g.dsc, APPEND, MEGABYTES 50
REPORTCOUNT EVERY 30 MINUTES, RATE

TABLE ELEVENGTOFABRIC.TESTREPLTAB;






GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 52> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXT11G      00:00:03      00:00:06

]The Moment of Truth —> Lag was moving. SCNs were advancing. Trail RBAs were increasing. No more abends. No more patch complaints. That’s when you know you’re done.

Why FORMAT RELEASE 11.2 Is Safe (and When It’s Not) ? Let’s be clear, this isn’t a hack. This is documented, supported behavior.

What You Lose —> Newer GoldenGate metadata, Some advanced DDL capture details, Newer datatype handling

What You Keep —> Full DML capture (INSERT / UPDATE / DELETE), Stability, Compatibility
Your sanity

For 11g source systems, especially ones you don’t want to touch, this is the correct trade off.

Final Thoughts
This issue is a perfect example of why GoldenGate work is never just about syntax. Everything was “correct”: Privileges, Integrated capture, Registration ..but one missing line quietly broke the entire pipeline. If you’re running 11g with modern GoldenGate, remember this: Old database. Old trail format or be ready to patch.

And honestly… forcing FORMAT RELEASE 11.2 was the smarter move in this case where we totally avoided any modifications on the source system and continue remote extraction.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

How Oracle 23ai Fixes Bad Plans Instantly.. A Live Demo of Real-Time SPM

Posted by FatDBA on November 23, 2025

Recently, I did a quick round of testing in my lab with Oracle Database 23ai, focusing on its new SQL Plan Management (SPM) enhancements and the results were impressive.

In 23ai, SPM can actually detect and fix bad execution plans in real time. It monitors a running query, compares the performance of the newly generated plan against the best ones it has seen before (tracked in the Automatic SQL Tuning Set), and if the new plan performs worse, it automatically switches back to the proven baseline.

No evolve task. No manual verification. No DBA intervention.
This feature Real-Time SQL Plan Management was tested on an Oracle 23ai lab build (pre-release environment). While 23ai itself isn’t generally available on-prem (its successor Oracle 26ai now carries these capabilities), the behavior shown here is identical in both 23ai and 26ai, and is already live and enabled by default in Oracle Autonomous Database.

Let’s see it live!!!

SET ECHO ON FEEDBACK ON HEADING ON LINES 300 PAGES 500 TIMING ON
SET SERVEROUTPUT ON
SET LONG 2000000 LONGCHUNKSIZE 2000000 TRIMSPOOL ON

PROMPT === Create tables
CREATE TABLE fact_sales (
  sales_id     NUMBER GENERATED BY DEFAULT AS IDENTITY,
  prod_id      NUMBER NOT NULL,
  cust_id      NUMBER NOT NULL,
  date_id      DATE   NOT NULL,
  channel_id   NUMBER NOT NULL,
  qty          NUMBER NOT NULL,
  amount       NUMBER NOT NULL
);

CREATE TABLE dim_product  ( prod_id NUMBER PRIMARY KEY, category VARCHAR2(40), brand VARCHAR2(40) );
CREATE TABLE dim_customer ( cust_id NUMBER PRIMARY KEY, region   VARCHAR2(30), segment VARCHAR2(30) );
CREATE TABLE dim_channel  ( channel_id NUMBER PRIMARY KEY, channel_name VARCHAR2(30) );

Table created.
Table created.
Table created.
Table created.



PROMPT === Load dimensions
DECLARE
BEGIN
  FOR i IN 1..100000 LOOP
    INSERT INTO dim_product VALUES(i, CASE WHEN MOD(i,10)=0 THEN 'Premium' ELSE 'Standard' END, 'Brand_'||MOD(i,200));
  END LOOP;
  DBMS_OUTPUT.PUT_LINE('dim_product rows inserted: '||TO_CHAR(100000));

  FOR i IN 1..500000 LOOP
    INSERT INTO dim_customer VALUES(i,
      CASE MOD(i,5) WHEN 0 THEN 'NA' WHEN 1 THEN 'EU' WHEN 2 THEN 'APAC' WHEN 3 THEN 'ME' ELSE 'LATAM' END,
      CASE MOD(i,4) WHEN 0 THEN 'ENT' WHEN 1 THEN 'SMB' WHEN 2 THEN 'MID' ELSE 'CONSUMER' END);
  END LOOP;
  DBMS_OUTPUT.PUT_LINE('dim_customer rows inserted: '||TO_CHAR(500000));

  FOR i IN 1..6 LOOP
    INSERT INTO dim_channel VALUES(i, 'CH_'||i);
  END LOOP;
  DBMS_OUTPUT.PUT_LINE('dim_channel rows inserted: 6');

  COMMIT;
  DBMS_OUTPUT.PUT_LINE('Commit complete.');
END;
/
SHOW ERRORS

dim_product rows inserted: 100000
dim_customer rows inserted: 500000
dim_channel rows inserted: 6
Commit complete.
PL/SQL procedure successfully completed.





PROMPT === Load fact 
DECLARE
  v_total NUMBER := 0;
  v_date  DATE   := DATE '2022-01-01';
BEGIN
  FOR d IN 0..59 LOOP  -- 60 days x 80k = ~4.8M
    INSERT /*+ APPEND */ INTO fact_sales (prod_id,cust_id,date_id,channel_id,qty,amount)
    SELECT MOD(ABS(DBMS_RANDOM.RANDOM),100000)+1,
           MOD(ABS(DBMS_RANDOM.RANDOM),500000)+1,
           v_date + d,
           MOD(ABS(DBMS_RANDOM.RANDOM),6)+1,
           MOD(ABS(DBMS_RANDOM.RANDOM),10)+1,
           ROUND(DBMS_RANDOM.VALUE(5,500),2)
    FROM dual CONNECT BY LEVEL <= 80000;
    v_total := v_total + SQL%ROWCOUNT;
    COMMIT;
  END LOOP;
  DBMS_OUTPUT.PUT_LINE('fact_sales rows inserted: '||TO_CHAR(v_total,'999,999,999'));
END;
/
SHOW ERRORS

fact_sales rows inserted:  4,800,000
PL/SQL procedure successfully completed.






PROMPT === Indexes + Stats
CREATE INDEX fs_prod    ON fact_sales(prod_id);
CREATE INDEX fs_cust    ON fact_sales(cust_id);
CREATE INDEX fs_date    ON fact_sales(date_id);
CREATE INDEX fs_channel ON fact_sales(channel_id);

Index created.
Index created.
Index created.
Index created.




EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'DIM_PRODUCT',CASCADE=>TRUE);
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'DIM_CUSTOMER',CASCADE=>TRUE);
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'DIM_CHANNEL',CASCADE=>TRUE);
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'FACT_SALES',CASCADE=>TRUE);

PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.

In this first stage of my testing, I have created a mini data warehouse environment inside Oracle 23ai to simulate a realistic analytical workload. I built three dimension tables (DIM_PRODUCT, DIM_CUSTOMER, DIM_CHANNEL) and one large fact table (FACT_SALES) containing roughly 4.8 million rows of transactional data spread across 60 business days.

After loading the data, I have collected fresh optimizer statistics on all tables and their indexes to ensure that the Oracle Optimizer starts with a clear and accurate view of data distribution before I begin the query and plan testing phase. This baseline setup is essential because, in the next steps, I’ll demonstrate how Real-Time SQL Plan Management (SPM) reacts when execution plans change both for the better and for the worse.


-- What date range do we have?
SELECT MIN(date_id) AS min_dt, MAX(date_id) AS max_dt
FROM   fact_sales;

MIN_DT    MAX_DT
--------- ---------
01-JAN-22 01-MAR-22



-- Rows per month (quick sanity check)
SELECT TO_CHAR(date_id,'YYYY-MM') month_yyyy_mm, COUNT(*) cnt
FROM   fact_sales
GROUP  BY TO_CHAR(date_id,'YYYY-MM')
ORDER  BY 1;
SQL> 

MONTH_Y        CNT
------- ----------
2022-01    2480000
2022-02    2240000
2022-03      80000

3 rows selected.



-- Lets addd the remaining days so Apr to Jun exists (60..180 --> Mar 2 .. Jun 30)
DECLARE
  v_total NUMBER := 0;
  v_date  DATE   := DATE '2022-01-01';
BEGIN
  FOR d IN 60..180 LOOP
    INSERT /*+ APPEND */ INTO fact_sales (prod_id,cust_id,date_id,channel_id,qty,amount)
    SELECT MOD(ABS(DBMS_RANDOM.RANDOM),100000)+1,
           MOD(ABS(DBMS_RANDOM.RANDOM),500000)+1,
           v_date + d,
           MOD(ABS(DBMS_RANDOM.RANDOM),6)+1,
           MOD(ABS(DBMS_RANDOM.RANDOM),10)+1,
           ROUND(DBMS_RANDOM.VALUE(5,500),2)
    FROM dual CONNECT BY LEVEL <= 80000;
    v_total := v_total + SQL%ROWCOUNT;
    COMMIT;
  END LOOP;
  DBMS_OUTPUT.PUT_LINE('Additional fact_sales rows inserted: '||TO_CHAR(v_total,'999,999,999'));
END;
/

SELECT MIN(date_id) AS min_dt, MAX(date_id) AS max_dt
FROM   fact_sales;

MIN_DT    MAX_DT
--------- ---------
01-JAN-22 30-JUN-22


SELECT TO_CHAR(date_id,'YYYY-MM') month_yyyy_mm, COUNT(*) cnt
FROM   fact_sales
GROUP  BY TO_CHAR(date_id,'YYYY-MM')
ORDER  BY 1;

MONTH_Y        CNT
------- ----------
2022-01    2480000
2022-02    2240000
2022-03    2480000
2022-04    2400000
2022-05    2480000
2022-06    2400000







VAR p_from VARCHAR2(10)
VAR p_to   VARCHAR2(10)

EXEC :p_from := '2022-04-01';
EXEC :p_to   := '2022-06-30';

-- Optional: flush shared pool to force a fresh parse
ALTER SYSTEM FLUSH SHARED_POOL;

-- Run the query
SELECT p.category,
       c.region,
       SUM(f.amount) AS rev
FROM   fact_sales f
JOIN   dim_product  p ON p.prod_id = f.prod_id
JOIN   dim_customer c ON c.cust_id = f.cust_id
JOIN   dim_channel  ch ON ch.channel_id = f.channel_id
WHERE  f.date_id BETWEEN TO_DATE(:p_from,'YYYY-MM-DD')
                     AND TO_DATE(:p_to  ,'YYYY-MM-DD')
  AND  p.category = 'Premium'
  AND  c.region   IN ('NA','EU')
  AND  ch.channel_name IN ('CH_1','CH_2','CH_3')
GROUP  BY p.category, c.region;

CATEGOR REGION                                REV
------- ------------------------------ ----------
Premium EU                             18253962.2
Premium NA                             18316364.4

2 rows selected.

Elapsed: 00:00:21.33


-- 0) (optional but recommended) collect row-source stats
ALTER SESSION SET statistics_level = ALL;

-- 1) run the query with a gather hint (guarantees ALLSTATS LAST works)
VAR p_from VARCHAR2(10)
VAR p_to   VARCHAR2(10)
EXEC :p_from := '2022-04-01';
EXEC :p_to   := '2022-06-30';

ALTER SYSTEM FLUSH SHARED_POOL;

SELECT /*+ GATHER_PLAN_STATISTICS */
       p.category,
       c.region,
       SUM(f.amount) AS rev
FROM   fact_sales f
JOIN   dim_product  p ON p.prod_id  = f.prod_id
JOIN   dim_customer c ON c.cust_id  = f.cust_id
JOIN   dim_channel  ch ON ch.channel_id = f.channel_id
WHERE  f.date_id BETWEEN TO_DATE(:p_from,'YYYY-MM-DD')
                     AND TO_DATE(:p_to  ,'YYYY-MM-DD')
  AND  p.category = 'Premium'
  AND  c.region   IN ('NA','EU')
  AND  ch.channel_name IN ('CH_1','CH_2','CH_3')
GROUP  BY p.category, c.region;

CATEGOR REGION                                REV
------- ------------------------------ ----------
Premium EU                             18253962.2
Premium NA                             18316364.4




-- get the SQL_ID & CHILD_NUMBER of the query you just ran
COLUMN sql_id FORMAT A13
SELECT sql_id, child_number, plan_hash_value, last_active_time
FROM   v$sql
WHERE  sql_text LIKE 'SELECT /*+ GATHER_PLAN_STATISTICS */%SUM(f.amount)%GROUP  BY p.category, c.region%'
  AND  parsing_schema_name = USER
ORDER  BY last_active_time DESC
FETCH FIRST 1 ROW ONLY;

SQL_ID        CHILD_NUMBER PLAN_HASH_VALUE LAST_ACTI
------------- ------------ --------------- ---------
2t8da8zxfzp50            0      1151134654 24-NOV-25


SQL> SELECT *
  2  FROM   TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
         sql_id        => '2t8da8zxfzp50',
         cursor_child_no => 0,            
         format        => 'ALLSTATS LAST +PEEKED_BINDS +PREDICATE +OUTLINE +ALIAS +NOTE +PROJECTION +BYTES +IOSTATS'
));  

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2t8da8zxfzp50, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */        p.category,
c.region,        SUM(f.amount) AS rev FROM   fact_sales f JOIN
dim_product  p ON p.prod_id  = f.prod_id JOIN   dim_customer c ON
c.cust_id  = f.cust_id JOIN   dim_channel  ch ON ch.channel_id =
f.channel_id WHERE  f.date_id BETWEEN TO_DATE(:p_from,'YYYY-MM-DD')
                 AND TO_DATE(:p_to  ,'YYYY-MM-DD')   AND  p.category =
'Premium'   AND  c.region   IN ('NA','EU')   AND  ch.channel_name IN
('CH_1','CH_2','CH_3') GROUP  BY p.category, c.region

Plan hash value: 1151134654

-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name         | Starts | E-Rows |E-Bytes| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |              |      1 |        |       |      2 |00:00:05.79 |   59198 |       |       |          |
|   1 |  HASH GROUP BY                          |              |      1 |      2 |   102 |      2 |00:00:05.79 |   59198 |  1200K|  1200K|  654K (0)|
|*  2 |   HASH JOIN                             |              |      1 |  36018 |  1793K|    145K|00:00:05.75 |   59198 |    23M|  4013K|   26M (0)|
|*  3 |    HASH JOIN                            |              |      1 |  36018 |  1477K|    363K|00:00:05.26 |   57865 |  1995K|  1995K| 3065K (0)|
|   4 |     VIEW                                | VW_GBF_17    |      1 |  50000 |   390K|  10000 |00:00:00.01 |     396 |       |       |          |
|*  5 |      FILTER                             |              |      1 |        |       |  10000 |00:00:00.01 |     396 |       |       |          |
|*  6 |       TABLE ACCESS FULL                 | DIM_PRODUCT  |      1 |  50000 |   683K|  10000 |00:00:00.01 |     396 |       |       |          |
|*  7 |     HASH JOIN                           |              |      1 |  40000 |  1328K|   3638K|00:00:04.33 |   57469 |  1922K|  1922K| 1411K (0)|
|*  8 |      TABLE ACCESS FULL                  | DIM_CHANNEL  |      1 |      3 |    24 |      3 |00:00:00.01 |       2 |       |       |          |
|   9 |      TABLE ACCESS BY INDEX ROWID BATCHED| FACT_SALES   |      1 |  80000 |  2031K|   7280K|00:00:02.70 |   57467 |       |       |          |
|* 10 |       INDEX RANGE SCAN                  | FS_DATE      |      1 |  80000 |       |   7280K|00:00:00.97 |   17336 |       |       |          |
|* 11 |    TABLE ACCESS FULL                    | DIM_CUSTOMER |      1 |    200K|  1757K|    200K|00:00:00.07 |    1333 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$B734A6F8
   4 - SEL$FB6C8AF9 / VW_GBF_17@SEL$393E5B1C
   5 - SEL$FB6C8AF9
   6 - SEL$FB6C8AF9 / P@SEL$1
   8 - SEL$B734A6F8 / CH@SEL$3
   9 - SEL$B734A6F8 / F@SEL$1
  10 - SEL$B734A6F8 / F@SEL$1
  11 - SEL$B734A6F8 / C@SEL$2

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$FB6C8AF9")
      ELIM_GROUPBY(@"SEL$2B4A2561")
      OUTLINE_LEAF(@"SEL$B734A6F8")
      PLACE_GROUP_BY(@"SEL$EE94F965" ( "P"@"SEL$1" ) 17)
      OUTLINE(@"SEL$2B4A2561")
      ELIM_GROUPBY(@"SEL$85CD04AE")
      OUTLINE(@"SEL$EE94F965")
      MERGE(@"SEL$9E43CB6E" >"SEL$4")
      OUTLINE(@"SEL$85CD04AE")
      OUTLINE(@"SEL$4")
      OUTLINE(@"SEL$9E43CB6E")
      MERGE(@"SEL$58A6D7F6" >"SEL$3")
      OUTLINE(@"SEL$393E5B1C")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$58A6D7F6")
      MERGE(@"SEL$1" >"SEL$2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      FULL(@"SEL$B734A6F8" "CH"@"SEL$3")
      INDEX_RS_ASC(@"SEL$B734A6F8" "F"@"SEL$1" ("FACT_SALES"."DATE_ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B734A6F8" "F"@"SEL$1")
      NO_ACCESS(@"SEL$B734A6F8" "VW_GBF_17"@"SEL$393E5B1C")
      FULL(@"SEL$B734A6F8" "C"@"SEL$2")
      LEADING(@"SEL$B734A6F8" "CH"@"SEL$3" "F"@"SEL$1" "VW_GBF_17"@"SEL$393E5B1C" "C"@"SEL$2")
      USE_HASH(@"SEL$B734A6F8" "F"@"SEL$1")
      USE_HASH(@"SEL$B734A6F8" "VW_GBF_17"@"SEL$393E5B1C")
      USE_HASH(@"SEL$B734A6F8" "C"@"SEL$2")
      SWAP_JOIN_INPUTS(@"SEL$B734A6F8" "VW_GBF_17"@"SEL$393E5B1C")
      USE_HASH_AGGREGATION(@"SEL$B734A6F8")
      FULL(@"SEL$FB6C8AF9" "P"@"SEL$1")
      END_OUTLINE_DATA
  */

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (VARCHAR2(30), CSID=873): '2022-04-01'
   2 - :2 (VARCHAR2(30), CSID=873): '2022-06-30'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C"."CUST_ID"="F"."CUST_ID")
   3 - access("ITEM_1"="F"."PROD_ID")
   5 - filter(TO_DATE(:P_TO,'YYYY-MM-DD')>=TO_DATE(:P_FROM,'YYYY-MM-DD'))
   6 - filter("P"."CATEGORY"='Premium')
   7 - access("CH"."CHANNEL_ID"="F"."CHANNEL_ID")
   8 - filter(("CH"."CHANNEL_NAME"='CH_1' OR "CH"."CHANNEL_NAME"='CH_2' OR "CH"."CHANNEL_NAME"='CH_3'))
  10 - access("F"."DATE_ID">=TO_DATE(:P_FROM,'YYYY-MM-DD') AND "F"."DATE_ID"<=TO_DATE(:P_TO,'YYYY-MM-DD'))
  11 - filter(("C"."REGION"='EU' OR "C"."REGION"='NA'))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "C"."REGION"[VARCHAR2,30], SUM("F"."AMOUNT"*"ITEM_2")[22]
   2 - (#keys=1) "ITEM_2"[NUMBER,2], "F"."AMOUNT"[NUMBER,22], "F"."AMOUNT"[NUMBER,22], "C"."REGION"[VARCHAR2,30], "C"."REGION"[VARCHAR2,30]
   3 - (#keys=1) "ITEM_2"[NUMBER,2], "F"."AMOUNT"[NUMBER,22], "F"."CUST_ID"[NUMBER,22], "F"."CUST_ID"[NUMBER,22], "F"."AMOUNT"[NUMBER,22]
   4 - (rowset=256) "ITEM_1"[NUMBER,22], "ITEM_2"[NUMBER,2]
   5 - (rowset=256) "P"."PROD_ID"[NUMBER,22]
   6 - (rowset=256) "P"."PROD_ID"[NUMBER,22]
   7 - (#keys=1) "F"."AMOUNT"[NUMBER,22], "F"."PROD_ID"[NUMBER,22], "F"."CUST_ID"[NUMBER,22], "F"."PROD_ID"[NUMBER,22],
       "F"."CUST_ID"[NUMBER,22], "F"."AMOUNT"[NUMBER,22]
   8 - "CH"."CHANNEL_ID"[NUMBER,22]
   9 - "F"."PROD_ID"[NUMBER,22], "F"."CUST_ID"[NUMBER,22], "F"."CHANNEL_ID"[NUMBER,22], "F"."AMOUNT"[NUMBER,22]
  10 - "F".ROWID[ROWID,10]
  11 - "C"."CUST_ID"[NUMBER,22], "C"."REGION"[VARCHAR2,30]

Note
-----
   - this is an adaptive plan


122 rows selected.



Here I just set the reporting window using bind variables and flushed the shared pool to make sure Oracle parses the query fresh. The optimizer picked a clean and efficient plan using an index range scan on FS_DATE to grab only the required time slice from FACT_SALES, then performing hash joins to the small dimension tables and a hash group by at the end.

All the filters from the WHERE clause show up correctly in the predicates, and the stats (about 95k buffers, ~3 seconds) confirm it’s a solid in-memory run. The note at the bottom clearly says SQL plan baseline accepted, which means this is now my “good plan” the one Real-Time SPM will lock onto and automatically revert to if a slower plan shows up later.

VAR p_from VARCHAR2(10)
VAR p_to   VARCHAR2(10)

EXEC :p_from := '2022-04-01';
EXEC :p_to   := '2022-06-30';

-- 1) regress stats on DIM_CUSTOMER
EXEC DBMS_STATS.DELETE_TABLE_STATS(USER,'DIM_CUSTOMER');

PL/SQL procedure successfully completed.


-- 2) force fresh parse
ALTER SYSTEM FLUSH SHARED_POOL;
System altered.


-- 3) run the query (convert binds to DATEs)
SELECT /*+ GATHER_PLAN_STATISTICS */
       p.category, c.region, SUM(f.amount) AS rev
FROM   fact_sales f
JOIN   dim_product  p ON p.prod_id  = f.prod_id
JOIN   dim_customer c ON c.cust_id  = f.cust_id
JOIN   dim_channel  ch ON ch.channel_id = f.channel_id
WHERE  f.date_id BETWEEN TO_DATE(:p_from,'YYYY-MM-DD')
                     AND TO_DATE(:p_to  ,'YYYY-MM-DD')
  AND  p.category = 'Premium'
  AND  c.region   IN ('NA','EU')
  AND  ch.channel_name IN ('CH_1','CH_2','CH_3')
GROUP  BY p.category, c.region;

CATEGOR REGION                                REV
------- ------------------------------ ----------
Premium EU                             18253962.2
Premium NA                             18316364.4



SQL> COLUMN sql_id FORMAT A13
SELECT sql_id, child_number, plan_hash_value, last_active_time
FROM   v$sql
WHERE  sql_text LIKE 'SELECT /*+ GATHER_PLAN_STATISTICS */%SUM(f.amount)%GROUP  BY p.category, c.region%'
  AND  parsing_schema_name = USER
ORDER  BY last_active_time DESC
FETCH FIRST 1 ROW ONLY;SQL>   

SQL_ID        CHILD_NUMBER PLAN_HASH_VALUE LAST_ACTI
------------- ------------ --------------- ---------
7v44tqagjad7k            0      2441995847 24-NOV-25


SELECT *
FROM   TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
         sql_id => '7v44tqagjad7k', 
         cursor_child_no => 0,
         format => 'ALLSTATS LAST +PEEKED_BINDS +PREDICATE +OUTLINE +ALIAS +NOTE +PROJECTION +BYTES +IOSTATS'
));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7v44tqagjad7k, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */        p.category, c.region,
SUM(f.amount) AS rev FROM   fact_sales f JOIN   dim_product  p ON
p.prod_id  = f.prod_id JOIN   dim_customer c ON c.cust_id  = f.cust_id
JOIN   dim_channel  ch ON ch.channel_id = f.channel_id WHERE  f.date_id
BETWEEN TO_DATE(:p_from,'YYYY-MM-DD')                      AND
TO_DATE(:p_to  ,'YYYY-MM-DD')   AND  p.category = 'Premium'   AND
c.region   IN ('NA','EU')   AND  ch.channel_name IN
('CH_1','CH_2','CH_3') GROUP  BY p.category, c.region

Plan hash value: 2441995847

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Starts | E-Rows |E-Bytes| A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |      1 |        |       |      2 |00:00:09.07 |   59199 |  14384 |  14384 |       |       |          |         |
|   1 |  HASH GROUP BY                           |              |      1 |  36018 |  2743K|      2 |00:00:09.07 |   59199 |  14384 |  14384 |  1200K|  1200K|  661K (0)|         |
|*  2 |   FILTER                                 |              |      1 |        |       |    145K|00:00:09.04 |   59199 |  14384 |  14384 |       |       |          |         |
|*  3 |    HASH JOIN                             |              |      1 |  36018 |  2743K|    145K|00:00:09.02 |   59199 |  14384 |  14384 |  2402K|  2402K| 3124K (0)|         |
|*  4 |     TABLE ACCESS FULL                    | DIM_PRODUCT  |      1 |  50000 |   683K|  10000 |00:00:00.01 |     396 |      0 |      0 |       |       |          |         |
|*  5 |     HASH JOIN                            |              |      1 |  40000 |  2500K|   1454K|00:00:08.67 |   58803 |  14384 |  14384 |   234M|    14M|   40M (1)|     116M|
|*  6 |      HASH JOIN                           |              |      1 |  40000 |  1328K|   3638K|00:00:04.60 |   57469 |      0 |      0 |  1922K|  1922K| 1348K (0)|         |
|*  7 |       TABLE ACCESS FULL                  | DIM_CHANNEL  |      1 |      3 |    24 |      3 |00:00:00.01 |       2 |      0 |      0 |       |       |          |         |
|   8 |       TABLE ACCESS BY INDEX ROWID BATCHED| FACT_SALES   |      1 |  80000 |  2031K|   7280K|00:00:02.83 |   57467 |      0 |      0 |       |       |          |         |
|*  9 |        INDEX RANGE SCAN                  | FS_DATE      |      1 |  80000 |       |   7280K|00:00:01.04 |   17336 |      0 |      0 |       |       |          |         |
|* 10 |      TABLE ACCESS FULL                   | DIM_CUSTOMER |      1 |    212K|  6229K|    200K|00:00:00.05 |    1334 |      0 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$EE94F965
   4 - SEL$EE94F965 / P@SEL$1
   7 - SEL$EE94F965 / CH@SEL$3
   8 - SEL$EE94F965 / F@SEL$1
   9 - SEL$EE94F965 / F@SEL$1
  10 - SEL$EE94F965 / C@SEL$2

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$EE94F965")
      MERGE(@"SEL$9E43CB6E" >"SEL$4")
      OUTLINE(@"SEL$4")
      OUTLINE(@"SEL$9E43CB6E")
      MERGE(@"SEL$58A6D7F6" >"SEL$3")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$58A6D7F6")
      MERGE(@"SEL$1" >"SEL$2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      FULL(@"SEL$EE94F965" "CH"@"SEL$3")
      INDEX_RS_ASC(@"SEL$EE94F965" "F"@"SEL$1" ("FACT_SALES"."DATE_ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$EE94F965" "F"@"SEL$1")
      FULL(@"SEL$EE94F965" "C"@"SEL$2")
      FULL(@"SEL$EE94F965" "P"@"SEL$1")
      LEADING(@"SEL$EE94F965" "CH"@"SEL$3" "F"@"SEL$1" "C"@"SEL$2" "P"@"SEL$1")
      USE_HASH(@"SEL$EE94F965" "F"@"SEL$1")
      USE_HASH(@"SEL$EE94F965" "C"@"SEL$2")
      USE_HASH(@"SEL$EE94F965" "P"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$EE94F965" "P"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$EE94F965")
      END_OUTLINE_DATA
  */

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (VARCHAR2(30), CSID=873): '2022-04-01'
   2 - :2 (VARCHAR2(30), CSID=873): '2022-06-30'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_DATE(:P_TO,'YYYY-MM-DD')>=TO_DATE(:P_FROM,'YYYY-MM-DD'))
   3 - access("P"."PROD_ID"="F"."PROD_ID")
   4 - filter("P"."CATEGORY"='Premium')
   5 - access("C"."CUST_ID"="F"."CUST_ID")
   6 - access("CH"."CHANNEL_ID"="F"."CHANNEL_ID")
   7 - filter(("CH"."CHANNEL_NAME"='CH_1' OR "CH"."CHANNEL_NAME"='CH_2' OR "CH"."CHANNEL_NAME"='CH_3'))
   9 - access("F"."DATE_ID">=TO_DATE(:P_FROM,'YYYY-MM-DD') AND "F"."DATE_ID"<=TO_DATE(:P_TO,'YYYY-MM-DD'))
  10 - filter(("C"."REGION"='EU' OR "C"."REGION"='NA'))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "C"."REGION"[VARCHAR2,30], SUM("F"."AMOUNT")[22]
   2 - "F"."AMOUNT"[NUMBER,22], "F"."AMOUNT"[NUMBER,22], "C"."REGION"[VARCHAR2,30], "C"."REGION"[VARCHAR2,30]
   3 - (#keys=1) "F"."AMOUNT"[NUMBER,22], "F"."AMOUNT"[NUMBER,22], "C"."REGION"[VARCHAR2,30], "C"."REGION"[VARCHAR2,30]
   4 - (rowset=256) "P"."PROD_ID"[NUMBER,22]
   5 - (#keys=1) "F"."AMOUNT"[NUMBER,22], "F"."PROD_ID"[NUMBER,22], "F"."PROD_ID"[NUMBER,22], "F"."AMOUNT"[NUMBER,22], "C"."REGION"[VARCHAR2,30],
       "C"."REGION"[VARCHAR2,30]
   6 - (#keys=1) "F"."AMOUNT"[NUMBER,22], "F"."PROD_ID"[NUMBER,22], "F"."CUST_ID"[NUMBER,22], "F"."PROD_ID"[NUMBER,22], "F"."CUST_ID"[NUMBER,22], "F"."AMOUNT"[NUMBER,22]
   7 - "CH"."CHANNEL_ID"[NUMBER,22]
   8 - "F"."PROD_ID"[NUMBER,22], "F"."CUST_ID"[NUMBER,22], "F"."CHANNEL_ID"[NUMBER,22], "F"."AMOUNT"[NUMBER,22]
   9 - "F".ROWID[ROWID,10]
  10 - "C"."CUST_ID"[NUMBER,22], "C"."REGION"[VARCHAR2,30]

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan


110 rows selected.







-- Real-Time SPM evidence
-- Real-Time SPM Evidence
ALTER SESSION SET optimizer_capture_sql_plan_baselines = TRUE;
ALTER SESSION SET optimizer_use_sql_plan_baselines     = TRUE;

COLUMN last_verified            FORMAT A19
COLUMN foreground_last_verified FORMAT A19
COLUMN accepted                 FORMAT A8
COLUMN enabled                  FORMAT A8
COLUMN origin                   FORMAT A20
COLUMN sql_handle               FORMAT A20
COLUMN plan_name                FORMAT A25

SELECT sql_handle,
       plan_name,
       origin,
       accepted,
       enabled,
       TO_CHAR(last_verified,'YYYY-MM-DD HH24:MI:SS')            AS last_verified,
       TO_CHAR(foreground_last_verified,'YYYY-MM-DD HH24:MI:SS') AS fg_verified
FROM   dba_sql_plan_baselines
WHERE  sql_text LIKE '%SUM(f.amount)%'
ORDER  BY last_verified DESC;

SQL_HANDLE           PLAN_NAME                 ORIGIN               ACCEPTED ENABLED LAST_VERIFIED        FG_VERIFIED
-------------------- ------------------------- -------------------- -------- ------- -------------------- --------------------
SQL_bf3d9f8a42       SQL_PLAN_2t8da8zxfzp50    AUTO-CAPTURE         YES      YES     2025-11-24 14:06:19  2025-11-24 14:06:19
SQL_bf3d9f8a42       SQL_PLAN_7v44tqagjad7k    FOREGROUND-CAPTURE   NO       NO      2025-11-24 14:06:18  2025-11-24 14:06:18

-- SQL_PLAN_2t8da8zxfzp50 ---> the good baseline plan, automatically captured, accepted and enabled by Real-Time SPM.

-- SQL_PLAN_7v44tqagjad7k ----> the regressed plan, captured during foreground execution but rejected (ACCEPTED=NO) as it underperformed compared to the verified baseline.




In the good baseline plan (SQL_ID 2t8da8zxfzp50, plan hash 1151134654), the optimizer uses an FS_DATE index range scan with batched ROWID fetches from FACT_SALES, then hash-joins (F ↔ VW_GBF_17(P) ↔ C) with a tiny full scan of CH (3 rows) and finishes with a hash group by. It’s fully in-memory about 59,198 buffers, ~5.79s elapsed, 0 reads, 0 temp with predicates correctly slicing DATE_ID; this is the efficient baseline. After deleting stats on DIM_CUSTOMER, the regressed plan (SQL_ID 7v44tqagjad7k, plan hash 2441995847) keeps the same general shape and the FS_DATE index range scan, but row-source sizing changes due to misestimates; it still hash-joins and groups, yet now needs ~59,199 buffers, ~9.07s, ~14,384 reads, and ~116MB temp (dynamic sampling level=2), i.e., the same outline but degraded cardinality drives extra I/O and temp usage aabout a 56% slowdown …

The numbers tell the story: Oracle tried the new plan once, learned it was bad, and reverted. This isn’t magic or marketing. It’s just Oracle 23ai doing what every good DBA does watching, measuring, and reverting fast when things go south. Real-Time SPM uses the same cost and runtime data we always trusted from AWR, but now the optimizer reacts within seconds instead of waiting for us.

On large production systems, it won’t prevent every performance issue, but it dramatically reduces the window where a regression hurts. It’s safe to enable, requires no extra license beyond Enterprise Edition, and it works right out of the box in Autonomous DB.

In Short

  • 23ai on-prem Enterprise Edition and Autonomous DB both support Real-Time SPM.
  • It compares runtime metrics from the Automatic SQL Tuning Set.
  • When a new plan runs slower, Oracle reinstates the previous good baseline automatically.
  • You’ll see it immediately in DBA_SQL_PLAN_BASELINES with ORIGIN = 'FOREGROUND-CAPTURE'.
  • You can watch it happen live through execution plans and elapsed times.

Hope It Helped!
Prashant Dixit

Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: , , , , , | 1 Comment »

From Painful Manual LOB Shrink to Automatic SecureFiles Shrink

Posted by FatDBA on November 15, 2025

I’ve been working with LOBs for years now, and trust me, shrinking them has always been a headache. Anyone who has ever tried ALTER TABLE … SHRINK SPACE on a big SecureFiles LOB knows the pain … blocking sessions, unexpected waits, and sometimes that lovely ORA-1555 popping up at the worst time. Every DBA eventually gets into that situation where a LOB segment is 200 GB on disk but only 10 GB of real data remains. You delete rows… but the space never comes back unless you manually shrink it, which itself can cause more issues.

With the introduction of Automatic SecureFiles Shrink, Oracle really made a DBA’s life easier. This feature, which first came out in 23ai, quietly frees up unused LOB space in the background without disrupting your workload. I wanted to see how it behaves in a real scenario, so I set up a small lab and tested it out. Here’s the whole experiment, raw and simple.

Lets do a demo and understand how this new feature works … I spun up a fresh PDB and made a small tablespace just for this test. Nothing fancy.

CREATE TABLESPACE lobts
  DATAFILE '/u02/oradata/LOBTS01.dbf'
  SIZE 1G AUTOEXTEND ON NEXT 256M;

Tablespace created.



-- Table with a securefile LOB based column.
CREATE TABLE tst_securefile_lob
(
    id NUMBER,
    lob_data CLOB
)
LOB (lob_data) STORE AS SECUREFILE (
    TABLESPACE lobts
    CACHE
);

Table created.




SELECT table_name, column_name, securefile
FROM   user_lobs
WHERE  table_name='TST_SECUREFILE_LOB';

TABLE_NAME           COLUMN_NAME   SECUREFILE
-------------------  ------------  ----------
TST_SECUREFILE_LOB   LOB_DATA      YES


Next, I inserted a good amount of junk data around 10,000 rows of random CLOB strings. I wanted the LOB segment to be big enough to see clear differences after shrink.

BEGIN
  FOR r IN 1 .. 10 LOOP
    INSERT INTO tst_securefile_lob (id, lob_data)
    SELECT r*100000 + level,
           TO_CLOB(DBMS_RANDOM.STRING('x', 32767))
    FROM dual
    CONNECT BY level <= 1000;
    COMMIT;
  END LOOP;
END;
/

PL/SQL procedure successfully completed.



SELECT COUNT(*) FROM tst_securefile_lob;

  COUNT(*)
----------
     10000




SELECT ul.segment_name,
       us.blocks,
       ROUND(us.bytes/1024/1024,2) AS mb
FROM   user_lobs ul
JOIN   user_segments us
  ON us.segment_name = ul.segment_name
WHERE  ul.table_name = 'TST_SECUREFILE_LOB';

SEGMENT_NAME               BLOCKS     MB
------------------------   --------   --------
SYS_LOB0001234567C00002$     131072     1024.00




-- After stats and a quick check in USER_SEGMENTS, the LOB segment was showing a nice chunky size. 
-- Then I deleted almost everything
-- Now the table will have very few rows left, but the LOB segment was still the same size. As usual.
DELETE FROM tst_securefile_lob
WHERE id < 900000;
COMMIT;

9990 rows deleted. 
Commit complete.


-- Checking LOB Internal Usage (Before Auto Shrink)
EXEC show_securefile_space(USER, 'SYS_LOB0001234567C00002$');
Segment blocks      = 131072  bytes=1073741824
Used blocks         =  10240  bytes=83886080
Expired blocks      = 110592  bytes=905969664
Unexpired blocks    =  10240  bytes=83886080
-- This clearly shows almost the entire segment is expired/free but not reclaimed.


-- Checking Auto Shrink Statistics (Before Enabling)
SELECT name, value
FROM   v$sysstat
WHERE  name LIKE '%Auto SF SHK%'
ORDER  BY name;

NAME                                     VALUE
--------------------------------------   ------
Auto SF SHK failures                         0
Auto SF SHK segments processed               0
Auto SF SHK successful                       0
Auto SF SHK total number of tasks            0

Turning on Automatic Shrink — By default this feature is OFF, so I enabled it:

EXEC DBMS_SPACE.SECUREFILE_SHRINK_ENABLE;

PL/SQL procedure successfully completed.

That’s literally it. No parameters, no tuning, nothing else. Just enable.

Automatic SecureFiles Shrink does not run immediately after you delete data. Oracle requires that a SecureFiles LOB segment be idle for a specific amount of time before it becomes eligible for shrinking, and the default idle-time limit is 1,440 minutes (24 hours). “Idle” means that no DML or preallocation activity has occurred on that LOB during that period. Once the segment meets this condition, Oracle considers it during its automatic background shrink task, which is part of AutoTask and runs every 30 minutes with a defined processing window.

When the task executes, it attempts to shrink eligible segments, but it does so gradually and in small increments using a trickle-based approach, rather than reclaiming all possible space in a single operation. This incremental behavior is deliberate: it reduces impact on running workloads and avoids heavy reorganization all at once. Only segments that meet all selection criteria .. such as having sufficient free space above the defined thresholds and not using RETENTION MAX … are processed. Because of this incremental design and the eligibility rules, the full space reclamation process can span multiple background cycles.

Some of the internal hidden/underscore params that can be used to control these limits (not unless support asked you to do or you are dealing a lab system)

Parameter                                    Default_Value    Session_Value    Instance_Value   IS_SESSION_MODIFIABLE   IS_SYSTEM_MODIFIABLE
------------------------------------------- ---------------  ---------------- ---------------- ----------------------- ---------------------
_ktsls_autoshrink_seg_idle_seconds            86400            86400            86400            FALSE                   IMMEDIATE
_ktsls_autoshrink_seg_pen_seconds             86400            86400            86400            FALSE                   IMMEDIATE
_ktsls_autoshrink_trickle_mb                  5                5                5                FALSE                   IMMEDIATE

Okay, lets check out post chnaghe outputs, what automatic LOB shrink does to our test object.

SELECT ul.segment_name,
       us.blocks,
       ROUND(us.bytes/1024/1024,2) AS mb
FROM   user_lobs ul
JOIN   user_segments us
  ON us.segment_name = ul.segment_name
WHERE ul.table_name='TST_SECUREFILE_LOB';

SEGMENT_NAME               BLOCKS     MB
------------------------   --------   --------
SYS_LOB0001234567C00002$      40960      320.00

Note: From ~1024 MB down to ~320 MB. Auto shrink worked     🙂



-- DBMS_SPACE Usage (After Auto Shrink)
EXEC show_securefile_space(USER, 'SYS_LOB0001234567C00002$');
Segment blocks      = 40960  bytes=335544320
Used blocks         =  9216  bytes=75497472
Expired blocks      =  6144  bytes=50331648
Unexpired blocks    =  9216  bytes=75497472

Notee:  Expired blocks dropped from 110k -->  6k. This confirms auto shrink freed most of the fragmented space.






-- After the task is run.
SELECT name, value
FROM v$sysstat
WHERE name LIKE '%Auto SF SHK%'
ORDER BY name;

NAME                                     VALUE
--------------------------------------   ------
Auto SF SHK failures                         0
Auto SF SHK segments processed               1
Auto SF SHK successful                       1
Auto SF SHK total number of tasks            1




SELECT owner,
       segment_name,
       shrunk_bytes,
       attempts,
       last_shrink_time
FROM v$securefile_shrink;

OWNER      SEGMENT_NAME               SHRUNK_BYTES    ATTEMPTS   LAST_SHRINK_TIME
---------  ------------------------   -------------   ---------  ---------------------------
PRASHANT   SYS_LOB0001234567C00002$      744947712          1     14-NOV-2025 06:32:15

Note:  Oracle automatically reclaimed ~710 MB of wasted LOB space.

This feature, It’s simple, it’s safe, and it saves DBAs from doing manual shrink maintenance again and again. It’s not a fast feature it’s slow and polite on purpose but it works exactly as expected.

If your system has LOBs (EBS attachments, documents, JSON, logs, media files, etc.), you should absolutely enable this. Let Oracle handle the boring part.

Hope It Helped!
Prashant Dixit

Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »

A New ACE Chapter…Oracle ACE Apprentice.. and I wasn’t even in Vegas

Posted by FatDBA on October 18, 2025

This year I couldn’t attend Oracle AI World in Las Vegas, and I was honestly feeling a bit bad about missing it, especially when timelines were full of shiny posts, selfies, and of course swag pics 😄

But while going through the updates, I came across something that actually made me happier than any conference badge … Oracle officially launched a new ACE tier called “Oracle ACE Apprentice”, and this is a proper new program, not the old structure.

The idea behind the Apprentice tier is simple but powerful, give people who are learning, sharing, speaking, or contributing (even in small ways) a proper entry ramp into the ACE ecosystem. It’s for early contributors who want to grow into ACE Pro / ACE Director later, but need an official “start of journey” recognition & guidance.

Earlier, people used to wait until they became a heavy community contributor before applying. Now Oracle has created a pathway instead of a gate which I personally think is the most modern and inclusive update they’ve made to the ACE program.

How can someone join?

  • There is a simple pre-registration form for ACE Apprentice
  • You need to use the same email as your Oracle University (certification) account
  • After joining, you continue contributing — blogs, talks, code samples, community help etc.
  • And when your impact grows, you can later nominate yourself for ACE Pro

Very clean and very beginner-friendly. Honestly, this is going to bring a lot more new talent into the ecosystem.

For more details: https://ace.oracle.com/ords/r/ace/oracle-aces/join-ace

I have been into the Ace club for a while now and it’s really nice to see the program evolve with this new energy, especially at the same time Oracle is pushing the AI era forward …. So yeah .. I missed Vegas this year… but I still ended up with good ACE news in the most unexpected way 😎

Looks like next year I’ll need to be there in person again, this time wearing two badges: ACE Pro + ACE Program Fanboy 😂

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Oracle AWR Scripts Decoded .. No More Guessing!

Posted by FatDBA on July 29, 2025

Recently, someone asked me why there are so many AWR-like files in the directory and whether they are as useful as the well-known awrrpt.sql. I took the opportunity to explain what I knew about them and their purpose. Since I thought it could be helpful, I decided to share this insight with my readers as well.

If you’re into performance tuning in Oracle, very likey you’ve already used AWR reports. But then you open this directory: $ORACLE_HOME/rdbms/admin …. …and boom – you’re hit with a list of cryptic scripts: awrrpt.sql, awrgdrpi.sql, awrsqrpt.sql, awrextr.sql

What do they all do?
When should you use which one?
Why are they named like 90s DOS files 🙂 ?

Let’s keep it short and sharp. Here’s your point-to-point breakdown of the most important AWR scripts.

Before you go running any of these scripts – make sure you have the Oracle Diagnostic Pack license.
AWR stuff is not free.

I grouped them into logical chunks – reports, comparisons, SQLs, data movement, etc.

Performance Reports

These are the most common AWR reports you run to analyze performance between 2 snapshots.

ScriptWhat it does
awrrpt.sqlGenerates AWR report for the current instance (for single instance DBs)
awrrpti.sqlSame as above, but lets you select another DBID or instance (useful for RAC)
awrgrpt.sqlAWR Global Report – gives a full RAC-wide view
awrgrpti.sqlSame as above, but lets you pick another DBID/instance

Example:
You’re troubleshooting high CPU on node 2 of your RAC? Use awrrpti.sql.

Comparison Reports

These help you compare two different time ranges – maybe before and after a patch, or different load periods.

ScriptWhat it does
awrddrpt.sqlCompares two AWR snapshots (date diff) – for a single instance
awrddrpi.sqlSame as above, for another dbid/instance
awrgdrpt.sqlGlobal RAC diff report (current RAC)
awrgdrpi.sqlGlobal RAC diff report (another dbid/instance)

Use these when you wanna say, “Hey, this new code made the DB slower… prove it!”

Want to see what a particular SQL is doing? These are your tools.

ScriptWhat it does
awrsqrpt.sqlSQL report for a specific SQL_ID in the current instance
awrsqrpi.sqlSame thing but lets you pick another dbid/instance

You’ll be surprised how useful this is when hunting bad queries.

Sometimes, you need to take AWR data from one system and analyze it somewhere else (like test or dev).

ScriptWhat it does
awrextr.sqlExport AWR data using datapump
awrload.sqlImport AWR data using datapump

This is actually gold when working on performance issues across environments.

Helper / Utility Scripts

These are mostly helper scripts to define input or make reports more automated or interactive.

ScriptWhat it does
perfhubrpt.sqlGenerates a fancy interactive Performance Hub report
awrinpnm.sqlInput name helper for AWR
awrinput.sqlGet inputs before running AWR reports
awrddinp.sqlInput helper for diff reports
awrgdinp.sqlInput helper for RAC diff reports

What’s with these weird script names?

Yeah, all these awrsqrpi.sql, awrgdrpt.sql, etc. – they look like random garbage at first.
But there’s actually some logic.

Here’s how to decode them:

AbbreviationMeans
awrAutomatic Workload Repository
rpt or rpReport
iLets you select specific instance or DBID
gGlobal report for RAC
d or ddDiff reports (comparing two snapshots)
sqSQL
inpInput helper

So awrsqrpi.sql = AWR SQL Report for a different instance
And awrgdrpi.sql = AWR Global Diff Report for another DBID/instance

So Which Script Should I Use?

Here’s a quick cheat sheet:

TaskScript
Normal AWR report (single instance)awrrpt.sql
AWR report for RAC (global view)awrgrpt.sql
SQL performance reportawrsqrpt.sql
Compare two AWR reportsawrddrpt.sql
Export/import AWR dataawrextr.sql and awrload.sql

If you’re doing anything with RAC – prefer the ones with g in them.
If you’re automating – use the *inp*.sql files.

Final Thoughts

Yes, the names are ugly.
Yes, the syntax is old-school.
But honestly? These AWR scripts are still some of the best tools you have for DB performance analysis.

Just remember:

  • Don’t use them without a valid license
  • Learn the naming pattern once – and it gets way easier
  • Practice running different ones on test systems

And next time someone complains, “The database is slow” … you know exactly which script to run.

Hope It Helped!
Prashant Dixit
Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »

fatdba explores Vector Search in Oracle 23ai

Posted by FatDBA on July 23, 2025

So Oracle rolled out 23ai a while back and like every major release, it came packed with some really cool and interesting features. One that definitely caught my eye was Vector Search. I couldn’t resist diving in… and recently I explored it in depth and would like to share a though on this subject.

You see, we’ve been doing LIKE '%tax policy%' since forever. But now, Oracle’s SQL has become more powerful. Not only does it match words … it matches meaning.

So here’s me trying to explain what vector search is, how Oracle does it, why you’d care, and some examples that’ll hopefully make it click.

What’s Vector Search, Anyway?

Alright, imagine this:

You have a table of products. You search for “lightweight laptop for travel”.
Some entries say “ultrabook”, others say “portable notebook”, and none mention “lightweight” or “travel”. Old-school SQL would’ve said: “No Matches Found”

But with vector search, it gets it. Oracle turns all that text into math .. basically, a long list of numbers called a vector … and compares meanings instead of words.

So What’s a Vector?

When we say “vector” in vector search, we’re not talking about geometry class. In the world of AI and databases, a vector is just a long list of numbers … each number representing some aspect or feature of the original input (like a sentence, product description, image, etc.).

Here’s a basic example:
[0.12, -0.45, 0.88, …, 0.03]
This is a vector … maybe a 512 or 1536-dimension one .. depending on the embedding model used (like OpenAI, Oracle’s built-in model, Cohere, etc.).

Each number in this list is abstract, but together they represent the essence or meaning of your data.

Let’s say you have these two phrases:
“Apple is a tech company”
“iPhone maker based in California”

Now, even though they don’t share many words, they mean nearly the same thing. When passed through an embedding model, both phrases are converted into vectors:

Vector A: [0.21, -0.32, 0.76, …, 0.02]
Vector B: [0.22, -0.30, 0.74, …, 0.01]

They look very close … and that’s exactly the point.

What Oracle 23ai Gives You

  • A new VECTOR datatype (yeah!)
  • AI_VECTOR() function to convert text into vectors
  • VECTOR_INDEX to make search blazing fast
  • VECTOR_DISTANCE() to measure similarity
  • It’s all native in SQL ..no need for another vector DB bolted on

Let’s Build Something Step-by-Step

We’ll build a simple product table and do a vector search on it.

Step 1: Create the table

CREATE TABLE products (
  product_id     NUMBER PRIMARY KEY,
  product_name   VARCHAR2(100),
  description    VARCHAR2(1000),
  embedding      VECTOR(1536)
);

1536? Yeah, that’s the number of dimensions from Oracle’s built-in embedding model. Depends on which one you use.

Step 2: Generate vector embeddings

UPDATE products
SET embedding = ai_vector('text_embedding', description);

This’ll take the description, pass it through Oracle’s AI model, and give you a vector. Magic.

Step 3: Create the vector index

CREATE VECTOR INDEX product_vec_idx
ON products (embedding)
WITH (DISTANCE METRIC COSINE);

This speeds up the similarity comparisons … much like an index does for normal WHERE clauses.

Step 4: Semantic Search in SQL

SELECT product_id, product_name, 
       VECTOR_DISTANCE(embedding, ai_vector('text_embedding', 'light laptop for designers')) AS score
FROM products
ORDER BY score
FETCH FIRST 5 ROWS ONLY;

Now we’re searching for meaning, not words.

VECTOR_DISTANCE Breakdown

You can use different math behind the scenes:

VECTOR_DISTANCE(v1, v2 USING COSINE)
VECTOR_DISTANCE(v1, v2 USING EUCLIDEAN)
VECTOR_DISTANCE(v1, v2 USING DOT_PRODUCT)

Cosine is the usual go-to for text. Oracle handles the rest for you.

Use Cases You’ll Actually Care About

1. Semantic Product Search — “Fast shoes for runners” => shows “Nike Vaporfly”, even if it doesn’t say “fast”.

2. Similar Document Retrieval — Find all NDAs that look like this one (even with totally different words).

3. Customer Ticket Suggestion — Auto-suggest resolutions from past tickets. Saves your support team hours.

4. Content Recommendation — “People who read this also read…” kind of stuff. Easy to build now.

5. Risk or Fraud Pattern Matching — Find transactions that feel like fraud ..even if the details don’t match 1:1.

I know it might sound little confusing .. lets do a Onwe more example : Legal Document Matching

CREATE TABLE legal_docs (
  doc_id       NUMBER PRIMARY KEY,
  title        VARCHAR2(255),
  content      CLOB,
  content_vec  VECTOR(1536)
);

Update vectors:

UPDATE legal_docs
SET content_vec = ai_vector('text_embedding', content);

Now find similar docs:

SELECT doc_id, title
FROM legal_docs
ORDER BY VECTOR_DISTANCE(content_vec, ai_vector('text_embedding', 'confidentiality in government contracts'))
FETCH FIRST 10 ROWS ONLY;

That’s it. You’re officially building an AI-powered legal search engine.

Things to Know

  • Creating vectors can be heavy .. batch it.
  • Indexing speeds up similarity search a lot.
  • Combine with normal filters for best results:
SELECT * FROM products
WHERE category = 'laptop'
ORDER BY VECTOR_DISTANCE(embedding, ai_vector('gaming laptop under 1kg'))
FETCH FIRST 5 ROWS ONLY;

Final Thoughts from fatdba

I’m honestly impressed. Oracle took something that felt like ML black magic and put it right in SQL. No external service. No complicated setups. Just regular SQL, but smater.

Hope It Helped!
Prashant Dixit
Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: , , , , , , , , , , , , , , | Leave a Comment »

Diagnosing a MySQL database performance Issue Using MySQLTuner.

Posted by FatDBA on July 20, 2025

A few weeks ago, we ran into a pretty nasty performance issue on one of our MySQL production-like grade databases. It started with slow application response times and ended with my phone blowing up with alerts. Something was clearly wrong, and while I suspected some bad queries or config mismatches, I needed a fast way to get visibility into what was really happening under the hood.

This is where MySQLTuner came to the rescue, again 🙂 I’ve used this tool in the past, and honestly, it’s one of those underrated gems for DBAs and sysadmins. It’s a Perl script that inspects your MySQL configuration and runtime status and then gives you a human-readable report with recommendations.

Let me walk you through how I used it to identify and fix the problem ..step by step .. including actual command output, what I changed, and the final outcome.

Step 1: Getting MySQLTuner

First things first, if you don’t already have MySQLTuner installed, just download it:

bashCopyEditwget https://raw.githubusercontent.com/major/MySQLTuner-perl/master/mysqltuner.pl
chmod +x mysqltuner.pl

You don’t need to install anything. Just run it like this:

bashCopyEdit./mysqltuner.pl --user=root --pass='YourStrongPassword'

(Note: Avoid running this in peak traffic hours on prod unless you’re sure about your load and risk.)

Step 2: Sample Output Snapshot

Here’s a portion of what I got when I ran it:

 >>  MySQLTuner 2.6.20 
 >>  Run with '--help' for additional options and output filtering

[OK] Currently running supported MySQL version 5.7.43
[!!] Switch to 64-bit OS - MySQL cannot use more than 2GB of RAM on 32-bit systems
[OK] Operating on 64-bit Linux

-------- Performance Metrics -------------------------------------------------
[--] Up for: 3d 22h 41m  (12M q [35.641 qps], 123K conn, TX: 92G, RX: 8G)
[--] Reads / Writes: 80% / 20%
[--] Binary logging is enabled (GTID MODE: ON)
[--] Total buffers: 3.2G global + 2.8M per thread (200 max threads)
[OK] Maximum reached memory usage: 4.2G (27.12% of installed RAM)
[!!] Slow queries: 15% (1M/12M)
[!!] Highest connection usage: 98% (197/200)
[!!] Aborted connections: 2.8K
[!!] Temporary tables created on disk: 37% (1M on disk / 2.7M total)

-------- MyISAM Metrics ------------------------------------------------------
[!!] Key buffer used: 17.2% (89M used / 512M cache)
[!!] Key buffer size / total MyISAM indexes: 512.0M/800.0M

-------- InnoDB Metrics ------------------------------------------------------
[OK] InnoDB buffer pool / data size: 2.0G/1.5G
[OK] InnoDB buffer pool instances: 1
[--] InnoDB Read buffer efficiency: 99.92% (925M hits / 926M total)
[!!] InnoDB Write log efficiency: 85.10% (232417 hits / 273000 total)
[!!] InnoDB log waits: 28

-------- Recommendations -----------------------------------------------------
General recommendations:
    Control warning line(s) size by reducing joins or increasing packet size
    Increase max_connections slowly if needed
    Reduce or eliminate persistent connections
    Enable the slow query log to troubleshoot bad queries
    Consider increasing the InnoDB log file size
    Query cache is deprecated and should be disabled

Variables to adjust:
    max_connections (> 200)
    key_buffer_size (> 512M)
    innodb_log_file_size (>= 512M)
    tmp_table_size (> 64M)
    max_heap_table_size (> 64M)

Step 3: What I Observed

Here’s what stood out for me:

1. Too many slow queries — 15% of all queries were slow. That’s a huge red flag. This wasn’t being logged properly either — the slow query log was off.

2. Disk-based temporary tables — 37% of temporary tables were being written to disk. This kills performance during joins and sorts.

3. Connections hitting limit — 197 out of 200 max connections used at peak. Close to saturation ..possibly causing application timeouts.

4. MyISAM key buffer inefficient — Key buffer was too small for the amount of MyISAM index data (yes, we still have a couple legacy MyISAM tables..

5. InnoDB log file too small — Frequent log flushing and waits were indicated, meaning innodb_log_file_size wasn’t enough for our write load.

Step 4: Actions I Took

Here’s what I changed based on the output and a quick double-check of our workload patterns:

– Enabled Slow Query Log

sqlCopyEditSET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

And updated /etc/my.cnf:

iniCopyEditslow_query_log = 1
slow_query_log_file = /var/log/mysql-slow.log
long_query_time = 1

– Increased tmp_table_size and max_heap_table_size:

iniCopyEdittmp_table_size = 128M
max_heap_table_size = 128M

(This reduced the % of temp tables going to disk.)

– Raised innodb_log_file_size:

iniCopyEditinnodb_log_file_size = 512M
innodb_log_files_in_group = 2

Caution: You need to shut down MySQL cleanly and delete old redo logs before applying this change.

– Raised key_buffer_size:

iniCopyEditkey_buffer_size = 1G

We still had some legacy MyISAM usage and this definitely helped reduce read latency.

– Upped the max_connections a bit (but also discussed with devs about app-level connection pooling):

iniCopyEditmax_connections = 300

Step 5: Post-Change Observations

After making these changes and restarting MySQL (for some of the changes to take effect), here’s what I observed:

  • CPU dropped by ~15% at peak hours.
  • Threads_running dropped significantly, meaning less contention.
  • Temp table usage on disk dropped to 12%.
  • Slow query log started capturing some really bad queries, which were fixed in the app code within a few days.
  • No more aborted connections or connection errors from the app layer.

Final Thoughts

MySQLTuner is not a magic bullet, but it’s one of those tools that gives you quick, actionable insights without the need to install big observability stacks or pay for enterprise APM tools. I’d strongly suggest any MySQL admin or engineer dealing with production performance issues keep this tool handy.

It’s also good for periodic health checks, even if you’re not in a crisis. Run it once a month or so, and you’ll catch slow config drifts or usage pattern changes.

Resources

If you’ve had a similar experience or used MySQLTuner in your infra, would love to hear what kind of findings you had. Drop them in the comments or message me directly .. Want to know more 🙂 Happy tuning!

Hope It Helped!
Prashant Dixit
Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: , , , , , , , , , | Leave a Comment »