Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Archive for March, 2026

Why oracle’s optimizer has been getting smarter for 15 years and what 26ai version actually adds

Posted by FatDBA on March 31, 2026

Every bad execution plan you’ve ever debugged traces back to the same root cause. The optimizer made a wrong guess about how many rows an operation would return ..and built an entire plan on top of that wrong number.

That number is called cardinality. It’s the estimated row count for each operation in your plan. Get it right and the optimizer picks the right join order, the right join method, the right access path. Get it wrong and you get a nested loops join against a table that returns 500,000 rows when the optimizer thought it was 12. You’ve seen this plan. It hurt.

Oracle has been progressively solving this problem for 15+ years. The story isn’t a single breakthrough ..it’s a series of increasingly smarter mechanisms, each one handling a class of estimation problem that the previous one couldn’t.

Here’s the full honest picture, ending with what 26ai actually adds.

Oracle 10g .. Dynamic Sampling The optimizer noticed when stats were missing or insufficient and sampled the data at parse time to get a rough estimate. Controlled by OPTIMIZER_DYNAMIC_SAMPLING. Blunt instrument, but better than pure guesswork.

Oracle 11g .. Cardinality Feedback The optimizer started comparing its estimates to reality after execution. If it estimated 50 rows and got 50,000, it stored the real number in the SGA and flagged the statement for re-optimization on the next execution. The estimate corrected itself over time. The problem: stored in SGA only — lost on restart, lost when the cursor aged out.

Oracle 12c …The Big Jump Three things landed together:

  • Statistics Feedback (renamed from Cardinality Feedback): same learning mechanism, better persistence
  • Adaptive Plans: the optimizer could now switch join methods mid-execution .. starting with Nested Loops based on its estimate, and switching to Hash Join live if actual rows exceeded the threshold. The final plan was then fixed for subsequent executions
  • SQL Plan Directives: when a misestimate was detected, the optimizer created a persistent directive (stored in SYSAUX, survives restarts) that told future parses: “when you see this predicate pattern, gather dynamic statistics first”. Directives are cross-statement … query’s lesson protects another with the same predicate pattern

Oracle 19c/23ai ..Automation at Scale Automatic SQL Tuning Sets (ASTS), Automatic SPM, and Real-Time SPM turned the individual learning mechanisms into a system-level feedback loop. The database wasn’t just learning from single statements .. it was maintaining plan stability across the entire workload automatically.

Oracle 26ai (23.8 RU) .. The Specific New Additions Two documented, named improvements to cardinality estimation:

  1. Dynamic Statistics for PL/SQL Functions : a new parameter plsql_function_dynamic_stats giving fine-grained control over whether PL/SQL functions called inside SQL can participate in dynamic statistics sampling at parse time. Previously the optimizer treated PL/SQL functions as black boxes with unknowable return cardinality. Now it can sample them.
  2. PL/SQL to SQL Transpiler : when enabled, the optimizer inlines eligible PL/SQL functions directly into SQL at parse time, eliminating the black box entirely. The optimizer can now see and estimate the underlying SQL expression rather than guessing at what a function returns.

Plus the general continued improvement of ML-informed cost models inside the optimizer engine .. real, but not a named switchable feature.

Now let’s see all of this in the plan output where it actually matters and I will do a quick demo — The single most important diagnostic habit in Oracle performance work. The GATHER_PLAN_STATISTICS hint tells the optimizer to track actual row counts during execution, then ALLSTATS LAST in DBMS_XPLAN surfaces them alongside the estimates.

-- Prereqs (run as SYS)
GRANT ADVISOR TO sh;
GRANT ADMINISTER SQL MANAGEMENT OBJECT TO sh;

CONN sh/sh

-- Set output format for readable plans
SET LINESIZE 200
SET PAGESIZE 10000
SET LONG 100000

-- Run the query with stats collection enabled
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;



SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));
```

**Output — before extended statistics exist:**
```
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |     17 |    127 |  1890 |    1533 |  <- E:17, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |     17 |    127 |   406 |    1213 |  <- E:17, A:127
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

Predicate Information:
   2 - access("S"."CUST_ID"="C"."CUST_ID")
   3 - filter("C"."CUST_STATE_PROVINCE"='CA'
          AND "C"."CUST_INCOME_LEVEL"='G: 130,000 - 149,999')

E-Rows: 17. A-Rows: 127. That’s a 7.5x underestimate.

The optimizer assumed cust_state_province = 'CA' and cust_income_level = 'G: 130,000 - 149,999' were independent. They’re not — they’re correlated. California has a disproportionate number of high-income customers in this dataset. The optimizer applied the selectivity of each predicate independently, multiplied them, and got the wrong answer.

This is the classic multi-column predicate correlation problem. The fix is extended statistics.

Lets try to fix it via extended statistics: Extended statistics (column groups) teach the optimizer about correlated columns. One DBMS_STATS call, no schema changes.

-- Create a column group for the two correlated columns
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(
  ownname  => 'SH',
  tabname  => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
) AS col_group_name
FROM DUAL;

-- COL_GROUP_NAME
-- SYS_STUFBF#JKQM8F3GTPA7XDE9  (system-generated name)

-- Now gather stats to populate the column group
EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname    => 'SH',
  tabname    => 'CUSTOMERS',
  method_opt => 'FOR ALL COLUMNS SIZE AUTO'
);


---- Lets re runn the same Sql.
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


After extended statistics:
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |    124 |    127 |  1890 |    1533 |  <- E:124, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |    124 |    127 |   406 |    1213 |  <- Near perfect
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

E-Rows went from 17 to 124. Actual is 127. That’s less than 3% off.

Same plan hash .. same shape. But now the cost model is working from accurate numbers. In a more complex query, this difference in estimated rows would change join order, join method, and index decisions.

nEXT, Lets see SQL Plan directives and see watching optimizer learn.

When the optimizer detects a cardinality misestimate during execution, it creates a SQL Plan Directive — a persistent instruction stored in SYSAUX telling future parses to gather dynamic statistics for this predicate pattern. You can watch this happen.

First, drop the extended stats so the misestimate recurs:

-- Reset: drop the column group
EXEC DBMS_STATS.DELETE_EXTENDED_STATS(
  ownname   => 'SH',
  tabname   => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
);

EXEC DBMS_STATS.GATHER_TABLE_STATS('SH', 'CUSTOMERS');

-- lets fliush the SP and re-run.

-- As SYS (flush shared pool in test environment only)
ALTER SYSTEM FLUSH SHARED_POOL;

CONN sh/sh

-- Run with stats collection
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;


-- Lets see if the directive was crwated. 

-- Check for new SQL Plan Directives on CUSTOMERS
SELECT d.directive_id,
       d.type,
       d.state,
       d.auto_drop,
       d.created,
       o.object_name,
       o.subobject_name  AS column_name
FROM   dba_sql_plan_directives     d
JOIN   dba_sql_plan_dir_objects    o
       ON d.directive_id = o.directive_id
WHERE  o.object_name = 'CUSTOMERS'
ORDER  BY d.created DESC;


Output — directive created after the misestimate:

DIRECTIVE_ID  TYPE             STATE   AUTO_DROP CREATED              OBJECT  COLUMN_NAME
------------  ---------------  ------  --------- -------------------  ------  -------------------
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_STATE_PROVINCE
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_INCOME_LEVEL

The optimizer created a directive covering both columns … it noticed the multi-column predicate correlation caused a misestimate and now knows to sample dynamically next time it sees this pattern. Run the query a second time:

SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


At the bottom of the plan output:
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

The optimizer is now dynamically sampling at parse time because the directive told it to. The cardinality estimate will be much closer to reality on this execution.

Now new in 26ai ..The SQL aanalysis report .. This is the part that’s genuinely new in 26ai. Previously you had to know to look at E-Rows vs A-Rows yourself. The SQL Analysis Report .. surfaced directly in DBMS_XPLAN.DISPLAY_CURSOR output …flags these problems inline without you having to hunt for them.

-- The standard DISPLAY_CURSOR call — no extra parameters needed
-- SQL Analysis Report appears automatically in 26ai when issues exist

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


In Oracle 26ai, after the standard execution plan output, you now see:

SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
--------------------------------------------------------------------------------
3 - SEL$1 / "C"@"SEL$1"
  - The following columns have predicates which prevent their use as keys
    in an index range scan. Consider rewriting the predicates or creating
    column group statistics.
    "CUST_STATE_PROVINCE", "CUST_INCOME_LEVEL"

The optimizer is telling you directly: these two columns in combination are causing an estimation problem, and column group statistics would fix it. You no longer have to derive this by comparing E-Rows and A-Rows yourself. It’s surfaced automatically in the plan output.

That’s a real DBA quality-of-life improvement. The diagnosis that used to take 10 minutes of plan reading is now one line in your standard plan output.

Okay next is Dynamic Stats for PL/SQL Functions …. This is the specific new documented feature in 26ai (RU 23.8). Consider a query that filters through a PL/SQL function:

-- A function the optimizer previously couldn't estimate
CREATE OR REPLACE FUNCTION sh.get_high_value_threshold
RETURN NUMBER DETERMINISTIC IS
BEGIN
  RETURN 1000;
END;
/

-- Query using the function in a predicate
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();


Before 26ai (or with `plsql_function_dynamic_stats = 'OFF'`):

The optimizer treats `get_high_value_threshold()` as a black box. It has no idea what value the function returns, 
so it can't estimate selectivity. It either guesses based on defaults or uses a very conservative estimate.

| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |   9188 |  12116 |  <- Rough guess
|*  3|    TABLE ACCESS FULL | SALES |   9188 |  12116 |

In 26ai with plsql_function_dynamic_stats = 'ON':

-- Enable dynamic stats for PL/SQL functions (session level)
ALTER SESSION SET plsql_function_dynamic_stats = 'ON';

-- Rerun
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |  12203 |  12116 |  <- Near accurate
|*  3|    TABLE ACCESS FULL | SALES |  12203 |  12116 |

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - PL/SQL function sampled for dynamic statistics

The optimizer called the function during dynamic statistics gathering at parse time, got the actual return value (1000), and used it to estimate selectivity accurately. E-Rows 12,203 vs A-Rows 12,116 — less than 1% off.

You can also control this at the object level, which is the right approach in production .. turn it on only for specific functions you trust:

-- Prefer object-level control in production
-- Allow dynamic stats for a specific function
EXEC DBMS_STATS.SET_FUNCTION_PREFS(
  ownname   => 'SH',
  funcname  => 'GET_HIGH_VALUE_THRESHOLD',
  pref_name => 'PLSQL_FUNCTION_DYNAMIC_STATS',
  pref_value => 'ON'
);

-- Check current settings
SELECT function_name, preference_name, preference_value
FROM   all_stat_extensions
WHERE  object_type = 'FUNCTION'
AND    owner = 'SH';

So, in short, Oracle’s optimizer hasn’t made one big leap … it’s made fifteen years of deliberate, incremental improvements, each one handling a class of cardinality problem the previous release couldn’t.

What 26ai specifically adds isn’t magic. It’s two concrete, named, documented improvements … dynamic statistics for PL/SQL functions, and the SQL Analysis Report surfacing optimizer advice inline .. plus the PL/SQL transpiler removing the problem class entirely for eligible functions. These are real. They’re testable. They’re in the docs.

The underlying ML enhanced cost modelling is also real, but it’s an evolutionary improvement without a named switch … Oracle’s engineering continues to get better at estimating costs, particularly for complex workloads, vector queries, and correlated predicates. That’s not hype. It’s just not a single feature you can point to in the docs either.

Know your E-Rows vs A-Rows. Know your SQL Plan Directives. Know your extended statistics. And in 26ai, let the SQL Analysis Report do the first pass for you.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »

Oracle 26ai Assertions – The Database constraint that took years to ship

Posted by FatDBA on March 29, 2026

If you’ve been working with Oracle for a while, you’ve probably hit a wall with constraints at some point. Primary keys, foreign keys, check constraints .. they’re great, but they all have one thing in common: they only work within a single row, or between two specific tables in a fixed relationship.

What happens when your business rule spans multiple rows? Or crosses two unrelated tables? You write a trigger. And then you spend the next hour making sure the trigger fires correctly on INSERT, UPDATE, and DELETE. And then six months later someone disables it during a data load and forgets to re-enable it. And then your data is a mess.

The idea is simple: define a boolean condition at the schema level, and the database guarantees that condition is always true, no matter what DML runs. Think of it as a check constraint, but with no limits … it can look across rows, across tables, anything.

The problem? For 30 + years, no major database engine actually implemented it. It existed in textbooks, in SQL courses, in Oracle certification material with a footnote saying “not supported” .. but never in production.

Until now ….. Oracle AI Database 26ai (Release Update 23.26.1, January 2026) is the first major database engine to ship SQL Assertions. That’s not marketing. That’s genuinely a first in this industry.

An assertion is a schema-level integrity constraint that contains a boolean SQL expression. The database evaluates that expression after every relevant DML operation and rolls back the statement if it returns false. The simplest way to think about it: a CHECK constraint which is more powerfull. A CHECK constraint can only see the row being modified. An assertion can see anything … the whole table, multiple tables, aggregates, subqueries.

CREATE ASSERTION assertion_name
CHECK ( <your boolean SQL condition here> )
[ DEFERRABLE | NOT DEFERRABLE ]
[ INITIALLY IMMEDIATE | INITIALLY DEFERRED ];
```

The moment any DML would violate the condition, Oracle raises:
```
ORA-08601: SQL assertion (SCHEMA.ASSERTION_NAME) violated.

And the statement rolls back. Clean, declarative, engine-enforced.

Lets discuss a real scenario that show where traditional constraints fail and assertions step in.

You have a FLIGHTS table with a capacity column and a SEATS table with one row per assigned seat per flight. Business rule: the number of rows in SEATS for a given flight can never exceed the flight’s capacity.

CREATE ASSERTION no_overbooking
CHECK (
  NOT EXISTS (
    SELECT f.flight_id
    FROM   flights f
    WHERE  (SELECT COUNT(*) FROM seats s WHERE s.flight_id = f.flight_id)
           > f.capacity
  )
);

Try to insert a seat assignment that pushes a flight past capacity …

INSERT INTO seats (seat_id, flight_id, passenger_id)
VALUES (9999, 101, 4455);

-- ORA-08601: SQL assertion (APP.NO_OVERBOOKING) violated.

The check happens inside the database engine, not in your booking API. That means it works even if someone runs a direct INSERT from SQL Developer, a data migration script, or any other path that bypasses your application layer.

Assertions work exactly like other Oracle constraints in terms of state management. You have four states:

StateMeaning
ENABLE VALIDATEExisting data checked. All future DML checked. Default when you create.
ENABLE NOVALIDATEExisting data NOT checked. Future DML is checked.
DISABLE VALIDATEExisting data was validated. DML not currently checked.
DISABLE NOVALIDATEFully off. No checking at all.

You can also make assertions deferrable, which is critical when you need to temporarily violate a rule mid-transaction and fix it before commit:

-- Deferrable assertion: checked at COMMIT, not after each statement
CREATE ASSERTION president_salary_range
CHECK (
  EXISTS (
    SELECT 1 FROM emp
    WHERE  job = 'PRESIDENT'
    AND    sal BETWEEN 4000 AND 10000
  )
)
DEFERRABLE INITIALLY DEFERRED;


This means within a transaction you can update the president’s salary to something temporarily outside the range …as long as it’s corrected before COMMIT. The assertion only fires at commit time. For non-deferrable assertions (the default), the check happens immediately after each DML statement.

But there are some limitations as well … Oracle’s implementation is solid but it’s version 1. These are real constraints to be aware of:

No non-deterministic functions : You cannot use SYSDATE, SYSTIMESTAMP, SYS_CONTEXT, USER, USERENV, CURRENT_SCHEMA, or any session-dependent context. Assertions must evaluate identically regardless of who runs the DML or when.
No PL/SQL : You can’t call PL/SQL functions inside an assertion … not even deterministic ones. Pure SQL only.
Read Committed only : Assertions are not supported under SERIALIZABLE isolation level.
No synonyms for cross schema references: If your assertion references tables in another schema, you must use the explicit SCHEMA.TABLE_NAME notation. Synonyms are not resolved.
No GROUP BY or analytic functions: At least in the current release. Aggregate conditions need to be expressed using subqueries.
No EXCEPTIONS INTO clause: Unlike constraint violations on existing data, there’s no way to route assertion violations to an exceptions table ..yet.

So, what’s the enginnering behind it, what happens internally ?>

When you create an assertion, Oracle creates internal change tracking auxiliary tables prefixed ORA$SA$TE_ … alongside your regular tables. These are internal global temporary tables with restricted read access (ORA-08709 if you try to query them directly). Their job is to track which rows changed in a DML operation so Oracle doesn’t have to re-evaluate the entire assertion from scratch on every statement.

You can see the locking strategy Oracle uses via the dictionary view, here are some of the related views.


-- Privilege to create assertions in your own schema
GRANT CREATE ASSERTION TO your_user;

-- Or just use the built-in developer role
GRANT DB_DEVELOPER_ROLE TO your_user;

-- Dictionary views
SELECT assertion_name, status, validated, deferrable, deferred
FROM   user_assertions;

SELECT * FROM user_assertion_dependencies;
SELECT * FROM user_assertion_lock_matrix;

For example, if your assertion involves EMPNO as a join column, Oracle knows it only needs to revalidate the assertion for the specific EMPNO values touched by the DML not the entire table. This is what makes assertions practical at scale. It’s essentially fine-grained, row-level re-validation rather than a full table scan on every INSERT or UPDATE.

If you manage data integrity rules through triggers today .. especially cross-table or multi row rules ..this is worth a serious look. The syntax is clean, the behavior is predictable, and Oracle’s internal change-tracking implementation means it’s not just a theoretical feature that kills performance on every write.

It’s real, it’s on-prem available, and it’s the most quietly significant data integrity improvement Oracle has shipped in a very long time. Oracle AI Database 26ai, Release Update 23.26.1 (January 2026) and later. On-premises Linux x86-64 supported.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Leave a Comment »

Real Time Data Replication from Oracle to DataStax Cassandra using Oracle GoldenGate 23ai DAA

Posted by FatDBA on March 16, 2026

Modern enterprise architectures rarely live inside a single database engine anymore. It is very common to see operational data stored in relational systems like Oracle while analytics or distributed applications rely on NoSQL platforms such as Cassandra.

In several real-world projects I have worked on, Cassandra clusters were used to power high-volume applications while Oracle remained the authoritative system of record. In these cases, Oracle GoldenGate D23ai DAA becomes a powerful bridge that streams transactional changes from Oracle redo logs directly into Cassandra in near real time.

In this article we will walk through a complete end-to-end implementation of Oracle → DataStax Cassandra replication using Oracle GoldenGate.

The goal is to demonstrate:

• Initial data load
• Continuous change capture
• Replication of DML operations
• Validation on the Cassandra side

Source System : Oracle Database Server
Hostname: ora-prod01.lab.internal
IP: 10.10.10.11

Operating System: Oracle Linux 8
Database Version: Oracle 19c (19.21 RU)
GoldenGate Version: 23ai Classic Deployment

Target System: DataStax Enterprise Cassandra Node
Hostname: cass-node01.lab.internal
IP: 10.10.10.21

Operating System: Rocky Linux 8
Cassandra Distribution: DataStax Enterprise 6.8.50

GoldenGate Big Data / Java Adapter is installed on the Cassandra side.

Step 1 – Prepare Oracle Source Database

Oracle GoldenGate captures data changes directly from the Oracle redo logs, allowing near real-time replication without querying tables. To support this, the database must have ARCHIVELOG mode, force logging, and supplemental logging enabled so that all row-level changes are fully recorded and available for capture.

sqlplus / as sysdba

archive log list;
Database log mode              No Archive Mode
Automatic archival             Disabled

-- enable archivelog mode
shutdown immediate;
startup mount;
alter database archivelog;
alter database open;

-- enable force logging
alter database force logging;

-- enable supp logging 
alter database add supplemental log data;

-- enabled GG parameter in source database
alter system set enable_goldengate_replication=true scope=both;

-- streams pool size 
alter system set streams_pool_size=2G scope=both;

Step 2 – Create GoldenGate User in Oracle

A dedicated database user is required for GoldenGate to capture changes from the Oracle redo logs and access the necessary metadata. This user is granted the required privileges to read transaction logs, access dictionary information, and manage replication objects.

CREATE TABLESPACE ogg_ts 
DATAFILE '/u02/oradata/ogg_ts01.dbf' 
SIZE 500M AUTOEXTEND ON;

CREATE USER ggadmin IDENTIFIED BY "Welcome#123"
DEFAULT TABLESPACE ogg_ts
TEMPORARY TABLESPACE temp;

GRANT CONNECT, RESOURCE TO ggadmin;
GRANT SELECT ANY TABLE TO ggadmin;
GRANT SELECT ANY TRANSACTION TO ggadmin;
GRANT SELECT ANY DICTIONARY TO ggadmin;
GRANT FLASHBACK ANY TABLE TO ggadmin;
GRANT ALTER SYSTEM TO ggadmin;
GRANT UNLIMITED TABLESPACE TO ggadmin;


-- Granted OGG admin privs 
exec dbms_goldengate_auth.grant_admin_privilege(
   grantee => 'GGADMIN',
   privilege_type => 'CAPTURE',
   grant_select_privileges => TRUE,
   do_grants => TRUE);

Step 3 – Create Source Table in Oracle

To validate the replication pipeline, create a sample table and insert a few test records in the Oracle source database. These records will later be used to verify that GoldenGate successfully captures and replicates the data to the DataStax Cassandra target.

CREATE TABLE SALES_TRANSACTIONS
(
   TXN_ID NUMBER PRIMARY KEY,
   CUSTOMER_NAME VARCHAR2(100),
   PRODUCT_CODE VARCHAR2(50),
   TXN_AMOUNT NUMBER,
   TXN_TIMESTAMP DATE
);

INSERT INTO SALES_TRANSACTIONS VALUES
(101,'John Carter','PRD-A',1200,SYSDATE);

INSERT INTO SALES_TRANSACTIONS VALUES
(102,'Alice Brown','PRD-B',850,SYSDATE);

INSERT INTO SALES_TRANSACTIONS VALUES
(103,'Michael Lee','PRD-C',430,SYSDATE);

COMMIT;


select * from SALES_TRANSACTIONS;

TXN_ID CUSTOMER_NAME PRODUCT_CODE TXN_AMOUNT TXN_TIMESTAMP
-----------------------------------------------------------
101    John Carter   PRD-A        1200       05-MAR-26
102    Alice Brown   PRD-B        850        05-MAR-26
103    Michael Lee   PRD-C        430        05-MAR-26

Step 4 – Install DataStax Cassandra

On the target node install DataStax Enterprise. Example repository configuration:

sudo vi /etc/yum.repos.d/datastax.repo

[datastax]
name=DataStax Repo
baseurl=https://rpm.datastax.com/enterprise
enabled=1
gpgcheck=0

-- Install DSE 
sudo yum install dse-full-6.8.50 -y

-- Start DSE 
sudo systemctl start dse

-- Veroify cluster/nodes status 
nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal

--  Address       Load       Tokens Owns  Host ID                               Rack
UN  10.10.10.21   120 KB     256    100%  8c34a0c1-b8c1-4b35-a6a7-9c2d8bfae111   rack1

Step 5 – Create Cassandra Keyspace

In DataStax Cassandra, a keyspace is the top-level namespace that defines how data is replicated across the cluster. Before creating tables for replication, we first create a keyspace and configure its replication strategy and replication factor.

cqlsh 10.10.10.21 <port> -u xxxx -p xxx

CREATE KEYSPACE retail_ks
WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': 1
};

-- verify 
DESCRIBE KEYSPACES;


-- Created target tables in cassandra
USE retail_ks;

CREATE TABLE sales_transactions
(
   txn_id int,
   customer_name text,
   product_code text,
   txn_amount decimal,
   txn_timestamp timestamp,
   PRIMARY KEY(txn_id)
);

-- verofy table structure 
DESCRIBE TABLE retail_ks.sales_transactions;

Step 6 – Install GoldenGate on Oracle

Oracle GoldenGate software must be installed on the source server to capture database changes. In production environments, the silent installation method is commonly used because it allows automated and repeatable deployments without manual interaction.

-- Unpack GoldenGate binaries.
unzip 231000_fbo_ggs_Linux_x64_Oracle.zip

-- Run silent installation. Respoonse file
INSTALL_OPTION=ora23ai
SOFTWARE_LOCATION=/u01/app/ogg
START_MANAGER=true
MANAGER_PORT=7809
DATABASE_LOCATION=/u01/app/oracle/product/19c


-- Install
./runInstaller -silent -responseFile ogg.rsp

-- LaUnch OGG
cd /u01/app/ogg
./ggsci

GGSCI> INFO ALL
Program     Status
MANAGER     RUNNING

Step 7 – Enable Table Level Replication

Login to database and enable supplemental logging for the table.

GGSCI> DBLOGIN USERID ggadmin PASSWORD Welcome#123

GGSCI> ADD TRANDATA SALES_TRANSACTIONS
OGG-15132 Logging supplemental redo data enabled

Step 8 – Configure Extract

Create parameter file.

GGSCI> EDIT PARAMS EXT_SALES

EXTRACT EXT_SALES
USERID ggadmin PASSWORD Welcome#123
EXTTRAIL ./dirdat/st
TRANLOGOPTIONS INTEGRATEDPARAMS (PARALLELISM 4)
TABLE SALES_TRANSACTIONS;


-- Create extract
ADD EXTRACT EXT_SALES INTEGRATED TRANLOG BEGIN NOW
ADD EXTTRAIL ./dirdat/st EXTRACT EXT_SALES

-- start & verifgy its status 
START EXTRACT EXT_SALES
INFO EXTRACT EXT_SALES

EXTRACT    EXT_SALES     Last Started 2026-03-15 21:48   Status RUNNING
Checkpoint Lag           00:00:00 (updated 00:00:04 ago)
Process ID               32418
Log Read Checkpoint      Oracle Redo Logs
                        SCN  845732918 (0x0000.3267a3b6)
                        2026-03-15 21:48:12

Step 9 – Configure Data Pump

The GoldenGate Data Pump reads captured changes from the local trail files and forwards them to the target GoldenGate deployment. This process improves reliability and allows data to be securely transported to the target environment.

EDIT PARAMS PUMP_SALES

EXTRACT PUMP_SALES
RMTHOST 10.10.10.21, MGRPORT 7809
RMTTRAIL ./dirdat/rt
TABLE SALES_TRANSACTIONS;

-- add pump
ADD EXTRACT PUMP_SALES EXTTRAILSOURCE ./dirdat/st
ADD RMTTRAIL ./dirdat/rt EXTRACT PUMP_SALES

-- Start Pump
START EXTRACT PUMP_SALES
INFO PUMP_SALES

EXTRACT    PUMP_SALES    Last Started 2026-03-15 21:48   Status RUNNING
Checkpoint Lag           00:00:00 (updated 00:00:02 ago)
Process ID               32455
Log Read Checkpoint      File ./dirdat/es000000000
                        RBA 18425

Step 10 – Configure Cassandra Handler

On the GoldenGate Big Data / DAA node, configure the Cassandra handler properties so that the Replicat process knows how to connect to the DataStax Cassandra cluster and apply incoming changes. This configuration defines connection details, consistency level, datacenter information, and the Java classpath required for the Cassandra driver.

This configuration allows the GoldenGate Replicat process to establish a connection with the Cassandra cluster and apply transactional changes coming from the Oracle source database.

[oggadmin@cass-hub01 ~]$ vi /u01/app/ogg_23ai/dirprm/cassandra.props
gg.handlerlist=cassandra

gg.handler.cassandra.type=cassandra
gg.handler.cassandra.mode=op
gg.handler.cassandra.contactPoints=10.20.30.41
gg.handler.cassandra.port=9042
gg.handler.cassandra.keyspace=retail_ops_ks
gg.handler.cassandra.consistencyLevel=LOCAL_QUORUM
gg.handler.cassandra.cassandraMode=async
gg.handler.cassandra.compressedUpdates=true
gg.handler.cassandra.datacenter=DC_MAIN
gg.log=log4j
gg.log.level=INFO
gg.report.time=30sec
gg.classpath=/opt/datastax/cassandra/resources/cassandra/lib/*:/u01/app/ogg_23ai/DependencyDownloader/dependencies/cassandra_4.1/*
javawriter.stats.full=true
javawriter.stats.display=TRUE

-- If authentication is enabled in DataStax Cassandra, the property file can also include credentials depending on the adapter version and security model being used.
-- exampl;e
gg.handler.cassandra.username=oggsvc
gg.handler.cassandra.password=Str0ngPassw0rd

Step 11 – Configure Replicat

At this stage, the source-side Extract and Data Pump are already running and writing trail records to the remote trail on the target GoldenGate deployment. The next step is to configure the Replicat process on the Cassandra side so that trail records can be applied into the target DataStax Cassandra keyspace and table.

In a GoldenGate Big Data or DAA style deployment, the first thing you should always do is connect to the deployment administration interface and verify that the deployment is up before creating the Replicat.

-- Loginto deployment
[oggadmin@cass-hub01 ~]$ cd /u01/app/ogg_23ai/bin
[oggadmin@cass-hub01 bin]$ ./adminclient

Oracle GoldenGate Admin Client for Oracle
Version 23.10.0.25.10 OGGCORE_23.10.0.0.0_PLATFORMS_251018.0830

Copyright (C) 1995, 2025, Oracle and/or its affiliates. All rights reserved.

Linux, x64, 64bit (optimized), Generic on Oct 18 2025 14:00:54
Operating system character set identified as UTF-8.


-- connect to the deployment
OGG (not connected) 1> CONNECT http://10.10.40.25:9001 deployment CassandraDAA as oggadmin password "xxxxxx";
Successfully connected to CassandraDAA.

Before creating the Replicat, verify that the remote trail files are arriving from the Data Pump. This is a simple but important validation step. If the trail is not being written, the Replicat will start but will have nothing to process.

[oggadmin@cass-hub01 ~]$ cd /u01/app/ogg_23ai/var/lib/data
[oggadmin@cass-hub01 data]$ ls -ltr rt*

-rw-r----- 1 oggadmin oinstall  1048576 Mar 15 20:11 rt000000000
-rw-r----- 1 oggadmin oinstall   524288 Mar 15 20:16 rt000000001

This confirms that the remote trail prefix rt is present and the target deployment is receiving records from the source-side Pump process.

Next lets create the replicate.

OGG (http://10.10.40.25:9001 CassandraDAA as oggadmin@CassandraDAA) 3> EDIT PARAMS REP_CASS

REPLICAT REP_CASS
TARGETDB LIBFILE libggjava.so SET property=dirprm/cassandra.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 1000
DISCARDFILE ./var/lib/data/REP_CASS.dsc, APPEND, MEGABYTES 500
DISCARDROLLOVER AT 02:00
MAP APP_OWNER.SALES_ORDERS_SRC,
TARGET retail_ops_ks.sales_orders_rt;

-- add the repl process
OGG (http://10.10.40.25:9001 CassandraDAA as oggadmin@CassandraDAA) 4> ADD REPLICAT REP_CASS, EXTTRAIL ./var/lib/data/rt
REPLICAT added.

-- staret the process
OGG (http://10.10.40.25:9001 CassandraDAA as oggadmin@CassandraDAA) 5> START REPLICAT REP_CASS
Sending START request to Manager ...
REPLICAT REP_CASS starting.

REPLICAT   REP_CASS      Last Started 2026-03-15 21:49   Status RUNNING
Checkpoint Lag           00:00:00 (updated 00:00:01 ago)
Process ID               32504
Log Read Checkpoint      File ./dirdat/rt000000000
                        RBA 18874


Params explanations …

REPLICAT REP_CASS defines the Replicat group name.

TARGETDB LIBFILE libggjava.so SET property=dirprm/cassandra.props tells GoldenGate to use the Java-based Big Data adapter library and load all Cassandra handler settings from the referenced properties file.

REPORTCOUNT EVERY 1 MINUTES, RATE prints transaction statistics regularly into the report file, which is very useful during initial testing and performance validation.

GROUPTRANSOPS 1000 batches operations to improve throughput. This value can be tuned depending on transaction profile, row size, and Cassandra write latency.
DISCARDFILE captures records that fail processing, while DISCARDROLLOVER keeps the discard file manageable.
The MAP statement defines the source Oracle table and the target DataStax Cassandra table.

In this example:
Oracle source table: APP_OWNER.SALES_ORDERS_SRC
Cassandra target table: retail_ops_ks.sales_orders_rt

Step 12 – Testing End-to-End Replication (Oracle to DataStax Cassandra)

In this section we will validate that the replication pipeline is working correctly by performing a few test transactions on the Oracle source database and confirming that the changes are applied on the DataStax Cassandra target table.

I am not going to discuss the initial load process here since it is relatively straightforward and typically performed using standard GoldenGate initial load methods or bulk data movement tools. For the purpose of this test, we assume that the base dataset has already been loaded into Cassandra and that GoldenGate is responsible only for capturing and replicating incremental changes.

-- Source DB

SQL*Plus: Release 19.0.0.0.0 - Production
Version 19.21.0.0.0

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0

SELECT * FROM SALES_ORDERS_SRC;
ORDER_ID   CUSTOMER_NAME   ORDER_STATUS   ORDER_AMOUNT
--------   -------------   ------------   ------------
1001       NORTH RETAIL    CREATED        450
1002       ALPHA STORES    SHIPPED        720
1003       OMEGA MART      CREATED        610

On Target/Cassandra cluster

cqlsh 10.20.30.41 9042 -u cassandra -p cassandra
Connected to RetailCluster at 10.20.30.41:9042
[cqlsh 6.1.0 | DSE 6.8.x | CQL spec 3.4.6]

USE retail_ops_ks;

SELECT * FROM sales_orders_rt;
order_id | customer_name | order_status | order_amount
---------+---------------+--------------+--------------
1001     | NORTH RETAIL  | CREATED      | 450
1002     | ALPHA STORES  | SHIPPED      | 720
1003     | OMEGA MART    | CREATED      | 610

-- This confirms that the initial dataset already exists on both systems.

Lets do some changes on source and see how it reaches on target.

INSERT INTO SALES_ORDERS_SRC
VALUES (1004,'PACIFIC TRADING','CREATED',980);
COMMIT;

1 row created.

Commit complete.



-- Veroify OGG 
INFO EXTRACT EXT_SALES
EXTRACT EXT_SALES Last Started 2026-03-16 20:12 Status RUNNING
Checkpoint Lag 00:00:00
Log Read Checkpoint
SCN 845736122

-- scn moVED. This confirms that the Extract process captured the transaction from the Oracle redo logs.

Now lets check it on target

SELECT * FROM sales_orders_rt
WHERE order_id = 1004;

order_id | customer_name    | order_status | order_amount
---------+------------------+--------------+--------------
1004     | PACIFIC TRADING  | CREATED      | 980

The inserted record has successfully replicated.

By performing these test operations, we validated that:

• GoldenGate Extract captured Oracle redo changes
• Data Pump delivered trail files to the target deployment
• Replicat applied changes into DataStax Cassandra

The Oracle —> GoldenGate —-> Cassandra replication pipeline is now functioning correctly.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »