Posts Tagged ‘troubleshooting’

Why oracle’s optimizer has been getting smarter for 15 years and what 26ai version actually adds

Posted by FatDBA on March 31, 2026

Every bad execution plan you’ve ever debugged traces back to the same root cause. The optimizer made a wrong guess about how many rows an operation would return ..and built an entire plan on top of that wrong number.

That number is called cardinality. It’s the estimated row count for each operation in your plan. Get it right and the optimizer picks the right join order, the right join method, the right access path. Get it wrong and you get a nested loops join against a table that returns 500,000 rows when the optimizer thought it was 12. You’ve seen this plan. It hurt.

Oracle has been progressively solving this problem for 15+ years. The story isn’t a single breakthrough ..it’s a series of increasingly smarter mechanisms, each one handling a class of estimation problem that the previous one couldn’t.

Here’s the full honest picture, ending with what 26ai actually adds.

Oracle 10g .. Dynamic Sampling The optimizer noticed when stats were missing or insufficient and sampled the data at parse time to get a rough estimate. Controlled by OPTIMIZER_DYNAMIC_SAMPLING. Blunt instrument, but better than pure guesswork.

Oracle 11g .. Cardinality Feedback The optimizer started comparing its estimates to reality after execution. If it estimated 50 rows and got 50,000, it stored the real number in the SGA and flagged the statement for re-optimization on the next execution. The estimate corrected itself over time. The problem: stored in SGA only — lost on restart, lost when the cursor aged out.

Oracle 12c …The Big Jump Three things landed together:

Statistics Feedback (renamed from Cardinality Feedback): same learning mechanism, better persistence
Adaptive Plans: the optimizer could now switch join methods mid-execution .. starting with Nested Loops based on its estimate, and switching to Hash Join live if actual rows exceeded the threshold. The final plan was then fixed for subsequent executions
SQL Plan Directives: when a misestimate was detected, the optimizer created a persistent directive (stored in SYSAUX, survives restarts) that told future parses: “when you see this predicate pattern, gather dynamic statistics first”. Directives are cross-statement … query’s lesson protects another with the same predicate pattern

Oracle 19c/23ai ..Automation at Scale Automatic SQL Tuning Sets (ASTS), Automatic SPM, and Real-Time SPM turned the individual learning mechanisms into a system-level feedback loop. The database wasn’t just learning from single statements .. it was maintaining plan stability across the entire workload automatically.

Oracle 26ai (23.8 RU) .. The Specific New Additions Two documented, named improvements to cardinality estimation:

Dynamic Statistics for PL/SQL Functions : a new parameter plsql_function_dynamic_stats giving fine-grained control over whether PL/SQL functions called inside SQL can participate in dynamic statistics sampling at parse time. Previously the optimizer treated PL/SQL functions as black boxes with unknowable return cardinality. Now it can sample them.
PL/SQL to SQL Transpiler : when enabled, the optimizer inlines eligible PL/SQL functions directly into SQL at parse time, eliminating the black box entirely. The optimizer can now see and estimate the underlying SQL expression rather than guessing at what a function returns.

Plus the general continued improvement of ML-informed cost models inside the optimizer engine .. real, but not a named switchable feature.

Now let’s see all of this in the plan output where it actually matters and I will do a quick demo — The single most important diagnostic habit in Oracle performance work. The GATHER_PLAN_STATISTICS hint tells the optimizer to track actual row counts during execution, then ALLSTATS LAST in DBMS_XPLAN surfaces them alongside the estimates.

-- Prereqs (run as SYS)
GRANT ADVISOR TO sh;
GRANT ADMINISTER SQL MANAGEMENT OBJECT TO sh;

CONN sh/sh

-- Set output format for readable plans
SET LINESIZE 200
SET PAGESIZE 10000
SET LONG 100000

-- Run the query with stats collection enabled
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;



SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));
```

**Output — before extended statistics exist:**
```
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |     17 |    127 |  1890 |    1533 |  <- E:17, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |     17 |    127 |   406 |    1213 |  <- E:17, A:127
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

Predicate Information:
   2 - access("S"."CUST_ID"="C"."CUST_ID")
   3 - filter("C"."CUST_STATE_PROVINCE"='CA'
          AND "C"."CUST_INCOME_LEVEL"='G: 130,000 - 149,999')

E-Rows: 17. A-Rows: 127. That’s a 7.5x underestimate.

The optimizer assumed cust_state_province = 'CA' and cust_income_level = 'G: 130,000 - 149,999' were independent. They’re not — they’re correlated. California has a disproportionate number of high-income customers in this dataset. The optimizer applied the selectivity of each predicate independently, multiplied them, and got the wrong answer.

This is the classic multi-column predicate correlation problem. The fix is extended statistics.

Lets try to fix it via extended statistics: Extended statistics (column groups) teach the optimizer about correlated columns. One DBMS_STATS call, no schema changes.

-- Create a column group for the two correlated columns
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(
  ownname  => 'SH',
  tabname  => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
) AS col_group_name
FROM DUAL;

-- COL_GROUP_NAME
-- SYS_STUFBF#JKQM8F3GTPA7XDE9  (system-generated name)

-- Now gather stats to populate the column group
EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname    => 'SH',
  tabname    => 'CUSTOMERS',
  method_opt => 'FOR ALL COLUMNS SIZE AUTO'
);


---- Lets re runn the same Sql.
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*)           AS num_orders,
  SUM(s.amount_sold) AS revenue
FROM   sales     s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


After extended statistics:
Plan hash value: 3421987654

-------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Starts | E-Rows | A-Rows | Cost  | Buffers |
-------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |      1 |        |      1 |  1891 |    1533 |
|  1 |  HASH GROUP BY       |           |      1 |      1 |      1 |  1891 |    1533 |
|* 2 |   HASH JOIN          |           |      1 |    124 |    127 |  1890 |    1533 |  <- E:124, A:127
|* 3 |    TABLE ACCESS FULL | CUSTOMERS |      1 |    124 |    127 |   406 |    1213 |  <- Near perfect
|   4|    PARTITION RANGE   |           |      1 |    918K|    918K|  1459 |     320 |
|   5|     TABLE ACCESS FULL| SALES     |     28 |    918K|    918K|  1459 |     320 |
-------------------------------------------------------------------------------------------

E-Rows went from 17 to 124. Actual is 127. That’s less than 3% off.

Same plan hash .. same shape. But now the cost model is working from accurate numbers. In a more complex query, this difference in estimated rows would change join order, join method, and index decisions.

nEXT, Lets see SQL Plan directives and see watching optimizer learn.

When the optimizer detects a cardinality misestimate during execution, it creates a SQL Plan Directive — a persistent instruction stored in SYSAUX telling future parses to gather dynamic statistics for this predicate pattern. You can watch this happen.

First, drop the extended stats so the misestimate recurs:

-- Reset: drop the column group
EXEC DBMS_STATS.DELETE_EXTENDED_STATS(
  ownname   => 'SH',
  tabname   => 'CUSTOMERS',
  extension => '(CUST_STATE_PROVINCE, CUST_INCOME_LEVEL)'
);

EXEC DBMS_STATS.GATHER_TABLE_STATS('SH', 'CUSTOMERS');

-- lets fliush the SP and re-run.

-- As SYS (flush shared pool in test environment only)
ALTER SYSTEM FLUSH SHARED_POOL;

CONN sh/sh

-- Run with stats collection
SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;


-- Lets see if the directive was crwated. 

-- Check for new SQL Plan Directives on CUSTOMERS
SELECT d.directive_id,
       d.type,
       d.state,
       d.auto_drop,
       d.created,
       o.object_name,
       o.subobject_name  AS column_name
FROM   dba_sql_plan_directives     d
JOIN   dba_sql_plan_dir_objects    o
       ON d.directive_id = o.directive_id
WHERE  o.object_name = 'CUSTOMERS'
ORDER  BY d.created DESC;


Output — directive created after the misestimate:

DIRECTIVE_ID  TYPE             STATE   AUTO_DROP CREATED              OBJECT  COLUMN_NAME
------------  ---------------  ------  --------- -------------------  ------  -------------------
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_STATE_PROVINCE
8273641920    DYNAMIC_SAMPLING USABLE  YES        2026-03-29 14:33:12 CUSTOMERS CUST_INCOME_LEVEL

The optimizer created a directive covering both columns … it noticed the multi-column predicate correlation caused a misestimate and now knows to sample dynamically next time it sees this pattern. Run the query a second time:

SELECT /*+ GATHER_PLAN_STATISTICS */
  c.cust_state_province,
  COUNT(*), SUM(s.amount_sold)
FROM   sales s
JOIN   customers c ON s.cust_id = c.cust_id
WHERE  c.cust_state_province = 'CA'
AND    c.cust_income_level   = 'G: 130,000 - 149,999'
GROUP  BY c.cust_state_province;

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


At the bottom of the plan output:
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

The optimizer is now dynamically sampling at parse time because the directive told it to. The cardinality estimate will be much closer to reality on this execution.

Now new in 26ai ..The SQL aanalysis report .. This is the part that’s genuinely new in 26ai. Previously you had to know to look at E-Rows vs A-Rows yourself. The SQL Analysis Report .. surfaced directly in DBMS_XPLAN.DISPLAY_CURSOR output …flags these problems inline without you having to hunt for them.

-- The standard DISPLAY_CURSOR call — no extra parameters needed
-- SQL Analysis Report appears automatically in 26ai when issues exist

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id          => NULL,
  cursor_child_no => 0,
  format          => 'ALLSTATS LAST +COST'
));


In Oracle 26ai, after the standard execution plan output, you now see:

SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
--------------------------------------------------------------------------------
3 - SEL$1 / "C"@"SEL$1"
  - The following columns have predicates which prevent their use as keys
    in an index range scan. Consider rewriting the predicates or creating
    column group statistics.
    "CUST_STATE_PROVINCE", "CUST_INCOME_LEVEL"

The optimizer is telling you directly: these two columns in combination are causing an estimation problem, and column group statistics would fix it. You no longer have to derive this by comparing E-Rows and A-Rows yourself. It’s surfaced automatically in the plan output.

That’s a real DBA quality-of-life improvement. The diagnosis that used to take 10 minutes of plan reading is now one line in your standard plan output.

Okay next is Dynamic Stats for PL/SQL Functions …. This is the specific new documented feature in 26ai (RU 23.8). Consider a query that filters through a PL/SQL function:

-- A function the optimizer previously couldn't estimate
CREATE OR REPLACE FUNCTION sh.get_high_value_threshold
RETURN NUMBER DETERMINISTIC IS
BEGIN
  RETURN 1000;
END;
/

-- Query using the function in a predicate
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();


Before 26ai (or with `plsql_function_dynamic_stats = 'OFF'`):

The optimizer treats `get_high_value_threshold()` as a black box. It has no idea what value the function returns, 
so it can't estimate selectivity. It either guesses based on defaults or uses a very conservative estimate.

| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |   9188 |  12116 |  <- Rough guess
|*  3|    TABLE ACCESS FULL | SALES |   9188 |  12116 |

In 26ai with plsql_function_dynamic_stats = 'ON':

-- Enable dynamic stats for PL/SQL functions (session level)
ALTER SESSION SET plsql_function_dynamic_stats = 'ON';

-- Rerun
SELECT /*+ GATHER_PLAN_STATISTICS */
  COUNT(*),
  SUM(amount_sold)
FROM   sh.sales
WHERE  amount_sold > sh.get_high_value_threshold();

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  format => 'ALLSTATS LAST +NOTE'
));


| Id | Operation            | Name  | E-Rows | A-Rows |
|  0 | SELECT STATEMENT     |       |        |      1 |
|  1 |  SORT AGGREGATE      |       |      1 |      1 |
|* 2 |   PARTITION RANGE ALL|       |  12203 |  12116 |  <- Near accurate
|*  3|    TABLE ACCESS FULL | SALES |  12203 |  12116 |

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - PL/SQL function sampled for dynamic statistics

The optimizer called the function during dynamic statistics gathering at parse time, got the actual return value (1000), and used it to estimate selectivity accurately. E-Rows 12,203 vs A-Rows 12,116 — less than 1% off.

You can also control this at the object level, which is the right approach in production .. turn it on only for specific functions you trust:

-- Prefer object-level control in production
-- Allow dynamic stats for a specific function
EXEC DBMS_STATS.SET_FUNCTION_PREFS(
  ownname   => 'SH',
  funcname  => 'GET_HIGH_VALUE_THRESHOLD',
  pref_name => 'PLSQL_FUNCTION_DYNAMIC_STATS',
  pref_value => 'ON'
);

-- Check current settings
SELECT function_name, preference_name, preference_value
FROM   all_stat_extensions
WHERE  object_type = 'FUNCTION'
AND    owner = 'SH';

So, in short, Oracle’s optimizer hasn’t made one big leap … it’s made fifteen years of deliberate, incremental improvements, each one handling a class of cardinality problem the previous release couldn’t.

What 26ai specifically adds isn’t magic. It’s two concrete, named, documented improvements … dynamic statistics for PL/SQL functions, and the SQL Analysis Report surfacing optimizer advice inline .. plus the PL/SQL transpiler removing the problem class entirely for eligible functions. These are real. They’re testable. They’re in the docs.

The underlying ML enhanced cost modelling is also real, but it’s an evolutionary improvement without a named switch … Oracle’s engineering continues to get better at estimating costs, particularly for complex workloads, vector queries, and correlated predicates. That’s not hype. It’s just not a single feature you can point to in the docs either.

Know your E-Rows vs A-Rows. Know your SQL Plan Directives. Know your extended statistics. And in 26ai, let the SQL Analysis Report do the first pass for you.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: 26ai, ai, databases, fatdba, performance, plsql, renaps, troubleshooting | Leave a Comment »

Parquet, hadoop, and a quietly dying process : lessons from a migration test using GoldenGate 23ai DAA

Posted by FatDBA on February 8, 2026

I was doing some hands-on testing with Oracle GoldenGate 23ai DAA, trying to move data from an old but reliable Oracle 11g database into Microsoft Azure Fabric. The idea was simple enough. Capture changes from Oracle 11g, push them through GoldenGate 23ai, and land them in Fabric OneLake so they could be used by a Lakehouse or a Mirrored Database. On paper, it sounded clean. In real life… well, it took a bit of digging.

The source side was boring in a good way. Oracle 11g behaved exactly as expected. Extracts were running, trails were getting generated, no drama there. The real work was on the target side. I configured a Replicat using the File Writer with Parquet output, since Parquet is the natural fit for Microsoft Fabric. Fabric loves Parquet. Lakehouse loves Parquet. Mirrored databases too. So far, so good.

I started the Replicat and GoldenGate politely told me it had started. That tiny moment of relief you get when a command doesn’t fail right away. But then I checked the status… and it was STOPPED. No lag, no progress, nothing. That’s usually when you know something went wrong very early, before any real work even started.

So I opened the report file. And there it was. A Java error staring right back at me:

OGG (http://192.168.10.10:9001 OGG23AIDAA as BigData@) 18> START REPLICAT FATD11D
2025-12-12T21:25:18Z  INFO    OGG-00975  Replicat group FATD11D starting.
2025-12-12T21:25:18Z  INFO    OGG-15445  Replicat group FATD11D started.


OGG (http://192.168.10.10:9001 OGG23AIDAA as BigData@) 20> info replicat FATD11D

Replicat   FATD11D    Initialized  2025-12-12 16:24   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:00:55 ago)
Log Read Checkpoint  File dirdat/i1000000000
                     First Record  RBA 0
Encryption Profile   LocalWallet





OGG (http://192.168.10.10:9001 OGG23AIDAA as BigData@) 21> view report FATD11D

***********************************************************************
     Oracle GoldenGate for Distributed Applications and Analytics
                   Version 23.10.0.25.10 (Build 001)

                      Oracle GoldenGate Delivery
 Version 23.10.1.25.10 OGGCORE_23.10.0.0.0OGGRU_LINUX.X64_251018.0830
    Linux, x64, 64bit (optimized), Generic on Oct 18 2025 14:00:54

Copyright (C) 1995, 2025, Oracle and/or its affiliates. All rights reserved.

                    Starting at 2025-12-12 16:25:18
***********************************************************************

2025-12-12 16:25:19  INFO    OGG-15052  Using Java class path: /testgg/app/ogg/ogg23ai/ogg23aidaa_MA//ggjava/ggjava.jar:/testgg/app/ogg/ogg23ai/ogg23aidaa_DEPLOYMENT/etc/conf/ogg:/u01/app/ogg/ogg
23ai/ogg23aidaa_MA/.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/hadoop/metadata/CompressionCodecName
        at oracle.goldengate.eventhandler.parquet.ParquetEventHandlerProperties.<init>(ParquetEventHandlerProperties.java:43)
        at oracle.goldengate.eventhandler.parquet.ParquetEventHandler.<init>(ParquetEventHandler.java:53)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at java.base/java.lang.Class.newInstance(Class.java:587)
        at oracle.goldengate.datasource.eventhandler.EventHandlerFramework.instantiateEventHandler(EventHandlerFramework.java:219)
        at oracle.goldengate.datasource.eventhandler.EventHandlerFramework.initEventHandler(EventHandlerFramework.java:163)
        at oracle.goldengate.datasource.eventhandler.EventHandlerFramework.init(EventHandlerFramework.java:58)
        at oracle.goldengate.handler.filewriter.FileWriterHandlerEO.init(FileWriterHandlerEO.java:627)
        at oracle.goldengate.datasource.AbstractDataSource.addDataSourceListener(AbstractDataSource.java:602)
        at oracle.goldengate.datasource.factory.DataSourceFactory.getDataSource(DataSourceFactory.java:164)
        at oracle.goldengate.datasource.UserExitDataSourceLauncher.<init>(UserExitDataSourceLauncher.java:45)
        at oracle.goldengate.datasource.UserExitMain.main(UserExitMain.java:109)
Caused by: java.lang.ClassNotFoundException: org.apache.parquet.hadoop.metadata.CompressionCodecName
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
        ... 15 more
2025-12-12 16:25:22  WARNING OGG-00869  java.lang.ClassNotFoundException: org.apache.parquet.hadoop.metadata.CompressionCodecName.

Source Context :
  SourceFile              : [/ade/aime_phxdbifa87/oggcore/OpenSys/src/gglib/ggdal/Adapter/Java/JavaAdapter.cpp]
  SourceMethod            : [HandleJavaException]
  SourceLine              : [350]
  ThreadBacktrace         : [19] elements
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libgglog.so(CMessageContext::AddThreadContext())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libgglog.so(_MSG_String(CSourceContext*, int, char const*, CMessageFactory::MessageDisposition))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libggjava.so()]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libggjava.so(ggs::gglib::ggdal::CJavaAdapter::Open())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::ggdal::CDALAdapter::Open())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(GenericImpl::Open())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(GenericImpl::GetWriter())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(GenericImpl::GetGenericDBType())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::er::ReplicatContext::ReplicatContext(ggs::gglib::ggapp::ReplicationContextParams const&, bool, ggs::gglib::
ggmetadata::MetadataContext*, ggs::er::ReplicatContext::LogBSNManager*))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::er::ReplicatContext::createReplicatContext(ggs::gglib::ggapp::ReplicationContextParams const&, ggs::gglib::
ggdatasource::DataSourceParams const&, ggs::gglib::ggmetadata::MetadataContext*))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat()]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(main)]
                          : [/lib64/libc.so.6()]
                          : [/lib64/libc.so.6(__libc_start_main)]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(_start)]

2025-12-12 16:25:22  ERROR   OGG-15051  Java or JNI exception:
java.lang.NoClassDefFoundError: org/apache/parquet/hadoop/metadata/CompressionCodecName.

2025-12-12 16:25:22  ERROR   OGG-01668  PROCESS ABENDING.

At that point it clicked. GoldenGate itself was fine. Oracle 11g was fine. Fabric wasn’t even in the picture yet. The problem was simpler. The Parquet libraries were missing.

All of the pre-reqs are there in the DependencyDownloader directory. Inside you will find all scripts for everything… Parquet, Hadoop, OneLake, Kafka, and more. Before touching anything, I checked Java. Java 17 was already installed. I ran the Parquet dependency script. Maven kicked in, downloaded a bunch of JARs, and finished successfully. I restarted the Replicat, feeling pretty confident. And… it failed again. Different error this time, though, which honestly felt like progress.

[oggadmin@D-ADON-01-CC-VM bin]$
[oggadmin@D-ADON-01-CC-VM bin]$ find /u01/app/ogg/ogg23ai -name "*.properties" | egrep -i "sample|example|handler|parquet|filewriter" | head -n 20
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/oci.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/kafka.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/hbase.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/parquet.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/kafkaconnect.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/azureservicebus.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/mongo.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/filewriter.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/bigquery.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/nosql.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/hdfs.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/synapse.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/redshift.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/pubsub.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/s3.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/redis.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/elasticsearch.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/jdbc.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/adw.properties
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/AdapterExamples/templates/jms.properties
[oggadmin@D-ADON-01-CC-VM bin]$




[oggadmin@D-ADON-01-CC-VM bin]$
[oggadmin@D-ADON-01-CC-VM bin]$ ls -ltrh /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/ggjava
total 60K
-rwxrwxr-x. 1 oggadmin ogg  34K Jun  5  2024 NOTICES.txt
-rwxrwxr-x. 1 oggadmin ogg   95 Oct 21 10:50 ggjava-version.txt
-rwxrwxr-x. 1 oggadmin ogg 9.5K Oct 21 10:50 ggjava.jar
drwxr-xr-x. 5 oggadmin ogg 4.0K Jan 29 16:51 resources
drwxr-xr-x. 6 oggadmin ogg 4.0K Jan 29 16:51 maven-3.9.6



[oggadmin@D-ADON-01-CC-VM bin]$ find /u01/app/ogg/ogg23ai -iname "onelake.sh" -o -iname "*parquet*.sh" -o -iname "*dependency*.sh"
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/onelake.sh
/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/parquet.sh
[oggadmin@D-ADON-01-CC-VM bin]$ /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/onelake.sh


[oggadmin@D-ADON-01-CC-VM bin]$ cd /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$ ls
aws.sh                    cassandra_dse.sh          gcs.sh                    hbase_hortonworks.sh         kafka.sh             orc.sh             snowflake.sh
azure_blob_storage.sh     cassandra.sh              googlepubsub.sh           hbase.sh                     kinesis.sh           parquet.sh         snowflakestreaming.sh
bigquery.sh               config_proxy.sh           hadoop_azure_cloudera.sh  internal_scripts             mongodb_capture.sh   project            synapse.sh
bigquerystreaming.sh      databricks.sh             hadoop_cloudera.sh        kafka_cloudera.sh            mongodb.sh           redis.sh           velocity.sh
cassandra_capture_3x.sh   docs                      hadoop_hortonworks.sh     kafka_confluent_protobuf.sh  onelake.sh           redshift.sh        xmls
cassandra_capture_4x.sh   download_dependencies.sh  hadoop.sh                 kafka_confluent.sh           oracle_nosql_sdk.sh  s3.sh
cassandra_capture_dse.sh  elasticsearch_java.sh     hbase_cloudera.sh         kafka_hortonworks.sh         oracle_oci.sh        snowflake-fips.sh
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$






[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$ java -version
openjdk version "17.0.18" 2026-01-20 LTS
OpenJDK Runtime Environment (Red_Hat-17.0.18.0.8-1.0.1) (build 17.0.18+8-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-17.0.18.0.8-1.0.1) (build 17.0.18+8-LTS, mixed mode, sharing)
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$






[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$ ./onelake.sh
openjdk version "17.0.18" 2026-01-20 LTS
Java is installed.
Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven is accessible.
Root Configuration Script
INFO: This is the Maven binary [../../ggjava/maven-3.9.6/bin/mvn].
INFO: This is the location of the settings.xml file [./docs/settings_np.xml].
INFO: This is the location of the toolchains.xml file [./docs/toolchains.xml].
INFO: The dependencies will be written to the following directory[../dependencies/onelake].
INFO: The Maven coordinates are the following:
INFO: Dependency 1
INFO: Group ID [com.azure].
INFO: Artifact ID [azure-storage-file-datalake].
INFO: Version [12.20.0]
INFO: Dependency 2
INFO: Group ID [com.azure].
INFO: Artifact ID [azure-identity].
INFO: Version [1.13.1]
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------< oracle.goldengate:dependencyDownloader >---------------
[INFO] Building dependencyDownloader 1.0
[INFO]   from pom_central_v2.xml
[INFO] --------------------------------[ pom ]---------------------------------
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/3.2.0/maven-clean-plugin-3.2.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/3.2.0/maven-clean-plugin-3.2.0.pom (5.3 kB at 24 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-plugins/35/maven-plugins-35.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-plugins/35/maven-plugins-35.pom (9.9 kB at 431 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/35/maven-parent-35.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/35/maven-parent-35.pom (45 kB at 1.7 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/apache/25/apache-25.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/apache/25/apache-25.pom (21 kB at 1.0 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/3.2.0/maven-clean-plugin-3.2.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/3.2.0/maven-clean-plugin-3.2.0.jar (36 kB at 1.4 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-dependency-plugin/2.9/maven-dependency-plugin-2.9.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-dependency-plugin/2.9/maven-dependency-plugin-2.9.pom (13 kB at 602 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins
.........
...............
...................
[INFO] Copying netty-tcnative-boringssl-static-2.0.65.Final-windows-x86_64.jar to /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/onelake/netty-tcnative-boringssl-static-2.0.65.Final-windows-x86_64.jar
[INFO] Copying reactive-streams-1.0.4.jar to /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/onelake/reactive-streams-1.0.4.jar
[INFO] Copying oauth2-oidc-sdk-11.9.1.jar to /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/onelake/oauth2-oidc-sdk-11.9.1.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  8.334 s
[INFO] Finished at: 2025-12-12T16:45:52-05:00
[INFO] ------------------------------------------------------------------------
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$













[oggadmin@D-ADON-01-CC-VM templates]$ cd /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader
[oggadmin@D-ADON-01-CC-VM templates]$   ./parquet.sh 1.13.1
openjdk version "17.0.18" 2026-01-20 LTS
Java is installed.
Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven is accessible.
Root Configuration Script
INFO: This is the Maven binary [../../ggjava/maven-3.9.6/bin/mvn].
INFO: This is the location of the settings.xml file [./docs/settings_np.xml].
INFO: This is the location of the toolchains.xml file [./docs/toolchains.xml].
INFO: The dependencies will be written to the following directory[../dependencies/parquet_1.13.1].
.....
...........
.................
.....
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-hadoop/1.13.1/parquet-hadoop-1.13.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-hadoop/1.13.1/parquet-hadoop-1.13.1.pom (15 kB at 69 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet/1.13.1/parquet-1.13.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet/1.13.1/parquet-1.13.1.pom (25 kB at 790 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-column/1.13.1/parquet-column-1.13.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-column/1.13.1/parquet-column-1.13.1.pom (6.0 kB at 238 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-common/1.13.1/parquet-common-1.13.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-common/1.13.1/parquet-common-1.13.1.pom (3.4 kB at 143 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-format-structures/1.13.1/parquet-format-structures-1.13.1.pom
......
..............
...............
[INFO] Copying jackson-annotations-2.12.7.jar to /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/parquet_1.13.1/jackson-annotations-2.12.7.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  2.119 s
[INFO] Finished at: 2025-12-12T16:52:03-05:00
[INFO] ------------------------------------------------------------------------

Once again the replicate on target side failed to start and this time with a different error.

OGG (http://192.168.10.10:9001 OGG23AIDAA as BigData@) 8>  info REPLICAT FATD11D

Replicat   FATD11D    Initialized  2025-12-12 16:24   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:34:28 ago)
Log Read Checkpoint  File dirdat/i1000000000
                     First Record  RBA 0
Encryption Profile   LocalWallet


OGG (http://192.168.10.10:9001 OGG23AIDAA as BigData@) 9> view report FATD11D

***********************************************************************
     Oracle GoldenGate for Distributed Applications and Analytics
                   Version 23.10.0.25.10 (Build 001)

                      Oracle GoldenGate Delivery
 Version 23.10.1.25.10 OGGCORE_23.10.0.0.0OGGRU_LINUX.X64_251018.0830
    Linux, x64, 64bit (optimized), Generic on Oct 18 2025 14:00:54

Copyright (C) 1995, 2025, Oracle and/or its affiliates. All rights reserved.

                    Starting at 2025-12-12 16:58:47
***********************************************************************

2025-12-12 16:58:47  INFO    OGG-15052  Using Java class path: /testgg/app/ogg/ogg23ai/ogg23aidaa_MA//ggjava/ggjava.jar:/testgg/app/ogg/ogg23ai/ogg23aidaa_DEPLOYMENT/etc/conf/ogg:/u01/app/ogg/ogg
23ai/ogg23aidaa_MA/:/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/onelake/*:/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/parquet_1.13.
1/*.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
        at oracle.goldengate.eventhandler.parquet.GGParquetWriter.init(GGParquetWriter.java:72)
        at oracle.goldengate.eventhandler.parquet.ParquetEventHandler.init(ParquetEventHandler.java:219)
        at oracle.goldengate.datasource.eventhandler.EventHandlerFramework.initEventHandler(EventHandlerFramework.java:168)
        at oracle.goldengate.datasource.eventhandler.EventHandlerFramework.init(EventHandlerFramework.java:58)
        at oracle.goldengate.handler.filewriter.FileWriterHandlerEO.init(FileWriterHandlerEO.java:627)
        at oracle.goldengate.datasource.AbstractDataSource.addDataSourceListener(AbstractDataSource.java:602)
        at oracle.goldengate.datasource.factory.DataSourceFactory.getDataSource(DataSourceFactory.java:164)
        at oracle.goldengate.datasource.UserExitDataSourceLauncher.<init>(UserExitDataSourceLauncher.java:45)
        at oracle.goldengate.datasource.UserExitMain.main(UserExitMain.java:109)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
        ... 9 more

2025-12-12 16:58:48  WARNING OGG-00869  java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration.

Source Context :
  SourceFile              : [/ade/aime_phxdbifa87/oggcore/OpenSys/src/gglib/ggdal/Adapter/Java/JavaAdapter.cpp]
  SourceMethod            : [HandleJavaException]
  SourceLine              : [350]
  ThreadBacktrace         : [19] elements
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libgglog.so(CMessageContext::AddThreadContext())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libgglog.so(_MSG_String(CSourceContext*, int, char const*, CMessageFactory::MessageDisposition))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libggjava.so()]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/../lib/libggjava.so(ggs::gglib::ggdal::CJavaAdapter::Open())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::ggdal::CDALAdapter::Open())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(GenericImpl::Open())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(GenericImpl::GetWriter())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(GenericImpl::GetGenericDBType())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::er::ReplicatContext::ReplicatContext(ggs::gglib::ggapp::ReplicationContextParams const&, bool, ggs::gglib::
ggmetadata::MetadataContext*, ggs::er::ReplicatContext::LogBSNManager*))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::er::ReplicatContext::createReplicatContext(ggs::gglib::ggapp::ReplicationContextParams const&, ggs::gglib::
ggdatasource::DataSourceParams const&, ggs::gglib::ggmetadata::MetadataContext*))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat()]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain())]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**))]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(main)]
                          : [/lib64/libc.so.6()]
                          : [/lib64/libc.so.6(__libc_start_main)]
                          : [/testgg/app/ogg/ogg23ai/ogg23aidaa_MA/bin/replicat(_start)]

2025-12-12 16:58:48  ERROR   OGG-15051  Java or JNI exception:
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration.

2025-12-12 16:58:48  ERROR   OGG-01668  PROCESS ABENDING.

That one made me pause for a second. The target wasn’t HDFS. I wasn’t running Hadoop. This was Microsoft Fabric. But here’s the catch. Parquet depends on Hadoop, even when you’re not using Hadoop directly. Some core Parquet classes expect Hadoop configuration classes to exist. No Hadoop libs, no Parquet writer.

So back to the DependencyDownloader I went, this time running the Hadoop script. More downloads, more JARs, more waiting.

[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$ cd /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader

[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$ ./hadoop.sh 3.4.2
openjdk version "17.0.18" 2026-01-20 LTS
Java is installed.
Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven is accessible.
Root Configuration Script
INFO: This is the Maven binary [../../ggjava/maven-3.9.6/bin/mvn].
INFO: This is the location of the settings.xml file [./docs/settings_np.xml].
INFO: This is the location of the toolchains.xml file [./docs/toolchains.xml].
INFO: The dependencies will be written to the following directory[../dependencies/hadoop_3.4.2].
[INFO] ---------------< oracle.goldengate:dependencyDownloader >---------------
[INFO] Building dependencyDownloader 1.0
[INFO]   from pom_central_v2.xml
[INFO] --------------------------------[ pom ]---------------------------------
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-client/3.4.2/hadoop-client-3.4.2.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-client/3.4.2/hadoop-client-3.4.2.pom (11 kB at 58 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-project-dist/3.4.2/hadoop-project-dist-3.4.2.pom
Downloaded from central: https://repo.maven.apach
..........
................
.....................
[INFO] Copying netty-codec-stomp-4.1.118.Final.jar to /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/hadoop_3.4.2/netty-codec-stomp-4.1.118.Final.jar
[INFO] Copying dnsjava-3.6.1.jar to /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/hadoop_3.4.2/dnsjava-3.6.1.jar
[INFO] Copying netty-transport-native-unix-common-4.1.118.Final.jar to /testgg/app/ogg/ogg23ai/ogg23aidaa_MA/opt/DependencyDownloader/dependencies/hadoop_3.4.2/netty-transport-native-unix-common-4.1.118.Final.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  7.627 s
[INFO] Finished at: 2025-12-12T18:02:30-05:00
[INFO] ------------------------------------------------------------------------
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$
[oggadmin@D-ADON-01-CC-VM DependencyDownloader]$

Once that finished, I restarted the Replicat again. No big expectations this time. This time it stayed up.

OGG (http://192.168.10.10:9001 OGG23AIDAA as BigData@) 2> START REPLICAT FATD11D
2025-12-12T23:07:54Z  INFO    OGG-00975  Replicat group FATD11D starting.
2025-12-12T23:07:54Z  INFO    OGG-15445  Replicat group FATD11D started.

OGG (http://192.168.10.10:9001 OGG23AIDAA as BigData@) 3> info FATD11D
No Extract groups exist.

Replicat   FATD11D    Last Started 2025-12-12 18:07   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:02 ago)
Process ID           47420
Log Read Checkpoint  File dirdat/i10000000001
                     First Record  RBA 167873
Encryption Profile   LocalWallet

The big takeaway from this whole exercise is pretty simple. When you’re doing Oracle database to Microsoft Azure Fabric using GoldenGate 23ai DAA, the tricky part is not Oracle, and not Fabric. It’s the middle layer. Parquet is the bridge, and Parquet brings Hadoop with it, whether you like it or not. If those dependencies aren’t staged correctly, the OGG processes will start, smile at you, and then quietly fall over 😀

Once everything was in place, though, the setup worked exactly the way it should. A clean path from a legacy Oracle 11g database into a modern Microsoft Fabric Lakehouse. No magic. Just the right pieces, in the right order… and a bit of patience

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: fatdba, goldengate, repli, replication, troubleshooting | Leave a Comment »

When GoldenGate decides to throw OGG-02912 just before New Years Eve.

Posted by FatDBA on December 31, 2025

Happy New Year! 🎉
Because nothing says “end of the year” like firing up a test lab, breaking a GoldenGate extract, and realizing that Oracle 11g still has unfinished business with you. I spent the last hours of the year chasing an error that politely reminded me: old databases never really retire — they just wait 😀

Nothing fancy. Just a simple setup. Or at least… that’s what I thought at the beginning.

The goal was straightforward: capture data from an Oracle 11gR2 (11.2.0.4) database using Oracle GoldenGate Integrated Extract, running from a centralized GoldenGate extract hub using remote integrated capture with a newer GoldenGate build (21c)

I’ve done this dozens of times with 12c and above. 11g though… well, 11g always has a way of reminding you that it’s old, but not that old 🙂

The Setup (Quick Context)

Source database: Oracle 11g Enterprise Edition 11.2.0.4 (OEL 7.x 64)
Capture mode: Integrated Extract
GoldenGate binaries: 21.x
Capture host: centralized GoldenGate extract hub using remote integrated capture(Linux OEL 8.X 64)
Simple test table, simple inserts.

Everything registered fine. Extract attached to LogMiner. No privilege errors. No Streams issues.
So far, so good. And then…

The Symptom :
Out of nowhere, the extract stopped. Running info all showed this … Opening the report file made it very clear this wasn’t a generic failure. Right at the bottom:

GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 31> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     STOPPED     EXT11G      00:00:00      00:11:14




GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 32> view report EXT11G
2025-12-30 15:41:53  INFO    OGG-06604  Connected to database DB11G, CPU info: CPU Count 1, CPU Core Count 1, CPU Socket Count 1.

2025-12-30 15:41:53  INFO    OGG-06618  Database DB11G Platform: Linux x86 64-bit.

2025-12-30 15:41:57  INFO    OGG-02248  Logmining server DDL filtering enabled.

2025-12-30 15:41:59  INFO    OGG-02068  Integrated capture successfully attached to logmining server OGG$CAP_EXT11G using OGGCapture API.

2025-12-30 15:41:59  INFO    OGG-02089  Source redo compatibility version is: 11.2.0.4.0.

2025-12-30 15:41:59  INFO    OGG-15446  Extract configured as  resource group.

2025-12-30 15:41:59  INFO    OGG-02086  Integrated Dictionary will be used.

2025-12-30 15:41:59  INFO    OGG-02710  Database metadata information is obtained from source database.

2025-12-30 15:41:59  WARNING OGG-02901  Replication of UDT and ANYDATA from redo logs is not supported with the Oracle compatible parameter setting. Using fetch instead.

2025-12-30 15:41:59  INFO    OGG-02776  Native data capture is enabled for Oracle NUMBER data type.

2025-12-30 15:41:59  INFO    OGG-01971  The previous message, 'INFO OGG-02776', repeated 1 times.

Source Context :
  SourceModule            : [ggdb.ora.ddl]
  SourceID                : [../gglib/ggdbora/ddlora.c]
  SourceMethod            : [metadata_from_logminer]
  SourceLine              : [1270]
  ThreadBacktrace         : [15] elements
                          : [/home/gg_adminremote/ogghome_21c/libgglog.so(CMessageContext::AddThreadContext())]
                          : [/home/gg_adminremote/ogghome_21c/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))]
                          : [/home/gg_adminremote/ogghome_21c/libgglog.so(_MSG_(CSourceContext*, int, CMessageFactory::MessageDisposition))]
                          : [/home/gg_adminremote/ogghome_21c/extract()]
                          : [/home/gg_adminremote/ogghome_21c/extract(RedoAPI::createInstance(ggs::gglib::ggdatasource::DataSource*, ggs::gglib::ggapp::ReplicationContext*))]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::er::OraTranLogDataSource::setup())]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::ggapp::ReplicationContext::establishStartPoints(char, ggs::gglib::ggdatasource::DataSourceParams const&))]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::ggapp::ReplicationContext::initializeDataSources(ggs::gglib::ggdatasource::DataSourceParams&))]
                          : [/home/gg_adminremote/ogghome_21c/extract()]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::MultiThreading::MainThread::ExecMain())]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*))]
                          : [/home/gg_adminremote/ogghome_21c/extract(ggs::gglib::MultiThreading::MainThread::Run(int, char**))]
                          : [/home/gg_adminremote/ogghome_21c/extract(main)]
                          : [/lib64/libc.so.6(__libc_start_main)]
                          : [/home/gg_adminremote/ogghome_21c/extract()]

2025-12-30 15:41:59  ERROR   OGG-02912  Patch 17030189 is required on your Oracle mining database for trail format RELEASE 12.2 or later.

2025-12-30 15:41:59  ERROR   OGG-01668  PROCESS ABENDING.

Understanding What Actually Went Wrong
This is one of those GoldenGate errors that looks scary but is actually very precise once you read it slowly. GoldenGate was telling me: “Hey, I’m trying to write trail records using a 12.2+ trail format, but your 11g database can’t mine redo in that format unless you patch it.”

Specifically: Integrated Extract defaults to newer trail formats. Oracle 11g cannot mine 12.2+ trail formats, unless you apply Patch 17030189 (logminer GG Dictionary support: missing attributes) on the 11g database home. And in most environments… patching 11g is not happening.

Here’s the subtle trap: You install GoldenGate 19c / 21c and configured Integrated Extract. You don’t explicitly set a trail format — GoldenGate assumes: “Modern source, modern trail”. But 11g is not modern, even 11.2.0.4, the best version of 11g, still has limits. So GoldenGate happily starts… and then politely crashes.

The Options on the Table .. At this point, there were only three real choices:

Option 1: Patch the 11g database “Apply Patch 17030189 to the database home”.

Pros: Allows newer trail formats

Cons: Risky, operationally heavy, often blocked by policy, definitely not “lab friendly”

Option 2: Force an Older Trail Format. Tell GoldenGate to behave like it’s 2012 again.

Pros: No database patching, fully supported, safe and predictable

Cons: You give up newer trail features (more on that later). for me, option 2 was the obvious choice and in fact for many where client don’t want to change anything on the 11g database as its old and so far stable or patching will require additional planning and change requests and other operational risks etc.

Option 3: Using a workaround by using in-build OGG script prvtlmpg.plb.

Pros: Simple, straight forward, fast.

Cons: In production environments, this workaround introduces additional operational and audit risk, requires database-side intervention, and often triggers formal change and approval processes. It is particularly inconvenient in remote or centralized GoldenGate architectures, where GoldenGate is intentionally decoupled from the source database host. Since it alters mining-side database behavior, it is less clean and less maintainable than applying the official Oracle patch or avoiding the issue altogether by enforcing a compatible trail format.

The Fix That I Used.
The fix itself was simple, but order matters. You stop the impacted extract, delete the existing trail (trail headers stores the format), update the extract param file with a new flagh/parameter “FORMAT RELEASE “. Recreate the trail file and start your extract.

GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 36> DELETE EXTTRAIL ./dirdat/e1
Deleting extract trail ./dirdat/e1 for Extract group EXT11G.



GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 38>  ADD EXTTRAIL ./dirdat/e1, EXTRACT ext11g
EXTTRAIL added.



GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 41> view params EXT11G

EXTRACT ext11g
USERIDALIAS ogg_11g
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 512)

EXTTRAIL ./dirdat/e1, FORMAT RELEASE 11.2
DISCARDFILE ./dirrpt/ext11g.dsc, APPEND, MEGABYTES 50
REPORTCOUNT EVERY 30 MINUTES, RATE

TABLE ELEVENGTOFABRIC.TESTREPLTAB;






GGSCI (postgrequebec.quebdomain as ggreplication@DB11G) 52> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXT11G      00:00:03      00:00:06

]The Moment of Truth —> Lag was moving. SCNs were advancing. Trail RBAs were increasing. No more abends. No more patch complaints. That’s when you know you’re done.

Why FORMAT RELEASE 11.2 Is Safe (and When It’s Not) ? Let’s be clear, this isn’t a hack. This is documented, supported behavior.

What You Lose —> Newer GoldenGate metadata, Some advanced DDL capture details, Newer datatype handling

What You Keep —> Full DML capture (INSERT / UPDATE / DELETE), Stability, Compatibility
Your sanity

For 11g source systems, especially ones you don’t want to touch, this is the correct trade off.

Final Thoughts
This issue is a perfect example of why GoldenGate work is never just about syntax. Everything was “correct”: Privileges, Integrated capture, Registration ..but one missing line quietly broke the entire pipeline. If you’re running 11g with modern GoldenGate, remember this: Old database. Old trail format or be ready to patch.

And honestly… forcing FORMAT RELEASE 11.2 was the smarter move in this case where we totally avoided any modifications on the source system and continue remote extraction.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: Bugs, golden gate, oracle, renaps, troubleshooting | Leave a Comment »

When Linux Swaps Away My Sleep – MySQL, RHEL8, and the Curious Case of High Swap Usage

Posted by FatDBA on December 12, 2025

I remember an old instance where I’d got an alert that one of production MySQL servers had suddenly gone sluggish after moved to RHEL 8 from RHEL7. On checking, I found something odd … the system was consuming swap heavily, even though there was plenty of physical memory free.

Someone who did the first time deployment years before, left THP as enabled and with default swapiness … but this setting that had worked perfectly for years on RHEL 7, but now, after the upgrade to RHEL 8.10, the behavior was completely different.

This post is about how that small OS level change turned into a real performance headache, and what we found after some deep digging.

The server in question was a MySQL 8.0.43 instance running on a VMware VM with 16 CPUs and 64 GB RAM. When the issue began, users complained that the database was freezing randomly, and monitoring tools were throwing high load average and slow query alerts.

Let’s take a quick look at the environment … It was a pretty decent VM, nothing under sized.

$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.10 (Ootpa)

$ uname -r
4.18.0-553.82.1.el8_10.x86_64

$ uptime
11:20:24 up 3 days, 10:57,  2 users,  load average: 4.34, 3.15, 3.63

$ grep ^CPU\(s\) sos_commands/processor/lscpu
CPU(s): 16

When I pulled the SAR data for that morning, the pattern was clear ..There were long stretches on CPU where %iowait spiked above 20-25%, and load averages crossed 400+ during peak time! The 09:50 slot looked particularly suspicious .. load average jumped to 464 and remained high for several minutes.

09:00:01 %usr=26.08  %iowait=22.78  %idle=46.67
09:40:01 %usr=29.04  %iowait=24.43  %idle=40.11
09:50:01 %usr=7.55   %iowait=10.07  %idle=80.26
10:00:01 %usr=38.53  %iowait=19.54  %idle=35.32

Here’s what the memory and swap stats looked like:

# Memory Utilization
%memused ≈ 99.3%
Free memory ≈ 400 MB (on a 64 GB box)
Swap usage ≈ 85% average, hit 100% at 09:50 AM

That was confusing.. MySQL was not leaking memory, and there was still >10 GB available for cache and buffers. The system was clearly pushing pages to swap even though it didn’t need to. That was the turning point in the investigation.

At the same time, the reporting agent started reporting MySQL timeouts:

 09:44:09 [mysql] read tcp xxx.xx.xx.xx:xxx->xxx.xxx.xx.xx:xxxx: i/o timeout
 09:44:14 [mysql] read tcp xx.xx.xx.xxxx:xxx->xx.xx.xx.xx.xx:xxx: i/o timeout

And the system kernel logs showed the familiar horror lines for every DBA .. MySQL threads were being stalled by the OS. This aligned perfectly with the time when swap usage peaked.

 09:45:34 kernel: INFO: task mysqld:5352 blocked for more than 120 seconds.
 09:45:34 kernel: INFO: task ib_pg_flush_co:9435 blocked for more than 120 seconds.
 09:45:34 kernel: INFO: task connection:10137 blocked for more than 120 seconds.

I double-checked the swappiness configuration:

$ cat /proc/sys/vm/swappiness
1

So theoretically, swap usage should have been minimal. But the system was still paging aggressively. Then I checked the cgroup configuration (a trick I learned from a Red Hat note) .. And there it was more than 115 cgroups still using the default value of 60! … In RHEL 8, memory management moved more toward cgroup v2, which isolates memory parameters by control group.

So even if /proc/sys/vm/swappiness is set to 1, processes inside those cgroups can still follow their own default value (60) and this explained why the system was behaving like swappiness=60 even though the global value was 1.

$ find /sys/fs/cgroup/memory/ -name *swappiness -exec cat {} \; | uniq -c
      1 1
    115 60

In RHEL 8, memory management moved more toward cgroup v2, which isolates memory parameters by control group. So even if /proc/sys/vm/swappiness is set to 1, processes inside those cgroups can still follow their own default value (60). This explained why the system was behaving like swappiness=60 even though the global value was 1.

Once the root cause was identified, the fix was straightforward — Enforced global swapiness across CGroups

Add this to /etc/sysctl.conf:

vm.force_cgroup_v2_swappiness = 1

Then reload:
sysctl -p

This forces the kernel to apply the global swappiness value to all cgroups, ensuring consistent behavior. Next, we handled THP that is always expected to cause intermittent fragmentation and stalls in memory intensive workloads like MySQL, Oracle, PostgreSQL and even in non RDBMSs like Cassandra etc., we disabled the transparent huge pages and rebooted the host.

In short what happened and was the root cause.

RHEL8 introduced a change in how swappiness interacts with cgroups.
The old /proc/sys/vm/swappiness setting no longer applies universally.
Unless explicitly forced, MySQL’s cgroup keeps the default swappiness (60).
Combined with THP and background I/O, this created severe page cache churn.

So the OS upgrade, not MySQL, was the real root cause.

Note: https://access.redhat.com/solutions/6785021

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: mysql, opertingsystem, optimization, OS, performance, rhel, troubleshooting, Tuning | Leave a Comment »

Oracle 23ai Tip: Use SESSION_EXIT_ON_PACKAGE_STATE_ERROR to Prevent Silent Data Corruption

Posted by FatDBA on December 28, 2024

Oracle Database 23ai introduces a new parameter, SESSION_EXIT_ON_PACKAGE_STATE_ERROR, designed to enhance session management and prevent potential data corruption by enforcing a hard session exit when the session state becomes invalidated.

Why SESSION_EXIT_ON_PACKAGE_STATE_ERROR Matters ?

In typical Oracle database environments, stateful PL/SQL packages, MLE modules, or environments may be modified while sessions actively use them. This can lead to errors such as:

ORA-04068: Can occur when a PL/SQL package body is recompiled, invalidating the session state.
ORA-4106 / ORA-4107: Can be raisrd when an MLE module or environment is altered via DDL, invalidating the session.

By default, the session remains active and throws an error when the invalid package or module is called. However, many applications may not properly handle these errors, leading to silent data corruption or unexpected behavior.

The SESSION_EXIT_ON_PACKAGE_STATE_ERROR parameter mitigates this risk by forcing an immediate session exit instead of raising an error.

Some of the benefits of using the parameter.

Prevents Data Corruption: By terminating sessions with invalid state, the risk of silent data corruption is reduced.
Simplifies Error Handling: Many applications are better at handling session disconnects than catching specific errors like ORA-04068.
Consistency Across Sessions: Ensures that all sessions dealing with modified packages or MLE modules are treated consistently, minimizing inconsistencies.

How SESSION_EXIT_ON_PACKAGE_STATE_ERROR Works

When SESSION_EXIT_ON_PACKAGE_STATE_ERROR is set to TRUE, the following behavior is enforced:

PL/SQL Package Modification:
- If a stateful PL/SQL package is modified, any active session that tries to invoke the package receives ORA-04068. With this parameter set to TRUE, the session exits immediately instead of raising the error.
MLE Module or Environment Modification:
- If an MLE module or environment is modified via DDL, active sessions receive ORA-4106 or ORA-4107. With SESSION_EXIT_ON_PACKAGE_STATE_ERROR = TRUE, these sessions are forcibly disconnected.
Application Handling:
- Most applications are designed to capture session disconnects and reestablish connections, streamlining recovery from session invalidation.

Use Cases

High-Availability Environments: In systems where continuous uptime is critical, preventing data corruption is paramount.
Distributed Applications: Applications spread across multiple environments that frequently modify PL/SQL packages or MLE modules benefit from session termination to maintain data integrity.
Oracle RAC Deployments: Different instances in an Oracle RAC environment can independently configure this parameter, allowing fine-grained control based on workload requirements.

Configuring SESSION_EXIT_ON_PACKAGE_STATE_ERROR:

Examples:
ALTER SYSTEM SET SESSION_EXIT_ON_PACKAGE_STATE_ERROR = TRUE; ALTER SESSION SET SESSION_EXIT_ON_PACKAGE_STATE_ERROR = TRUE; ALTER SYSTEM SET SESSION_EXIT_ON_PACKAGE_STATE_ERROR = TRUE SCOPE = SPFILE;

Considerations

Default Behavior: By default, this parameter is set to FALSE, meaning sessions will raise errors rather than exit.
Testing and Validation: Test this configuration in lower environments to ensure application compatibility.
Session Management: Monitor session disconnects to ensure that forced exits do not disrupt critical workflows.

Conclusion

SESSION_EXIT_ON_PACKAGE_STATE_ERROR is a powerful new feature in Oracle Database 23ai that enhances session management by enforcing session termination on package or module state invalidation. By using this parameter, Oracle environments can significantly reduce the risk of data corruption and streamline error handling processes across diverse applications. Whether managing PL/SQL packages or MLE modules, this parameter offers greater control and reliability for database administrators and developers both.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: 23ai, Database, new feature, oracle, oracle-database, SQL, troubleshooting | Leave a Comment »

Data Pump Troubleshooting Tips – My favorite 6

Posted by FatDBA on October 26, 2024

There are numerous utilities, options, and methods available for migrating and moving data between Oracle databases, yet Oracle Data Pump remains one of the most widely used tools. A significant number of DBAs are very comfortable with Data Pump, as it has been a trusted utility for a long time (originally as exp and imp). Its stability, user-friendliness, and robust capabilities make it a top choice for handling large data migrations, backup, and restore operations.

However, one area where DBAs still often face challenges is troubleshooting when issues arise. When a Data Pump job fails, performs poorly, or behaves unexpectedly, it can be unclear where to start, what logs to review, or what checks to perform. Many find it difficult to pinpoint the source of the problem and make adjustments to optimize performance or resolve issues.

Today’s post focuses on troubleshooting Data Pump performance and functionality issues, sharing the steps I typically follow when diagnosing problems. We’ll cover key areas to investigate, like log file analysis, parameter tuning, network considerations, and common bottlenecks. These steps aim to provide a practical guide to understanding and resolving Data Pump issues and optimizing your data movement processes.

Option 1: Generate an AWR Report to Assess Database Performance

Start by generating an AWR (Automatic Workload Repository) report to gain insight into the database’s overall performance during the relevant period. Adjusting the AWR snapshot interval to 15 minutes is recommended for a more granular view. This approach reduces the chances of averaging out short performance spikes, allowing you to capture transient issues more effectively.

exec dbms_workload_repository.modify_snapshot_settings(null, 15);
exec dbms_workload_repository.create_snapshot;

Option 2: Enable SQL Trace for Data Pump Processes or Specific SQL IDs
Optionally, you can enable SQL trace for the Data Pump processes (dm for the master process and dw for worker processes) or for specific SQL statements by SQL ID. This will help isolate SQL-level performance issues affecting the Data Pump job.

alter system set events 'sql_trace {process: pname = dw | process: pname = dm} level=8';
alter system set events 'sql_trace[SQL: 8krc88r46raff]';

Option 3: Run Data Pump Job with Detailed Trace Enabled
For enhanced tracing, run the Data Pump job with additional trace options, which provide more comprehensive output. Including metrics=yes, logtime=all, and trace=1FF0300 in the command enables detailed logging of both timing and activity metrics. Tracing can be enabled by specifying an 7 digit hexadecimal mask in the TRACE parameter of Export DataPump (expdp) or Import DataPump (impdp). The first three digits enable tracing for a specific Data Pump component, while the last four digits are usually: 0300.

expdp ... metrics=yes logtime=all trace=1FF0300
impdp ... metrics=yes logtime=all trace=1FF0300

Data Pump tracing can also be started with a line with EVENT 39089 in the initialization parameter file. This method should only be used to trace the Data Pump calls in an early state, e.g. if details are needed about the DBMS_DATAPUMP.OPEN API call. Trace level 0x300 will trace all Data Pump client processes.

-- Enable event
ALTER SYSTEM SET EVENTS = '39089 trace name context forever, level 0x300' ;
-- Disable event
ALTER SYSTEM SET EVENTS = '39089 trace name context off' ;

Option 4: Review Data Pump Trace Files
Locate and analyze the Data Pump trace files stored in the Oracle trace directory. The master control process file names typically contain *dm*, while worker process files include *dw*. These files provide insights into the processes, job details, and potential error sources during execution.

Option 5: Activate SQL_TRACE on specific Data Pump process with higher trace level.
Lets assume we see that the Data Pump Master process (DM00) has SID: 143 and serial#: 50 and the Data Pump Worker process (DW01) has SID: 150 and serial#: 17. These details can be used to activate SQL tracing in SQL*Plus with DBMS_SYSTEM.SET_EV, e.g.:

-- In SQL*Plus, activate SQL tracing with DBMS_SYSTEM and SID/SERIAL#  
-- Syntax: DBMS_SYSTEM.SET_EV([SID],[SERIAL#],[EVENT],[LEVEL],'') 

-- Example to SQL_TRACE Worker process with level 4 (Bind values):   
execute sys.dbms_system.set_ev(150,17,10046,4,''); 

-- and stop tracing: 
execute sys.dbms_system.set_ev(150,17,10046,0,'');  


-- Example to SQL_TRACE Master Control process with level 8 (Waits):  
execute sys.dbms_system.set_ev(143,50,10046,8,'');  

-- and stop tracing:  
execute sys.dbms_system.set_ev(143,50,10046,0,'');

Option 6: Use the Data Pump Log Analyzer

I’ve personally used the Data Pump Log Analyzer for some time and have found it to be incredibly user-friendly, making it simple to understand the performance and runtime statistics of Data Pump jobs. This tool is highly effective in streamlining troubleshooting efforts, quickly identifying bottlenecks, and delivering clear insights into job performance. It’s a fantastic addition to a DBA’s toolkit and provides valuable capabilities that aren’t typically found in standard scripts. The Data Pump Log Analyzer has been tested with Data Pump log files across various database versions, including those generated by Data Pump client (expdp/impdp), Zero Downtime Migration (ZDM), OCI Database Migration Service (DMS), and Data Pump API (DBMS_DATAPUMP).The Data Pump Log Analyzer is a Python-based command-line utility designed for in-depth analysis of Oracle Data Pump log files. It goes beyond basic log review by offering detailed, structured insights into key performance metrics, errors, and process details. This tool can be particularly useful for DBAs needing a quick and comprehensive view of Data Pump job behavior, helping with issue diagnosis and performance optimization. Link to read and download or a more detailed guide on it’s usage Link

With the Data Pump Log Analyzer, you get:

Detailed Operations and Processing Metrics: Granular information on data operations for pinpoint analysis.
Error and ORA- Code Analysis: Summaries and explanations of encountered errors for easier troubleshooting.
Object-Type Breakdown and Processing Times: Insight into performance by object type, aiding in performance tuning.
Data Pump Worker Performance: Analyzes individual worker processes for any lagging tasks.
Summarized Schema, Table, Partition Details: Overview of data handled by each schema, table, or partition.
Instance-Based Data Analysis (for Oracle 21c and later): Statistics by instance for performance evaluation in multitenant setups.
Flexible Output Options: Filter, sort, and export analysis results to text or HTML for efficient sharing and record-keeping.

One below is with basic syntax to get operational details.

$ python3 dpla.py import.log
========================
Data Pump Log Analyzer
========================
...
Operation Details
~~~~~~~~~~~~~~~~~
Operation: Import
Data Pump Version: 19.23.0.0.0
DB Info: Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0
Job Name: FATDBAJOB1
Status: COMPLETED
 Processing: -
Errors: 1301
 ORA- Messages: 1267
Start Time: 2024-08-21 01:30:45
End Time: 2024-08-21 11:43:11
Runtime: 35:03:06
Data Processing
~~~~~~~~~~~~~~~
Parallel Workers: 104
Schemas: 47
Objects: 224718
Data Objects: 188131
Overall Size: 19.11 TB

Use flag ‘-e’ to view all ORA- messages encountered during the Data Pump operation, or optionally you can filter our specific errors as well i.e. ‘-e ORA-39082 ORA-31684′.

python3 dpla.py import.log -e
========================
Data Pump Log Analyzer
========================
...
ORA- MESSAGES DETAILS
~~~~~~~~~~~~~~~~~~~~~
(sorted by count):
Message Count
--------------------------------------------------------------------------------------------------- ---------
ORA-39346: data loss in character set conversion for object COMMENT 919
ORA-39082: Object type PACKAGE BODY created with compilation warnings 136
ORA-39346: data loss in character set conversion for object PACKAGE_BODY 54
ORA-39082: Object type TRIGGER created with compilation warnings 36
ORA-39082: Object type PROCEDURE created with compilation warnings 29
ORA-31684: Object type USER already exists 27
ORA-39111: Dependent object type PASSWORD_HISTORY skipped, base object type USER already exists 27
ORA-39346: data loss in character set conversion for object PACKAGE 18
ORA-39082: Object type PACKAGE created with compilation warnings 10
ORA-39082: Object type VIEW created with compilation warnings 7
ORA-39346: data loss in character set conversion for object PROCEDURE 2
ORA-39082: Object type FUNCTION created with compilation warnings 2
--------------------------------------------------------------------------------------------------- ---------
Total 1267
--------------------------------------------------------------------------------------------------- ---------

Use flag ‘-o’ to see details about which types of database objects were involved in the Data Pump operation.

python3 dpla.py import.log -o
========================
Data Pump Log Analyzer
========================
...
Object                                  Count      Seconds      Workers     Duration
----------------------------------      ---------- -----------  ----------- ------------
SCHEMA_EXPORT/TABLE/TABLE_DATA             188296    6759219         128       6759219
CONSTRAINT                                    767      37253           1         37253
TABLE                                        2112       3225          51           156
COMMENT                                     26442        639         128            18
PACKAGE_BODY                                  197        125         128             5
OBJECT_GRANT                                 5279         25           1            25
TYPE                                          270          6           1             6
ALTER_PROCEDURE                               149          5           2             3
ALTER_PACKAGE_SPEC                            208          4           3             2
PACKAGE                                       208          3           3             1
PROCEDURE                                     149          2           2             1

...
---------------------------------- ---------- ----------- ----------- ------------
Total 224755 6800515 128 6796697
---------------------------------- ---------- ----------- ----------- ------------

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: data pump, Database, Errors, migration, oracle, oracle-database, performance, SQL, sql-server, tracing, troubleshooting, Tuning | Leave a Comment »

Addressing Stuck Undo Segments : How to Safely Drop Problematic Undo Segments

Posted by FatDBA on October 14, 2024

Hi All,

This post discusses an intriguing issue we encountered recently on a 19.22 Oracle database following a CDB restart. After the restart, we observed a peculiar problem where all sessions performing DDL commands were getting locked and hung at the PDB level. This behavior was affecting the entire database, essentially halting all DDL operations.

During our analysis, we discovered that the SMON process was waiting on a latch, leading to high CPU resource consumption. Furthermore, we noticed that the MMON process was blocking SMON, causing additional delays. The alert log revealed multiple error messages, which further complicated the diagnosis.

This issue required a deep dive into Oracle’s background processes and system-level contention to resolve, as it was causing a significant disruption to database operations.

-- Fragments from alert log, smon/mmon process logs and standard diag traces.
kcbzib: encounter logical error ORA-1410, try re-reading from other mirror..
TRCMIR:kcf_reread     :start:3722012:0:+DATA/CDBMONKEY/AA82C21DD440449FE053B4146E0AA55B/DATAFILE/tablespace_test_dataaa.xxx.xxxx
TRCMIR:kcf_reread     :done :3722012:0:+DATA/CDBMONKEY/AA82C21DD440449FE053B4146E0AA55B/DATAFILE/tablespace_test_dataaa.xxx.xxxxx
kcbzibmlt: encounter logical error ORA-1410, try re-reading from other mirror..

---> SMON: Parallel transaction recovery tried
30317 error message received from server=1.70(P01Y) qref:0x8de103cf0 qrser:5121 qrseq:3 mh:0x97fdf9460
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 30317

*** 2024-08-19T20:38:23.297997-04:00 (PWS1E(3))
Parallel Transaction recovery caught exception 30319
Parallel Transaction recovery caught error 30319

*** 2024-08-19T20:38:50.613855-04:00 (PWS1E(3))
30317 error message received from server=1.57(P01L) qref:0x8de109fe8 qrser:11265 qrseq:3 mh:0x95fccd3c8
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 30317
Parallel Transaction recovery caught exception 30319

TEST1E(3):about to recover undo segment 98 status:6 inst:0
TEST1E(3):mark undo segment 98 as available status:6 ret:1

TEST1E(3):about to recover undo segment 46 status:6 inst:0
TEST1E(3):mark undo segment 46 as available status:6 ret:1

The logs and trace files also highlighted an issue with two specific undo segments, identified by segment numbers 98 and 46, from the UNDO tablespace. Upon further investigation, we found that both segments were in a ‘RECOVERING’ state. What was particularly concerning was that the recovery process for these segments was progressing extremely slowly, with the v$fast_start_transactions view showing an unusually high estimated recovery time.

In fact, based on the progress we monitored, it seemed like the recovery process wasn’t moving forward at all and appeared to be stuck in some kind of loop. This stagnation in recovery added to the overall system’s delay, compounding the performance issues we were already facing. It became clear that this problem was a significant bottleneck in restoring the database to normal operation.

SQL> select * from V$FAST_START_TRANSACTIONS;

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME PARENTUSN PARENTSLT PARENTSEQ XID PXID RCVSERVERS CON_ID
---------- ---------- ---------- ---------------- -------------- --------------- ---------- ---------- ---------- ---------- 
46 46 2313064 RECOVERING 505 24992423 77 5586 0 0 0 10001000684B2300 0000000000000000 1 0
98 25 1352150 RECOVERING 0 226231 78 5586 0 0 0 30001900D6A11400 0000000000000000 1 0
	

SQL> SELECT segment_name, tablespace_name  FROM dba_rollback_segs  WHERE segment_id IN (98, 46);

SEGMENT_NAME		       TABLESPACE_NAME
------------------------------ ------------------------------
_SYSSMU46_5249279471$	       UNDOTEST1
_SYSSMU98_5249279471$	       UNDOTEST1

We attempted to take the segments offline and ultimately drop them, as they were associated with a materialized view (MV) refresh and a bulk insert statement. These operations were part of an ad-hoc activity, so it was acceptable for them to be missed. However, despite our efforts, the segments remained in a ‘PARTLY AVAILABLE’ state, leaving us with no option to drop or take them offline. This left us in a situation where we were essentially stuck, unable to proceed with dropping the segments or the associated tablespace. The inability to release these segments further complicated our recovery efforts.

We’d even checked the status of the those two undo segments using base table x$ktuxe and the KTUXESTA (Status) was coming as ‘DEAD’, means the transaction has failed but is still holding resources and that gave ius more confidence about what happened under the hood.

SQL> select min(sample_time), max(sample_time), sql_id, xid, count(1) from dba_hist_active_sess_history 
where xid in ('10001000684B2300','30001900D6A11400') group by sql_id, xid;

MIN(SAMPLE_TIME) MAX(SAMPLE_TIME) SQL_ID XID COUNT(1)
--------------------------------------------------------------------------- -------------------
15-SEP-24 01.22.25.446 PM 15-SEP-24 05.51.22.340 PM 30001900D6A11400 3213
15-SEP-24 10.22.46.218 AM 15-SEP-24 01.22.15.440 PM ac5hhandj9fh1 30001980D6A11400 2158 --------------> 
13-SEP-24 08.31.54.374 PM 14-SEP-24 02.53.45.723 AM annqr822no0a1 10001090684B2300 4578 -------------->
14-SEP-24 02.53.55.731 AM 15-SEP-24 05.51.22.340 PM 10001000684B2300 27781

SQL> select sql_id, sql_text from dba_hist_sqltext where sql_id in ('annqr822no0a1','ac5hhandj9fh1o');

SQL_ID SQL_TEXT
------------- --------------------------------------------------------------------------------
annqr822no0a1 INSERT INTO monkey.ah_ah3_xaa_131C (
ac5hhandj9fh1o /* MV_REFRESH (INS) */INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "monkey"."test_


SQL> ALTER ROLLBACK SEGMENT "_SYSSMU46_5249279471$" offline;

Rollback segment altered.

SQL> ALTER ROLLBACK SEGMENT "_SYSSMU98_5249279471$" offline;

Rollback segment altered.


SQL> SELECT segment_name, status, tablespace_name
FROM dba_rollback_segs
WHERE segment_name IN ('_SYSSMU98_5249279471$', '_SYSSMU46_5249279471$');
  2    3
SEGMENT_NAME		       STATUS		TABLESPACE_NAME
------------------------------ ---------------- ------------------------------
_SYSSMU46_5249279471$	       PARTLY AVAILABLE UNDOTEST1
_SYSSMU98_5249279471$	       PARTLY AVAILABLE UNDOTEST1


SQL> SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */ KTUXESTA Status, KTUXECFL Flags FROM x$ktuxe 
WHERE ktuxesta!='INACTIVE' AND ktuxeusn=98;

KTUXEUSN KTUXESLT KTUXESQN STATUS FLAGS
---------- ---------- ---------- ---------------- ------------------------
98 25 1352150 ACTIVE DEAD

SQL> SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */ KTUXESTA Status, KTUXECFL Flags FROM x$ktuxe 
WHERE ktuxesta!='INACTIVE' AND ktuxeusn=46;

KTUXEUSN KTUXESLT KTUXESQN STATUS FLAGS
---------- ---------- ---------- ---------------- ------------------------
46 46 2313064 ACTIVE DEAD

Given that this is a critical production system, we couldn’t afford to wait for a complete recovery of the affected undo segments. To mitigate the issue, we created a new undo tablespace and designated it as the default for the database. This action enabled us to resume normal operations while the recovery of the problematic segments continued in the background.

However, the underlying mystery remains: why are we unable to drop these segments in the production environment? To investigate further, we cloned the production database and set up a test instance. To our surprise, we replicated the same situation, where both segments 46 and 98 appeared again in a ‘PARTLY AVAILABLE’ state, providing no options for us to drop them.

In our exploration, we first enabled the FAST_START_PARALLEL_ROLLBACK parameter, which determines the number of processes that participate in parallel rollback during transaction rollbacks, typically following an instance failure or a large manual rollback. We set this parameter to HIGH, as it significantly accelerates the rollback process for large transactions, particularly in scenarios involving instance failures or extensive operations requiring manual rollback.

Additionally, we experimented with the undocumented parameter _OFFLINE_ROLLBACK_SEGMENTS, which is intended to control the state of rollback segments.
Note: When dealing with hidden or undocumented parameters, it’s crucial to consult with Oracle support or rely on prior experience, as these settings can lead to unforeseen consequences in production environments.

Ran below query to dynamically get alter statements for segments which we need to set offline.

SQL>  select 'ALTER SYSTEM SET "_OFFLINE_ROLLBACK_SEGMENTS"='||listagg(''''||segment_name||'''',',') WITHIN GROUP (ORDER BY segment_name)||' scope=spfile;' from dba_rollback_segs 
where tablespace_name='UNDOTBS1' and status ='NEEDS RECOVERY'; 

Alter System set "_OFFLINE_ROLLBACK_SEGMENTS"="_SYSSMU98_5249279471$" scope=spfile;
Alter System set "_OFFLINE_ROLLBACK_SEGMENTS"="_SYSSMU46_5249279471$" scope=spfile;

Shutdown the database and startup as normal after setting the above parameter. 

shutdown immediate;
startup;

and finally the drop statements. 
SQL> select 'drop rollback segment "'||segment_name||'";' from dba_rollback_segs 
where tablespace_name='UNDOTBS1' and status ='NEEDS RECOVERY';

drop rollback segment "_SYSSMU98_5249279471$";
drop rollback segment "_SYSSMU46_5249279471$";

Issue above two drop rollback segemnts from the dfatabase and bounce the database again anf finally drop the problematic undo tablespace. Do not forget to reset the ‘_OFFLINE_ROLLBACK_SEGMENTS’ parameter and a one more bounce again.

SQL>  shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>  startup;


SQL>  drop tablespace UNDOTEST1;
Tablespace dropped.


SQL>  Alter System reset "_OFFLINE_ROLLBACK_SEGMENTS";
System altered.

SQL>  shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>  startup;

Although it was a lengthy and demanding process involving numerous experiments, the results were ultimately positive. We encountered no errors and successfully dropped the problematic segments, freeing the database from the issues that had plagued it. This experience not only resolved our immediate concerns but also provided valuable insights into managing similar challenges in the future.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: Database, oracle, oracle-database, performance, SQL, sql-server, troubleshooting | Leave a Comment »

Some weirdness with V$DIAG_ALERT_EXT and magical support from Oracle Customer Support.

Posted by FatDBA on December 13, 2023

Hi All,

Recently, one of our custom monitoring scripts, which comprises more than 2000 lines of PL/SQL code covering all possible performance areas of the Oracle database, started behaving strangely on some 19c (19.15.0) databases. Usually, the report completes within 10-12 minutes on databases running version 12.2 and above, but it began taking approximately 30 minutes for a single run, with worst times exceeding 40 minutes. This issue seems to occur specifically on our 19c instances.

During analysis, we identified one of the underlying SQL queries in our script that queries V$DIAG_ALERT_EXT as the culprit, consuming most of the time and significantly exceeding average execution times. We utilize V$DIAG_ALERT_EXT to access the XML-formatted alert log from the Automatic Diagnostic Repository (ADR), particularly in cases where we can only access the database through integrated development environments (IDEs) like SQL Developer, and direct access to the databases is unavailable.

The SQL queries are straightforward, one of the simple one we used is focusing on capturing INCIDENT_ERROR and ERROR type codes. We have implemented filter conditions to select rows where the message_type is either 2 or 3 and where the originating_timestamp is within the last 8 hours (sysdate – 8/24 represents the current date and time minus 8 hours).

SELECT TO_CHAR(originating_timestamp,'DD/MM/YYYY HH24:MI:SS') AS time, message_text 

FROM v$diag_alert_ext 

WHERE message_type IN (2, 3) AND originating_timestamp > sysdate - 8/24 

ORDER BY RECORD_ID;

Initially, we were perplexed by the problem but were confident that it was specific to 21c databases, as all other versions where we had this script scheduled were working perfectly. After exhausting all optimization attempts and with no insights into what was happening with this specific dynamic view on this database version, we decided to engage Oracle support. They promptly confirmed that the issue was due to a known bug in Oracle 19c database – Bug 33513906, which is resolved in the 19.16.0.0.220719 (July 2022) Database Release Update (DB RU).

Oracle Database support stands out as one of the best I have worked with. They possess extensive knowledge about their products, provide comprehensive private and public documentation, and, in the presence of all diagnostic files, swiftly identify issues. Moreover, they offer both temporary and permanent fixes to problems.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: Database, oracle, performance, troubleshooting | Leave a Comment »

A story of Advance Queues mayhem … ORA-24002 ORA-04063

Posted by FatDBA on October 9, 2023

Hi Everyone,

Todays’ post was about an interesting scenario faced by my colleagues while they were doing an application upgrade and migration. Due to some reasons, the upgrade was supposed to be rolled back, and as a part of the strategy, they had to drop specific schemas touched by the failed upgrade and import a healthy backup taken just before the activity.

The steps were simple and were well discussed and vetted. While doing the import, they started getting errors which reported about the AQ (Oracle Advanced Queues). The error pointed out that the import job had failed to create the AQs with a ‘no data found’ message. Many of the AQs were not imported, and the queues were in an INVALID state.

21-SEPT-23 17:20:40.797: ORA-39083: Object type PROCDEPOBJ:"TESTDBSC"."DPF_OUT" failed to create with error:
ORA-01403: no data found
ORA-01403: no data found
	
Failing sql is:
BEGIN
SYS.DBMS_AQ_IMP_INTERNAL.IMPORT_QUEUE(HEXTORAW('ASHQ18371NDDU1842NXXXXXXXXXXX'),
'AQTST_PTT_OUT','AQ$_AQTST_PTT_OUT_E',1,0,0,0,0,'exception queue');COMMIT; END;

We have even tried to drop the schema, but failed with errors reporting about missing advance queue tables. Tried to drop the the queue table and force all queues to be stopped and dropped by the system, but landed into all different errors.

SQL> drop user TESTDBSC cascade;
drop user TESTDBSC cascade
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-24002: QUEUE_TABLE TESTDBSC.AQTST_XXX does not exist
ORA-06512: at "SYS.DBMS_AQADM", line 707
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 7735
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 7070
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 7402
ORA-06512: at "SYS.DBMS_AQADM", line 702
ORA-06512: at line 1




begin dbms_aqadm.stop_queue (queue_name => 'AQ$_AQTST_PTT_IN_E', enqueue => TRUE, dequeue => FALSE); end;
*
ERROR at line 1:
ORA-04063: TESTDBSC.AQTST_XXX_IN has errors
ORA-06512: at "SYS.DBMS_AQADM", line 788
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8702
ORA-06512: at "SYS.DBMS_AQADM_SYSCALLS", line 926
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8687
ORA-06512: at "SYS.DBMS_AQADM", line 783
ORA-06512: at line 1

During the investigation, we found there were ~ 70 individual queue tables in the db that need to be cleared. There are no entries in DBA_QUEUE_TABLES so we can’t use the AQ api to force drop them. There were few child or orphan objects that still exist and need to be cleared. These child or dependent objects were – Views, Sequences, EVALUATION CONTEXT etc. apart from Queues and Tables.

OBJECT_ID	   OBJECT_NAME	                    OBJECT_TYPE
-----------    ------------------------------   --------------------------------------- 
124238	       AQ$_AQTST_FATDBA_UPD_V	        EVALUATION CONTEXT
124239	       AQ$_AQTST_FATDBA_UPD_N	        SEQUENCE
124259	       AQ$_AQTST_MONKEYINDC_UPD_N	    VIEW

As a breakthrough, we found all of the objects have the prefix AQTST, so it will be little easy to clean them.

SQL> select QUEUE_TABLE from dba_queue_tables where owner='TESTDBSC';

QUEUE_TABLE
------------------------------------------------------------
AQTST_XXXXXXXXXXXX_UPD
AQTST_XXXXXX_IN_UPD
AQTST_XXXXXXXXXXXX_UPD
AQTST_XXXXXX_IN_UPD
AQTST_XXXXXX_OUT_UPD

We first started dropping SEQUENCES, just to lower down number of elements followed by Views and Queue Tables. For Queue tables, we’ve set 10851 debugging event, this is to allow drop command to drop queue tables. This was kind of a last resort to drop queue table if all other options fail. This makes sense as we were manually cleaning AQ’s metadata after this operation.

select 'drop view "'||view_name||'";' as statements from user_views;

drop sequence AQ$_AQTST_XXXXXXXXXXXX_UPD_N;	
drop sequence AQ$_AQTST_XXXXXXXXXXXX_UPD_N;
drop sequence AQ$_AQTST_XXXX_XXXXX_IN_N;
.....
........

ALTER SESSION SET EVENTS '10851 trace name context forever, level 2';

-- Above command went fine, hence moved to the below step
-- Connected with the same schema that we tried to drop earlier and were failing 
select 'drop table "'||table_name|| ' cascade constraints";' as statements from user_tables;

Next, we need to manually start the cleanup.
Note: Take a valid backup before proceeding with the manual cleanup as it involves deleting from SYS objects.

We started the cleanup for remaining AQ objects related to %AQTST% from user # (TESTDBSC). First we deleted all object IDs of orphan objects from system tables i.e. SYSTEM.AQ$_QUEUES, SYS.OBJ$, SYSTEM.AQ$_QUEUE_TABLES etc.

Note: You can use Oracle’s provided AQ & MGW Health Check and Configuration Script (Doc ID 1193854.1) which will help you to quickly point out INVALID objects, object IDs and AQs consistency information.

SQL>  DELETE FROM SYSTEM.AQ$_QUEUES WHERE table_objno in (<object_id>,<object_id>,<object_id>,<object_id>,<object_id>);
1 rows deleted.

SQL>  DELETE FROM SYS.OBJ$ WHERE obj# IN (<object_id> ,<object_id>,<object_id>,<object_id>,
<object_id>,<object_id>,<object_id>,<object_id>);
34 rows deleted.

DELETE FROM SYSTEM.AQ$_QUEUE_TABLES where objno in (<object_id>,<object_id>,<object_id>,
<object_id>,<object_id>,<object_id><object_id>,<object_id>)
/

SQL> commit;
Commit complete.

Next we decided to drop the rule sets and rule evaluation contexts.

execute DBMS_RULE_ADM.DROP_EVALUATION_CONTEXT ('TESTDBSC.AQ$_AQTST_XXXXXXXXXXXX_UPD_V',TRUE)
execute DBMS_RULE_ADM.DROP_EVALUATION_CONTEXT ('TESTDBSC.AQ$_AQTST_XXXX_XXXX_V',TRUE)
..
....
execute DBMS_RULE_ADM.DROP_RULE_SET ('TESTDBSC.AWARD_XXX_XXXX_N',TRUE);
execute DBMS_RULE_ADM.DROP_RULE_SET ('TESTDBSC.AWARD_XXXX_XXXX_R',TRUE);
...
....

Now we have all of the queues named %AQTST% properly cleaned from the impacted schema. Just as a precautionary measure, we flushed our shared pool. We tried to drop the user after manual cleanup and were able to drop the schema which was earlier throwing AQ related errors.

So, in short this all happened because there was a AQ metadata mismatch which lead to failed import as it contained queues and we had to manually cleanup the advance queue’s metadata.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: Database, oracle, troubleshooting | Leave a Comment »

A simple lovely Python script to get complete row locking details in Oracle database …

Posted by FatDBA on August 12, 2023

Hi Guys,

Recently I was working on one row locking contention where the a particular row was locked in exclusive mode caused all subsequent sessions trying to modify that row went in to waiting mode and were were waiting on ‘enq: TX row lock contention‘ wait event. This was a classic pessimistic row locking scenario which was happening due to application design problem.

The situation is not new for most of the DBAs, and they know what is causing the block, the relationship of parent and child blockers, the blocking and block SIDs, lock modes etc., but things are sometimes difficult for non-DBA users who don’t know where to go, what to call and what to check where there is locking in the database stopping their program to finish and they are scratching their head.

I have tried to write a Python script which connects with the Oracle Database using cx_Oracle module using connection details and start executing blocking specific SQL statements embedded inside the python code. I have even tried to add exception handling for the cases when there is any syntax errors or any grammar issues. Tried to add color coding too with result seperators to make the output easy to read. The embedd code makes it self-reliant and complete and makes it very easy to run on any system.

The code is pasted below and is also available on my GitHub website. Here is the link to download the source code —> https://github.com/fatdba/OtherScripts/blob/main/Python-Locking-Main.py

We only have to make sure that we have Python, PIP and Python Module cx_Oracle installed on the server/host.

[oracle@fatdba ~]$ python --version
Python 2.7.5
[root@fatdba ~]# pip install cx_Oracle==7.3
Collecting cx_Oracle==7.3
  Downloading https://files.pythonhosted.org/packages/71/2a/91eb1ccb37a249772a93620ee0539a3f902b187ddb82978d8519abe03899/cx_Oracle-7.3.0-cp27-cp27mu-manylinux1_x86_64.whl (728kB)
    100% |████████████████████████████████| 737kB 1.3MB/s
Installing collected packages: cx-Oracle
Successfully installed cx-Oracle-7.3.0
You are using pip version 8.1.2, however version 23.2.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[root@fatdba ~]#
[root@fatdba ~]$ pip --version
pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7)


Here is the code of the tool.

import cx_Oracle

# Database connection details
db_username = "system"
db_password = "xxxx"
db_host = "hostname-fqdn"
db_port = "1521"
db_service = "xxxx"

bold_start = "\033[1m"
color_green = "\033[32m"
reset_format = "\033[0m"

def print_colored(text, color_code):
    colored_text = "{}{}{}".format(color_code, text, reset_format)
    print(colored_text)

# Define the SQL statements
sql_statements = [
'''
SELECT rpad(instance_name, 17) current_instance, status, STARTUP_TIME, HOST_NAME, version, DATABASE_STATUS FROM v$instance
''',
"""
alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS'
""",
"""
SELECT
    OUTPUT || CHR(10) || RPAD('-', LENGTH(OUTPUT) - LENGTH(REPLACE(OUTPUT, CHR(10), '')), '-') AS OUTPUT
FROM (
    SELECT
    'INST_ID -->  '||x.INST_ID || CHR(10) ||
    'Serial ID -->  '||x.sid || CHR(10) ||
    'Serial Num -->  '||x.serial# || CHR(10) ||
    'User Name -->  '||x.username || CHR(10) ||
    'Session Status -->  '||x.status || CHR(10) ||
    'Program -->  '||x.program || CHR(10) ||
    'Module -->  '||x.Module || CHR(10) ||
    'Action -->  '||x.action || CHR(10) ||
    'Machine -->  '||x.machine || CHR(10) ||
    'OS_USER -->  '||x.OSUSER || CHR(10) ||
    'Process -->  '||x.process || CHR(10) ||
    'State -->  '||x.State || CHR(10) ||
    'EVENT -->  '||x.event || CHR(10) ||
    'SECONDS_IN_WAIT -->  '||x.SECONDS_IN_WAIT || CHR(10) ||
    'LAST_CALL_ET -->  '||x.LAST_CALL_ET || CHR(10) ||
    'SQL_PROFILE --> '||SQL_PROFILE || CHR(10) ||
    'ROWS_PROCESSED --> '||ROWS_PROCESSED || CHR(10) ||
    'BLOCKING_SESSION_STATUS --> '||BLOCKING_SESSION_STATUS || CHR(10) ||
    'BLOCKING_INSTANCE --> '||BLOCKING_INSTANCE || CHR(10) ||
    'BLOCKING_SESSION --> '||BLOCKING_SESSION || CHR(10) ||
    'FINAL_BLOCKING_SESSION_STATUS --> '||FINAL_BLOCKING_SESSION_STATUS || CHR(10) ||
    'SQL_ID -->  '||x.sql_id || CHR(10) ||
    'SQL_TEXT -->  '||SQL_TEXT || CHR(10) ||
    'Logon Time -->  '||TO_CHAR(x.LOGON_TIME, 'MM-DD-YYYY HH24MISS') || CHR(10) ||
    'RunTime -->  '||ltrim(to_char(floor(x.LAST_CALL_ET/3600), '09')) || ':'
    || ltrim(to_char(floor(mod(x.LAST_CALL_ET, 3600)/60), '09')) || ':'
    || ltrim(to_char(mod(x.LAST_CALL_ET, 60), '09')) || CHR(10) AS OUTPUT,
    x.LAST_CALL_ET AS RUNT
    FROM   gv$sqlarea sqlarea
    ,gv$session x
    WHERE  x.sql_hash_value = sqlarea.hash_value
    AND    x.sql_address    = sqlarea.address
    AND    x.status='ACTIVE'
    AND x.event like '%row lock contention%'
    AND SQL_TEXT not like '%SELECT     OUTPUT || CHR(10)%'
    AND x.USERNAME IS NOT NULL
    AND x.SQL_ADDRESS    = sqlarea.ADDRESS
    AND x.SQL_HASH_VALUE = sqlarea.HASH_VALUE
)
ORDER BY RUNT DESC
""",
"""
with lk as (select blocking_instance||'.'||blocking_session blocker, inst_id||'.'||sid waiter
 from gv$session where blocking_instance is not null and blocking_session is not null and username is not null)
 select lpad(' ',2*(level-1))||waiter lock_tree from
 (select * from lk
 union all
 select distinct 'root', blocker from lk
 where blocker not in (select waiter from lk))
 connect by prior waiter=blocker start with blocker='root'
""",
"""
SELECT DECODE (l.BLOCK, 0, 'Waiting', 'Blocking ->') user_status
,CHR (39) || s.SID || ',' || s.serial# || CHR (39) sid_serial
,(SELECT instance_name FROM gv$instance WHERE inst_id = l.inst_id)
conn_instance
,s.SID
,s.PROGRAM
,s.inst_id
,s.osuser
,s.machine
,DECODE (l.TYPE,'RT', 'Redo Log Buffer','TD', 'Dictionary'
,'TM', 'DML','TS', 'Temp Segments','TX', 'Transaction'
,'UL', 'User','RW', 'Row Wait',l.TYPE) lock_type
,DECODE (l.lmode,0, 'None',1, 'Null',2, 'Row Share',3, 'Row Excl.'
,4, 'Share',5, 'S/Row Excl.',6, 'Exclusive'
,LTRIM (TO_CHAR (lmode, '990'))) lock_mode
,ctime
,object_name
FROM
   gv$lock l
JOIN
   gv$session s
ON (l.inst_id = s.inst_id
AND l.SID = s.SID)
JOIN gv$locked_object o
ON (o.inst_id = s.inst_id
AND s.SID = o.session_id)
JOIN dba_objects d
ON (d.object_id = o.object_id)
WHERE (l.id1, l.id2, l.TYPE) IN (SELECT id1, id2, TYPE
FROM gv$lock
WHERE request > 0)
ORDER BY id1, id2, ctime DESC
""",
"""
select l1.sid, ' IS BLOCKING ', l2.sid
from gv$lock l1, gv$lock l2 where l1.block =1 and l2.request > 0
and l1.id1=l2.id1
and l1.id2=l2.id2
""",
"""
select s2.inst_id,s1.username || '@' || s1.machine
 || ' ( SID=' || s1.sid || ' )  is blocking '
 || s2.username || '@' || s2.machine || ' ( SID=' || s2.sid || ' ) ' AS blocking_status
  from gv$lock l1, gv$session s1, gv$lock l2, gv$session s2
  where s1.sid=l1.sid and s2.sid=l2.sid and s1.inst_id=l1.inst_id and s2.inst_id=l2.inst_id
  and l1.BLOCK=1 and l2.request > 0
  and l1.id1 = l2.id1
  and l2.id2 = l2.id2
order by s1.inst_id
""",
"""
SELECT sid,
                                TYPE,
                                DECODE( block, 0, 'NO', 'YES' ) BLOCKER,
        DECODE( request, 0, 'NO', 'YES' ) WAITER,
        decode(LMODE,1,'    ',2,'RS',3,'RX',4,'S',5,'SRX',6,'X','NONE') lmode,
                                 decode(REQUEST,1,'    ',2,'RS',3,'RX',4,'S',5,'SRX',6,'X','NONE') request,
                                TRUNC(CTIME/60) MIN ,
                                ID1,
                                ID2,
        block
                        FROM  gv$lock
      where request > 0 OR block =1
""",
"""
select  sn.USERNAME,
        m.SID,
        sn.SERIAL#,
        m.TYPE,
        decode(LMODE,
                0, 'None',
                1, 'Null',
                2, 'Row-S (SS)',
                3, 'Row-X (SX)',
                4, 'Share',
                5, 'S/Row-X (SSX)',
                6, 'Exclusive') lock_type,
        decode(REQUEST,
                0, 'None',
                1, 'Null',
                2, 'Row-S (SS)',
                3, 'Row-X (SX)',
                4, 'Share',
                5, 'S/Row-X (SSX)',
                6, 'Exclusive') lock_requested,
        m.ID1,
        m.ID2,
        t.SQL_TEXT
from    v$session sn,
        v$lock m ,
        v$sqltext t
where   t.ADDRESS = sn.SQL_ADDRESS
and     t.HASH_VALUE = sn.SQL_HASH_VALUE
and     ((sn.SID = m.SID and m.REQUEST != 0)
or      (sn.SID = m.SID and m.REQUEST = 0 and LMODE != 4 and (ID1, ID2) in
        (select s.ID1, s.ID2
         from   gv$lock S
         where  REQUEST != 0
         and    s.ID1 = m.ID1
         and    s.ID2 = m.ID2)))
order by sn.USERNAME, sn.SID, t.PIECE
"""
]

banner = """
=========================================================
      Locking Stats Report
        Author: Prashant Dixit
        Version : 1.0
       Date : 2023-August-11
========================================================
"""
print_colored(banner, color_green)

try:
    # Establishing a database connection
    dsn = cx_Oracle.makedsn(db_host, db_port, service_name=db_service)
    connection = cx_Oracle.connect(user=db_username, password=db_password, dsn=dsn)

    # Executing each SQL statement
    cursor = connection.cursor()
    for idx, statement in enumerate(sql_statements):
        statement = statement.strip()
        if statement:
            try:
                cursor.execute(statement)

                if statement.upper().startswith("SELECT"):
                    result = cursor.fetchall()
                    if result:
                        column_names = [desc[0] for desc in cursor.description]
                        print_colored("Column Headers: " + ", ".join(column_names), color_green)
                        print("Results:")

                        for row in result:
                            print("Row:")
                            for col_name, col_value in zip(column_names, row):
                                print("{}: {}".format(col_name, col_value))
                            print("\n")

                    else:
                        print("No results.")

                    # Add a newline after the output of the first two SQL statements
                    #if idx == 1:
                     #   print("\n")

                #print("\n" * 1)  # Add a gap of one  lines

            except Exception as e:
                print("Error executing statement:", statement)
                print("Error details:", str(e))

    connection.commit()
    print("SQL statements execution completed.")

except Exception as e:
    connection.rollback()
    print("An error occurred:", str(e))

finally:
    if connection:
        connection.close()

…..

……

The output will look like this.

[oracle@fatdbatest ~]$ python locking.py

=========================================================
      Locking Stats Report
        Author: Prashant Dixit 
        Version : 1.0
       Date : 2023-August-11
========================================================

Column Headers: CURRENT_INSTANCE, STATUS, STARTUP_TIME, HOST_NAME, VERSION, BLOCKED, DATABASE_STATUS
Results:
Row:
CURRENT_INSTANCE: fatdba
STATUS: OPEN
STARTUP_TIME: 2023-06-03 19:42:57
HOST_NAME: fatdba.com
VERSION: 19.0.0.0.0
BLOCKED: NO
DATABASE_STATUS: ACTIVE


Column Headers: OUTPUT
Results:
Row:
OUTPUT: INST_ID -->  1
Serial ID -->  5065
Serial Num -->  17982
User Name -->  SYS
Session Status -->  ACTIVE
Program -->  sqlplus@fatdba.com (TNS V1-V3)
Module -->  sqlplus@fatdba.com (TNS V1-V3)
Action -->
Machine -->  fatdba.com
OS_USER -->  oracle
Process -->  6873
State -->  WAITING
EVENT -->  enq: TX - row lock contention
SECONDS_IN_WAIT -->  3
LAST_CALL_ET -->  4
SQL_PROFILE -->
ROWS_PROCESSED --> 0
BLOCKING_SESSION_STATUS --> VALID
BLOCKING_INSTANCE --> 1
BLOCKING_SESSION --> 5062
FINAL_BLOCKING_SESSION_STATUS --> VALID
SQL_ID -->  vdksq7js9b0cp
SQL_TEXT -->  update locking set id=100
Logon Time -->  08-11-2023 215112
RunTime -->  00:00:04

-------------------------

Column Headers: USER_STATUS, SID_SERIAL, CONN_INSTANCE, SID, PROGRAM, INST_ID, OSUSER, MACHINE, LOCK_TYPE, LOCK_MODE, CTIME, OBJECT_NAME
Results:
Row:
USER_STATUS: Blocking ->
SID_SERIAL: '5062,50729'
CONN_INSTANCE: fatdba
SID: 5062
PROGRAM: sqlplus@fatdba.com (TNS V1-V3)
INST_ID: 1
OSUSER: oracle
MACHINE: fatdba.com
LOCK_TYPE: Transaction
LOCK_MODE: Exclusive
CTIME: 9241
OBJECT_NAME: LOCKING


Row:
USER_STATUS: Waiting
SID_SERIAL: '5065,17982'
CONN_INSTANCE: fatdba
SID: 5065
PROGRAM: sqlplus@fatdba.com (TNS V1-V3)
INST_ID: 1
OSUSER: oracle
MACHINE: fatdba.com
LOCK_TYPE: Transaction
LOCK_MODE: None
CTIME: 5
OBJECT_NAME: LOCKING

Column Headers: SID, 'ISBLOCKING', SID
Results:
Row:
SID: 5062
'ISBLOCKING':  IS BLOCKING
SID: 5065


Column Headers: INST_ID, BLOCKING_STATUS
Results:
Row:
INST_ID: 1
BLOCKING_STATUS: SYS@fatdba.com ( SID=5062 )  is blocking SYS@fatdba.com ( SID=5065 )


Column Headers: SID, TYPE, BLOCKER, WAITER, LMODE, REQUEST, MIN, ID1, ID2, BLOCK
Results:
Row:
SID: 5062
TYPE: TX
BLOCKER: YES
WAITER: NO
LMODE: X
REQUEST: NONE
MIN: 154
ID1: 327688
ID2: 5988830
BLOCK: 1


Row:
SID: 5065
TYPE: TX
BLOCKER: NO
WAITER: YES
LMODE: NONE
REQUEST: X
MIN: 0
ID1: 327688
ID2: 5988830
BLOCK: 0


Column Headers: USERNAME, SID, SERIAL#, TYPE, LOCK_TYPE, LOCK_REQUESTED, ID1, ID2, SQL_TEXT
Results:
Row:
USERNAME: SYS
SID: 5065
SERIAL#: 17982
TYPE: TX
LOCK_TYPE: None
LOCK_REQUESTED: Exclusive
ID1: 327688
ID2: 5988830
SQL_TEXT: update locking set id=100

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: oracle, performance, tools, troubleshooting | 1 Comment »

« Previous Entries

Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes

Posts Tagged ‘troubleshooting’

Why oracle’s optimizer has been getting smarter for 15 years and what 26ai version actually adds

Parquet, hadoop, and a quietly dying process : lessons from a migration test using GoldenGate 23ai DAA

When GoldenGate decides to throw OGG-02912 just before New Years Eve.

When Linux Swaps Away My Sleep – MySQL, RHEL8, and the Curious Case of High Swap Usage

Oracle 23ai Tip: Use SESSION_EXIT_ON_PACKAGE_STATE_ERROR to Prevent Silent Data Corruption

How SESSION_EXIT_ON_PACKAGE_STATE_ERROR Works

Use Cases

Configuring SESSION_EXIT_ON_PACKAGE_STATE_ERROR:

Considerations

Conclusion

Data Pump Troubleshooting Tips – My favorite 6

Addressing Stuck Undo Segments : How to Safely Drop Problematic Undo Segments

Some weirdness with V$DIAG_ALERT_EXT and magical support from Oracle Customer Support.

A story of Advance Queues mayhem … ORA-24002 ORA-04063

A simple lovely Python script to get complete row locking details in Oracle database …

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes

Posts Tagged ‘troubleshooting’

Share this:

Share this:

Share this:

Share this:

How SESSION_EXIT_ON_PACKAGE_STATE_ERROR Works

Use Cases

Configuring SESSION_EXIT_ON_PACKAGE_STATE_ERROR:

Considerations

Conclusion

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: