Hi Everyone,
Recently I was asked to check a mono-directional Integrated Golden Gate (Version 12.3.0.1.2) replication problem where the replicat was not processing the data and got struck with a huge lag of ~ 3.5 hours. Customer reported that the GG stuck due to CREATE INDEX statements running from last 3 hours and want to skip those transactions.
This being an ad-hoc request, I didn’t had any direct access to their systems and can only ask for files or outputs from their operations team. And as per reports shared, yes, there was a delay of around 3.5 hours for the replicat and was showing in RUNNING state.
-- Process stats on the target
GGSCI (cana01db66664b.prod.sdt.monkey12.se) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING TORONTO 03:29:11 00:00:03
-- From replicat parameter file
--DDL Options Used --
DDL INCLUDE MAPPED OBJNAME PDIXIT.*
DDLERROR DEFAULT IGNORE
DDLERROR 942 IGNORE
DDLERROR 955 IGNORE
DDLERROR 1435 IGNORE INCLUDE OPTYPE ALTER OBJTYPE SESSION
DDLERROR 904 IGNORE
DDLERROR 1430 IGNORE
While I was examining their parameter file saw they’re using BATCHSQL and HANDLECOLLISIONS to improve performance. Updated them that the use of the HANDLECOLLISIONS parameter can also cause performance issues if there are collisions since there needs to be additional processing done with those records. Hence, It is recommended to remove this parameter, if you are “not doing your initial load” for your tables. Most of transactions are happening in normal mode, hence, asked them to remove BATCHSQL parameter too.
Next, I asked them to try with DDLOPTIONS REPORT to be set in the parameter file, as this settings causes Integrated Extract to write a step-by-step history of all DDL operations captured to it’s report file; which is very useful when troubleshooting DDL replication issues. After they made a change to the parameter file and a bounce of replicat, I checked the logs, and per logs, below DDL was already executed
--> 2021-10-20 08:19:30 INFO OGG-00489 DDL is of mapped scope, after mapping new operation [create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607) (size 82)].
However , DDL error ignored due to ORA-04021 and as they have DDLERROR ignore parameters in parameter file.
--> 2021-10-20 08:34:32 INFO OGG-00492 DDL error ignored: error code [DEFAULT], filter [include all (default)], error text [Error code [4021], ORA-04021: timeout occurred while waiting to lock object PDIXIT.IDX_TEST121 SQL create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607) /* GOLDENGATE_DDL_REPLICATION */].
Asked them to check whether the index “PDIXIT”.”IDX_TEST121″ has been created on the target. If not then they may manually need to create this index on the target, but as per them, the index statement can be ignored and they only want to move the GG further, as their entire processing system was stopped and caused a massive delay.
Sid,Ser# USERNAME STATUS SQL_ID SQL_CHILD_NUMBER SQL_HASH_VALUE Logon Time OSUser@Machine pid
TERMINAL PROGRAM MODULE
------------- ------------- --------------- --------------- ---------------- -------------- ------------- -------------------- -------
-------- ----------------------------------- -----------------------------------
6781,54621 GGGTEST ACTIVE u52csv8kcnb34 0 617229321 23-AUG 06:55 oracle@cana01db66664 33329 r
eplicat@cana01db66664b.prod.sdt.er OGG-RCITSA00-CANA_DAIS_GREATC
b.prod.sdt.monkey12.
se
SQL> select sql_fulltext from gv$sql where sql_id='u52csv8kcnb34';
SQL_FULLTEXT
--------------------------------------------------------------------------------
create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607
) /* GOLDENGATE_DDL_REPLICATION */
create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607
) /* GOLDENGATE_DDL_REPLICATION */
Locking Object Object Lock
SID SER# STATUS User Os user Os Proc Owner Name Mode
------ ------ --------------- -------------------- -------------------- -------------------- -------------------- ------------------------- ----------
6781 54621 ACTIVE GGGTEST oracle 33329 SYS OBJ$ Row-X (SX)
6781 54621 ACTIVE GGGTEST oracle 33329 PDIXIT TAB12 Share
I immediately asked them to kill those database session which was running above DDLs, and the lock issue got resolved the moment they killed them the lag was drained in few seconds. Next we added below parameter in to the replicat to exclude CREATE INDEX statements.
DDL INCLUDE MAPPED OBJNAME PDIXIT.* exclude INSTR 'CREATE INDEX'
Customer also wanted to remove few of the rebuild online index DDLs, since those are taking too much time and causing lag in replicat, so we tried one more parameter to exclude INDEX REBUILD ONLINE operations from the specific user to happen on the target
DDL INCLUDE MAPPED OBJNAME PDIXIT.* exclude optype alter objtype 'INDEX' instr 'rebuild online'
And told them to remove added parameters to IGNORE DDLs and revert back to original once lag is zero and go back to original parameter of
DDL INCLUDE MAPPED OBJNAME PDIXIT.*
So, that’s how we fixed the issue and the flow we adopted to handle the situation.
Hope It Helped!
Prashant Dixit