Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

DDL stalled Golden Gate replicat : A quick damage control !

Posted by FatDBA on October 31, 2021

Hi Everyone,

Recently I was asked to check a mono-directional Integrated Golden Gate (Version 12.3.0.1.2) replication problem where the replicat was not processing the data and got struck with a huge lag of ~ 3.5 hours. Customer reported that the GG stuck due to CREATE INDEX statements running from last 3 hours and want to skip those transactions.

This being an ad-hoc request, I didn’t had any direct access to their systems and can only ask for files or outputs from their operations team. And as per reports shared, yes, there was a delay of around 3.5 hours for the replicat and was showing in RUNNING state.

-- Process stats on the target
GGSCI (cana01db66664b.prod.sdt.monkey12.se) 1> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING
REPLICAT    RUNNING     TORONTO       03:29:11      00:00:03


-- From replicat parameter file
--DDL Options Used --
DDL INCLUDE MAPPED OBJNAME PDIXIT.*
DDLERROR DEFAULT IGNORE
DDLERROR 942 IGNORE
DDLERROR 955 IGNORE
DDLERROR 1435 IGNORE INCLUDE OPTYPE ALTER OBJTYPE SESSION
DDLERROR 904 IGNORE
DDLERROR 1430 IGNORE

While I was examining their parameter file saw they’re using BATCHSQL and HANDLECOLLISIONS to improve performance. Updated them that the use of the HANDLECOLLISIONS parameter can also cause performance issues if there are collisions since there needs to be additional processing done with those records. Hence, It is recommended to remove this parameter, if you are “not doing your initial load” for your tables. Most of transactions are happening in normal mode, hence, asked them to remove BATCHSQL parameter too.

Next, I asked them to try with DDLOPTIONS REPORT to be set in the parameter file, as this settings causes Integrated Extract to write a step-by-step history of all DDL operations captured to it’s report file; which is very useful when troubleshooting DDL replication issues. After they made a change to the parameter file and a bounce of replicat, I checked the logs, and per logs, below DDL was already executed

--> 2021-10-20 08:19:30 INFO OGG-00489 DDL is of mapped scope, after mapping new operation [create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607) (size 82)]. 

However , DDL error ignored due to ORA-04021 and as they have DDLERROR ignore parameters in parameter file.

--> 2021-10-20 08:34:32 INFO OGG-00492 DDL error ignored: error code [DEFAULT], filter [include all (default)], error text [Error code [4021], ORA-04021: timeout occurred while waiting to lock object PDIXIT.IDX_TEST121 SQL create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607) /* GOLDENGATE_DDL_REPLICATION */].

Asked them to check whether the index “PDIXIT”.”IDX_TEST121″ has been created on the target. If not then they may manually need to create this index on the target, but as per them, the index statement can be ignored and they only want to move the GG further, as their entire processing system was stopped and caused a massive delay.

Sid,Ser#      USERNAME      STATUS          SQL_ID          SQL_CHILD_NUMBER SQL_HASH_VALUE Logon Time    OSUser@Machine         pid   
TERMINAL PROGRAM                            MODULE
------------- ------------- --------------- --------------- ---------------- -------------- ------------- -------------------- ------- 
-------- ----------------------------------- -----------------------------------
6781,54621    GGGTEST       ACTIVE          u52csv8kcnb34                  0      617229321 23-AUG 06:55  oracle@cana01db66664 33329  r
eplicat@cana01db66664b.prod.sdt.er OGG-RCITSA00-CANA_DAIS_GREATC
                                                                                                          b.prod.sdt.monkey12.
                                                                                                          se

SQL> select sql_fulltext from gv$sql where sql_id='u52csv8kcnb34';


SQL_FULLTEXT
--------------------------------------------------------------------------------
create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607
)  /* GOLDENGATE_DDL_REPLICATION */

create index "PDIXIT"."IDX_TEST121" on "PDIXIT"."TAB12"(C1666600563,C737666607
)  /* GOLDENGATE_DDL_REPLICATION */


                                    Locking                                                         Object                 Object            Lock
 SID     SER# STATUS                  User               Os user              Os Proc               Owner                   Name             Mode
------ ------ --------------- -------------------- -------------------- -------------------- -------------------- ------------------------- ----------
  6781  54621 ACTIVE          GGGTEST              oracle               33329                SYS                  OBJ$                    Row-X (SX)
  6781  54621 ACTIVE          GGGTEST              oracle               33329                PDIXIT               TAB12                   Share
 

I immediately asked them to kill those database session which was running above DDLs, and the lock issue got resolved the moment they killed them the lag was drained in few seconds. Next we added below parameter in to the replicat to exclude CREATE INDEX statements.

DDL INCLUDE MAPPED OBJNAME PDIXIT.* exclude INSTR 'CREATE INDEX' 

Customer also wanted to remove few of the rebuild online index DDLs, since those are taking too much time and causing lag in replicat, so we tried one more parameter to exclude INDEX REBUILD ONLINE operations from the specific user to happen on the target

DDL INCLUDE MAPPED OBJNAME PDIXIT.* exclude optype alter objtype 'INDEX' instr 'rebuild online'

And told them to remove added parameters to IGNORE DDLs and revert back to original once lag is zero and go back to original parameter of

DDL INCLUDE MAPPED OBJNAME PDIXIT.*

So, that’s how we fixed the issue and the flow we adopted to handle the situation.

Hope It Helped!
Prashant Dixit

Advertisement

4 Responses to “DDL stalled Golden Gate replicat : A quick damage control !”

  1. Saibal Ghosh said

    Very good analysis and provides a great insight on what to look out for!

  2. Revathi Ballani said

    Hi Sir,

    It will be great if you could provide full query to check golden gate session at database level , As whenever i try to find golde gate session ,I could see sqlid as null , Please help me

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: