Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Posts Tagged ‘Tuning’

How to trace DBMS_STATS and see what is under the hood for a slow sluggish stats collection ..

Posted by FatDBA on July 10, 2022

Recently while working on a slow stats gathering case, someone asked if there is a way to know what’s happening beneath the surface ? what all flags and calculations its doing internally while on the front end the DBMS_STATS still running ? Yes, there is a way! As with most of the Oracle utilities, DBMS_STATS too comes with its own tracing facility which you can call using dbms_stats.set_global_prefs. Today’s post is all about enabling tracing on DBMS_STATS package.

Few of the commonly used DBMS_STATS flags/options are given below, you can always do a combination if wanted to club …

4 = This is to trace table stats
8 = This is to trace index stats
16 = This is to trace columnar stats
512 = auto stats job
1024 = This is to trace parallel executions
4096 = This is to trace partition prunes
16384 = This one traces extended column stats
32768 = To trace approximate NDV (number distinct values) ….

Let’s quickly collect traces for both Indexes and Tables followed by the stas collection for a table and its dependent Indices.

SQL>
SQL> exec dbms_stats.set_global_prefs('TRACE',4+8);

PL/SQL procedure successfully completed.

SQL>

SQL> set time on
13:28:55 SQL>
13:28:56 SQL>
13:28:56 SQL>
13:28:56 SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYS' , tabname => 'BIGTAB',cascade => true, estimate_percent => 10,method_opt=>'for all indexed columns size 1', granularity => 'ALL', degree => 4);

PL/SQL procedure successfully completed.

13:28:59 SQL>
13:29:00 SQL>


SQL> select name, value from v$diag_info where name='Diag Trace';

NAME                                     VALUE
---------------------------------------- ------------------------------------------------------------------------------------
Diag Trace                               /u01/app/oracle/diag/rdbms/localdb/localdb/trace

Alright, we have our trace ready, let’s dig in deep what’s all inside the trace that we have collected for both the Table and it’s Index.
It starts with a standard header in the trace files, followed by all by default explicitly mentioned flags/options with DBMS_STATS in XML format.

-- Header
DBMS_STATS: Record gather table stats operation on table : BIGTAB
DBMS_STATS:   job name: 
DBMS_STATS:    |--> Operation id: 1341
DBMS_STATS: gather_table_stats: <params><param name="block_sample" val="FALSE"/><param name="cascade" val="TRUE"/>
<param name="concurrent" val="FALSE"/><param name="degree" val="4"/><param name="estimate_percent" val="10"/>
<param name="force" val="FALSE"/><param name="granularity" val="ALL"/><param name="method_opt" val="for all indexed columns size 1"/><param name="no_invalidate" val="NULL"/>
<param name="ownname" val="SYS"/><param name="partname" val=""/><param name="reporting_mode" val="FALSE"/><param name="statid" val=""/><param name="statown" val=""/>
<param name="stattab" val=""/><param name="stattype" val="DATA"/><param name="tabname" val="BIGTAB"/></params>
DBMS_STATS: Start gather table stats -- tabname: BIGTAB

Next comes the preferences set for the called Table, ‘BIGTAB’ in our case. This will give you more idea about what all options were set overall and if there is any scope to tune and tweak anyone of them.

DBMS_STATS: Preferences for table SYS.BIGTAB
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME                                         - 
DBMS_STATS: STATS_RETENTION                                   - 
DBMS_STATS: MON_MODS_ALL_UPD_TIME                             - 
DBMS_STATS: SNAPSHOT_UPD_TIME                                 - 
DBMS_STATS: TRACE                                             - 12
DBMS_STATS: DEBUG                                             - 0
DBMS_STATS: SYS_FLAGS                                         - 1
DBMS_STATS: SPD_RETENTION_WEEKS                               - 53
DBMS_STATS: CASCADE                                           - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT                                  - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE                                            - NULL
DBMS_STATS: METHOD_OPT                                        - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE                                     - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY                                       - AUTO
DBMS_STATS: PUBLISH                                           - TRUE
DBMS_STATS: STALE_PERCENT                                     - 10
DBMS_STATS: APPROXIMATE_NDV                                   - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM                         - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE                        - FALSE
DBMS_STATS: INCREMENTAL                                       - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL                      - TRUE
DBMS_STATS: AUTOSTATS_TARGET                                  - AUTO
DBMS_STATS: CONCURRENT                                        - OFF
DBMS_STATS: JOB_OVERHEAD_PERC                                 - 1
DBMS_STATS: JOB_OVERHEAD                                      - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS                           - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS                        - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS                          - 3
DBMS_STATS: TABLE_CACHED_BLOCKS                               - 1
DBMS_STATS: INCREMENTAL_LEVEL                                 - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS                             - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS                                           - GATHER
DBMS_STATS: GATHER_AUTO                                       - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY                                     - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE                                         - 0
DBMS_STATS: GATHER_SCAN_RATE                                  - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER                    - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS                              - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS                         - 15
DBMS_STATS: ROOT_TRIGGER_PDB                                  - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD                         - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS                        - FALSE
DBMS_STATS: AUTO_TASK_STATUS                                  - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME                            - 3600
DBMS_STATS: AUTO_TASK_INTERVAL                                - 900
DBMS_STATS: STATS_MODEL_INTERNAL_MINRSQ                       - 0.9
DBMS_STATS: STATS_MODEL_INTERNAL_CONTROL                      - 0
DBMS_STATS: STATS_MODEL                                       - ON
DBMS_STATS: AUTO_STATS_ADVISOR_TASK                           - TRUE

Next sections is where it monitors the stats gathering for Table and its Index, their start and end times were captured, for example : “APPROX_NDV_ALGORITHM => Non-Incremental” was chosen. Table level stats were used i.e. row count, blocks, average row length, sample size, number of indexes etc.

Next it jumps to the Index where it tries to calculate the global statistics for the partitioned Index by aggregating the partition-level statistics but as its a non-partitioned Index, it simples moves to the next stage. Next it triggers the Indexing options (just enables them at the start) i.e. nlb (number of leaf blocks), ndk (number of distinct keys), clf (clustering factor), nblks (number of blocks), sample percentage, degree (parallelism), nrw (key counts) etc by allocating them value 1 which means to consider them.
Finally it starts to analyze the Index structure (In this case it’s not using approximate_ndv algorithm) and calculates the final values of nrw, nlb, ndk, clf etc.

Few Internal options were also used i.e. ctx.conc_ctx, ctx.batching_coeff etc.

-- Table Stats
DBMS_STATS: Started table SYS.BIGTAB. at 28-JUN-22 01.28.58.499744000 PM -04:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting
DBMS_STATS: reporting_man_log_task: target: "SYS"."BIGTAB" objn: 108254 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: ALL global_requested: NULL pfix: 
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: HLL in incremental (no stats exist)
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: ALL execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix:  ownname: SYS tabname: BIGTAB pname:  spname:  execution phase: 1
DBMS_STATS: Number of rows in the table = 492190, blocks = , average row length = 19, chain count = , scan rate = 0, sample size = 49219, cstats.count = 3, cind = 2

-- Index part 
DBMS_STATS: Started index SYS.IDX_DIXIT at 28-JUN-22 01.28.58.816819000 PM -04:00 granularity: ALL gIdxGran: 
DBMS_STATS: Specified granularity = , New granularity = ALL, Fixed granularity = ALL
DBMS_STATS: granularity ALL
DBMS_STATS: reporting_man_log_task: target: "SYS"."IDX_DIXIT" objn: 108255 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS:  Gather index subpartition stats...
DBMS_STATS:  Gather index partition stats...
DBMS_STATS:  Gather global index stats...
DBMS_STATS: Start analyze_index_using_sql
DBMS_STATS: tab_stats_stale: BIGTAB not analyzed using full compute
DBMS_STATS:   Number of blocks in the index: 1030
DBMS_STATS: Start execute_analyze_index(owner= SYS, indname= IDX_DIXIT, fobjn= , sample_pct= 10, seed= 0, degree= 4, collect_nrw= 1, collect_nlb= 1, collect_ndk= 1, collect_clf= 1, bmi= 0, iot_ov= 0, iot_sec= 0, ppredtxt= , nblks= 1030)
DBMS_STATS: execute_analyze_index(): Not using approximate_ndv, pct=111.529126213592233009708737864077669903,sample_pct=10
DBMS_STATS: End execute_analyze_index(ssize= 494100, nrw= 494100, nlb= 1027, ndk= 1000, clf= 494100)
DBMS_STATS: target_size: 1027
DBMS_STATS: Finished index SYS.IDX_DIXIT at 28-JUN-22 01.28.59.464468000 PM -04:00

DBMS_STATS: Finished table SYS.BIGTAB. at 28-JUN-22 01.28.59.519495000 PM -04:00

Once you are done with the stats tracing, close it immediately!

SQL> exec dbms_stats.set_global_prefs('TRACE',0);

PL/SQL procedure successfully completed.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

High stats collection time for partitioned tables after upgrade to 19c

Posted by FatDBA on July 2, 2022

Recently, while working on a database upgrade from 12c to 19c (19.15) one of my friend encountered a strange issue on the newly upgraded 19c database where the stats gathering on the full database started taking huge time. It used to take ~ 3 hours to complete the full database statistics, but the same stats collection job after the upgrade to 19c started taking close to 10 hours. The stats collection script they were using was quite simple and with minimal parameters used.

EXEC DBMS_STATS.GATHER_DATABASE_STATS(ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE,degree => 8);

During the analysis he observed that the top 3-4 in-flight transactions during stats collection were related to the Index Statistics and were found doing ‘Index Fast Full Scan’, and all of them are on few of the large partitioned tables in the database. He discussed the case with me and together tried few thing i.e. recollected dictionary and fixed object statistics, did some comparative study of parameters between 12c and 19c but none of them worked. At last we tried to set debugging levels on DBMS_STATS to see what’s happening under the hood, and that gave us some hint when set it with level/flag 8 (trace index stats) and with level 32768 to trace approximate NDV (number distinct values) gatherings. Traces gave us some idea that its surely with the index stats and NDV or number of distinct keys and is taking time.

But even after that we both were totally clueless as these Tables and its dependent objects are there in the system for a very long time. So. the big question was – What’s new in 19c that has slowed down stats collection ?

Finally we decided to contact OCS! And they quickly responded to the problem as its a known problem with the 19c. As per them, there was an enhancement in 19c that is related to Index stats gathering, and that had lead to the longer stats times. It was all due to an unpublished Bug 33427856 which is an enhancement to improve the calculation of index NDK (Number of Distinct Keys). This new feature with the approx_count_distinct function and fully scans indexes to calculate NDK. This has a significant benefit because NDK is now accurate. It also means that gathering statistics can take longer (for example, updating global index statistics if incremental stats is used). So, In general, this is expected behavior, since DBMS_STATS is doing more work in 19c than it did in previously unenhanced versions.

And the solution to this new 19c index-stats feature (a problem) off by setting fix control to disable ‘Enhance Index NDK Statistics’ – 27268249

alter system set "_fix_control"='27268249:0';

And as soon as we deleted existing statistics and regather them, the time dropped drastically and got completed under 3 hours.

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , | Leave a Comment »

Exceptionally high stats collection time on FIXED OBJECTS during an upgrade …

Posted by FatDBA on June 26, 2022

Someone recently asked about a situation where they were trying to upgrade their database to 19c and as a part of their upgrade plan, they were trying to run fixed object statistics but it was going on forever, and they were totally clueless why and where its taking time. This being a mandatory step, they tried several times, but same result.

About fixed object stats, It is recommended that you re-gather them if you do a major database or application upgrade, implement a new module, or make changes to the database configuration. For example if you increase the SGA size then all of the X$ tables that contain information about the buffer cache and shared pool may change significantly, such as X$ tables used in v$buffer_pool or v$shared_pool_advice.

About fixed objects stats collection idle time, I mean anything between 1-10 minutes is I will say normal and average, but anything that goes beyond 20 minutes or even more or even in hours is abnormally high and points to a situation.

So, I was asked to take a look on ad-hoc basis and during the analysis I found a SQL trying to do a count all on unified_audit_trail, and was running from the same time since they called the DBMS_STATS for FIXED OBJECTS on the database. When asked, they told me that they’d enabled auditing on the database some 6 months back and haven’t purged anything since then, the audit trail had grown behemoth and has ~ 880 Million records. I immediately offered them two approaches to handle the situation – Either lock your unified table statistics (using dbms_stats.lock_table_stats) or else take backup of the table and purge audit records before calling the stats gathering job again. They agreed with the second approach, they took backup of audit table and purged audit trail. As soon as they purged audit table, the stats collection on fixed objects got finished in ~ 3 minutes.

This was the situation and what we did …

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL RAC_INSTANCE LAST_ARCHIVE_TS
-------------------- ------------ ------------------------------
STANDARD AUDIT TRAIL 0 22-MAY-22 06.00.00.000000 AM +00:00


SQL> select count(*) from aud$;

COUNT(*)
----------
885632817

BEGIN
DBMS_AUDIT_MGMT.clean_audit_trail(
audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
use_last_arch_timestamp => FALSE);
END;
/

SQL> select count(*) from aud$;

COUNT(*)
----------
0


SQL> SET TIMING ON
SQL> BEGIN
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
/

Elapsed: 00:03:10.81

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , | 2 Comments »

What is a short stack dump ?

Posted by FatDBA on May 23, 2022

Hi Everyone,

Short stacks are a rich source of information which can help you to do analyzing and deconstructing a process which is part of a problem in Oracle databases, and the good part is you don’t have to rely on any OS utilities like gdb, pstack etc. to extract stacks or traces. Its pretty simple and quick to generate the short stacks, you have to pass on the OSPID of the process, and issue SHORT_STACK option for ORADEBUG, and will generate a sequence of all functions involved and their current position in a calls stack. You can also use errorstack for the same, but short_stacks are fast and point-to-point!

Now one can ask – What is a ‘call stack’ ?
A call stack is a list of names of methods called at run time from the beginning of a program until the execution of the current statement. A call stack is mainly intended to keep track of the point to which each active subroutine should return control when it finishes executing. Call stack acts as a tool to debug an application when the method to be traced can be called in more than one context. This forms a better alternative than adding tracing code to all methods that call the given method.

Let me run it on a real time system where we are facing issues where due to adaptive switching between Log write methods LGWR was causing excessive ‘log file sync‘ Waits

[oracle@oracleontario ~]$ ps -ef|grep lg
oracle     4402      1  0 00:25 ?        00:00:00 ora_lgwr_dixitdb
oracle     4719   3201  0 00:25 pts/1    00:00:00 grep --color=auto lg

-- Here in the example I am generating short stacks for LGWR process
SQL>
SQL> oradebug setospid 4402
Oracle pid: 19, Unix process pid: 4402, image: oracle@oracleontario.ontadomain (LGWR)
SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+187
<-ksliwat()+2233<-kslwaitctx()+200<-ksarcv()+320<-ksbabs()+670<-ksbrdp()+1167<-opirip()+541<-opidrv()+581<-sou2o()+165
<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
SQL>
SQL>



-- After a gap of ~ 10-15 Mins
-- Let me generate the short stack once again just to see if there is any difference.
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+992 <-sigacthandler()+104
<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212 <-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680
<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380


So, as you’ve seen in above code, the clear change is there in the LGWR short stacks, we did few log switches, modified a big table in the database followed by multiple commits. If you carefully look at the stack, it was function ksedsts() where it was at the time we generated the stack, and function ksdxfstk () called it, followed by ksdxcb(), sspuser() and immediate change after functions called by LGWR internal call stacks. They matched one of the known BUG which gave us a hint that we resolved after modified a undocumented parameter.

So, it can help us locating the buggy functions called by any user session or background processes and is very useful in case when you are battling with a strange looking problem in your Oracle database.

Few of the functions that I am able to recall are given below.
semtimedop() is one of the function for semaphore operations also known as semop and is used to perform atomically an array of semaphore operations on the set of semaphores associated with the semaphore identifier specified by semid.
‘ksdxcb’ is a function that’s called usually when the command oradebug is being used.
sighandler is the programming signal handler and is used to locate an exception. When a signal is sent to a process, various register and stack operations occur that make it look as though the leaf PC at the time of the signal is the return address for a call to a system function, sigacthandler(). sigacthandler() calls the user-specified signal handler just as any function would call another.
The sigaltstack() function allows a thread to define and examine the state of an alternate stack area on which signals are processed.

opidrv() is ORACLE Program Interface DRiVer (IGNORE)
opiodr is ORACLE code request driver – route the current request
main() is the standard executable entry point
ksedst() is the KSE dump the call stack
skgmstack() is the call specified function with extra STACK space
rpidrv() or the RPI is theRecursive Program Interface DRiVer
pfrrun() or PSDEVN is the PL/SQL Interpreter Main Instruction Loop
kkxexe() or KKX is to execute plsql

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

An interesting case of ‘enq: CR – block range reuse ckpt’, CKPT blocking user sessions …

Posted by FatDBA on May 2, 2022

Hi All,

Last week we faced an interesting issue with one of the production system which was recently migrated from Oracle 12.2. to 19.15. The setup was running on a VMWare machine with limited resources. It all started when the application team started reporting slowness in their daily scheduled jobs and other ad-hoc operations, when checked at the database layer, it was all ‘enq: CR – block range reuse ckpt‘ wait event. Same can be seen in the below ORATOP output, where the BLOCKER ID 3817 is the CKPT or the checkpoint process.

The strange part was, the blocker was CKPT process and it was all for a particular SQL ID (an INSERT operation), see below in next oratop screen fix.

As far as this wait event (other classed), This comes just before you delete or truncate a table, where we need a level segment checkpoint. This is because it must maintain the consistency of the blocks there may be in the buffer memory and what’s on the disc. As per the definition, this wait event happens due to contention on blocks caused by multiple processes trying to update same blocks. This may seem issues from the application logic resulting into this concurrency bottleneck, but interestingly this was happening on a simple INSERT operation, not a DELETE or TRUNCATE.

Same can be seen in the AWR and ASH reports too! There are CBC (Cache Buffer Chains) latching and latch free events too along with ‘enq: CR – block range reuse ckpt‘, but the initial focus was to understand the event and its reasons. As far as ‘direct path read temp‘ it was happening due to couple of SELECT statement which we resolved after attaching a better plan with the SQLs.

Wait event source (from ASH)

SQL Text was quite simple, an INSERT INTO statement.

 INSERT INTO xx_xxx_xx(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) VALUES (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)

I’ve tried first with the CKPT process traces, just to see if there is anything interesting or useful caught within the process logs or traces. The trace was short and has only got some strange and obscure looking content, but which at least gave us an idea that the reuse block checkpoint of the RBR was failed due to enqueue, and its entry was failed due to abandoned parent. Still, that doesn’t helped us anything, we were unsure about the reason.

--> Info in CKPT trace file ---> XXXXX_ckpt_110528.trc.
RBR_CKPT: adding rbr ckpt failed for 65601 due to enqueue
RBR_CKPT: rbr mark the enqueue for 65601 as failed
RBR_CKPT: adding ckpt entry failed due to abandoned parent 0x1b57b4a88
RBR_CKPT: rbr mark the enqueue for 65601 as failed

There were few things logged in the alert.log, multiple deadlocks (ORA 0060), too many parse errors for one SELECT statement and some checkpoint incomplete errors (log switching was high >35)

-- deadlocks in alert log.
Errors in file /opt/u01/app/oracle/diag/rdbms/xxxxxx/xxxx/trace/xxxx_ora_73010.trc:
2022-04-19T13:41:22.489551+05:30 ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ 
ORA-00060: Deadlock detected. See Note 60.1 at My Oracle Support for Troubleshooting ORA-60 Errors. More info in file /opt/u01/app/oracle/diag/rdbms/pwfmfsm/PWFMFSM/trace/PWFMFSM_ora_73010.trc.

-- From systemstatedumps
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph: 
------------Blocker(s)----------- ------------Waiter(s)------------
Resource Name process session holds waits serial process session holds waits serial
TX-03AC0014-0000B33A-00000000-00000000 1562 691 X 1632 946 5441 X 56143
TX-01460020-0001A5C2-00000000-00000000 946 5441 X 56143 1562 691 X 1632



-- too many parse errors
2022-04-19T13:57:26.261176+05:30
WARNING: too many parse errors, count=2965 SQL hash=0x8ce1e2ff ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
PARSE ERROR: ospid=68036, error=942 for statement:
2022-04-19T13:57:26.272805+05:30
SELECT * FROM ( SELECT xxxxxxxxxxxxxxxxxxxxxx FROM xxxxxxxxxxxxxxxxxxxxxx AND xxxxxxxxxxxxxxxxxxxxxx ORDER BY xxxxxxxxxxxxxxxxxxxxxx ASC ) WHERE ROWNUM <= 750
Additional information: hd=0x546ba1be8 phd=0x61edab798 flg=0x20 cisid=113 sid=113 ciuid=113 uid=113 sqlid=ccmkzhy6f3srz
...Current username=xxxxxxxxxxxxxxxxxxxxxx
...Application: xxxxxxxxxxxxxxxxxxxxxx.exe Action:


-- Checkpoint incomplete
2022-04-19T15:03:16.964470+05:30 ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
Thread 1 cannot allocate new log, sequence 186456
Checkpoint not complete
Current log# 11 seq# 186455 mem# 0: +ONLINE_REDO/xxxxxxxxxxxxxxxxxxxxxx/ONLINELOG/group_11.256.1087565999
2022-04-19T15:03:17.785113+05:30

But the alert log was not sufficient to give us any concrete evidences or reasons for CKPT bloking user sessions. So, next we decided to generate the HANGANALYZE and SYSTEMSTATEDUMPs to understand what’s all happening under the hood, through its wait chains. We noticed few interesting things there

  • Wait chain 1 where a session was waiting on ‘buffer busy waits‘ while doing the “bitmapped file space header” which talks about the space management blocks (nothing to with bitmap indexes) and was related with one SELECT statement.
  • Wait chain 2 where a session was found waiting on ‘enq: CR – block range reuse ckpt‘ event and was blocked by CKPT process (3817) which was further waiting on ‘control file sequential read
  • Wait chain 4 where SID 1670, was found waiting on ‘buffer busy waits‘ while doing ‘bitmapped file space header’ operations.
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 46794
process id: 2014, oracle@monkeymachine.prod.fdt.swedish.se
session id: 15
session serial #: 19322
module name: 0 (xxxx.exe)
}
is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ bitmapped file space header
time in wait: 2 min 20 sec
timeout after: never
wait id: 10224
blocking: 0 sessions
current sql_id: 1835193535
current sql: SELECT * FROM ( SELECT xxxxx FROM task JOIN request ON xxxxx = xxxxx JOIN xxxxx ON xxxxx = xxxxx JOIN c_task_assignment_view ON xxxxx = xxxxx
.
.
.
and is blocked by
=> Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 23090
process id: 3231, oracle@monkeymachine.prod.fdt.swedish.se
session id: 261
session serial #: 39729
module name: 0 (xxxx.exe)
}
which is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄  bitmapped file space header
time in wait: 18.999227 sec ◄◄◄
timeout after: never
wait id: 47356
blocking: 25 sessions ◄◄◄
current sql_id: 0
current sql: <none>
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-read()+14<-snttread()+16<-nttfprd()+354<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+910<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
wait history:
* time between current wait and wait #1: 0.000084 sec
1. event: 'SQL*Net message from client'
time waited: 0.008136 sec
wait id: 47355 p1: 'driver id'=0x28444553
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.000043 sec
2. event: 'SQL*Net message to client'
time waited: 0.000002 sec
wait id: 47354 p1: 'driver id'=0x28444553
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000093 sec
3. event: 'SQL*Net message from client'
time waited: 2.281674 sec
wait id: 47353 p1: 'driver id'=0x28444553
p2: '#bytes'=0x1
}
and may or may not be blocked by another session.
.
.
.
Chain 2:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 10122
process id: 2850, oracle@monkeymachine.prod.fdt.swedish.se
session id: 97
session serial #: 47697
module name: 0 (xxxx.exe)
}
is waiting for 'enq: CR - block range reuse ckpt' with wait info: ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
{
p1: 'name|mode'=0x43520006
p2: '2'=0x10b22
p3: '0'=0x1
time in wait: 2.018004 sec
timeout after: never
wait id: 81594
blocking: 0 sessions
current sql_id: 1335044282
current sql: <none>
.
.
.
and is blocked by
=> Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 110528
process id: 24, oracle@monkeymachine.prod.fdt.swedish.se
session id: 3817 ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄  CKPT background process
session serial #: 46215
}
which is waiting for 'control file sequential read' with wait info: ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
{
p1: 'file#'=0x0
p2: 'block#'=0x11e
p3: 'blocks'=0x1
px1: 'disk number'=0x4
px2: 'au'=0x34
px3: 'offset'=0x98000
time in wait: 0.273981 sec
timeout after: never
wait id: 17482450
blocking: 45 sessions ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
current sql_id: 0
current sql: <none>
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2199<-kslwaitctx()+205<-ksarcv()+320<-ksbabs()+602<-ksbrdp()+1167<-opirip()+541<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
wait history:
* time between current wait and wait #1: 0.000012 sec
1. event: 'control file sequential read'
time waited: 0.005831 sec
wait id: 17482449 p1: 'file#'=0x0
p2: 'block#'=0x11
p3: 'blocks'=0x1
* time between wait #1 and #2: 0.000012 sec
2. event: 'control file sequential read'
time waited: 0.011667 sec
wait id: 17482448 p1: 'file#'=0x0
p2: 'block#'=0xf
p3: 'blocks'=0x1
* time between wait #2 and #3: 0.000017 sec
3. event: 'control file sequential read'
time waited: 0.009160 sec
wait id: 17482447 p1: 'file#'=0x0
p2: 'block#'=0x1
p3: 'blocks'=0x1
}
.
.
.
Chain 4:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 46479
process id: 1036, oracle@monkeymachine.prod.fdt.swedish.se
session id: 1670
session serial #: 6238
module name: 0 (xxxx.exe)
}
is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄  bitmapped file space header
time in wait: 18.954206 sec ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
timeout after: never
wait id: 20681
blocking: 0 sessions
current sql_id: 343919375
current sql: SELECT * FROM ( SELECT xxxxx FROM task JOIN request ON xxxxx = xxxxx JOIN xxxxx ON xxxxx = xxxxx JOIN c_task_assignment_view ON xxxxx = xxxxx
.
.
.
and is blocked by
=> Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 44958
process id: 523, oracle@monkeymachine.prod.fdt.swedish.se
session id: 4681
session serial #: 41996
module name: 0 (xxxx.exemonkeymachine.prod.fdt.swedish.se (TNS)
}
which is waiting for 'buffer busy waits' with wait info: ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ bitmapped file space header
time in wait: 18.959429 sec ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
timeout after: never
wait id: 153995
blocking: 101 sessions ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
current sql_id: 343919375
current sql:  INSERT INTO xx_xxx_xx(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) VALUES (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2199<-kslwaitctx()+205<-ksqcmi()+21656<-ksqcnv()+809<-ksqcov()+95<-kcbrbr_int()+2476<-kcbrbr()+47<-ktslagsp()+2672<-ktslagsp_main()+945<-kdliAllocCache()+37452<-kdliAllocBlocks()+1342<-kdliAllocChunks()+471<-kdliWriteInit()+1249<-kdliWriteV()+967<-kdli_fwritev()+904<-kdlxNXWrite()+577<-kdlx_write()+754<-kdlxdup_write1()+726<-kdlwWriteCallbackOld_pga()+1982<-kdlw_write()+1321<-kdld_write()+410<-kdl
wait history:
* time between current wait and wait #1: 0.756195 sec
1. event: 'direct path write temp'
time waited: 0.406543 sec
wait id: 153994 p1: 'file number'=0xc9
p2: 'first dba'=0xb28fc
p3: 'block cnt'=0x4
* time between wait #1 and #2: 0.000001 sec
2. event: 'ASM IO for non-blocking poll'
time waited: 0.000000 sec
wait id: 153993 p1: 'count'=0x4
p2: 'where'=0x2
p3: 'timeout'=0x0
* time between wait #2 and #3: 0.000002 sec
3. event: 'ASM IO for non-blocking poll'
time waited: 0.000001 sec
wait id: 153992 p1: 'count'=0x4
p2: 'where'=0x2
p3: 'timeout'=0x0
}
and may or may not be blocked by another session.
.
.

Though, we wanted to try couple of hidden parameters to enable fast object level truncation and checkpointing, as they had helped us a lot in the past in similar scenarios, but had to involve Oracle support and after carefully analyzing the situation, they too agreed and want us to try them as they started suspecting it as an aftermath of a known bug of 19c.

[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Apr 24 00:24:47 2022
Version 19.15.0.0.0
Copyright (c) 1982, 2022, Oracle.  All rights reserved.

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0

SQL>
SQL> @hidden
Enter value for param: db_fast_obj_truncate
old   5:    and a.ksppinm like '%&param%'
new   5:    and a.ksppinm like '%db_fast_obj_truncate%'

Parameter                                     Session Value             Instance Value            descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_db_fast_obj_truncate                         TRUE                      TRUE                      enable fast object truncate
_db_fast_obj_ckpt                             TRUE                      TRUE                      enable fast object checkpoint


SQL> ALTER SYSTEM SET "_db_fast_obj_truncate" = false sid = '*';

System altered.

SQL>
SQL> ALTER SYSTEM SET "_db_fast_obj_ckpt" = false sid = '*';

System altered.

SQL>

And soon after setting them, we saw a drastic drop in the waits and system seemed better, much better. But it all happened during an off-peak hour, so there wasn’t much of a workload to see anything odd.

And as we suspected, the issue repeated itself, and next day during peak business hours we started seeing the same issue, same set of events back into existence. This time the ‘latch: cache buffers chains‘ was quite high and prominent which was not that much earlier.

Initially we tried to fix some of the expensive statements on ‘logical IOs’ or memory reads, but that hardly helped. The issue persisted even after setting a higher value for db_block_hash_latches and decreasing cursor_db_buffers_pinned. AWR continues to show ‘latch: cache buffers chains’ in the top ten, foreground timed events, and ‘latch free‘ in first place in the background timed events.
Oracle support confirmed the behavior was due to published bug 33025005 where excessive Latch CBC waits were seen after upgrading from 12c to 19c, and suggested to apply patch 33025005 and then to set hidden parameter “_post_multiple_shared_waiters” to value FALSE (in MEMORY only to test) which disables multiple shared waiters in the database.

-- After applying patch 33025005

SQL> ALTER SYSTEM SET "_post_multiple_shared_waiters" = FALSE SCOPE = MEMORY;

System altered.

And even after applying the patch and setting the recommended undocumented parameter, the issue persisted and we were totally clueless.

And as a last resort, we tried to flush the buffer cache of the database, and bingo that crude method of purging the cache helped to drastically to reduce the load on ‘CBC Latching‘ and for ‘enq: cr block range reuse ckpt‘, and the system ran fine soon after the flush of the DB Buffer cache.

So, nothing worked for us, we changed multiple checkpointing and shared writers related parameters in the database, applied a bug fix patch (33025005), but of no use. Finally, the flush of buffer cache worked for us! Oracle support agreed that this was happening due to a new/unpublished bug (33125873 or 31844316) which is not yet fixed in 19.15 and will be included in 23.1, and they are in status 11 that means still being worked by Development so there is no fix for it yet.

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , | 5 Comments »

How to prioritize an Oracle Database background process ?

Posted by FatDBA on April 18, 2022

Recently while looking into a system (was running on 19.3.0.0.0 standalone) where ‘log file sync’ was bugging the database, and after we tried all other possible solutions, we thought to increase the priroty of the LGWR background process to see if that helps.

Increasing the LGWR priority is putting the LGWR process in the Round-Robin (SCHED_RR) class. You can increase process’s priority both using OS (renice, nice commands) or Database methods, but this post is about setting the priority using ‘_high_priority_process’ an undocumented/hidden parameter that prioritizes your database managed processes.

I am using Oracle 19.3 for the test where the LGWR is not by default comes with any priority in the DB, starting from 21.3.0.0.0 LGWR process is part of _high_priority_processes group along with VKTM & LMS* processes.
Note: This being a hidden/undocumented parameter I advise to consult with Oracle support before going and changing the parameter value. Try other possible ways to reduce log file sync, before jumping into this crude method of prioritizing LGWR over others.

[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Apr 10 03:36:06 2022
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> @hidden

Enter value for param: high_prio
old   5:    and a.ksppinm like '%&param%'
new   5:    and a.ksppinm like '%priority_processes%'

Parameter                                     Session Value             Instance Value            descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_highest_priority_processes                   VKTM                      VKTM                      Highest Priority Process Name Mask
_high_priority_processes                      LMS*|VKTM                 LMS*|VKTM                 High Priority Process Name Mask 

And by default in Oracle version 19.3.0.0 the parameter is set to prioritize VKTM (Virtual keeper of time) and LMS (Lock Manager, a RAC process). Let me check VKTM’s current priority class, and it is set to RR class (SCHED_RR scheduling class) for the process as its defined via _high_priority_processes parameter.

[oracle@oracleontario ~]$ ps -eo pid,class,pri,nice,time,args |grep vktm
 23871 RR   41   - 00:00:53 ora_vktm_dixitdb

About the LGWR process, and it is set to TS (SCHED_OTHER) class and it has no priority class attached to it.

[oracle@oracleontario ~]$ ps -eo pid,class,pri,nice,time,args |grep ora_lg*
 23990 TS   19   0 00:00:07 ora_lgwr_dixitdb

Let’s change the priority and reboot the database to persistent the change!

SQL> alter system set "_high_priority_processes"='LMS*|VKTM|LGWR' scope=spfile;

System altered.

SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup
ORACLE instance started.

Total System Global Area 1593831936 bytes
Fixed Size                  8897024 bytes
Variable Size            1107296256 bytes
Database Buffers          469762048 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.

SQL> @hidden
Enter value for param: high_priority_processes
old   5:    and a.ksppinm like '%&param%'
new   5:    and a.ksppinm like '%high_priority_processes%'

Parameter                                     Session Value             Instance Value            descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_high_priority_processes                      LMS*|VKTM|LGWR            LMS*|VKTM|LGWR            High Priority Process Name Mask

At the same time I can see the same was logged into the Alert log file.

2022-04-10T03:54:31.488732-04:00
LGWR started with pid=8, OS id=26058 at elevated (RT) priority

So, we have reniced the priority of LGWR on the system, I mean the higher value of priority actually makes the process lower priority; it means the process demands fewer system resources (and therefore is a “nicer” process). Now lets check the scheduling class of the process at the OS, it should be now changed to RR from TS.

SQL> !ps -eo pid,class,pri,nice,time,args |grep ora_lm*
 26058 RR   41   - 00:00:00 ora_lgwr_dixitdb

Let me check at the OS Level what has changed now.

[oracle@oracleontario 26058]$ pwd
/proc/26058
[oracle@oracleontario 26058]$ more sched
ora_lgwr_dixitd (26058, #threads: 1)
-------------------------------------------------------------------
se.exec_start                                :      26820431.663015
se.vruntime                                  :            -2.963799
se.sum_exec_runtime                          :          1858.211503
se.nr_migrations                             :                    0
nr_switches                                  :                 4038
nr_voluntary_switches                        :                 4023
nr_involuntary_switches                      :                   15
se.load.weight                               :                 1024
policy                                       :                    2      -----> Policy, the 0-99 are real-time priorities
prio                                         :                   98
clock-delta                                  :                   59
mm->numa_scan_seq                            :                    0
numa_migrations, 0
numa_faults_memory, 0, 0, 1, 0, -1
numa_faults_memory, 1, 0, 0, 0, -1


-- output from top utility
top - 05:09:14 up  7:32,  3 users,  load average: 0.14, 0.10, 0.11
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
GiB Mem :      4.3 total,      0.0 free,      1.1 used,      3.1 buff/cache
GiB Swap:      3.9 total,      3.9 free,      0.0 used.      2.1 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 26027 oracle    -2   0 2016104  20096  17064 S  1.7  0.4   1:28.22 ora_vktm_dixitdb      ---> Look at the PR (priority) column with value -2 (higher pri)
 26058 oracle    -2   0 2017136  30360  26768 S  0.0  0.7   0:01.86 ora_lgwr_dixitdb      ---> Look at the PR (priority) column with value -2 (higher pri)

So, when nothing was working for us, this workaround helped and we were able to reduce LFS waits by more than 80% …

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , | 2 Comments »

Adaptive Log File Sync is not always good for you …

Posted by FatDBA on April 11, 2022

Last week I was part of one system/database stability short term assignment where customer running a critical Telco application on 2-Node RAC Cluster (RHEL) on 11.2.0.3.1 reported slowness in few of the critical data processing modules, which in turn had slowed down their entire system. Todays post is about buggy behavior of an adaptive feature which had caused a huge mess in the system in the form of ‘Log File Sync‘ waits.

After few of the initial calls with the customer, we come to know that the database has a history of high ‘log file sync‘ waits, but they simply come and go, and it seemed that the events were never handled correctly or being analyzed! And interestingly, restarting the cluster resolves the issue for few hours, sometimes for few days. LFS event was quite prominently seen in this database and on an average found consuming > 38% of the total DB Time% available with very high average wait times (248 ms).

Below are few of the database statistics captured from the problem time.

About the event, When a user session commits, all redo records generated by that session’s transaction need to be flushed from memory to the redo logfile to insure changes to the database made by that transaction become permanent. The time between the user session posting the LGWR and the LGWR posting the user after the write has completed is the wait time for ‘log file sync’ that the user session will show. This event is known as the time lost as a result of the LGWR process waiting while users initiate a Transaction Commit or Rollback.

Next in order to get more idea about the event, I ran the lfsdiag.sql (Oracle provided script to catch diag info on it – Doc ID 1064487.1)). The script will look at the important parameters involved in log file sync waits, wait histogram data, and at the worst average LFS times in the active session history data and AWR data and dump information to help determine why those times were the highest.

Below are the ASH LFS background process waits caught during its worst minute and it had some really bad stats captured for the worst minute for the same time when customer had worst application performance.

MINUTE          INST_ID EVENT                            TOTAL_WAIT_TIME      WAITS   AVG_TIME_WAITED
------------ ---------- ------------------------------ ----------------- ---------- -----------------
Apr06_1742            2 log file sync                        2819744.300       1973          1429.166
Apr06_1745            2 log file sync                        1219765.027       1200          1016.471
Apr06_1747            2 log file sync                        2045077.213       1745          1171.964
Apr06_1748            2 log file sync                        1417639.236       1170          1211.657
Apr06_1749            2 log file sync                        2202804.958       2190          1005.847
Apr06_1753            2 log file sync                        1967863.159       1871          1051.771
Apr06_1756            2 log file sync                        1096747.638        336          3264.130
Apr06_1843            2 log file sync                        1710602.016        531          3221.473
Apr06_1846            2 log file sync                         814607.205        247          3298.005

Another intersting section was the Histogram data for LFS and other related events. Here was can see the LFS waits at “wait_time_milli” and specially the high wait times to correlate them with other wait events. From below stats its evident that Node 2 of this RAC cluster was severely impacted with the LFS waits., with very high wait counts and wait times (ms), and lot of ‘gcs log flush sync’ along with LFS events that pushes LGWR process to write data to the disk.

The stats are very bad, with highest wait time of 1048576 ms on Node 2 and average of 99864 ms (1.6 mins), both ‘log file parallel write’ and ‘gcs log flush sync’ were quite high too.

   INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
         1 log file sync                                          1   76126281
         1 log file sync                                          2   31805429
         1 log file sync                                          4   18893320
         1 log file sync                                          8   29604071
         1 log file sync                                         16   21903517
         1 log file sync                                         32    7252292
         1 log file sync                                         64    3692615
         1 log file sync                                        128    1615552
         1 log file sync                                        256     262632
         1 log file sync                                        512      25341
         1 log file sync                                       1024       5431
         1 log file sync                                       2048        901
         1 log file sync                                       4096        188
         1 log file sync                                       8192        102
         1 log file sync                                      16384         33
         2 log file sync                                          1   56003954
         2 log file sync                                          2   19903760
         2 log file sync                                          4   12749437
         2 log file sync                                          8   17572031
         2 log file sync                                         16   12266342
         2 log file sync                                         32    3209853
         2 log file sync                                         64    1124852
         2 log file sync                                        128    2912977
         2 log file sync                                        256    5516739
         2 log file sync                                        512    3226632
         2 log file sync                                       1024     783055
         2 log file sync                                       2048     119096
         2 log file sync                                       4096      24974
         2 log file sync                                       8192       8841
         2 log file sync                                      16384       3782
         2 log file sync                                      32768       1640
         2 log file sync                                      65536        578
         2 log file sync                                     131072        259
         2 log file sync                                     262144         78
         2 log file sync                                     524288         22
         2 log file sync                                    1048576          4


         1 log file parallel write                                1  198650048
         1 log file parallel write                                2   16777612
         1 log file parallel write                                4    9934905
         1 log file parallel write                                8    4511957
         1 log file parallel write                               16     808734
         1 log file parallel write                               32     107699
         1 log file parallel write                               64      17255
         1 log file parallel write                              128       1837
         1 log file parallel write                              256        139
         1 log file parallel write                              512         24
         1 log file parallel write                             1024          2
         2 log file parallel write                                1  150805280
         2 log file parallel write                                2    9299589
         2 log file parallel write                                4    4816280
         2 log file parallel write                                8    3145522
         2 log file parallel write                               16     793765
         2 log file parallel write                               32     116312
         2 log file parallel write                               64      12469
         2 log file parallel write                              128       1298
         2 log file parallel write                              256        118
         2 log file parallel write                              512         14
         2 log file parallel write                             1024          1


         1 gcs log flush sync                                     1   85952301
         1 gcs log flush sync                                     2    8052174
         1 gcs log flush sync                                     4    4224838
         1 gcs log flush sync                                     8    4017059
         1 gcs log flush sync                                    16    1226469
         1 gcs log flush sync                                    32      88359
         1 gcs log flush sync                                    64       4263
         1 gcs log flush sync                                   128         90
         1 gcs log flush sync                                   256          1
         2 gcs log flush sync                                     1   60282021
         2 gcs log flush sync                                     2    5102517
         2 gcs log flush sync                                     4    1951187
         2 gcs log flush sync                                     8    2470197
         2 gcs log flush sync                                    16    1088285
         2 gcs log flush sync                                    32     214273
         2 gcs log flush sync                                    64      26585
         2 gcs log flush sync                                   128        147

As a possible solution, we tried few of the quick ones i.e. batching redo (commit_logging = batch) for the log writer (I know it has its own risks) to reduce LFS, but that didn’t worked either.

Next thing I’d generated the system state dump to understand the system and situation bettern, and the LFS events were caught in the system state dump as well with wait chains pointing to ‘rdbms ipc message'<=’log file sync’.

Process traces are always considered a wealth of diagnostic information, So I’d checked the LGWR process traces and thats where I saw some strangeness with frequent entries related with switching between post/wait and polling method which is an adaptive way to control switching between post/wait (older way) and polling (new method) for log file syncs. This gave me little hint about the possible reason on why so many LFS waits.

Talking about the adaptive log file sync, there are 2 methods by which LGWR and foreground processes can communicate in order to acknowledge that a commit has completed:
Post/wait: traditional method available in previous Oracle releases LGWR explicitly posts all processes waiting for the commit to complete. The advantage of the post/wait method is that sessions should find out almost immediately when the redo has been flushed to disk.
Polling: Foreground processes sleep and poll to see if the commit is complete, this was introduced to free high CPU usage by the LGWR.

This behavior is controlled by the parameter “_use_adaptive_log_file_sync” and was introduced in 11gR2 and controls whether adaptive switching between post/wait and polling is enabled. In 11.2.0.1 and 11.2.0.2 the default value for the parameter is false. From 11.2.0.3, the default value has been changed to true.

-- LGWR traces 
WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128
WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=1048576 AIO-NR=169402

*** 2022-04-07 06:03:31.916
Warning: log write broadcast wait time 2612477ms (SCN 0xad1.f8c170fe)

*** 2022-04-07 06:03:31.916
Warning: log write broadcast wait time 2598008ms (SCN 0xad1.f8c21251)
kcrfw_update_adaptive_sync_mode: post->poll long#=33 sync#=202 sync=5963 poll=8730 rw=383 rw+=383 ack=3982 min_sleep=1135

*** 2022-04-07 07:46:20.018
Log file sync switching to polling --------------------->>>>>>> It shows current method is polling 
Current scheduling delay is 1 usec 
Current approximate redo synch write rate is 67 per sec

*** 2022-04-07 07:47:13.877
kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=63 switch_sync_count_delta=202

*** 2022-04-07 07:47:13.877
Log file sync switching to post/wait ------------------>>>>>>>> It shows current method is post/wait
Current approximate redo synch write rate is 21 per sec


-- Below stats shows that the POLLING is happening on the database for LGWR wrtes 
SQL> select name,value from v$sysstat where name in ('redo synch poll writes','redo synch polls');

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo synch poll writes                                             10500129
redo synch polls                                                   10773618

As there were no other symptoms of issues with I/O or in other areas, the problem could be with excessive switching between post/wait and polling wait methods.

After consulting with Oracle support, we found the issue was happening due to BUG (25178179) and the issue gets severe when only while using log file sync “polling mode“. To prevent the problem from happening, they suggested us to turn off the _use_adaptive_log_file_sync by setting it to FALSE in either in the spfile and restarting the database or dynamically in memory.
This will force the log file sync waits to use (the traditional) “post/wait mode” rather than the automatically switching between “post/wait mode” and “polling mode” based on performance statistics. Changing the said parameter to FALSE disabled adaptive LFS in the database and that resolved the issue and system performance was restored.

-- Set the parameter _use_adaptive_log_file_sync = false and restart the database:
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync" = FALSE;

-- In cases where a restart is not feasible, then you can set in memory and also in the SP file for when a restart does occur:
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync"=FALSE; -- to set in memory and spfile as by default scope=both
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync"=FALSE scope=sfile sid='*'; -- to set on spfile, so parameter is used at next restart.

Though In the vast majority of cases adaptive log file sync improves the overall performance of log file synchronization, but there are few bugs associated with this feature i.e. 13707904, 13074706 and 25178179.

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , | 1 Comment »

Demystifying OUTLINE DATA in an Execution Plan …

Posted by FatDBA on March 7, 2022

‘Outline Data’ section displays the list of hints that would be needed to replicate the execution plan, even if the statistics change., but in case of a complex execution plans it comes with lot of strange looking and obscure terms used. As there isn’t any published explanation of stored outline hints, so today’s post is to decipher few of the terms that you see in an outline data of an execution plan.

If you want to read more about outlines, what are they, their purpose, please read my previous post on the same subject at –> https://fatdba.com/2017/11/30/how-to-fix-sql-plan-issues-using-outline-data/

Okay, coming back to the post, let me quickly generate the outline data for one of the test SQL and will try to explain about each of the hints, query blocks, aliases and other representations used.

SQL>
SQL> explain plan for 
SELECT d.department_name,e.employee_name
FROM departments d
LEFT OUTER JOIN employees e ON d.department_id = e.department_id
WHERE d.department_id >= 30
ORDER BY d.department_name, e.employee_name;  

Explained.

SQL> select * from table(dbms_xplan.display('PLAN_TABLE',NULL,'+alias +outline'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 3871261979
-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |     4 |   168 |     7  (15)| 00:00:01 |
|   1 |  SORT ORDER BY      |             |     4 |   168 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN OUTER   |             |     4 |   168 |     6   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| DEPARTMENTS |     2 |    44 |     3   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| EMPLOYEES   |     6 |   120 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$2BFA4EE4
   3 - SEL$2BFA4EE4 / D@SEL$1
   4 - SEL$2BFA4EE4 / E@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      USE_HASH(@"SEL$2BFA4EE4" "E"@"SEL$1")
      LEADING(@"SEL$2BFA4EE4" "D"@"SEL$1" "E"@"SEL$1")
      FULL(@"SEL$2BFA4EE4" "E"@"SEL$1")
      FULL(@"SEL$2BFA4EE4" "D"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$2")
      ANSI_REARCH(@"SEL$1")
      OUTLINE(@"SEL$8812AA4E")
      ANSI_REARCH(@"SEL$2")
      OUTLINE(@"SEL$948754D7")
      MERGE(@"SEL$8812AA4E" >"SEL$948754D7")
      OUTLINE_LEAF(@"SEL$2BFA4EE4")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("D"."DEPARTMENT_ID"="E"."DEPARTMENT_ID"(+))
   3 - filter("D"."DEPARTMENT_ID">=30)
   4 - filter("E"."DEPARTMENT_ID"(+)>=30)

As per the above outline data captured for the execution plan, the very first entry is USE_HASH(@”SEL$2BFA4EE4″ “E”@”SEL$1”) This represents the use of HASH join methods used in the query. Other join method outline options you might see are USE_NL and USE_MERGE. Here in our example we’ve used the LEFT OUTER JOIN which will return all valid rows from the table on the left side of the JOIN keyword, that’s table EMPLOYEE (alias ‘E’) along with the values from the table on the right side.

i.e LEFT OUTER JOIN employees e ON d.department_id = e.department_id

LEADING(@”SEL$2BFA4EE4″ “D”@”SEL$1” “E”@”SEL$1”) The LEADING hints specifies the exact join order for the SQL that is followed by the optimizer in the plan. It displays the join order as an ordered list of table aliases and query block names. The aliases appears in the ordering in which we access those tables in the query.

The first part of the hint @”SEL$2BFA4EE4″ represents the main SELECT block and if you closely see the full hint definition you will see “D”@”SEL$1”, where “SEL$1” is the query block name for table alias ‘D’ for Department table, which is followed by “E”@”SEL$1”, that represents query block name “SEL$1” on second table joined ‘E’ for Employee table.

FULL(@”SEL$2BFA4EE4″ “E”@”SEL$1”) & FULL(@”SEL$2BFA4EE4″ “D”@”SEL$1”) are what you see next and these two hints for FULL TABLE SCANS on table alias “E”, that is EMPLOYEE table, followed by FTS on table alias “D”, that is DEPARTMENT table. Same you can see in the execution plan too.

OUTLINE(@”SEL$1″) & OUTLINE(@”SEL$2″) These two OUTLINE hints correspond to initial and the intermediate query blocks.

ANSI_REARCH(@”SEL$1″) This is the hint that instructs the optimizer to re-architecture of ANSI left, right, and full outer joins. In our case this was to re-arch the left outer join.

MERGE(@”SEL$8812AA4E” >”SEL$948754D7″) is the MERGE query block hint. SEL$8812AA4E and SEL$948754D7 are the transformed query blocks.

OUTLINE_LEAF(@”SEL$2BFA4EE4″) This hint builds an outline leaf for the specified query block. In our example it represents that the query block SEL$2BFA4EE4 is a “final” query block that has actually been subject to independent optimization. Outline leaf cannot be transformed. You can see multiple outline_leaf hints for the query blocks.

IGNORE_OPTIM_EMBEDDED_HINTS is a special hint instructs the CBO to ignore most of all the other supplied hints.

ALL_ROWS This hint instructs to optimize the query/statement block for best throughput with lowest resource utilization.

DB_VERSION(‘19.1.0’) & OPTIMIZER_FEATURES_ENABLE(‘19.1.0’) these two hints allows for the CBO to process the SQL on the said version of the database, 19.1.0 in our case.

Few others that you might see in OUTLINE DATA sections and are easy to decipher are INDEX_RS_ASC (Index Range Scan in ascending order) and it happens when the INDEX RANGE SCAN is used as an access path/method in SQL execution plan. Few others that you can see in case of Nested Loop Joins are NLJ_BATCHING which happens when Oracle batches multiple physical I/O requests and process them using a vector I/O (array) instead of processing them one at a time, and batching improves performance because it uses asynchronous reads..
Few others that you can see in case of NL’s are USE_NL, NLJ_PREFETCH etc.

INDEX_FFS in case of Index Fast Full Scans and the list is long …

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , | Leave a Comment »

Script to display latency wait times using DBA_HIST_EVENT_HISTOGRAM using Heat Maps

Posted by FatDBA on March 2, 2022

Today’s post if about a brilliant tool/script that I frequently use to get the outputs/graphs for customer meetings, presentations and for quick analysis of any performance problem …

This is one of the awesome Oracle provided tool/script that generates a heat map of latency times for a given wait event. The script takes the output from DBA_HIST_EVENT_HISTOGRAM and produces a heat map and a JPG version of the heat map is also produced representing the wait time latency for the given wait event over a specific time frame identified by range of snapshots.

Source : Script to Display Latency Wait Time From DBA_HIST_EVENT_HISTOGRAM Using Heat Map (Doc ID 1931492.1)

Note: This script queries ASH views, specially DBA_HIST_EVENT_HISTOGRAM which requires license as its part of the Diagnostics Pack on Oracle EE

The scripts takes the following input parameters:

  • Snapshot id range (first snapshot id and last snapshot id)
  • Database id
  • Instance number
  • Exact name of wait event
perl ./lhm.pl --begin [first snapid] --end [last snapid] --dbid [database id] --instance [instance num] --wait_event "wait event"

perl ./lhm.pl  --dbid 1266075800 --begin 19907 --end 19991 --event "log file parallel write" --instance 4


-- Example Output
database id:     1266075800
wait event:      log file parallel write
Matrix:          [26x85]
snap range:      [19907:19991]
instance id:     4
jpg file   :     latency_log_file_parallel_write.jpg

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , , | 1 Comment »

A stubborn after upgrade issue, and how I handled it!

Posted by FatDBA on February 16, 2022

Writing this one from the misty mountains .. 🙂

Recently I had to part-time support a system where customer reported slowness issues with few of the critical SQLs which uses few IN-Memory tables and were impacted exactly since they moved from Oracle 12.1 to 12.2. We tried few things but none of them worked, and with that I started suspecting if the change in CBO version from 12.1 to 12.2 the reason behind those problems with SQLs.

So, today’s post will give you some idea about such scenarios when you’re not sure about the things that broke the performance after the upgrade, and you have to do the try and error approach to find out the cause. So, I decided to give it a try with one of those SQLs to set the OFE (Optimizer Feature Enable) to earlier DB version 12.1 and verify query performance. So I started my test with the optimizer_features_enable parameter to set it to the older version.

SQL> alter session set optimizer_features_enable='12.1.0.2';

And the query ran fine, At this stage we had some sort of understanding of the problem as it was clear that some changes in 12.2 that has trigger this problem. But now I have to identify among all those hidden/underscore parameters and fix control’s that one final parameter which is influencing the optimizer for that odd behavior. So, I started something like this …
We extract all the fixes and underscore parameters introduced in oracle 12.2 and I started with the hidden Parameters followed by Fix Control settings in 12.2.0.1 Optimizer environment to come to 12.1.0.2 OFE level

-- Hidden parameters added into Oracle version 12.2 
-- Setting them back to Oracle version 12.1 
alter session set "_optimizer_undo_cost_change"="12.1.0.2";
alter session set "_optimizer_cbqt_or_expansion"=off;
alter session set "_optimizer_ads_use_partial_results"=false;
alter session set "_query_rewrite_use_on_query_computation"=false;
alter session set "_px_scalable_invdist_mcol"=false;
alter session set "_optimizer_eliminate_subquery"=false;
alter session set "_sqlexec_hash_based_distagg_ssf_enabled"=false;
alter session set "_optimizer_union_all_gsets"=false;
alter session set "_optimizer_enhanced_join_elimination"=false;
alter session set "_optimizer_multicol_join_elimination"=false;
alter session set "_key_vector_create_pushdown_threshold"=0;
alter session set "_optimizer_enable_plsql_stats"=false;
alter session set "_recursive_with_parallel"=false;
alter session set "_recursive_with_branch_iterations"=1;
alter session set "_px_dist_agg_partial_rollup_pushdown"=off;
alter session set "_optimizer_key_vector_pruning_enabled"=false;
alter session set "_pwise_distinct_enabled"=false;
alter session set "_vector_encoding_mode"=off;
alter session set "_ds_xt_split_count"=0;
alter session set "_ds_sampling_method"=NO_QUALITY_METRIC;
alter session set "_optimizer_ads_use_spd_cache"=false;
alter session set "_optimizer_use_table_scanrate"=OFF;
alter session set "_optimizer_use_xt_rowid"=false;
alter session set "_xt_sampling_scan_granules"=off;
alter session set "_optimizer_band_join_aware"=false;
alter session set "_optimizer_vector_base_dim_fact_factor"=0;
alter session set "_ds_enable_view_sampling"=false;
alter session set "_optimizer_inmemory_use_stored_stats"=NEVER;
alter session set "_mv_access_compute_fresh_data"=off;
alter session set "_bloom_filter_ratio"=30;
alter session set "_optimizer_control_shard_qry_processing"=65535;
alter session set "_optimizer_interleave_or_expansion"=false;

Placed all above underscore parameters with the query and ran it again and observed the benefits (make sure to flush the previous plans from the shared pool). In my case this actually worked, but question was which underscore parameter helped. Do same with all fixes (controls).

-- Fix Controls added in Oracle version 12.2 
-- Setting them back to Oracle version 12.1 
alter session set "_fix_control"="16515789:0";
alter session set "_fix_control"="17491018:0";
alter session set "_fix_control"="17986549:0";
alter session set "_fix_control"="18115594:0";
alter session set "_fix_control"="18182018:0";
alter session set "_fix_control"="18302923:0";
alter session set "_fix_control"="18377553:0";
alter session set "_fix_control"="5677419:0";
alter session set "_fix_control"="18134680:0";
alter session set "_fix_control"="18636079:0";
alter session set "_fix_control"="18415557:0";
alter session set "_fix_control"="18385778:0";
alter session set "_fix_control"="18308329:0";
alter session set "_fix_control"="17973658:0";
alter session set "_fix_control"="18558952:0";
alter session set "_fix_control"="18874242:0";
alter session set "_fix_control"="18765574:0";
alter session set "_fix_control"="18952882:0";
alter session set "_fix_control"="18924221:0";
alter session set "_fix_control"="18422714:0";
alter session set "_fix_control"="18798414:0";
alter session set "_fix_control"="18969167:0";
alter session set "_fix_control"="19055664:0";
alter session set "_fix_control"="18898582:0";
alter session set "_fix_control"="18960760:0";
alter session set "_fix_control"="19070454:0";
alter session set "_fix_control"="19230097:0";
alter session set "_fix_control"="19063497:0";
alter session set "_fix_control"="19046459:0";
alter session set "_fix_control"="19269482:0";
alter session set "_fix_control"="18876528:0";
alter session set "_fix_control"="19227996:0";
alter session set "_fix_control"="18864613:0";
alter session set "_fix_control"="19239478:0";
alter session set "_fix_control"="19451895:0";
alter session set "_fix_control"="18907390:0";
alter session set "_fix_control"="19025959:0";
alter session set "_fix_control"="16774698:0";
alter session set "_fix_control"="19475484:0";
alter session set "_fix_control"="19287919:0";
alter session set "_fix_control"="19386746:0";
alter session set "_fix_control"="19774486:0";
alter session set "_fix_control"="18671960:0";
alter session set "_fix_control"="19484911:0";
alter session set "_fix_control"="19731940:0";
alter session set "_fix_control"="19604408:0";
alter session set "_fix_control"="14402409:0";
alter session set "_fix_control"="16486095:0";
alter session set "_fix_control"="19563657:0";
alter session set "_fix_control"="19632232:0";
alter session set "_fix_control"="19889960:0";
alter session set "_fix_control"="17208933:0";
alter session set "_fix_control"="19710102:0";
alter session set "_fix_control"="18697515:0";
alter session set "_fix_control"="18318631:0";
alter session set "_fix_control"="20078639:0";
alter session set "_fix_control"="19503668:0";
alter session set "_fix_control"="20124288:0";
alter session set "_fix_control"="19847091:0";
alter session set "_fix_control"="12618642:0";
alter session set "_fix_control"="19779920:0";
alter session set "_fix_control"="20186282:0";
alter session set "_fix_control"="20186295:0";
alter session set "_fix_control"="20265690:0";
alter session set "_fix_control"="16047938:0";
alter session set "_fix_control"="19507904:0";
alter session set "_fix_control"="18915345:0";
alter session set "_fix_control"="20329321:0";
alter session set "_fix_control"="20225191:0";
alter session set "_fix_control"="18776755:0";
alter session set "_fix_control"="19882842:0";
alter session set "_fix_control"="20010996:0";
alter session set "_fix_control"="20379571:0";
alter session set "_fix_control"="20129763:0";
alter session set "_fix_control"="19899588:0";
alter session set "_fix_control"="10098852:0";
alter session set "_fix_control"="19663421:0";
alter session set "_fix_control"="20465582:0";
alter session set "_fix_control"="16732417:0";
alter session set "_fix_control"="20732410:0";
alter session set "_fix_control"="20289688:0";
alter session set "_fix_control"="20543684:0";
alter session set "_fix_control"="20506136:0";
alter session set "_fix_control"="20830312:0";
alter session set "_fix_control"="19768896:0";
alter session set "_fix_control"="19814541:0";
alter session set "_fix_control"="17443547:0";
alter session set "_fix_control"="19123152:0";
alter session set "_fix_control"="19899833:0";
alter session set "_fix_control"="20754928:0";
alter session set "_fix_control"="20808265:0";
alter session set "_fix_control"="20808192:0";
alter session set "_fix_control"="20340595:0";
alter session set "_fix_control"="18949550:0";
alter session set "_fix_control"="14775297:0";
alter session set "_fix_control"="17497847:0";
alter session set "_fix_control"="20232513:0";
alter session set "_fix_control"="20587527:0";
alter session set "_fix_control"="19186783:0";
alter session set "_fix_control"="19653920:0";
alter session set "_fix_control"="21211786:0";
alter session set "_fix_control"="21057343:0";
alter session set "_fix_control"="21503478:0";
alter session set "_fix_control"="21476032:0";
alter session set "_fix_control"="20859246:0";
alter session set "_fix_control"="21639419:0";
alter session set "_fix_control"="20951803:0";
alter session set "_fix_control"="21683982:0";
alter session set "_fix_control"="20216500:0";
alter session set "_fix_control"="20906162:0";
alter session set "_fix_control"="20854798:0";
alter session set "_fix_control"="21509656:0";
alter session set "_fix_control"="21833220:0";
alter session set "_fix_control"="21802552:0";
alter session set "_fix_control"="21452843:0";
alter session set "_fix_control"="21800590:0";
alter session set "_fix_control"="21273039:0";
alter session set "_fix_control"="16750133:0";
alter session set "_fix_control"="22013607:0";
alter session set "_fix_control"="22152372:0";
alter session set "_fix_control"="22077191:0";
alter session set "_fix_control"="22123025:0";
alter session set "_fix_control"="16913734:0";
alter session set "_fix_control"="8357294:0";
alter session set "_fix_control"="21979983:0";
alter session set "_fix_control"="22158526:0";
alter session set "_fix_control"="21971099:0";
alter session set "_fix_control"="22090662:0";
alter session set "_fix_control"="21300129:0";
alter session set "_fix_control"="21339278:0";
alter session set "_fix_control"="20270511:0";
alter session set "_fix_control"="21424812:0";
alter session set "_fix_control"="22114090:0";
alter session set "_fix_control"="22159570:0";
alter session set "_fix_control"="22272439:0";
alter session set "_fix_control"="22372694:0";
alter session set "_fix_control"="22514195:0";
alter session set "_fix_control"="22520315:0";
alter session set "_fix_control"="22649054:0";
alter session set "_fix_control"="8617254:0";
alter session set "_fix_control"="22020067:0";
alter session set "_fix_control"="22864730:0";
alter session set "_fix_control"="21099502:0";
alter session set "_fix_control"="22904304:0";
alter session set "_fix_control"="22967807:0";
alter session set "_fix_control"="22879002:0";
alter session set "_fix_control"="23019286:0";
alter session set "_fix_control"="22760704:0";
alter session set "_fix_control"="20853506:0";
alter session set "_fix_control"="22513493:0";
alter session set "_fix_control"="22518491:0";
alter session set "_fix_control"="23103096:0";
alter session set "_fix_control"="22143411:0";
alter session set "_fix_control"="23180670:0";
alter session set "_fix_control"="23002609:0";
alter session set "_fix_control"="23210039:0";
alter session set "_fix_control"="23102649:0";
alter session set "_fix_control"="23071621:0";
alter session set "_fix_control"="23136865:0";
alter session set "_fix_control"="23176721:0";
alter session set "_fix_control"="23223113:0";
alter session set "_fix_control"="22258300:0";
alter session set "_fix_control"="22205301:0";
alter session set "_fix_control"="23556483:0";
alter session set "_fix_control"="21305617:0";
alter session set "_fix_control"="22533539:0";
alter session set "_fix_control"="23596611:0";
alter session set "_fix_control"="22937293:0";
alter session set "_fix_control"="23565188:0";
alter session set "_fix_control"="24654471:0";
alter session set "_fix_control"="24845754:0";
ALTER session set "_fix_control"='5483301:OFF;

In my case none of fix control gives any benefits. This was now clear there was some feature in 12.2 which when disabled in the form of underscore parameter helped to get us back to previous state. But the problem is, they are total 32 different parameters, so I thought to divide the parameter list into half and execute the query after setting first half parameters and see if I get the correct result, if not then tried again setting next half parameter list and executed the query.
And with that trial and error approach, I was able to reach that one parameter which caused issues with those set of SQLs. It was ‘_optimizer_inmemory_use_stored_stats‘ which was causing issues with those IN-Memory tables references by those SQLs, as the optimizer NEVER uses the stored statistics for in-memory tables in 12.1, but the same parameter in 12.2 is with default value of AUTO and that causes the issue.

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , | Leave a Comment »

 
%d bloggers like this: