Recently while working on a slow stats gathering case, someone asked if there is a way to know what’s happening beneath the surface ? what all flags and calculations its doing internally while on the front end the DBMS_STATS still running ? Yes, there is a way! As with most of the Oracle utilities, DBMS_STATS too comes with its own tracing facility which you can call using dbms_stats.set_global_prefs. Today’s post is all about enabling tracing on DBMS_STATS package.
Few of the commonly used DBMS_STATS flags/options are given below, you can always do a combination if wanted to club …
4 = This is to trace table stats 8 = This is to trace index stats 16 = This is to trace columnar stats 512 = auto stats job 1024 = This is to trace parallel executions 4096 = This is to trace partition prunes 16384 = This one traces extended column stats 32768 = To trace approximate NDV (number distinct values) ….
Let’s quickly collect traces for both Indexes and Tables followed by the stas collection for a table and its dependent Indices.
SQL>
SQL> exec dbms_stats.set_global_prefs('TRACE',4+8);
PL/SQL procedure successfully completed.
SQL>
SQL> set time on
13:28:55 SQL>
13:28:56 SQL>
13:28:56 SQL>
13:28:56 SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYS' , tabname => 'BIGTAB',cascade => true, estimate_percent => 10,method_opt=>'for all indexed columns size 1', granularity => 'ALL', degree => 4);
PL/SQL procedure successfully completed.
13:28:59 SQL>
13:29:00 SQL>
SQL> select name, value from v$diag_info where name='Diag Trace';
NAME VALUE
---------------------------------------- ------------------------------------------------------------------------------------
Diag Trace /u01/app/oracle/diag/rdbms/localdb/localdb/trace
Alright, we have our trace ready, let’s dig in deep what’s all inside the trace that we have collected for both the Table and it’s Index. It starts with a standard header in the trace files, followed by all by default explicitly mentioned flags/options with DBMS_STATS in XML format.
Next comes the preferences set for the called Table, ‘BIGTAB’ in our case. This will give you more idea about what all options were set overall and if there is any scope to tune and tweak anyone of them.
Next sections is where it monitors the stats gathering for Table and its Index, their start and end times were captured, for example : “APPROX_NDV_ALGORITHM => Non-Incremental” was chosen. Table level stats were used i.e. row count, blocks, average row length, sample size, number of indexes etc.
Next it jumps to the Index where it tries to calculate the global statistics for the partitioned Index by aggregating the partition-level statistics but as its a non-partitioned Index, it simples moves to the next stage. Next it triggers the Indexing options (just enables them at the start) i.e. nlb (number of leaf blocks), ndk (number of distinct keys), clf (clustering factor), nblks (number of blocks), sample percentage, degree (parallelism), nrw (key counts) etc by allocating them value 1 which means to consider them. Finally it starts to analyze the Index structure (In this case it’s not using approximate_ndv algorithm) and calculates the final values of nrw, nlb, ndk, clf etc.
Few Internal options were also used i.e. ctx.conc_ctx, ctx.batching_coeff etc.
-- Table Stats
DBMS_STATS: Started table SYS.BIGTAB. at 28-JUN-22 01.28.58.499744000 PM -04:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting
DBMS_STATS: reporting_man_log_task: target: "SYS"."BIGTAB" objn: 108254 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: ALL global_requested: NULL pfix:
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: HLL in incremental (no stats exist)
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: ALL execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix: ownname: SYS tabname: BIGTAB pname: spname: execution phase: 1
DBMS_STATS: Number of rows in the table = 492190, blocks = , average row length = 19, chain count = , scan rate = 0, sample size = 49219, cstats.count = 3, cind = 2
-- Index part
DBMS_STATS: Started index SYS.IDX_DIXIT at 28-JUN-22 01.28.58.816819000 PM -04:00 granularity: ALL gIdxGran:
DBMS_STATS: Specified granularity = , New granularity = ALL, Fixed granularity = ALL
DBMS_STATS: granularity ALL
DBMS_STATS: reporting_man_log_task: target: "SYS"."IDX_DIXIT" objn: 108255 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Gather index subpartition stats...
DBMS_STATS: Gather index partition stats...
DBMS_STATS: Gather global index stats...
DBMS_STATS: Start analyze_index_using_sql
DBMS_STATS: tab_stats_stale: BIGTAB not analyzed using full compute
DBMS_STATS: Number of blocks in the index: 1030
DBMS_STATS: Start execute_analyze_index(owner= SYS, indname= IDX_DIXIT, fobjn= , sample_pct= 10, seed= 0, degree= 4, collect_nrw= 1, collect_nlb= 1, collect_ndk= 1, collect_clf= 1, bmi= 0, iot_ov= 0, iot_sec= 0, ppredtxt= , nblks= 1030)
DBMS_STATS: execute_analyze_index(): Not using approximate_ndv, pct=111.529126213592233009708737864077669903,sample_pct=10
DBMS_STATS: End execute_analyze_index(ssize= 494100, nrw= 494100, nlb= 1027, ndk= 1000, clf= 494100)
DBMS_STATS: target_size: 1027
DBMS_STATS: Finished index SYS.IDX_DIXIT at 28-JUN-22 01.28.59.464468000 PM -04:00
DBMS_STATS: Finished table SYS.BIGTAB. at 28-JUN-22 01.28.59.519495000 PM -04:00
Once you are done with the stats tracing, close it immediately!
Hello All, my blog FatDBA have been awarded with the ‘Top 100 Oracle Blogs on the planet‘ (Rank 27) by feedspot. The best Oracle blog list curated from thousands of blogs on the web and ranked by traffic, social media followers, domain authority & freshness.
Short stacks are a rich source of information which can help you to do analyzing and deconstructing a process which is part of a problem in Oracle databases, and the good part is you don’t have to rely on any OS utilities like gdb, pstack etc. to extract stacks or traces. Its pretty simple and quick to generate the short stacks, you have to pass on the OSPID of the process, and issue SHORT_STACK option for ORADEBUG, and will generate a sequence of all functions involved and their current position in a calls stack. You can also use errorstack for the same, but short_stacks are fast and point-to-point!
Now one can ask – What is a ‘call stack’ ? A call stack is a list of names of methods called at run time from the beginning of a program until the execution of the current statement. A call stack is mainly intended to keep track of the point to which each active subroutine should return control when it finishes executing. Call stack acts as a tool to debug an application when the method to be traced can be called in more than one context. This forms a better alternative than adding tracing code to all methods that call the given method.
Let me run it on a real time system where we are facing issues where due to adaptive switching between Log write methods LGWR was causing excessive ‘log file sync‘ Waits
[oracle@oracleontario ~]$ ps -ef|grep lg
oracle 4402 1 0 00:25 ? 00:00:00 ora_lgwr_dixitdb
oracle 4719 3201 0 00:25 pts/1 00:00:00 grep --color=auto lg
-- Here in the example I am generating short stacks for LGWR process
SQL>
SQL> oradebug setospid 4402
Oracle pid: 19, Unix process pid: 4402, image: oracle@oracleontario.ontadomain (LGWR)
SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+187
<-ksliwat()+2233<-kslwaitctx()+200<-ksarcv()+320<-ksbabs()+670<-ksbrdp()+1167<-opirip()+541<-opidrv()+581<-sou2o()+165
<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
SQL>
SQL>
-- After a gap of ~ 10-15 Mins
-- Let me generate the short stack once again just to see if there is any difference.
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+992 <-sigacthandler()+104
<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212 <-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680
<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
So, as you’ve seen in above code, the clear change is there in the LGWR short stacks, we did few log switches, modified a big table in the database followed by multiple commits. If you carefully look at the stack, it was function ksedsts() where it was at the time we generated the stack, and function ksdxfstk () called it, followed by ksdxcb(), sspuser() and immediate change after functions called by LGWR internal call stacks. They matched one of the known BUG which gave us a hint that we resolved after modified a undocumented parameter.
So, it can help us locating the buggy functions called by any user session or background processes and is very useful in case when you are battling with a strange looking problem in your Oracle database.
Few of the functions that I am able to recall are given below. semtimedop() is one of the function for semaphore operations also known as semop and is used to perform atomically an array of semaphore operations on the set of semaphores associated with the semaphore identifier specified by semid. ‘ksdxcb’ is a function that’s called usually when the command oradebug is being used. sighandler is the programming signal handler and is used to locate an exception. When a signal is sent to a process, various register and stack operations occur that make it look as though the leaf PC at the time of the signal is the return address for a call to a system function, sigacthandler(). sigacthandler() calls the user-specified signal handler just as any function would call another. The sigaltstack()function allows a thread to define and examine the state of an alternate stack area on which signals are processed.
opidrv() is ORACLE Program Interface DRiVer (IGNORE) opiodr is ORACLE code request driver – route the current request main() is the standard executable entry point ksedst()is the KSE dump the call stack skgmstack() is the call specified function with extra STACK space rpidrv()or the RPI is theRecursive Program Interface DRiVer pfrrun() or PSDEVN is the PL/SQL Interpreter Main Instruction Loop kkxexe()or KKX is to execute plsql
Recently we were facing one performance issue with one of the SQL which references a highly volatile table in the entire database, and one of the largest too. The tables goes through massive DMLs each minute and that has lead to huge randomness in the data distribution, and expectedly few of its critical indexes has a very high (bad) clustering factor (CF).
A quick explanation on the clustering factor, it is a measure of the ordered-ness of an index in comparison to the table that it is based on. That would be a simple explanation on Oracle’s clustering factor (CF), but if you want to read more about it I would recommend to read blog posts by Richard Foote, Jonathan Lewis and few other great articles by Randolf Geist.
Okay, coming back to the case – We tried few methods (fixed BLevel, Leaf_Blocks, degree, stats recollection etc.), but none of them helped much to stabilize performance of the query, and finally we tried something that was introduced in Oracle 12.1 the ‘Attribute Clustering‘ that helped us to resolve the problem. The attribute clustering improves physical IOs for tables and its partitions. An attribute-clustered table stores data in close proximity on disk in an ordered way based on the values of a certain set of columns in the table or a set of columns in the other tables.
Attribute clustering is a user-defined table directive that provides data clustering on one or more columns in a table. The directives can be specified when the table is created or modified. There are two types of attribute clustering:
With Linear Ordering : Linear ordering stores the data according to the order of specified columns. This is the default type of clustering.
With Interleaved Ordering : It accurately determines exactly where data is located on the disk. This enabled I/O Pruning. This uses a special multidimensional clustering technique based on Z-order curve fitting.
Note: Zone mapping is a separately licensed feature.
Let’s do some test to understand the feature better. Here in the demo I would be using linear order clustering without any zonemaps. You can try try pairing the LO clustering with zone maps too. All tests I am doing it on a 19.15.0.0.0 sandbox environment.
For testing purpose, I have created a new tables with some 2000000 random values/rows inserted into it, this is to mimic a bad clustering factor (CF) scenario. I have also created an Index on the top of a column (date) and have collected the statistics.
[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Apr 27 00:13:08 2022
Version 19.15.0.0.0
Copyright (c) 1982, 2022, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0
SQL>
SQL> create table new_test (id number, DOB date, text varchar2(40));
Table created.
SQL> insert into new_test select rownum, sysdate-trunc(dbms_random.value(0, 20000)), 'PRASHANT DIXIT'
from dual connect by level <= 2000000; 2
2000000 rows created.
SQL>
SQL> commit;
Commit complete.
SQL> create index idx_newtest on new_test(dob);
Index created.
SQL> EXEC DBMS_STATS.gather_table_stats('DIXDROID', 'NEW_TEST', estimate_percent => null, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.
-- Here you can see the extremely bad CLUSTERING FACTOR of the Index.
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST'; 2
TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST IDX_NEWTEST 10097 2000000 1989246
SQL>
With such a bad clustering factor of the Index, let me try to run a query on the same column and pass a date range and see the execution plan.
SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';
Explained.
SQL>
SQL> SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 1847055510
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3100 | 86800 | 2750 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| NEW_TEST | 3100 | 86800 | 2750 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
15 rows selected.
With a bad clustering factor, the query is going for a full table scan on NEW_TEST table, and the overall cost of the access path is very high too. Let’s alter the table using Linear attribute clustering – I won’t be using materialized zonemaps here
SQL> alter table NEW_TEST add clustering by linear order(DOB) without materialized zonemap;
Table altered.
SQL> alter table NEW_TEST move online;
Table altered.
-- Now if you check you will the improved CF of the Index.
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST';
TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST IDX_NEWTEST 10097 2000000 9277
The clustering factor of the Index was greatly improved after altering the table column (DOB) with linear order clustering enabled. Let’s try to run the same SQL and see if there is any improvements or not.
SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------
Plan hash value: 1446839462
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3100 | 86800 | 26 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NEW_TEST | 3100 | 86800 | 26 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_NEWTEST | 3100 | | 11 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
15 rows selected.
SQL>
SQL>
And we can see the FTS was replaced with a quick INDEX RANGE SCAN on IDX_NEWTEST index (created to cover DOB column). Look at the great reduction in the cost per step and final cost of the access path and plan tree.
Today’s post is about one of the good featured added into Oracle 19c for DBMS_XPLAN package that provides an easy way to display the output of the EXPLAIN PLAN using a new function called COMPARE_PLANS. This function compares each plan in the list to a reference plan and returns a report and makes life of a DBA little easy.
The following example illustrates the usage of the COMPARE_PLANS function in 19c.
-- Lets create a sample table
SQL> CREATE TABLE bigtab (
id NUMBER,
weight NUMBER,
adate DATE
);
Table created.
-- Insert some dummy rows into it
SQL> INSERT INTO bigtab (id, weight, adate)
SELECT MOD(ROWNUM,1000),
DBMS_RANDOM.RANDOM,
SYSDATE-1000+DBMS_RANDOM.VALUE(0,1000)
FROM all_objects
/
84217 rows created.
SQL>
-- Lets query the table with our conditions
SQL> select count(*) from bigtab where ID between 280 and 500;
COUNT(*)
----------
18564
SQL>
SQL> select count(*) from bigtab where ID not between 280 and 500;
COUNT(*)
----------
65653
-- All set lets get the SQLID for both of the SQLs that we ran
SQL> set linesize 400 pagesize 400
SQL> select sql_id, sql_text from v$sqlarea where sql_text like '%from bigtab where ID%';
SQL>
SQL_ID SQL_TEXT
------------- --------------------------------------------------------------------------------
3k1b0y7scc8bd select count(*) from bigtab where ID not between :"SYS_B_0" and :"SYS_B_1"
51tfy4wqb5sg3 select count(*) from bigtab where ID between :"SYS_B_0" and :"SYS_B_1"
Now we have SQLIDs for both of the SQLs I ran in last session, let’s call DBMS_XPLAN.COMPARE_PLANS and compare both of the plans and see how the compare reports looks like and what all information it presents. There are multiple parameters available with the reference_plan and compare_plan_list, but to keep it simple I will be using only cursor_cache_object (to get all plans in the cursor cache generated for SQL ID) and plan_object_list(cursor_cache_object) which is a TYPE
Other available plan sources are: plan_table_object(owner, plan_table_name, statement_id, plan_id) : for schedules cursor_cache_object(sql_id, child_number) : For cursor cache awr_object(sql_id, dbid, con_dbid, plan_hash_value) : for Hours or if you want to check for two different PHVs sqlset_object (sqlset_owner, sqlset_name, sql_id, plan_hash_value) : for SQL Tuning Set spm_object (sql_handle, plan_name) : For SQL Plan management sql_profile_object (profile_name) : For SQL configuration file
Let’s go simple and will use cursor_cache_object with plan_object_list to get a simple plain execution plan compare report.
SQL>
SQL> VARIABLE v_rep CLOB
SQL>
SQL> BEGIN
:v_rep := DBMS_XPLAN.COMPARE_PLANS(
reference_plan => cursor_cache_object('3k1b0y7scc8bd', NULL),
compare_plan_list => plan_object_list(cursor_cache_object('51tfy4wqb5sg3', NULL)),
type => 'TEXT',
level => 'TYPICAL',
section => 'ALL');
END;
/
PL/SQL procedure successfully completed.
SQL>
-- Now lets call the variable to see contents inside, the compare report.
SQL> set long 100000
SQL> COLUMN report FORMAT a200
SQL>
SQL> SELECT :v_rep REPORT FROM DUAL;
REPORT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
COMPARE PLANS REPORT
---------------------------------------------------------------------------------------------
Current user : SYS
Total number of plans : 2
Number of findings : 2
---------------------------------------------------------------------------------------------
COMPARISON DETAILS
---------------------------------------------------------------------------------------------
Plan Number : 1 (Reference Plan)
Plan Found : Yes
Plan Source : Cursor Cache
SQL ID : 3k1b0y7scc8bd
Child Number : 0
Plan Database Version : 19.0.0.0
Parsing Schema : "SYS"
SQL Text : select count(*) from bigtab where ID not between
:"SYS_B_0" and :"SYS_B_1"
Plan
-----------------------------
Plan Hash Value : 2140185107
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 82 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| * 2 | TABLE ACCESS FULL | BIGTAB | 54028 | 702364 | 82 | 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 2 - filter(("ID"<:SYS_B_0 OR "ID">:SYS_B_1))
Notes
-----
- Dynamic sampling used for this statement ( level = 2 )
---------------------------------------------------------------------------------------------
Plan Number : 2
Plan Found : Yes
Plan Source : Cursor Cache
SQL ID : 51tfy4wqb5sg3
Child Number : 0
Plan Database Version : 19.0.0.0
Parsing Schema : "SYS"
SQL Text : select count(*) from bigtab where ID between
:"SYS_B_0" and :"SYS_B_1"
Plan
-----------------------------
Plan Hash Value : 3865534252
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 79 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| * 2 | FILTER | | | | | |
| * 3 | TABLE ACCESS FULL | BIGTAB | 17969 | 233597 | 79 | 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 2 - filter(:SYS_B_1>=:SYS_B_0)
* 3 - filter(("ID">=:SYS_B_0 AND "ID"<=:SYS_B_1))
Notes
-----
- Dynamic sampling used for this statement ( level = 2 )
Comparison Results (2):
-----------------------------
1. Query block SEL$1, Alias "BIGTAB"@"SEL$1": Some lines (id: 2) in the
current plan are missing in the reference plan.
2. Query block SEL$1, Alias "BIGTAB"@"SEL$1": Some columns (ID, PARENT_ID,
DEPTH) do not match between the reference plan (id: 2) and the current plan
(id: 3).
---------------------------------------------------------------------------------------------
SQL>
SQL>
This is great feature when the execution plans are messy and behemoth and just a glance is not enough and you want to compare two such execution plans. This will present a ‘comparison result’ at the end of the report like in our example the report says the Query Block with name SEL$1 which is an alias of BIGTAB in the current plan doesn’t have the line ID 2 (FILTER method in our case) and similar other observations.
Recently someone asked me, How to run oratop utility on Oracle database version that is above 12.1 ? He was using Oracle version 19.3.0.0.0 and was getting error ‘error while loading shared libraries’ when tried to call the oratop, and it was quite an obvious error as the script trying to locate a shared object file called libclntshcore.so under $ORACLE_HOME/lib directory for version 12.1, whereas it was version 19.1 and hence the shared object version was libclntshcore.so.19.1
[oracle@oracleontario ~]$ ./oratop.RDBMS_12.1_LINUX_X64 -d -f -i 3 / as sysdba
./oratop.RDBMS_12.1_LINUX_X64: error while loading shared libraries: libclntshcore.so.12.1: cannot open shared object file: No such file or directory
[oracle@oracleontario ~]$
Solution to the problem is, starting from Oracle 19c, oratop is part of Oracle supplied tool and is present under ORACLE_HOME/suptools directory, and is not version specific. Other tools that you will see along with oratop in suptools directory are orachk and tfa. Let’s call it and see if it runs!
[oracle@oracleontario ~]$ cd $ORACLE_HOME/suptools/
[oracle@oracleontario suptools]$ ls
orachk oratop tfa
[oracle@oracleontario suptools]$ cd oratop/
[oracle@oracleontario oratop]$ ls
oratop
[oracle@oracleontario oratop]$
[oracle@oracleontario oratop]$
[oracle@oracleontario oratop]$ ./oratop -f -i 3 / as sysdba
oratop: Release 15.0.0 Production on Thu Mar 10 07:33:49 2022
Copyright (c) 2011, Oracle. All rights reserved.
Connecting ..
Processing ...
Oracle 19c - Primary dixitd 07:33:45 up: 5.2h, 1 ins, 0 sn, 0 us, 1.5G sga, 0% fra, 1 er, 0.5%db
ID CPU %CPU %DCP LOAD AAS ASC ASI ASW IDL ASP LAT MBPS IOPS R/S W/S LIO GCPS %FR PGA TEMP UTPS UCPS RT/X DCTR DWTR
1 1 9.1 0.2 0.2 0.0 0 0 0 0 0 0 0.1 3 3 0 3 0 45 153M 0 0 0 110m 118 8
EVENT (C) TOTAL WAITS TIME(s) AVG_MS PCT WAIT_CLASS
db file sequential read 5175 65 12.6 32 User I/O
DB CPU 42 21
oracle thread bootstrap 81 40 503.8 20 Other
db file scattered read 593 29 50.4 15 User I/O
external table read 1 23 23470.8 12 User I/O
ID SID SPID USERNAME PROGRAM SRV SERVICE PGA SQLID/BLOCKER OPN E/T STA STE WAIT_CLASS EVENT/*LATCH W/T
1 58 14884 SYS sqlplus@o DED SYS$USE 1.4M 1:77 UPD 8.0s ACT WAI Applicatio enq: TX - row lock co 7.8s
[oracle@oracleontario oratop]$
[oracle@oracleontario oratop]$
Recently one of the customer asked me about their critical SQL on a production system running on Oracle 12cR2, if there is a way to pin it in the AWR. I mean the query was already super fast, gets completed very fast and doesn’t consume any system resources, or isn’t a ‘Top SQL’. He still want to track this low resource SQL historically to check few stats, the guys was coming from an Oracle DBA background, and got the impression that the AWR only captures statements considered to be Top N (DBA_HIST_WR_CONTROL.TOPNSQL) which is by default 30. So, If you require any SQL to be always available in the top SQL, then you need to modify the default value to store those many SQLs. But this will create a AWR snapshot stress and that he don’t want.
The answer was ‘Yes’, there is a way how this can be achieved, and that’s too quite easily, and is even present since 11gR1 without making any modifications to your snapshot settings of topnsql. This is by using DBMS_WORKLOAD_REPOSITORY.ADD_COLORED_SQL API to pin any SQL into the AWR report as long as the SQL is in memory. So, if you mark your SQL or color it, it will be captured in every snapshot and it doesn’t have to be a TOP-N SQL to present in an AWR.
Let’s do a demo to understand how we can do that. I am going to run a test SQL to count number of rows into the table.
SQL> select count(*) from TABLE_GROWTH_MB;
COUNT(*)
----------
2868
SQL> select sql_id, plan_hash_value, sql_text from v$sqlarea where sql_text like '%TABLE_GROWTH_MB%';
SQL_ID PLAN_HASH_VALUE SQL_TEXT
------------- --------------- --------------------------------------------------
4n3x6w1vhn1k3 1134407048 select count(*) from TABLE_GROWTH_MB
Okay, so we have executed the SQL and we have its SQL_ID with us now. Lets try to use awr_sql_report_text function that displays an Automatic Workload Repository (AWR) SQL report as plain text to see if the SQL is present under the latest snapshot range.
SQL>
SQL> select output from table (dbms_workload_repository.awr_sql_report_text(l_dbid => 2592296819 , l_inst_num => 2,
l_bid => 36708, l_eid => 36718, l_sqlid => '4n3x6w1vhn1k3'));
OUTPUT
--------------------------------------------------------------------------------
WORKLOAD REPOSITORY SQL Report
Snapshot Period Summary
DB Name DB Id Unique Name DB Role Edition Release RAC CDB
------------ ----------- ----------- ---------------- ------- ---------- --- ---
FATDBACA 2592296819 FATDBACA PRIMARY EE 12.2.0.1.0 YES NO
Instance Inst Num Startup Time
------------ -------- ---------------
FATDBACA2 2 26-Oct-21 23:10
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 36708 10-Mar-22 16:00:37 190 2.5
End Snap: 36718 10-Mar-22 18:30:06 181 2.6
Elapsed: 149.48 (mins)
DB Time: 91.13 (mins)
SQL Summary DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36718
No data exists for this section of the report.
-------------------------------------------------------------
SQL ID: 4n3x6w1vhn1k3 DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36718
No data exists for this section of the report.
-------------------------------------------------------------
Full SQL Text
No data exists for this section of the report.
SQL>
SQL>
SQL>
SQL>
And the report has no data about the SQL as the SQL was pretty fast, and hardly took few milliseconds to finish and is expected as it didn’t qualified as a TOP-SQL hence found no place in the AWR report. I am now going to color the SQLID and will re-run the SQL and check if its there in AWR now or not …
-- to PIN the SQLID in question
SQL> exec dbms_workload_repository.add_colored_sql('4n3x6w1vhn1k3');
PL/SQL procedure successfully completed.
SQL>
-- To UNPIN or remove the COLOR SQLID from repo
SQL> exec dbms_workload_repository.remove_colored_sql('<SQL_ID>');
-- Query below view to see list of all colored SQLs
SQL> select * from DBA_HIST_COLORED_SQL;
DB Id SQL_ID CREATE_TI CON_ID
----------- ------------- --------- ----------
2592296819 4n3x6w1vhn1k3 10-MAR-22 0
-- Re ran the SQL
SQL> select count(*) from TABLE_GROWTH_MB;
COUNT(*)
----------
2868
-- Let me generate a manual snap to capture fresh data
SQL>
SQL>
SQL> var n_snapid number
set feed off term on head off
set serveroutput on size unlimited
begin
:n_snapid := dbms_workload_repository.create_snapshot();
dbms_output.put_line('snap_id: ' || to_char(:n_snapid));
end;
/
snap_id: 36719
SQL>
-- Let me retry again to see if the SQLID details are there in the AWR after I have colored it
SQL> SQL> select output from table (dbms_workload_repository.awr_sql_report_text(l_dbid => 2592296819 , l_inst_num => 2,
l_bid => 36718, l_eid => 36719, l_sqlid => '4n3x6w1vhn1k3'));
OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
WORKLOAD REPOSITORY SQL Report
Snapshot Period Summary
DB Name DB Id Unique Name DB Role Edition Release RAC CDB
------------ ----------- ----------- ---------------- ------- ---------- --- ---
FATDBACA 2592296819 FATDBACA PRIMARY EE 12.2.0.1.0 YES NO
Instance Inst Num Startup Time
------------ -------- ---------------
FATDBACA2 2 26-Oct-21 23:10
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 36708 10-Mar-22 16:00:37 190 2.5
End Snap: 36719 10-Mar-22 18:39:40 177 2.6
Elapsed: 159.06 (mins)
DB Time: 93.43 (mins)
SQL Summary DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36719
Elapsed
SQL Id Time (ms)
------------- ----------
Container DB
------------
4n3x6w1vhn1k3 0
Module: SQL*Plus
select count(*) from TABLE_GROWTH_MB
2592296819
-------------------------------------------------------------
SQL ID: 4n3x6w1vhn1k3 DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36719
-> 1st Capture and Last Capture Snap IDs
refer to Snapshot IDs witin the snapshot range
-> select count(*) from TABLE_GROWTH_MB
Plan Hash Total Elapsed 1st Capture Last Capture
# Value Time(ms) Executions Snap ID Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1 1134407048 0 1 36719 36719
-------------------------------------------------------------
Plan 1(PHV: 1134407048)
-----------------------
Plan Statistics DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36719
-> % Total DB Time is the Elapsed Time of the SQL statement divided
into the Total Database Time multiplied by 100
Stat Name Statement Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms) 0 0.2 0.0
CPU Time (ms) 0 0.2 0.0
Executions 1 N/A N/A
Buffer Gets 22 22.0 0.0
Disk Reads 0 0.0 0.0
Parse Calls 1 1.0 0.0
Rows 1 1.0 N/A
User I/O Wait Time (ms) 0 N/A N/A
Cluster Wait Time (ms) 0 N/A N/A
Application Wait Time (ms) 0 N/A N/A
Concurrency Wait Time (ms) 0 N/A N/A
Invalidations 0 N/A N/A
Version Count 1 N/A N/A
Sharable Mem(KB) 19 N/A N/A
-------------------------------------------------------------
Execution Plan
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 10 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TABLE_GROWTH_MB | 2624 | 10 (0)| 00:00:01 |
------------------------------------------------------------------------------
Full SQL Text
SQL ID SQL Text
------------- -----------------------------------------------------------------
4n3x6w1vhn1k3 select count(*) from TABLE_GROWTH_MB
Perfect, the SQLID is there in the AWR now, and we can see all runtime stats, execution plan, SQL Text etc.
Oracle 19c has a great enhancement added to the DBMS_XPLAN package called the ‘Hint Usage Report’, which is pretty helpful in reporting if an incorrect hint was used, or there was any syntax error with the hint or if there was an unresolved hint used in the SQL.
This you can use with the DBMS_XPLAN package and its functions like DISPLAY, DISPLAY_CURSOR, DISPLAY_WORKLOAD_REPOSITORY or DISPLAY_SQL_PLAN_BASELINE. By default, DBMS_XPLAN in the default TYPICAL format will report only invalid hints. The report shows the text of the hints. The hint may also have one of the following annotations:
E : indicates a syntax error. N : indicates an unresolved hint. U : indicates that the corresponding hint was not used in the final plan.
Lets do some tests to understand it better. For our tests I have a test table created called BIGTAB and loaded it with sample data. First case, I will try the PARALLEL_INDEX hint with one of the Index (non-partitioned) to see what the Hint Report says.
SQL> explain plan for select /*+ PARALLEL_INDEX(BIGTAB,WEIGHT_IDX,2) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'TYPICAL'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 441133017
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13613 | 252K| 70 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| BIGTAB | 13613 | 252K| 70 (2)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("WEIGHT">=(-168444405) AND "WEIGHT"<=639413572)
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 - SEL$1 / BIGTAB@SEL$1
U - PARALLEL_INDEX(BIGTAB,WEIGHT_IDX,4)
20 rows selected.
SQL>
Here we are getting UNUSED (U) for PARALLEL_INDEX hint that we have used against index WEIGHT_INDEX, but it wasn’t honored because the PARALLEL_INDEX hint parallelizes an index range scan of a partitioned index. This is a common mistake with parallel_index hint is using it against a non-partitioned index, where it will be ignored.
Let me try the next case where I will make a syntax error with the hint INDEX, will pass it as INDEXX and see what the report says.
06:04:55 SQL>
06:04:55 SQL>
06:04:55 SQL> explain plan for select /*+ INDEXX(BIGTAB,WEIGHT_IDX) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
Elapsed: 00:00:00.05
06:08:35 SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
1 - SEL$1
E - INDEXX
15 rows selected.
Elapsed: 00:00:00.17
06:08:40 SQL>
And it says that there is was a syntax error (E) with the HINT, hence ignored by the optimizer.
Now, I will be trying a correct/valid hint with the SQL Statement and see what the hint report says about it.
SQL> explain plan for select /*+ INDEX(BIGTAB,WEIGHT_IDX) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL>
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2994241301
----------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BIGTAB |
| 2 | INDEX RANGE SCAN | WEIGHT_IDX |
----------------------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
1 - SEL$1 / BIGTAB@SEL$1
- INDEX(BIGTAB,WEIGHT_IDX)
16 rows selected.
And the report says that no error was reported and hint was correctly used, same can be seen in the change of access path of the execution plan.
Next, let me try any optimizer related hint, I will try the optimizer_index_cost_adj HINT to make the SQL more or less index friendly
03:32:43 SQL> explain plan for select /*+ opt_param('optimizer_index_cost_adj',20) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
Elapsed: 00:00:00.00
03:32:53 SQL>
03:32:58 SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
0 - STATEMENT
- opt_param('optimizer_index_cost_adj',20)
15 rows selected.
Elapsed: 00:00:00.02
And that worked too as no error or any status flag was reported in the hint report.
Next, I will try a non existing Index and will see what the report says about it.
SQL> explain plan for select /*+ INDEX(FATDBA_INDEX) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (N - Unresolved (1))
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------------------
1 - SEL$1
N - INDEX(FATDBA_INDEX)
15 rows selected.
SQL>
And the report says it was UNRESOLVED (N) as the Index doesn’t exists.
Let me try one more case where I will pass two bad or contradictory hints, will use both first_rows and all_rows together and will see what happens.
SQL> explain plan for select /*+ first_rows(1) all_rows */ * from BIGTAB;
Explained.
SQL> select * from dbms_xplan.display(format=>'basic +alias +hint_report');
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - SEL$1 / BIGTAB@SEL$1
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2 (U - Unused (2))
---------------------------------------------------------------------------
0 - STATEMENT
U - all_rows / hint conflicts with another in sibling query block
U - first_rows(1) / hint conflicts with another in sibling query block
21 rows selected.
SQL>
The preceding report shows that the all_rows and first_rows(1) CBO mode hints conflict with one other. As the First_rows attempts to optimize the query to get the very first row back to the client as fast as possible, whereas All_rows attempts to optimize the query to get the very last row as fast as possible.
Last example I will try to pass multiple hints where few of them will be valid and rest invalid. I will pass FULL & PARALLEL valid hints, and will use two INDEX unresolved hints and one UNRESOLVED hint of MERGE with query block.
SQL> explain plan for select /*+ FULL(bigtab) INDEX(WEIGHT_IDX) INDEX(ID_IDX) MERGE(@SEL$1) PARALLEL(2) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL> select * from dbms_xplan.display(format=>'basic +alias +hint_report');
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 305217024
-----------------------------------------
| Id | Operation | Name |
-----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | PX COORDINATOR | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 |
| 3 | PX BLOCK ITERATOR | |
| 4 | TABLE ACCESS FULL| BIGTAB |
-----------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
4 - SEL$1 / BIGTAB@SEL$1
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 5 (U - Unused (1), N - Unresolved (2))
---------------------------------------------------------------------------
0 - STATEMENT
- PARALLEL(2)
1 - SEL$1
N - INDEX(ID_IDX)
N - INDEX(WEIGHT_IDX)
U - MERGE(@SEL$1)
4 - SEL$1 / BIGTAB@SEL$1
- FULL(bigtab)
32 rows selected.
One of the very common scenarios that we DBAs has encountered are when the database is hanging, freeze or totally non responsive, or when your database crashes with a serious performance problem, or when the SQLPlus cannot connect to the server or when you noticed that we can not save the instance/node then most of the time DBA take the system state dump for further analysis or you can say for RCA purpose. In this case, if you want to obtain the current status of the database for later diagnosis, then we need to use SYSTEMSTATE dump to know what the process is doing, what it is waiting for, who is the resource owner, and who is blocking others. In case of the above problems, it is very helpful to analyze the cause of the problem by collecting SYSTEMSTATE dump in time.
In order to understand the purpose, let me create a situation, a problem, I will simulate a row locking (pessimistic type) case in to the database. From the first session I had created a table and insert few rows into it and didn’t commit and opened few others sessions and tried to update same row. And I can see a good amount of blocking is created in the database!
LOCK_TREE
----------
1.97
1.58
1.60
1.75
USERNAME SID SERIAL# TY LOCK_TYPE LOCK_REQUESTE ID1 ID2 SQL_TEXT
----------- ---------- ---------- -- ------------- ------------- ---------- ---------- ----------------------------------------
SYS 58 17036 TX None Exclusive 589839 1777 update test set id=09 where ID=10
SYS 60 21943 TX None Exclusive 589839 1777 update test set id=09 where ID=10
SYS 75 91 TX None Exclusive 589839 1777 update test set id=09 where ID=10
Right time to generate the SYSTEMSTATE dump, below is how you can generate that. Do system state dump few times in a row, so we can determine whether the processes are hung or active.
-- connect with the database if its not hang or freeze
SQL> connect / as sysdba
-- Connect using prelim option if its stalled or freeze
SQL> sqlplus -prelim / as sysdba
-- Use SETMYPID command to select the current process as the ORADEBUG process
SQL> oradebug setmypid
Statement processed.
-- This is to remove the limitation on the size of the trace file use.
-- In Oracle 8.1.6 and above the maximum size of the trace file defaults to UNLIMITED
SQL> oradebug unlimit
Statement processed.
-- This is to take the systemstate dumps with 266 level (short stack + dump)
SQL> oradebug dump systemstate 266
Statement processed.
-- Or use below in case running on a RAC setup
SQL> oradebug -g all dump systemstate 266
SQL> oradebug dump systemstate 266
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
-- This command prints the name of the current trace file
SQL> Oradebug tracefile_name
/u01/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_191381.trc
There are few other levels which you can use and are discussed below. Note: I highly recommend to check with Oracle support to use there levels as few of them with higher level are little risky and known to crash instances.
Level 2 : Takes a dump (not including the lock element)
Level 10 : Takes a dump including lock details
Level 11 : Takes a dump + global cache of RAC
Level 256 : Takes short stacks (function stack)
Level 258 (256 + 2) : Takes a short stack + dump (not including the lock element)
Level 266 (256 + 10) : Takes a short stack + dump
Level 267 (256 + 11) : Takes a short stack + dump + global cache of RAC
So we have artificially created a locking scenario on the database and we are aware about the sessions involved in this pessimistic type locking. Now we will use the same information and that will help us to navigate through the massive SYSTEMSTATE dumps. Lets dig into the trace file that we have generated for the session.
The first part or the header contains details like hostname, machine info, service, module and client details, followed by command processing’s that we did through ORADEBUG.
Next part is where you will see process summary and its dumps, including all user OSPIDs, their commands and blocking details with wait information and is the most interesting one and important section of the dump to understand the wait/blocking chain.
Dumping the Process Summary
1: PSEUDO process
2: PMON ospid 3904 sid 2 ser 9826, waiting for 'pmon timer'
3: CLMN ospid 3906 sid 3 ser 8008, waiting for 'pmon timer'
4: PSP0 ospid 3908 sid 4 ser 46393, waiting for 'rdbms ipc message'
5: VKTM ospid 3910 sid 5 ser 26025, waiting for 'VKTM Logical Idle Wait'
6: GEN0 ospid 3915 sid 6 ser 36023, waiting for 'rdbms ipc message'
7: MMAN ospid 3917 sid 7 ser 17720, waiting for 'rdbms ipc message'
.....
......
.......
47: USER ospid 5516 sid 79 ser 52211,
49: USER ospid 4429 sid 58 ser 17036, waiting for 'enq: TX - row lock contention'
Cmd: UPDATE
Blocked by inst: 1, sid: 97, ser: 2507
Final Blocker inst: 1, sid: 97, ser: 2507
50: USER ospid 4538 sid 75 ser 91, waiting for 'enq: TX - row lock contention'
Cmd: UPDATE
Blocked by inst: 1, sid: 97, ser: 2507
Final Blocker inst: 1, sid: 97, ser: 2507
51: W006 ospid 4290 sid 76 ser 17641, waiting for 'Space Manager: slave idle wait'
52: USER ospid 4636 sid 60 ser 21943, waiting for 'enq: TX - row lock contention'
Cmd: UPDATE
Blocked by inst: 1, sid: 97, ser: 2507
Final Blocker inst: 1, sid: 97, ser: 2507
55: M004 ospid 4722 sid 74 ser 6422, waiting for 'class slave wait'
And from the above trace section we understood that the final blocker is with SID 97, serial 2507 where SIDs 58,75 and 60 all waiting on event ‘enq: TX – row lock contention’.
Next section is where it dumps buffer cache working set information which contains size of the BC, number of pools, state information and other information. Followed by process level statistics – state, session, latching, post information along with short stack dumps which is debugging/triggering events that generates a trace file in case of any error and full chain. The trace also contains details about wait stacks, blocking chains (waiters), session event history,
Here in below section you will also see the SQL text (update test set id=09 where ID=10) and its details which has caused the blocking, its statistics, execution counts, active locking info, SQL ID, Table Info, HashValues and namespace info.
Next we will check more details about the blocker or the session that has blocked others, here SID 97 is the case.
(session) sid: 97 ser: 2507 trans: 0x7815a9a8, creator: 0x7d3d9878
flags: (0x41) USR/- flags2: (0x40009) -/-/INC
flags_idl: (0x0) status: -/-/-/- kill: -/-/-/-
DID: 0001-002E-000000010000-0000-00000000, short-term DID:
txn branch: (nil)
edition#: 134 user#/name: 0/SYS
oct: 0, prv: 0, sql: (nil), psql: 0x71cc7dc8
stats: 0x705a2930, PX stats: 0x128f8d84
service name: SYS$USERS
client details:
O/S info: user: oracle, term: pts/1, ospid: 4321
machine: oracleontario.ontadomain program: sqlplus@oracleontario.ontadomain (TNS V1-V3)
application name: sqlplus@oracleontario.ontadomain (TNS V1-V3), hash value=2432467899
.....
........
.........
Current Wait Stack:
0: waiting for 'SQL*Net message from client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=650 seq_num=652 snap_id=1
wait times: snap=11 min 38 sec, exc=11 min 38 sec, total=11 min 38 sec
wait times: max=infinite, heur=11 min 38 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x1a0
There are 3 sessions blocked by this session.
Dumping one waiter:
inst: 1, sid: 58, ser: 17036
wait event: 'enq: TX - row lock contention'
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0x9000f
p3: 'sequence'=0x6f1
row_wait_obj#: 77878, block#: 122337, row#: 0, file# 1
.........
............
.............
Session Wait History:
elapsed time of 0.000001 sec since current wait
0: waited for 'SQL*Net message to client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=649 seq_num=651 snap_id=1
wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000040 sec of elapsed time
1: waited for 'SQL*Net message from client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=648 seq_num=650 snap_id=1
wait times: snap=0.000557 sec, exc=0.000557 sec, total=0.000557 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000017 sec of elapsed time
2: waited for 'SQL*Net message to client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=647 seq_num=649 snap_id=1
wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000015 sec of elapsed time
3: waited for 'SQL*Net message from client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=646 seq_num=648 snap_id=1
wait times: snap=0.000280 sec, exc=0.000280 sec, total=0.000280 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000014 sec of elapsed time
4: waited for 'PGA memory operation'
=0x0, =0x0, =0x0
wait_id=645 seq_num=647 snap_id=1
wait times: snap=0.000349 sec, exc=0.000349 sec, total=0.000349 sec
wait times: max=infinite
wait counts: calls=0 os=0
....
......
........
You will also see details on SQLs that we’ve executed using same SID (97), here in my case I was running few SQLs to identify locking in the database and were captured in the dump along with its parent and child cursor details, followed by some object level stats.
Other important section is the ‘Enqueue Dump‘. Here in our case this section discloses that enqueue name was “TX-0009000F-000006F1-00000000-00000000” and was acquired in ‘X’ (Exclusive) mode by the holder session ID 97.
At the end it prints details about the redo transport, ASM File Info, SGA Information, PDB/CDB details …
KGSKSGA info:
is the system running: yes
session that stopped the system: (nil)
number of plans 1 classes 4
low threshold delta 0
high threshold delta 2
number of cpus[0]: 1
thresholds - low[0]: 1, high[0]: 3
dynamic threshold adjustments - misbehave_adjust[0]: 0, cpu_util_adj[0]: 2
dynamic thresholds - low[0]: 3, high[0]: 5
running cnt[0]: 1, runnable cnt[0]: 3
flags: 0x0
debug: 0x0
all pointers:
num_latches 300 current 222 chg_class 0x74e364a0 chg_state 0x74e42020
class_list 0x74dfaa88,top_plan 0x7084e9d8, plan_list 0x74e03d38
heaps - kgsk subheap 0x74e142f0, plan heap 0x7b5d7438
backgrounds in_sched (pids):
.....
..........
...............
Redo Log Management (from krsk_dump_rlm)
SRL min count: 0
SRL max count: 0
SRL min avail: 0
SRL file size: 0
SRL block size: 0
SRL use FRA : FALSE
So, if you want to dig in deep and want more proofs and details about the incident, SYSTEMSTATE dumps is a great source of all internals that were involved at the time of the problem. This isn’t limited to row locking problems, you can generate it for all sort of events, problems or issues i.e. Row cache, latch free, LC Locks, CBC are few of the common ones …
Finally, I’ve got a working copy of my favorite 10053 optimizer trace viewer from one of my connection. This one was written by Sergei Romanenko. I love it because it allows direct jumps to the most important parts of the trace and also uses highlighter to improve the readability of the trace and optionally formats the final query after transformations. It’s pretty easy to use, and you can directly search the keywords within these big thumping traces. And you can also wrap your text and can also format the final query.