Recently during one of the performance taskforce on a newly migrated system, customer DBA asked me to use one of their legacy tool to get more idea about database’s performance, that one of their expert DBA written to collect performance metrics. I’d seen their previous reports collected through the same tool for other systems, and it was good. But, got a runtime exception with an error while calling the script/tool which says ‘PLAN_TABLE physical table present in user schema SYS‘. The error means the user executing it (SYS) owns the table PLAN_TABLE that is the not the Oracle seeded GTT (Global Temporary Table) plan table owned by SYS (PLAN_TABLE$ table with a PUBLIC synonym PLAN_TABLE).
This was little odd to the customer DBAs as they had never experienced this error with the tool, and now when its there, question was Shall we drop the PLAN_TABLE ? Is it risky to do that ? If we drop it, will it impact the execution plan generation or not ? Any other associated risk with drop of plan_table ?
Next when I’d queried DBA_OBJECTS, I saw the table is there in SYS schema, though this system was migrated from 12.2 to 19c, but the table should not be there as the table only by default existed in older versions of Oracle. The object creation date was coming for the time when database was upgraded. It had appeared that someone after upgrade/migration, called the utlrp.sql explicitly (maybe any old 8i/9i DBA) and that’d created the table. Now the question is – It’s safe to dropthis table ?
SQL> select owner, object_name, object_type, created from dba_objects where object_name like '%PLAN_TABLE%'
and owner not in ('SQLTXPLAIN','SQLTXADMIN') ORDER BY 1;
OWNER OBJECT_NAME OBJECT_TYPE CREATED
---------- -------------------- ----------------------- ---------
PUBLIC PLAN_TABLE SYNONYM 17-APR-19
PUBLIC SQL_PLAN_TABLE_TYPE SYNONYM 17-APR-19
PUBLIC PLAN_TABLE_OBJECT SYNONYM 17-APR-19
SYS SQL_PLAN_TABLE_TYPE TYPE 17-APR-19
SYS PLAN_TABLE TABLE 13-MAR-22 ----->>>>> OLD PLAN_TABLE created during the UPGRADE
SYS SQL_PLAN_TABLE_TYPE TYPE 17-APR-19
SYS PLAN_TABLE_OBJECT TYPE 17-APR-19
SYS PLAN_TABLE$ TABLE 17-APR-19
SYS PLAN_TABLE_OBJECT TYPE BODY 17-APR-19
9 rows selected.
-- Look at the difference between the two, PLAN_TABLE$ is a GLOBAL TEMP TABLE and old PLAN_TABLE is not.
SQL> SELECT TABLE_NAME, owner, temporary from dba_tables where table_name like '%PLAN_TABLE%'
AND owner not in ('SQLTXPLAIN','SQLTXADMIN') ORDER BY 1;
TABLE_NAME OWNER T
------------------------------ -------------------- -
PLAN_TABLE SYS N
PLAN_TABLE$ SYS Y ---> Y represents GTT
Let’s first see what’s there inside the PLAN_TABLE and what’s its purpose. Will generate few SQL execution plans will observe changes that happens in PLAN_TABLE.
-- Table columns and details
SQL> desc plan_table
Name Null? Type
----------------------------------------- -------- ----------------------------
STATEMENT_ID VARCHAR2(30)
PLAN_ID NUMBER
TIMESTAMP DATE
REMARKS VARCHAR2(4000)
OPERATION VARCHAR2(30)
OPTIONS VARCHAR2(255)
OBJECT_NODE VARCHAR2(128)
OBJECT_OWNER VARCHAR2(128)
OBJECT_NAME VARCHAR2(128)
OBJECT_ALIAS VARCHAR2(261)
OBJECT_INSTANCE NUMBER(38)
OBJECT_TYPE VARCHAR2(30)
OPTIMIZER VARCHAR2(255)
SEARCH_COLUMNS NUMBER
ID NUMBER(38)
PARENT_ID NUMBER(38)
DEPTH NUMBER(38)
POSITION NUMBER(38)
COST NUMBER(38)
CARDINALITY NUMBER(38)
BYTES NUMBER(38)
OTHER_TAG VARCHAR2(255)
PARTITION_START VARCHAR2(255)
PARTITION_STOP VARCHAR2(255)
PARTITION_ID NUMBER(38)
OTHER LONG
DISTRIBUTION VARCHAR2(30)
CPU_COST NUMBER(38)
IO_COST NUMBER(38)
TEMP_SPACE NUMBER(38)
ACCESS_PREDICATES VARCHAR2(4000)
FILTER_PREDICATES VARCHAR2(4000)
PROJECTION VARCHAR2(4000)
TIME NUMBER(38)
QBLOCK_NAME VARCHAR2(128)
OTHER_XML CLOB
-- Let me check other stats or details about the PLAN_TABLE
SQL> select index_name, table_name from dba_indexes where table_name='PLAN_TABLE'
And owner not in ('SQLTXPLAIN','SQLTXADMIN') ORDER BY 1;
INDEX_NAME TABLE_NAME
-------------------------------------------------- ------------------------------
SYS_IL0000078251C00036$$ PLAN_TABLE
SQL> select table_name, owner, TABLESPACE_NAME from dba_tables where table_name like '%PLAN_TABLE%'
and owner not in ('SQLTXPLAIN','SQLTXADMIN') ORDER BY 1;
TABLE_NAME OWNER TABLESPACE_NAME
------------------------------ ------------------------------ ------------------------------
PLAN_TABLE SYS SYSTEM
PLAN_TABLE$ SYS
SQL>
-- The OLD PLAN_TABLE is empty at the moment
SQL> select count(*) from plan_table;
COUNT(*)
----------
0
-- Lets explain a test SQL to see what happens to the OLD PLAN_TABLE
SQL> explain plan for select count(*) from bigtab;
Explained.
-- And immediately 3 rows related to the plan line ids added to it
SQL> select count(*) from plan_table;
COUNT(*)
----------
3
-- Three entries for below 3 IDs.
SQL> select * FROM TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2140185107
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| BIGTAB | 72358 | 69 (0)| 00:00:01 |
---------------------------------------------------------------------
9 rows selected.
-- But the new PLAN_TABLE$ is still empty
SQL> select count(*) from PLAN_TABLE$ ;
COUNT(*)
----------
0
So, the question is – Is it safe to drop this table PLAN_TABLE ?
SQL> drop table PLAN_TABLE;
Table dropped.
SQL>
-- And the table is gone
SQL> select owner, object_name, object_type, created from dba_objects where object_name like '%PLAN_TABLE%'
and owner not in ('SQLTXPLAIN','SQLTXADMIN') ORDER BY 1;
OWNER OBJECT_NAME OBJECT_TYPE CREATED
------------------------------ -------------------- ----------------------- ---------
PUBLIC PLAN_TABLE SYNONYM 17-APR-19
PUBLIC SQL_PLAN_TABLE_TYPE SYNONYM 17-APR-19
PUBLIC PLAN_TABLE_OBJECT SYNONYM 17-APR-19
SYS PLAN_TABLE_OBJECT TYPE BODY 17-APR-19
SYS SQL_PLAN_TABLE_TYPE TYPE 17-APR-19
SYS PLAN_TABLE_OBJECT TYPE 17-APR-19
SYS PLAN_TABLE$ TABLE 17-APR-19
SYS SQL_PLAN_TABLE_TYPE TYPE 17-APR-19
8 rows selected.
Now when the table is gone, lets check if we are still able to generate the execution plan.
SQL>
SQL> explain plan for select count(*) from bigtab;
Explained.
SQL> select * FROM TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2140185107
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| BIGTAB | 72358 | 69 (0)| 00:00:01 |
---------------------------------------------------------------------
9 rows selected.
SQL> select count(*) from plan_table$;
COUNT(*)
----------
3
And yes, no issues at all. The plan now started to sit inside PLAN_TABLE$ that has a PUBLIC SYNONYM called PLAN_TABLE. So, it’s totally safe to drop the PLAN_TABLE from your schema if it still exists and Oracle has now a public synonym for the same purpose. WARNING: Don’t drop the PLAN_TABLE$ nor the PLAN_TABLE public synonym, these need to exist for the new PLAN_TABLE to work properly.
Last week we faced an interesting issue with one of the production system which was recently migrated from Oracle 12.2. to 19.15. The setup was running on a VMWare machine with limited resources. It all started when the application team started reporting slowness in their daily scheduled jobs and other ad-hoc operations, when checked at the database layer, it was all ‘enq: CR – block range reuse ckpt‘ wait event. Same can be seen in the below ORATOP output, where the BLOCKER ID 3817 is the CKPT or the checkpoint process.
The strange part was, the blocker was CKPT process and it was all for a particular SQL ID (an INSERT operation), see below in next oratop screen fix.
As far as this wait event (other classed), This comes just before you delete or truncate a table, where we need a level segment checkpoint. This is because it must maintain the consistency of the blocks there may be in the buffer memory and what’s on the disc. As per the definition, this wait event happens due to contention on blocks caused by multiple processes trying to update same blocks. This may seem issues from the application logic resulting into this concurrency bottleneck, but interestingly this was happening on a simple INSERT operation, not a DELETE or TRUNCATE.
Same can be seen in the AWR and ASH reports too! There are CBC (Cache Buffer Chains) latching and latch free events too along with ‘enq: CR – block range reuse ckpt‘, but the initial focus was to understand the event and its reasons. As far as ‘direct path read temp‘ it was happening due to couple of SELECT statement which we resolved after attaching a better plan with the SQLs.
Wait event source (from ASH)
SQL Text was quite simple, an INSERT INTO statement.
I’ve tried first with the CKPT process traces, just to see if there is anything interesting or useful caught within the process logs or traces. The trace was short and has only got some strange and obscure looking content, but which at least gave us an idea that the reuse block checkpoint of the RBR was failed due to enqueue, and its entry was failed due to abandoned parent. Still, that doesn’t helped us anything, we were unsure about the reason.
--> Info in CKPT trace file ---> XXXXX_ckpt_110528.trc.
RBR_CKPT: adding rbr ckpt failed for 65601 due to enqueue
RBR_CKPT: rbr mark the enqueue for 65601 as failed
RBR_CKPT: adding ckpt entry failed due to abandoned parent 0x1b57b4a88
RBR_CKPT: rbr mark the enqueue for 65601 as failed
There were few things logged in the alert.log, multiple deadlocks (ORA 0060), too many parse errors for one SELECT statement and some checkpoint incomplete errors (log switching was high >35)
-- deadlocks in alert log.
Errors in file /opt/u01/app/oracle/diag/rdbms/xxxxxx/xxxx/trace/xxxx_ora_73010.trc:
2022-04-19T13:41:22.489551+05:30 ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
ORA-00060: Deadlock detected. See Note 60.1 at My Oracle Support for Troubleshooting ORA-60 Errors. More info in file /opt/u01/app/oracle/diag/rdbms/pwfmfsm/PWFMFSM/trace/PWFMFSM_ora_73010.trc.
-- From systemstatedumps
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
------------Blocker(s)----------- ------------Waiter(s)------------
Resource Name process session holds waits serial process session holds waits serial
TX-03AC0014-0000B33A-00000000-00000000 1562 691 X 1632 946 5441 X 56143
TX-01460020-0001A5C2-00000000-00000000 946 5441 X 56143 1562 691 X 1632
-- too many parse errors
2022-04-19T13:57:26.261176+05:30
WARNING: too many parse errors, count=2965 SQL hash=0x8ce1e2ff ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
PARSE ERROR: ospid=68036, error=942 for statement:
2022-04-19T13:57:26.272805+05:30
SELECT * FROM ( SELECT xxxxxxxxxxxxxxxxxxxxxx FROM xxxxxxxxxxxxxxxxxxxxxx AND xxxxxxxxxxxxxxxxxxxxxx ORDER BY xxxxxxxxxxxxxxxxxxxxxx ASC ) WHERE ROWNUM <= 750
Additional information: hd=0x546ba1be8 phd=0x61edab798 flg=0x20 cisid=113 sid=113 ciuid=113 uid=113 sqlid=ccmkzhy6f3srz
...Current username=xxxxxxxxxxxxxxxxxxxxxx
...Application: xxxxxxxxxxxxxxxxxxxxxx.exe Action:
-- Checkpoint incomplete
2022-04-19T15:03:16.964470+05:30 ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
Thread 1 cannot allocate new log, sequence 186456
Checkpoint not complete
Current log# 11 seq# 186455 mem# 0: +ONLINE_REDO/xxxxxxxxxxxxxxxxxxxxxx/ONLINELOG/group_11.256.1087565999
2022-04-19T15:03:17.785113+05:30
But the alert log was not sufficient to give us any concrete evidences or reasons for CKPT bloking user sessions. So, next we decided to generate the HANGANALYZE and SYSTEMSTATEDUMPs to understand what’s all happening under the hood, through its wait chains. We noticed few interesting things there
Wait chain 1 where a session was waiting on ‘buffer busy waits‘ while doing the “bitmapped file space header” which talks about the space management blocks (nothing to with bitmap indexes) and was related with one SELECT statement.
Wait chain 2 where a session was found waiting on ‘enq: CR – block range reuse ckpt‘ event and was blocked by CKPT process (3817) which was further waiting on ‘control file sequential read‘
Wait chain 4 where SID 1670, was found waiting on ‘buffer busy waits‘ while doing ‘bitmapped file space header’ operations.
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 46794
process id: 2014, oracle@monkeymachine.prod.fdt.swedish.se
session id: 15
session serial #: 19322
module name: 0 (xxxx.exe)
}
is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ bitmapped file space header
time in wait: 2 min 20 sec
timeout after: never
wait id: 10224
blocking: 0 sessions
current sql_id: 1835193535
current sql: SELECT * FROM ( SELECT xxxxx FROM task JOIN request ON xxxxx = xxxxx JOIN xxxxx ON xxxxx = xxxxx JOIN c_task_assignment_view ON xxxxx = xxxxx
.
.
.
and is blocked by
=> Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 23090
process id: 3231, oracle@monkeymachine.prod.fdt.swedish.se
session id: 261
session serial #: 39729
module name: 0 (xxxx.exe)
}
which is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ bitmapped file space header
time in wait: 18.999227 sec ◄◄◄
timeout after: never
wait id: 47356
blocking: 25 sessions ◄◄◄
current sql_id: 0
current sql: <none>
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-read()+14<-snttread()+16<-nttfprd()+354<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+910<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
wait history:
* time between current wait and wait #1: 0.000084 sec
1. event: 'SQL*Net message from client'
time waited: 0.008136 sec
wait id: 47355 p1: 'driver id'=0x28444553
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.000043 sec
2. event: 'SQL*Net message to client'
time waited: 0.000002 sec
wait id: 47354 p1: 'driver id'=0x28444553
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000093 sec
3. event: 'SQL*Net message from client'
time waited: 2.281674 sec
wait id: 47353 p1: 'driver id'=0x28444553
p2: '#bytes'=0x1
}
and may or may not be blocked by another session.
.
.
.
Chain 2:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 10122
process id: 2850, oracle@monkeymachine.prod.fdt.swedish.se
session id: 97
session serial #: 47697
module name: 0 (xxxx.exe)
}
is waiting for 'enq: CR - block range reuse ckpt' with wait info: ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
{
p1: 'name|mode'=0x43520006
p2: '2'=0x10b22
p3: '0'=0x1
time in wait: 2.018004 sec
timeout after: never
wait id: 81594
blocking: 0 sessions
current sql_id: 1335044282
current sql: <none>
.
.
.
and is blocked by
=> Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 110528
process id: 24, oracle@monkeymachine.prod.fdt.swedish.se
session id: 3817 ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ CKPT background process
session serial #: 46215
}
which is waiting for 'control file sequential read' with wait info: ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
{
p1: 'file#'=0x0
p2: 'block#'=0x11e
p3: 'blocks'=0x1
px1: 'disk number'=0x4
px2: 'au'=0x34
px3: 'offset'=0x98000
time in wait: 0.273981 sec
timeout after: never
wait id: 17482450
blocking: 45 sessions ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
current sql_id: 0
current sql: <none>
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2199<-kslwaitctx()+205<-ksarcv()+320<-ksbabs()+602<-ksbrdp()+1167<-opirip()+541<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
wait history:
* time between current wait and wait #1: 0.000012 sec
1. event: 'control file sequential read'
time waited: 0.005831 sec
wait id: 17482449 p1: 'file#'=0x0
p2: 'block#'=0x11
p3: 'blocks'=0x1
* time between wait #1 and #2: 0.000012 sec
2. event: 'control file sequential read'
time waited: 0.011667 sec
wait id: 17482448 p1: 'file#'=0x0
p2: 'block#'=0xf
p3: 'blocks'=0x1
* time between wait #2 and #3: 0.000017 sec
3. event: 'control file sequential read'
time waited: 0.009160 sec
wait id: 17482447 p1: 'file#'=0x0
p2: 'block#'=0x1
p3: 'blocks'=0x1
}
.
.
.
Chain 4:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 46479
process id: 1036, oracle@monkeymachine.prod.fdt.swedish.se
session id: 1670
session serial #: 6238
module name: 0 (xxxx.exe)
}
is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ bitmapped file space header
time in wait: 18.954206 sec ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
timeout after: never
wait id: 20681
blocking: 0 sessions
current sql_id: 343919375
current sql: SELECT * FROM ( SELECT xxxxx FROM task JOIN request ON xxxxx = xxxxx JOIN xxxxx ON xxxxx = xxxxx JOIN c_task_assignment_view ON xxxxx = xxxxx
.
.
.
and is blocked by
=> Oracle session identified by:
{
instance: 1 (kpkpkpkpk.kpkpkpkpk)
os id: 44958
process id: 523, oracle@monkeymachine.prod.fdt.swedish.se
session id: 4681
session serial #: 41996
module name: 0 (xxxx.exemonkeymachine.prod.fdt.swedish.se (TNS)
}
which is waiting for 'buffer busy waits' with wait info: ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
{
p1: 'file#'=0xca
p2: 'block#'=0x2
p3: 'class#'=0xd ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄ bitmapped file space header
time in wait: 18.959429 sec ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
timeout after: never
wait id: 153995
blocking: 101 sessions ◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄◄
current sql_id: 343919375
current sql: INSERT INTO xx_xxx_xx(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) VALUES (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2199<-kslwaitctx()+205<-ksqcmi()+21656<-ksqcnv()+809<-ksqcov()+95<-kcbrbr_int()+2476<-kcbrbr()+47<-ktslagsp()+2672<-ktslagsp_main()+945<-kdliAllocCache()+37452<-kdliAllocBlocks()+1342<-kdliAllocChunks()+471<-kdliWriteInit()+1249<-kdliWriteV()+967<-kdli_fwritev()+904<-kdlxNXWrite()+577<-kdlx_write()+754<-kdlxdup_write1()+726<-kdlwWriteCallbackOld_pga()+1982<-kdlw_write()+1321<-kdld_write()+410<-kdl
wait history:
* time between current wait and wait #1: 0.756195 sec
1. event: 'direct path write temp'
time waited: 0.406543 sec
wait id: 153994 p1: 'file number'=0xc9
p2: 'first dba'=0xb28fc
p3: 'block cnt'=0x4
* time between wait #1 and #2: 0.000001 sec
2. event: 'ASM IO for non-blocking poll'
time waited: 0.000000 sec
wait id: 153993 p1: 'count'=0x4
p2: 'where'=0x2
p3: 'timeout'=0x0
* time between wait #2 and #3: 0.000002 sec
3. event: 'ASM IO for non-blocking poll'
time waited: 0.000001 sec
wait id: 153992 p1: 'count'=0x4
p2: 'where'=0x2
p3: 'timeout'=0x0
}
and may or may not be blocked by another session.
.
.
Though, we wanted to try couple of hidden parameters to enable fast object level truncation and checkpointing, as they had helped us a lot in the past in similar scenarios, but had to involve Oracle support and after carefully analyzing the situation, they too agreed and want us to try them as they started suspecting it as an aftermath of a known bug of 19c.
[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Sun Apr 24 00:24:47 2022
Version 19.15.0.0.0
Copyright (c) 1982, 2022, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0
SQL>
SQL> @hidden
Enter value for param: db_fast_obj_truncate
old 5: and a.ksppinm like '%¶m%'
new 5: and a.ksppinm like '%db_fast_obj_truncate%'
Parameter Session Value Instance Value descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_db_fast_obj_truncate TRUE TRUE enable fast object truncate
_db_fast_obj_ckpt TRUE TRUE enable fast object checkpoint
SQL> ALTER SYSTEM SET "_db_fast_obj_truncate" = false sid = '*';
System altered.
SQL>
SQL> ALTER SYSTEM SET "_db_fast_obj_ckpt" = false sid = '*';
System altered.
SQL>
And soon after setting them, we saw a drastic drop in the waits and system seemed better, much better. But it all happened during an off-peak hour, so there wasn’t much of a workload to see anything odd.
And as we suspected, the issue repeated itself, and next day during peak business hours we started seeing the same issue, same set of events back into existence. This time the ‘latch: cache buffers chains‘ was quite high and prominent which was not that much earlier.
Initially we tried to fix some of the expensive statements on ‘logical IOs’ or memory reads, but that hardly helped. The issue persisted even after setting a higher value for db_block_hash_latches and decreasing cursor_db_buffers_pinned. AWR continues to show ‘latch: cache buffers chains’ in the top ten, foreground timed events, and ‘latch free‘ in first place in the background timed events. Oracle support confirmed the behavior was due to published bug 33025005 where excessive Latch CBC waits were seen after upgrading from 12c to 19c, and suggested to apply patch 33025005 and then to set hidden parameter “_post_multiple_shared_waiters” to value FALSE (in MEMORY only to test) which disables multiple shared waiters in the database.
-- After applying patch 33025005
SQL> ALTER SYSTEM SET "_post_multiple_shared_waiters" = FALSE SCOPE = MEMORY;
System altered.
And even after applying the patch and setting the recommended undocumented parameter, the issue persisted and we were totally clueless.
And as a last resort, we tried to flush the buffer cache of the database, and bingo that crude method of purging the cache helped to drastically to reduce the load on ‘CBC Latching‘ and for ‘enq: cr block range reuse ckpt‘, and the system ran fine soon after the flush of the DB Buffer cache.
So, nothing worked for us, we changed multiple checkpointing and shared writers related parameters in the database, applied a bug fix patch (33025005), but of no use. Finally, the flush of buffer cache worked for us! Oracle support agreed that this was happening due to a new/unpublished bug (33125873 or 31844316) which is not yet fixed in 19.15 and will be included in 23.1, and they are in status 11 that means still being worked by Development so there is no fix for it yet.
Today’s post is about one of the good featured added into Oracle 19c for DBMS_XPLAN package that provides an easy way to display the output of the EXPLAIN PLAN using a new function called COMPARE_PLANS. This function compares each plan in the list to a reference plan and returns a report and makes life of a DBA little easy.
The following example illustrates the usage of the COMPARE_PLANS function in 19c.
-- Lets create a sample table
SQL> CREATE TABLE bigtab (
id NUMBER,
weight NUMBER,
adate DATE
);
Table created.
-- Insert some dummy rows into it
SQL> INSERT INTO bigtab (id, weight, adate)
SELECT MOD(ROWNUM,1000),
DBMS_RANDOM.RANDOM,
SYSDATE-1000+DBMS_RANDOM.VALUE(0,1000)
FROM all_objects
/
84217 rows created.
SQL>
-- Lets query the table with our conditions
SQL> select count(*) from bigtab where ID between 280 and 500;
COUNT(*)
----------
18564
SQL>
SQL> select count(*) from bigtab where ID not between 280 and 500;
COUNT(*)
----------
65653
-- All set lets get the SQLID for both of the SQLs that we ran
SQL> set linesize 400 pagesize 400
SQL> select sql_id, sql_text from v$sqlarea where sql_text like '%from bigtab where ID%';
SQL>
SQL_ID SQL_TEXT
------------- --------------------------------------------------------------------------------
3k1b0y7scc8bd select count(*) from bigtab where ID not between :"SYS_B_0" and :"SYS_B_1"
51tfy4wqb5sg3 select count(*) from bigtab where ID between :"SYS_B_0" and :"SYS_B_1"
Now we have SQLIDs for both of the SQLs I ran in last session, let’s call DBMS_XPLAN.COMPARE_PLANS and compare both of the plans and see how the compare reports looks like and what all information it presents. There are multiple parameters available with the reference_plan and compare_plan_list, but to keep it simple I will be using only cursor_cache_object (to get all plans in the cursor cache generated for SQL ID) and plan_object_list(cursor_cache_object) which is a TYPE
Other available plan sources are: plan_table_object(owner, plan_table_name, statement_id, plan_id) : for schedules cursor_cache_object(sql_id, child_number) : For cursor cache awr_object(sql_id, dbid, con_dbid, plan_hash_value) : for Hours or if you want to check for two different PHVs sqlset_object (sqlset_owner, sqlset_name, sql_id, plan_hash_value) : for SQL Tuning Set spm_object (sql_handle, plan_name) : For SQL Plan management sql_profile_object (profile_name) : For SQL configuration file
Let’s go simple and will use cursor_cache_object with plan_object_list to get a simple plain execution plan compare report.
SQL>
SQL> VARIABLE v_rep CLOB
SQL>
SQL> BEGIN
:v_rep := DBMS_XPLAN.COMPARE_PLANS(
reference_plan => cursor_cache_object('3k1b0y7scc8bd', NULL),
compare_plan_list => plan_object_list(cursor_cache_object('51tfy4wqb5sg3', NULL)),
type => 'TEXT',
level => 'TYPICAL',
section => 'ALL');
END;
/
PL/SQL procedure successfully completed.
SQL>
-- Now lets call the variable to see contents inside, the compare report.
SQL> set long 100000
SQL> COLUMN report FORMAT a200
SQL>
SQL> SELECT :v_rep REPORT FROM DUAL;
REPORT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
COMPARE PLANS REPORT
---------------------------------------------------------------------------------------------
Current user : SYS
Total number of plans : 2
Number of findings : 2
---------------------------------------------------------------------------------------------
COMPARISON DETAILS
---------------------------------------------------------------------------------------------
Plan Number : 1 (Reference Plan)
Plan Found : Yes
Plan Source : Cursor Cache
SQL ID : 3k1b0y7scc8bd
Child Number : 0
Plan Database Version : 19.0.0.0
Parsing Schema : "SYS"
SQL Text : select count(*) from bigtab where ID not between
:"SYS_B_0" and :"SYS_B_1"
Plan
-----------------------------
Plan Hash Value : 2140185107
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 82 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| * 2 | TABLE ACCESS FULL | BIGTAB | 54028 | 702364 | 82 | 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 2 - filter(("ID"<:SYS_B_0 OR "ID">:SYS_B_1))
Notes
-----
- Dynamic sampling used for this statement ( level = 2 )
---------------------------------------------------------------------------------------------
Plan Number : 2
Plan Found : Yes
Plan Source : Cursor Cache
SQL ID : 51tfy4wqb5sg3
Child Number : 0
Plan Database Version : 19.0.0.0
Parsing Schema : "SYS"
SQL Text : select count(*) from bigtab where ID between
:"SYS_B_0" and :"SYS_B_1"
Plan
-----------------------------
Plan Hash Value : 3865534252
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 79 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| * 2 | FILTER | | | | | |
| * 3 | TABLE ACCESS FULL | BIGTAB | 17969 | 233597 | 79 | 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 2 - filter(:SYS_B_1>=:SYS_B_0)
* 3 - filter(("ID">=:SYS_B_0 AND "ID"<=:SYS_B_1))
Notes
-----
- Dynamic sampling used for this statement ( level = 2 )
Comparison Results (2):
-----------------------------
1. Query block SEL$1, Alias "BIGTAB"@"SEL$1": Some lines (id: 2) in the
current plan are missing in the reference plan.
2. Query block SEL$1, Alias "BIGTAB"@"SEL$1": Some columns (ID, PARENT_ID,
DEPTH) do not match between the reference plan (id: 2) and the current plan
(id: 3).
---------------------------------------------------------------------------------------------
SQL>
SQL>
This is great feature when the execution plans are messy and behemoth and just a glance is not enough and you want to compare two such execution plans. This will present a ‘comparison result’ at the end of the report like in our example the report says the Query Block with name SEL$1 which is an alias of BIGTAB in the current plan doesn’t have the line ID 2 (FILTER method in our case) and similar other observations.
Recently while looking into a system (was running on 19.3.0.0.0 standalone) where ‘log file sync’ was bugging the database, and after we tried all other possible solutions, we thought to increase the priroty of the LGWR background process to see if that helps.
Increasing the LGWR priority is putting the LGWR process in the Round-Robin (SCHED_RR) class. You can increase process’s priority both using OS (renice, nice commands) or Database methods, but this post is about setting the priority using ‘_high_priority_process’ an undocumented/hidden parameter that prioritizes your database managed processes.
I am using Oracle 19.3 for the test where the LGWR is not by default comes with any priority in the DB, starting from 21.3.0.0.0 LGWR process is part of _high_priority_processes group along with VKTM & LMS* processes. Note: This being a hidden/undocumented parameter I advise to consult with Oracle support before going and changing the parameter value. Try other possible ways to reduce log file sync, before jumping into this crude method of prioritizing LGWR over others.
[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Sun Apr 10 03:36:06 2022
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> @hidden
Enter value for param: high_prio
old 5: and a.ksppinm like '%¶m%'
new 5: and a.ksppinm like '%priority_processes%'
Parameter Session Value Instance Value descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_highest_priority_processes VKTM VKTM Highest Priority Process Name Mask
_high_priority_processes LMS*|VKTM LMS*|VKTM High Priority Process Name Mask
And by default in Oracle version 19.3.0.0 the parameter is set to prioritize VKTM (Virtual keeper of time) and LMS (Lock Manager, a RAC process). Let me check VKTM’s current priority class, and it is set to RR class (SCHED_RR scheduling class) for the process as its defined via _high_priority_processes parameter.
Let’s change the priority and reboot the database to persistent the change!
SQL> alter system set "_high_priority_processes"='LMS*|VKTM|LGWR' scope=spfile;
System altered.
SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1593831936 bytes
Fixed Size 8897024 bytes
Variable Size 1107296256 bytes
Database Buffers 469762048 bytes
Redo Buffers 7876608 bytes
Database mounted.
Database opened.
SQL> @hidden
Enter value for param: high_priority_processes
old 5: and a.ksppinm like '%¶m%'
new 5: and a.ksppinm like '%high_priority_processes%'
Parameter Session Value Instance Value descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_high_priority_processes LMS*|VKTM|LGWR LMS*|VKTM|LGWR High Priority Process Name Mask
At the same time I can see the same was logged into the Alert log file.
2022-04-10T03:54:31.488732-04:00
LGWR started with pid=8, OS id=26058 at elevated (RT) priority
So, we have reniced the priority of LGWR on the system, I mean the higher value of priority actually makes the process lower priority; it means the process demands fewer system resources (and therefore is a “nicer” process). Now lets check the scheduling class of the process at the OS, it should be now changed to RR from TS.
Last week I was part of one system/database stability short term assignment where customer running a critical Telco application on 2-Node RAC Cluster (RHEL) on 11.2.0.3.1 reported slowness in few of the critical data processing modules, which in turn had slowed down their entire system. Todays post is about buggy behavior of an adaptive feature which had caused a huge mess in the system in the form of ‘Log File Sync‘ waits.
After few of the initial calls with the customer, we come to know that the database has a history of high ‘log file sync‘ waits, but they simply come and go, and it seemed that the events were never handled correctly or being analyzed! And interestingly, restarting the cluster resolves the issue for few hours, sometimes for few days. LFS event was quite prominently seen in this database and on an average found consuming > 38% of the total DB Time% available with very high average wait times (248 ms).
Below are few of the database statistics captured from the problem time.
About the event, When a user session commits, all redo records generated by that session’s transaction need to be flushed from memory to the redo logfile to insure changes to the database made by that transaction become permanent. The time between the user session posting the LGWR and the LGWR posting the user after the write has completed is the wait time for ‘log file sync’ that the user session will show. This event is known as the time lost as a result of the LGWR process waiting while users initiate a Transaction Commit or Rollback.
Next in order to get more idea about the event, I ran the lfsdiag.sql (Oracle provided script to catch diag info on it – Doc ID 1064487.1)). The script will look at the important parameters involved in log file sync waits, wait histogram data, and at the worst average LFS times in the active session history data and AWR data and dump information to help determine why those times were the highest.
Below are the ASH LFS background process waits caught during its worst minute and it had some really bad stats captured for the worst minute for the same time when customer had worst application performance.
Another intersting section was the Histogram data for LFS and other related events. Here was can see the LFS waits at “wait_time_milli” and specially the high wait times to correlate them with other wait events. From below stats its evident that Node 2 of this RAC cluster was severely impacted with the LFS waits., with very high wait counts and wait times (ms), and lot of ‘gcs log flush sync’ along with LFS events that pushes LGWR process to write data to the disk.
The stats are very bad, with highest wait time of 1048576 ms on Node 2 and average of 99864 ms (1.6 mins), both ‘log file parallel write’ and ‘gcs log flush sync’ were quite high too.
As a possible solution, we tried few of the quick ones i.e. batching redo (commit_logging = batch) for the log writer (I know it has its own risks) to reduce LFS, but that didn’t worked either.
Next thing I’d generated the system state dump to understand the system and situation bettern, and the LFS events were caught in the system state dump as well with wait chains pointing to ‘rdbms ipc message'<=’log file sync’.
Process traces are always considered a wealth of diagnostic information, So I’d checked the LGWR process traces and thats where I saw some strangeness with frequent entries related with switching between post/wait and polling method which is an adaptive way to control switching between post/wait (older way) and polling (new method) for log file syncs. This gave me little hint about the possible reason on why so many LFS waits.
Talking about the adaptive log file sync, there are 2 methods by which LGWR and foreground processes can communicate in order to acknowledge that a commit has completed: Post/wait: traditional method available in previous Oracle releases LGWR explicitly posts all processes waiting for the commit to complete. The advantage of the post/wait method is that sessions should find out almost immediately when the redo has been flushed to disk. Polling: Foreground processes sleep and poll to see if the commit is complete, this was introduced to free high CPU usage by the LGWR.
This behavior is controlled by the parameter “_use_adaptive_log_file_sync” and was introduced in 11gR2 and controls whether adaptive switching between post/wait and polling is enabled. In 11.2.0.1 and 11.2.0.2 the default value for the parameter is false. From 11.2.0.3, the default value has been changed to true.
-- LGWR traces
WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128
WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=1048576 AIO-NR=169402
*** 2022-04-07 06:03:31.916
Warning: log write broadcast wait time 2612477ms (SCN 0xad1.f8c170fe)
*** 2022-04-07 06:03:31.916
Warning: log write broadcast wait time 2598008ms (SCN 0xad1.f8c21251)
kcrfw_update_adaptive_sync_mode: post->poll long#=33 sync#=202 sync=5963 poll=8730 rw=383 rw+=383 ack=3982 min_sleep=1135
*** 2022-04-07 07:46:20.018
Log file sync switching to polling --------------------->>>>>>> It shows current method is polling
Current scheduling delay is 1 usec
Current approximate redo synch write rate is 67 per sec
*** 2022-04-07 07:47:13.877
kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=63 switch_sync_count_delta=202
*** 2022-04-07 07:47:13.877
Log file sync switching to post/wait ------------------>>>>>>>> It shows current method is post/wait
Current approximate redo synch write rate is 21 per sec
-- Below stats shows that the POLLING is happening on the database for LGWR wrtes
SQL> select name,value from v$sysstat where name in ('redo synch poll writes','redo synch polls');
NAME VALUE
---------------------------------------------------------------- ----------
redo synch poll writes 10500129
redo synch polls 10773618
As there were no other symptoms of issues with I/O or in other areas, the problem could be with excessive switching between post/wait and polling wait methods.
After consulting with Oracle support, we found the issue was happening due to BUG (25178179) and the issue gets severe when only while using log file sync “polling mode“. To prevent the problem from happening, they suggested us to turn off the _use_adaptive_log_file_sync by setting it to FALSE in either in the spfile and restarting the database or dynamically in memory. This will force the log file sync waits to use (the traditional) “post/wait mode” rather than the automatically switching between “post/wait mode” and “polling mode” based on performance statistics. Changing the said parameter to FALSE disabled adaptive LFS in the database and that resolved the issue and system performance was restored.
-- Set the parameter _use_adaptive_log_file_sync = false and restart the database:
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync" = FALSE;
-- In cases where a restart is not feasible, then you can set in memory and also in the SP file for when a restart does occur:
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync"=FALSE; -- to set in memory and spfile as by default scope=both
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync"=FALSE scope=sfile sid='*'; -- to set on spfile, so parameter is used at next restart.
Though In the vast majority of cases adaptive log file sync improves the overall performance of log file synchronization, but there are few bugs associated with this feature i.e. 13707904, 13074706 and 25178179.
Recently someone asked me, How to run oratop utility on Oracle database version that is above 12.1 ? He was using Oracle version 19.3.0.0.0 and was getting error ‘error while loading shared libraries’ when tried to call the oratop, and it was quite an obvious error as the script trying to locate a shared object file called libclntshcore.so under $ORACLE_HOME/lib directory for version 12.1, whereas it was version 19.1 and hence the shared object version was libclntshcore.so.19.1
[oracle@oracleontario ~]$ ./oratop.RDBMS_12.1_LINUX_X64 -d -f -i 3 / as sysdba
./oratop.RDBMS_12.1_LINUX_X64: error while loading shared libraries: libclntshcore.so.12.1: cannot open shared object file: No such file or directory
[oracle@oracleontario ~]$
Solution to the problem is, starting from Oracle 19c, oratop is part of Oracle supplied tool and is present under ORACLE_HOME/suptools directory, and is not version specific. Other tools that you will see along with oratop in suptools directory are orachk and tfa. Let’s call it and see if it runs!
[oracle@oracleontario ~]$ cd $ORACLE_HOME/suptools/
[oracle@oracleontario suptools]$ ls
orachk oratop tfa
[oracle@oracleontario suptools]$ cd oratop/
[oracle@oracleontario oratop]$ ls
oratop
[oracle@oracleontario oratop]$
[oracle@oracleontario oratop]$
[oracle@oracleontario oratop]$ ./oratop -f -i 3 / as sysdba
oratop: Release 15.0.0 Production on Thu Mar 10 07:33:49 2022
Copyright (c) 2011, Oracle. All rights reserved.
Connecting ..
Processing ...
Oracle 19c - Primary dixitd 07:33:45 up: 5.2h, 1 ins, 0 sn, 0 us, 1.5G sga, 0% fra, 1 er, 0.5%db
ID CPU %CPU %DCP LOAD AAS ASC ASI ASW IDL ASP LAT MBPS IOPS R/S W/S LIO GCPS %FR PGA TEMP UTPS UCPS RT/X DCTR DWTR
1 1 9.1 0.2 0.2 0.0 0 0 0 0 0 0 0.1 3 3 0 3 0 45 153M 0 0 0 110m 118 8
EVENT (C) TOTAL WAITS TIME(s) AVG_MS PCT WAIT_CLASS
db file sequential read 5175 65 12.6 32 User I/O
DB CPU 42 21
oracle thread bootstrap 81 40 503.8 20 Other
db file scattered read 593 29 50.4 15 User I/O
external table read 1 23 23470.8 12 User I/O
ID SID SPID USERNAME PROGRAM SRV SERVICE PGA SQLID/BLOCKER OPN E/T STA STE WAIT_CLASS EVENT/*LATCH W/T
1 58 14884 SYS sqlplus@o DED SYS$USE 1.4M 1:77 UPD 8.0s ACT WAI Applicatio enq: TX - row lock co 7.8s
[oracle@oracleontario oratop]$
[oracle@oracleontario oratop]$
Recently one of the customer asked me about their critical SQL on a production system running on Oracle 12cR2, if there is a way to pin it in the AWR. I mean the query was already super fast, gets completed very fast and doesn’t consume any system resources, or isn’t a ‘Top SQL’. He still want to track this low resource SQL historically to check few stats, the guys was coming from an Oracle DBA background, and got the impression that the AWR only captures statements considered to be Top N (DBA_HIST_WR_CONTROL.TOPNSQL) which is by default 30. So, If you require any SQL to be always available in the top SQL, then you need to modify the default value to store those many SQLs. But this will create a AWR snapshot stress and that he don’t want.
The answer was ‘Yes’, there is a way how this can be achieved, and that’s too quite easily, and is even present since 11gR1 without making any modifications to your snapshot settings of topnsql. This is by using DBMS_WORKLOAD_REPOSITORY.ADD_COLORED_SQL API to pin any SQL into the AWR report as long as the SQL is in memory. So, if you mark your SQL or color it, it will be captured in every snapshot and it doesn’t have to be a TOP-N SQL to present in an AWR.
Let’s do a demo to understand how we can do that. I am going to run a test SQL to count number of rows into the table.
SQL> select count(*) from TABLE_GROWTH_MB;
COUNT(*)
----------
2868
SQL> select sql_id, plan_hash_value, sql_text from v$sqlarea where sql_text like '%TABLE_GROWTH_MB%';
SQL_ID PLAN_HASH_VALUE SQL_TEXT
------------- --------------- --------------------------------------------------
4n3x6w1vhn1k3 1134407048 select count(*) from TABLE_GROWTH_MB
Okay, so we have executed the SQL and we have its SQL_ID with us now. Lets try to use awr_sql_report_text function that displays an Automatic Workload Repository (AWR) SQL report as plain text to see if the SQL is present under the latest snapshot range.
SQL>
SQL> select output from table (dbms_workload_repository.awr_sql_report_text(l_dbid => 2592296819 , l_inst_num => 2,
l_bid => 36708, l_eid => 36718, l_sqlid => '4n3x6w1vhn1k3'));
OUTPUT
--------------------------------------------------------------------------------
WORKLOAD REPOSITORY SQL Report
Snapshot Period Summary
DB Name DB Id Unique Name DB Role Edition Release RAC CDB
------------ ----------- ----------- ---------------- ------- ---------- --- ---
FATDBACA 2592296819 FATDBACA PRIMARY EE 12.2.0.1.0 YES NO
Instance Inst Num Startup Time
------------ -------- ---------------
FATDBACA2 2 26-Oct-21 23:10
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 36708 10-Mar-22 16:00:37 190 2.5
End Snap: 36718 10-Mar-22 18:30:06 181 2.6
Elapsed: 149.48 (mins)
DB Time: 91.13 (mins)
SQL Summary DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36718
No data exists for this section of the report.
-------------------------------------------------------------
SQL ID: 4n3x6w1vhn1k3 DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36718
No data exists for this section of the report.
-------------------------------------------------------------
Full SQL Text
No data exists for this section of the report.
SQL>
SQL>
SQL>
SQL>
And the report has no data about the SQL as the SQL was pretty fast, and hardly took few milliseconds to finish and is expected as it didn’t qualified as a TOP-SQL hence found no place in the AWR report. I am now going to color the SQLID and will re-run the SQL and check if its there in AWR now or not …
-- to PIN the SQLID in question
SQL> exec dbms_workload_repository.add_colored_sql('4n3x6w1vhn1k3');
PL/SQL procedure successfully completed.
SQL>
-- To UNPIN or remove the COLOR SQLID from repo
SQL> exec dbms_workload_repository.remove_colored_sql('<SQL_ID>');
-- Query below view to see list of all colored SQLs
SQL> select * from DBA_HIST_COLORED_SQL;
DB Id SQL_ID CREATE_TI CON_ID
----------- ------------- --------- ----------
2592296819 4n3x6w1vhn1k3 10-MAR-22 0
-- Re ran the SQL
SQL> select count(*) from TABLE_GROWTH_MB;
COUNT(*)
----------
2868
-- Let me generate a manual snap to capture fresh data
SQL>
SQL>
SQL> var n_snapid number
set feed off term on head off
set serveroutput on size unlimited
begin
:n_snapid := dbms_workload_repository.create_snapshot();
dbms_output.put_line('snap_id: ' || to_char(:n_snapid));
end;
/
snap_id: 36719
SQL>
-- Let me retry again to see if the SQLID details are there in the AWR after I have colored it
SQL> SQL> select output from table (dbms_workload_repository.awr_sql_report_text(l_dbid => 2592296819 , l_inst_num => 2,
l_bid => 36718, l_eid => 36719, l_sqlid => '4n3x6w1vhn1k3'));
OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
WORKLOAD REPOSITORY SQL Report
Snapshot Period Summary
DB Name DB Id Unique Name DB Role Edition Release RAC CDB
------------ ----------- ----------- ---------------- ------- ---------- --- ---
FATDBACA 2592296819 FATDBACA PRIMARY EE 12.2.0.1.0 YES NO
Instance Inst Num Startup Time
------------ -------- ---------------
FATDBACA2 2 26-Oct-21 23:10
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 36708 10-Mar-22 16:00:37 190 2.5
End Snap: 36719 10-Mar-22 18:39:40 177 2.6
Elapsed: 159.06 (mins)
DB Time: 93.43 (mins)
SQL Summary DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36719
Elapsed
SQL Id Time (ms)
------------- ----------
Container DB
------------
4n3x6w1vhn1k3 0
Module: SQL*Plus
select count(*) from TABLE_GROWTH_MB
2592296819
-------------------------------------------------------------
SQL ID: 4n3x6w1vhn1k3 DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36719
-> 1st Capture and Last Capture Snap IDs
refer to Snapshot IDs witin the snapshot range
-> select count(*) from TABLE_GROWTH_MB
Plan Hash Total Elapsed 1st Capture Last Capture
# Value Time(ms) Executions Snap ID Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1 1134407048 0 1 36719 36719
-------------------------------------------------------------
Plan 1(PHV: 1134407048)
-----------------------
Plan Statistics DB/Inst: FATDBACA/FATDBACA2 Snaps: 36708-36719
-> % Total DB Time is the Elapsed Time of the SQL statement divided
into the Total Database Time multiplied by 100
Stat Name Statement Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms) 0 0.2 0.0
CPU Time (ms) 0 0.2 0.0
Executions 1 N/A N/A
Buffer Gets 22 22.0 0.0
Disk Reads 0 0.0 0.0
Parse Calls 1 1.0 0.0
Rows 1 1.0 N/A
User I/O Wait Time (ms) 0 N/A N/A
Cluster Wait Time (ms) 0 N/A N/A
Application Wait Time (ms) 0 N/A N/A
Concurrency Wait Time (ms) 0 N/A N/A
Invalidations 0 N/A N/A
Version Count 1 N/A N/A
Sharable Mem(KB) 19 N/A N/A
-------------------------------------------------------------
Execution Plan
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 10 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TABLE_GROWTH_MB | 2624 | 10 (0)| 00:00:01 |
------------------------------------------------------------------------------
Full SQL Text
SQL ID SQL Text
------------- -----------------------------------------------------------------
4n3x6w1vhn1k3 select count(*) from TABLE_GROWTH_MB
Perfect, the SQLID is there in the AWR now, and we can see all runtime stats, execution plan, SQL Text etc.
Oracle 19c has a great enhancement added to the DBMS_XPLAN package called the ‘Hint Usage Report’, which is pretty helpful in reporting if an incorrect hint was used, or there was any syntax error with the hint or if there was an unresolved hint used in the SQL.
This you can use with the DBMS_XPLAN package and its functions like DISPLAY, DISPLAY_CURSOR, DISPLAY_WORKLOAD_REPOSITORY or DISPLAY_SQL_PLAN_BASELINE. By default, DBMS_XPLAN in the default TYPICAL format will report only invalid hints. The report shows the text of the hints. The hint may also have one of the following annotations:
E : indicates a syntax error. N : indicates an unresolved hint. U : indicates that the corresponding hint was not used in the final plan.
Lets do some tests to understand it better. For our tests I have a test table created called BIGTAB and loaded it with sample data. First case, I will try the PARALLEL_INDEX hint with one of the Index (non-partitioned) to see what the Hint Report says.
SQL> explain plan for select /*+ PARALLEL_INDEX(BIGTAB,WEIGHT_IDX,2) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'TYPICAL'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 441133017
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13613 | 252K| 70 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| BIGTAB | 13613 | 252K| 70 (2)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("WEIGHT">=(-168444405) AND "WEIGHT"<=639413572)
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 - SEL$1 / BIGTAB@SEL$1
U - PARALLEL_INDEX(BIGTAB,WEIGHT_IDX,4)
20 rows selected.
SQL>
Here we are getting UNUSED (U) for PARALLEL_INDEX hint that we have used against index WEIGHT_INDEX, but it wasn’t honored because the PARALLEL_INDEX hint parallelizes an index range scan of a partitioned index. This is a common mistake with parallel_index hint is using it against a non-partitioned index, where it will be ignored.
Let me try the next case where I will make a syntax error with the hint INDEX, will pass it as INDEXX and see what the report says.
06:04:55 SQL>
06:04:55 SQL>
06:04:55 SQL> explain plan for select /*+ INDEXX(BIGTAB,WEIGHT_IDX) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
Elapsed: 00:00:00.05
06:08:35 SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
1 - SEL$1
E - INDEXX
15 rows selected.
Elapsed: 00:00:00.17
06:08:40 SQL>
And it says that there is was a syntax error (E) with the HINT, hence ignored by the optimizer.
Now, I will be trying a correct/valid hint with the SQL Statement and see what the hint report says about it.
SQL> explain plan for select /*+ INDEX(BIGTAB,WEIGHT_IDX) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL>
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2994241301
----------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BIGTAB |
| 2 | INDEX RANGE SCAN | WEIGHT_IDX |
----------------------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
1 - SEL$1 / BIGTAB@SEL$1
- INDEX(BIGTAB,WEIGHT_IDX)
16 rows selected.
And the report says that no error was reported and hint was correctly used, same can be seen in the change of access path of the execution plan.
Next, let me try any optimizer related hint, I will try the optimizer_index_cost_adj HINT to make the SQL more or less index friendly
03:32:43 SQL> explain plan for select /*+ opt_param('optimizer_index_cost_adj',20) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
Elapsed: 00:00:00.00
03:32:53 SQL>
03:32:58 SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
0 - STATEMENT
- opt_param('optimizer_index_cost_adj',20)
15 rows selected.
Elapsed: 00:00:00.02
And that worked too as no error or any status flag was reported in the hint report.
Next, I will try a non existing Index and will see what the report says about it.
SQL> explain plan for select /*+ INDEX(FATDBA_INDEX) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (N - Unresolved (1))
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------------------
1 - SEL$1
N - INDEX(FATDBA_INDEX)
15 rows selected.
SQL>
And the report says it was UNRESOLVED (N) as the Index doesn’t exists.
Let me try one more case where I will pass two bad or contradictory hints, will use both first_rows and all_rows together and will see what happens.
SQL> explain plan for select /*+ first_rows(1) all_rows */ * from BIGTAB;
Explained.
SQL> select * from dbms_xplan.display(format=>'basic +alias +hint_report');
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 441133017
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| BIGTAB |
------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - SEL$1 / BIGTAB@SEL$1
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2 (U - Unused (2))
---------------------------------------------------------------------------
0 - STATEMENT
U - all_rows / hint conflicts with another in sibling query block
U - first_rows(1) / hint conflicts with another in sibling query block
21 rows selected.
SQL>
The preceding report shows that the all_rows and first_rows(1) CBO mode hints conflict with one other. As the First_rows attempts to optimize the query to get the very first row back to the client as fast as possible, whereas All_rows attempts to optimize the query to get the very last row as fast as possible.
Last example I will try to pass multiple hints where few of them will be valid and rest invalid. I will pass FULL & PARALLEL valid hints, and will use two INDEX unresolved hints and one UNRESOLVED hint of MERGE with query block.
SQL> explain plan for select /*+ FULL(bigtab) INDEX(WEIGHT_IDX) INDEX(ID_IDX) MERGE(@SEL$1) PARALLEL(2) */ * from bigtab where weight between -168444405 and 639413572;
Explained.
SQL> select * from dbms_xplan.display(format=>'basic +alias +hint_report');
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 305217024
-----------------------------------------
| Id | Operation | Name |
-----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | PX COORDINATOR | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 |
| 3 | PX BLOCK ITERATOR | |
| 4 | TABLE ACCESS FULL| BIGTAB |
-----------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
4 - SEL$1 / BIGTAB@SEL$1
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 5 (U - Unused (1), N - Unresolved (2))
---------------------------------------------------------------------------
0 - STATEMENT
- PARALLEL(2)
1 - SEL$1
N - INDEX(ID_IDX)
N - INDEX(WEIGHT_IDX)
U - MERGE(@SEL$1)
4 - SEL$1 / BIGTAB@SEL$1
- FULL(bigtab)
32 rows selected.
One of the very common scenarios that we DBAs has encountered are when the database is hanging, freeze or totally non responsive, or when your database crashes with a serious performance problem, or when the SQLPlus cannot connect to the server or when you noticed that we can not save the instance/node then most of the time DBA take the system state dump for further analysis or you can say for RCA purpose. In this case, if you want to obtain the current status of the database for later diagnosis, then we need to use SYSTEMSTATE dump to know what the process is doing, what it is waiting for, who is the resource owner, and who is blocking others. In case of the above problems, it is very helpful to analyze the cause of the problem by collecting SYSTEMSTATE dump in time.
In order to understand the purpose, let me create a situation, a problem, I will simulate a row locking (pessimistic type) case in to the database. From the first session I had created a table and insert few rows into it and didn’t commit and opened few others sessions and tried to update same row. And I can see a good amount of blocking is created in the database!
LOCK_TREE
----------
1.97
1.58
1.60
1.75
USERNAME SID SERIAL# TY LOCK_TYPE LOCK_REQUESTE ID1 ID2 SQL_TEXT
----------- ---------- ---------- -- ------------- ------------- ---------- ---------- ----------------------------------------
SYS 58 17036 TX None Exclusive 589839 1777 update test set id=09 where ID=10
SYS 60 21943 TX None Exclusive 589839 1777 update test set id=09 where ID=10
SYS 75 91 TX None Exclusive 589839 1777 update test set id=09 where ID=10
Right time to generate the SYSTEMSTATE dump, below is how you can generate that. Do system state dump few times in a row, so we can determine whether the processes are hung or active.
-- connect with the database if its not hang or freeze
SQL> connect / as sysdba
-- Connect using prelim option if its stalled or freeze
SQL> sqlplus -prelim / as sysdba
-- Use SETMYPID command to select the current process as the ORADEBUG process
SQL> oradebug setmypid
Statement processed.
-- This is to remove the limitation on the size of the trace file use.
-- In Oracle 8.1.6 and above the maximum size of the trace file defaults to UNLIMITED
SQL> oradebug unlimit
Statement processed.
-- This is to take the systemstate dumps with 266 level (short stack + dump)
SQL> oradebug dump systemstate 266
Statement processed.
-- Or use below in case running on a RAC setup
SQL> oradebug -g all dump systemstate 266
SQL> oradebug dump systemstate 266
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
-- This command prints the name of the current trace file
SQL> Oradebug tracefile_name
/u01/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_191381.trc
There are few other levels which you can use and are discussed below. Note: I highly recommend to check with Oracle support to use there levels as few of them with higher level are little risky and known to crash instances.
Level 2 : Takes a dump (not including the lock element)
Level 10 : Takes a dump including lock details
Level 11 : Takes a dump + global cache of RAC
Level 256 : Takes short stacks (function stack)
Level 258 (256 + 2) : Takes a short stack + dump (not including the lock element)
Level 266 (256 + 10) : Takes a short stack + dump
Level 267 (256 + 11) : Takes a short stack + dump + global cache of RAC
So we have artificially created a locking scenario on the database and we are aware about the sessions involved in this pessimistic type locking. Now we will use the same information and that will help us to navigate through the massive SYSTEMSTATE dumps. Lets dig into the trace file that we have generated for the session.
The first part or the header contains details like hostname, machine info, service, module and client details, followed by command processing’s that we did through ORADEBUG.
Next part is where you will see process summary and its dumps, including all user OSPIDs, their commands and blocking details with wait information and is the most interesting one and important section of the dump to understand the wait/blocking chain.
Dumping the Process Summary
1: PSEUDO process
2: PMON ospid 3904 sid 2 ser 9826, waiting for 'pmon timer'
3: CLMN ospid 3906 sid 3 ser 8008, waiting for 'pmon timer'
4: PSP0 ospid 3908 sid 4 ser 46393, waiting for 'rdbms ipc message'
5: VKTM ospid 3910 sid 5 ser 26025, waiting for 'VKTM Logical Idle Wait'
6: GEN0 ospid 3915 sid 6 ser 36023, waiting for 'rdbms ipc message'
7: MMAN ospid 3917 sid 7 ser 17720, waiting for 'rdbms ipc message'
.....
......
.......
47: USER ospid 5516 sid 79 ser 52211,
49: USER ospid 4429 sid 58 ser 17036, waiting for 'enq: TX - row lock contention'
Cmd: UPDATE
Blocked by inst: 1, sid: 97, ser: 2507
Final Blocker inst: 1, sid: 97, ser: 2507
50: USER ospid 4538 sid 75 ser 91, waiting for 'enq: TX - row lock contention'
Cmd: UPDATE
Blocked by inst: 1, sid: 97, ser: 2507
Final Blocker inst: 1, sid: 97, ser: 2507
51: W006 ospid 4290 sid 76 ser 17641, waiting for 'Space Manager: slave idle wait'
52: USER ospid 4636 sid 60 ser 21943, waiting for 'enq: TX - row lock contention'
Cmd: UPDATE
Blocked by inst: 1, sid: 97, ser: 2507
Final Blocker inst: 1, sid: 97, ser: 2507
55: M004 ospid 4722 sid 74 ser 6422, waiting for 'class slave wait'
And from the above trace section we understood that the final blocker is with SID 97, serial 2507 where SIDs 58,75 and 60 all waiting on event ‘enq: TX – row lock contention’.
Next section is where it dumps buffer cache working set information which contains size of the BC, number of pools, state information and other information. Followed by process level statistics – state, session, latching, post information along with short stack dumps which is debugging/triggering events that generates a trace file in case of any error and full chain. The trace also contains details about wait stacks, blocking chains (waiters), session event history,
Here in below section you will also see the SQL text (update test set id=09 where ID=10) and its details which has caused the blocking, its statistics, execution counts, active locking info, SQL ID, Table Info, HashValues and namespace info.
Next we will check more details about the blocker or the session that has blocked others, here SID 97 is the case.
(session) sid: 97 ser: 2507 trans: 0x7815a9a8, creator: 0x7d3d9878
flags: (0x41) USR/- flags2: (0x40009) -/-/INC
flags_idl: (0x0) status: -/-/-/- kill: -/-/-/-
DID: 0001-002E-000000010000-0000-00000000, short-term DID:
txn branch: (nil)
edition#: 134 user#/name: 0/SYS
oct: 0, prv: 0, sql: (nil), psql: 0x71cc7dc8
stats: 0x705a2930, PX stats: 0x128f8d84
service name: SYS$USERS
client details:
O/S info: user: oracle, term: pts/1, ospid: 4321
machine: oracleontario.ontadomain program: sqlplus@oracleontario.ontadomain (TNS V1-V3)
application name: sqlplus@oracleontario.ontadomain (TNS V1-V3), hash value=2432467899
.....
........
.........
Current Wait Stack:
0: waiting for 'SQL*Net message from client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=650 seq_num=652 snap_id=1
wait times: snap=11 min 38 sec, exc=11 min 38 sec, total=11 min 38 sec
wait times: max=infinite, heur=11 min 38 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x1a0
There are 3 sessions blocked by this session.
Dumping one waiter:
inst: 1, sid: 58, ser: 17036
wait event: 'enq: TX - row lock contention'
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0x9000f
p3: 'sequence'=0x6f1
row_wait_obj#: 77878, block#: 122337, row#: 0, file# 1
.........
............
.............
Session Wait History:
elapsed time of 0.000001 sec since current wait
0: waited for 'SQL*Net message to client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=649 seq_num=651 snap_id=1
wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000040 sec of elapsed time
1: waited for 'SQL*Net message from client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=648 seq_num=650 snap_id=1
wait times: snap=0.000557 sec, exc=0.000557 sec, total=0.000557 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000017 sec of elapsed time
2: waited for 'SQL*Net message to client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=647 seq_num=649 snap_id=1
wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000015 sec of elapsed time
3: waited for 'SQL*Net message from client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=646 seq_num=648 snap_id=1
wait times: snap=0.000280 sec, exc=0.000280 sec, total=0.000280 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000014 sec of elapsed time
4: waited for 'PGA memory operation'
=0x0, =0x0, =0x0
wait_id=645 seq_num=647 snap_id=1
wait times: snap=0.000349 sec, exc=0.000349 sec, total=0.000349 sec
wait times: max=infinite
wait counts: calls=0 os=0
....
......
........
You will also see details on SQLs that we’ve executed using same SID (97), here in my case I was running few SQLs to identify locking in the database and were captured in the dump along with its parent and child cursor details, followed by some object level stats.
Other important section is the ‘Enqueue Dump‘. Here in our case this section discloses that enqueue name was “TX-0009000F-000006F1-00000000-00000000” and was acquired in ‘X’ (Exclusive) mode by the holder session ID 97.
At the end it prints details about the redo transport, ASM File Info, SGA Information, PDB/CDB details …
KGSKSGA info:
is the system running: yes
session that stopped the system: (nil)
number of plans 1 classes 4
low threshold delta 0
high threshold delta 2
number of cpus[0]: 1
thresholds - low[0]: 1, high[0]: 3
dynamic threshold adjustments - misbehave_adjust[0]: 0, cpu_util_adj[0]: 2
dynamic thresholds - low[0]: 3, high[0]: 5
running cnt[0]: 1, runnable cnt[0]: 3
flags: 0x0
debug: 0x0
all pointers:
num_latches 300 current 222 chg_class 0x74e364a0 chg_state 0x74e42020
class_list 0x74dfaa88,top_plan 0x7084e9d8, plan_list 0x74e03d38
heaps - kgsk subheap 0x74e142f0, plan heap 0x7b5d7438
backgrounds in_sched (pids):
.....
..........
...............
Redo Log Management (from krsk_dump_rlm)
SRL min count: 0
SRL max count: 0
SRL min avail: 0
SRL file size: 0
SRL block size: 0
SRL use FRA : FALSE
So, if you want to dig in deep and want more proofs and details about the incident, SYSTEMSTATE dumps is a great source of all internals that were involved at the time of the problem. This isn’t limited to row locking problems, you can generate it for all sort of events, problems or issues i.e. Row cache, latch free, LC Locks, CBC are few of the common ones …
‘Outline Data’ section displays the list of hints that would be needed to replicate the execution plan, even if the statistics change., but in case of a complex execution plans it comes with lot of strange looking and obscure terms used. As there isn’t any published explanation of stored outline hints, so today’s post is to decipher few of the terms that you see in an outline data of an execution plan.
Okay, coming back to the post, let me quickly generate the outline data for one of the test SQL and will try to explain about each of the hints, query blocks, aliases and other representations used.
SQL>
SQL> explain plan for
SELECT d.department_name,e.employee_name
FROM departments d
LEFT OUTER JOIN employees e ON d.department_id = e.department_id
WHERE d.department_id >= 30
ORDER BY d.department_name, e.employee_name;
Explained.
SQL> select * from table(dbms_xplan.display('PLAN_TABLE',NULL,'+alias +outline'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 3871261979
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 168 | 7 (15)| 00:00:01 |
| 1 | SORT ORDER BY | | 4 | 168 | 7 (15)| 00:00:01 |
|* 2 | HASH JOIN OUTER | | 4 | 168 | 6 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| DEPARTMENTS | 2 | 44 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| EMPLOYEES | 6 | 120 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$2BFA4EE4
3 - SEL$2BFA4EE4 / D@SEL$1
4 - SEL$2BFA4EE4 / E@SEL$1
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
USE_HASH(@"SEL$2BFA4EE4" "E"@"SEL$1")
LEADING(@"SEL$2BFA4EE4" "D"@"SEL$1" "E"@"SEL$1")
FULL(@"SEL$2BFA4EE4" "E"@"SEL$1")
FULL(@"SEL$2BFA4EE4" "D"@"SEL$1")
OUTLINE(@"SEL$1")
OUTLINE(@"SEL$2")
ANSI_REARCH(@"SEL$1")
OUTLINE(@"SEL$8812AA4E")
ANSI_REARCH(@"SEL$2")
OUTLINE(@"SEL$948754D7")
MERGE(@"SEL$8812AA4E" >"SEL$948754D7")
OUTLINE_LEAF(@"SEL$2BFA4EE4")
ALL_ROWS
DB_VERSION('19.1.0')
OPTIMIZER_FEATURES_ENABLE('19.1.0')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("D"."DEPARTMENT_ID"="E"."DEPARTMENT_ID"(+))
3 - filter("D"."DEPARTMENT_ID">=30)
4 - filter("E"."DEPARTMENT_ID"(+)>=30)
As per the above outline data captured for the execution plan, the very first entry is USE_HASH(@”SEL$2BFA4EE4″ “E”@”SEL$1”) This represents the use of HASH join methods used in the query. Other join method outline options you might see are USE_NL and USE_MERGE. Here in our example we’ve used the LEFT OUTER JOIN which will return all valid rows from the table on the left side of the JOIN keyword, that’s table EMPLOYEE (alias ‘E’) along with the values from the table on the right side.
i.e LEFT OUTER JOIN employees e ON d.department_id = e.department_id
LEADING(@”SEL$2BFA4EE4″ “D”@”SEL$1” “E”@”SEL$1”) The LEADING hints specifies the exact join order for the SQL that is followed by the optimizer in the plan. It displays the join order as an ordered list of table aliases and query block names. The aliases appears in the ordering in which we access those tables in the query.
The first part of the hint @”SEL$2BFA4EE4″ represents the main SELECT block and if you closely see the full hint definition you will see “D”@”SEL$1”, where “SEL$1” is the query block name for table alias ‘D’ for Department table, which is followed by “E”@”SEL$1”, that represents query block name “SEL$1” on second table joined ‘E’ for Employee table.
FULL(@”SEL$2BFA4EE4″“E”@”SEL$1”) & FULL(@”SEL$2BFA4EE4″ “D”@”SEL$1”) are what you see next and these two hints for FULL TABLE SCANS on table alias “E”, that is EMPLOYEE table, followed by FTS on table alias “D”, that is DEPARTMENT table. Same you can see in the execution plan too.
OUTLINE(@”SEL$1″) & OUTLINE(@”SEL$2″) These two OUTLINE hints correspond to initial and the intermediate query blocks.
ANSI_REARCH(@”SEL$1″) This is the hint that instructs the optimizer to re-architecture of ANSI left, right, and full outer joins. In our case this was to re-arch the left outer join.
MERGE(@”SEL$8812AA4E” >”SEL$948754D7″) is the MERGE query block hint. SEL$8812AA4E and SEL$948754D7 are the transformed query blocks.
OUTLINE_LEAF(@”SEL$2BFA4EE4″) This hint builds an outline leaf for the specified query block. In our example it represents that the query block SEL$2BFA4EE4 is a “final” query block that has actually been subject to independent optimization. Outline leaf cannot be transformed. You can see multiple outline_leaf hints for the query blocks.
IGNORE_OPTIM_EMBEDDED_HINTS is a special hint instructs the CBO to ignore most of all the other supplied hints.
ALL_ROWS This hint instructs to optimize the query/statement block for best throughput with lowest resource utilization.
DB_VERSION(‘19.1.0’) & OPTIMIZER_FEATURES_ENABLE(‘19.1.0’) these two hints allows for the CBO to process the SQL on the said version of the database, 19.1.0 in our case.
Few others that you might see in OUTLINE DATA sections and are easy to decipher are INDEX_RS_ASC (Index Range Scan in ascending order) and it happens when the INDEX RANGE SCAN is used as an access path/method in SQL execution plan. Few others that you can see in case of Nested Loop Joins are NLJ_BATCHING which happens when Oracle batches multiple physical I/O requests and process them using a vector I/O (array) instead of processing them one at a time, and batching improves performance because it uses asynchronous reads.. Few others that you can see in case of NL’s are USE_NL, NLJ_PREFETCH etc.
INDEX_FFS in case of Index Fast Full Scans and the list is long …