Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Archive for the ‘Uncategorized’ Category

An Oracle Deadlock scenario and the importance of Event 10027 trace …

Posted by FatDBA on February 16, 2023

Hi Guys,

Recently someone asked me about Oracle debug 10027 trace event which we use in case of a deadlock scenario i.e. ORA-0060. Though the deadlock itself creates a trace file in DIAG directory, but 10027 trace event gives you a better control over the amount and type of DIAG information generated in response to the deadlock case. I mean the default trace file for deadlock (ora-60) contains cached cursors, a deadlock graph, process state info, current SQL Statements of the session involved and session wait history.
Event 10027 may be used to augment the trace information with a system state dump or a call stack in an attempt to find the root cause of the deadlocks. The minimum amount of trace information is written at level 1, at this level it will hardly contain deadlock graph and current AQL statements of the sessions involved.

In todays post I will try to simulate a deadlock scenario in one of my test box and will generate the 10027 trace with level 2 to get more information. Level 2 will give you cached cursors, process state info, session wait history for all sessions and the system state which is not possible in case of level 1. I am going to try with the Level 4 here as I want to get system state dump and don’t want to complicate this scenario.

I am going to create two tables – TableB is child table of TableA and some supporting objects to simulate the deadlock case.
Will try to simulate a deadlock scenario when insert/update/delete happens on TableB we need to sum the amount(amt) and then update it in TableA.total_amt column.

[oracle@oracleontario ~]$ sqlplus dixdroid/oracle90

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Feb 16 03:58:12 2023
Version 19.15.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.

Last Successful login time: Thu Feb 16 2023 03:04:03 -05:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0

SQL> create table tableA (pk_id number primary key, total_amt number);

Table created.

SQL> create table tableB (pk_id number primary key, fk_id number references tableA(pk_id) not null, amt number);

Table created.

SQL>
SQL> CREATE OR REPLACE PACKAGE global_pkg
IS
fk_id tableA.pk_id%TYPE;
END global_pkg;  
/

Package created.

SQL>
SQL> CREATE OR REPLACE TRIGGER tableB_ROW_TRG
BEFORE INSERT OR UPDATE OR DELETE
ON tableB
FOR EACH ROW
BEGIN
IF INSERTING OR UPDATING
THEN
global_pkg.fk_id := :new.fk_id;
ELSE
global_pkg.fk_id := :old.fk_id;
END IF;
END tableB_ROW_TRG;
/

Trigger created.

SQL>
SQL>
SQL>
SQL> CREATE OR REPLACE TRIGGER tableB_ST_trg
AFTER INSERT OR UPDATE OR DELETE
ON tableB
BEGIN
IF UPDATING OR INSERTING
THEN
UPDATE tableA
SET total_amt =
(SELECT SUM (amt)
FROM tableB
WHERE fk_id = global_pkg.fk_id)
WHERE pk_id = global_pkg.fk_id;
ELSE
UPDATE tableA
SET total_amt =
(SELECT SUM (amt)
FROM tableB
WHERE fk_id = global_pkg.fk_id)
WHERE pk_id = global_pkg.fk_id;
END IF;
END tableB_ST_trg;

Trigger created.

SQL>
SQL>
SQL> insert into tableA values (1, 0);

1 row created.

SQL> insert into tableA values (2, 0);

1 row created.

SQL> insert into tableB values (123, 1, 100);

1 row created.

SQL> insert into tableB values (456,1, 200);

1 row created.

SQL> insert into tableB values (789, 1, 100);

1 row created.

SQL> insert into tableB values (1011, 2, 50);

1 row created.

SQL> insert into tableB values (1213,2, 150);

1 row created.

SQL> insert into tableB values (1415, 2, 50);

1 row created.

SQL> commit;

Commit complete.

SQL>

Lets query the table and see the record count and next will delete an entry from tableB and won’t commit. At the same time will check the locks — TM lock on tableA.

SQL>
SQL> select * from tableA;

     PK_ID  TOTAL_AMT
---------- ----------
         1        400
         2        250

SQL> delete tableB where pk_id = 1415;

1 row deleted.

SQL> SELECT sid,
(SELECT username
FROM v$session s
WHERE s.sid = v$lock.sid)
uname,
TYPE,
id1,
id2,
(SELECT object_name
FROM user_objects
WHERE object_id = v$lock.id1)
nm
FROM v$lock
WHERE sid IN (SELECT sid
FROM v$session
WHERE username IN (USER)); 


       SID UNAME                          TY        ID1        ID2 NM
---------- ------------------------------ -- ---------- ---------- ------------------------------
       440 DIXDROID                       AE        134          0
       440 DIXDROID                       TX     589853       3078
       440 DIXDROID                       TM      82284          0 TABLEB
       440 DIXDROID                       TM      82282          0 TABLEA


I will now connect to another session (sesion 2) and delete from tableB and that will induce a locking scenario in the database and session 2 will go into hung/waiting state.

---- from session 2:
SQL>
SQL>
SQL> delete tableB where pk_id = 1213;
.....
.........
............. <HUNG > <HUNG > <HUNG > <HUNG > <HUNG > 

Lets see more stats on the blocking session.

SQL>
SQL> SELECT (SELECT username
FROM v$session
WHERE sid = a.sid)
blocker,
a.sid,
' is blocking ',
(SELECT username
FROM v$session
WHERE sid = b.sid)
blockee,
b.sid
FROM v$lock a, v$lock b
WHERE a.block = 1 AND b.request > 0 AND a.id1 = b.id1 AND a.id2 = b.id2;  


BLOCKER                     SID 'ISBLOCKING'  BLOCKEE                               SID
-------------------- ---------- ------------- ------------------------------ ----------
DIXDROID                    440  is blocking  DIXDROID                              427

And as expected 440 SID is now blocking 427. Now lets go back to the original (session 1) and will try to delete from tableB again and this will snipe the session 2 which is still in hung/wait state and will throw a deadlock error (ORA-00060). Though we have already collected blocking information from v$lock + v$session but to get more clarify about blocking sessions, I will set the 10027 trace event with level 2.

This will increase our chance of finding the root cause of the deadlocks. If the setting shall persist across instance startups, you need to use the initialization parameter EVENT.

i.e. EVENT=”10027 trace name contex forever, level 2″ otherwise use ALTER SYSTEM version of it.

------------------------------------------------------------
-- Execute below statement in session1
------------------------------------------------------------
delete tableB where pk_id = 1213;





---------------------------------------------------------------------------------------
-- Oracle is throwing deadlock error as below in session2
---------------------------------------------------------------------------------------
SQL>
SQL> ALTER SYSTEM SET EVENTS '10027 trace name context forever, level 2';

System altered.

SQL> delete tableB where pk_id = 1213;
delete tableB where pk_id = 1213
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "DIXDROID.TABLEB_ST_TRG", line 11
ORA-04088: error during execution of trigger 'DIXDROID.TABLEB_ST_TRG'

-- same captured in alert log too.
2023-02-16T04:02:57.648156-05:00
Errors in file /u01/app/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_29389.trc:
2023-02-16T04:03:00.019794-05:00
ORA-00060: Deadlock detected. See Note 60.1 at My Oracle Support for Troubleshooting ORA-60 Errors. More info in file /u01/app/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_29389.trc.

And we done, we were able to simulate the deadlock case in the database. Now lets dig into the trace file generated by the deadlock along with information flushed by 10027 trace event. It has all crucial information associated with the deadlock.

2023-02-16 04:13:34.115*:ksq.c@13192:ksqdld_hdr_dump():
DEADLOCK DETECTED ( ORA-00060 )
See Note 60.1 at My Oracle Support for Troubleshooting ORA-60 Errors

[Transaction Deadlock]


Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00080009-00000C62-00000000-00000000         45     427     X        13193      24     440           X  22765
TX-0004001A-00000C1E-00000000-00000000         24     440     X        22765      45     427           X  13193




----- Information for waiting sessions -----
Session 427:
  sid: 427 ser: 13193 audsid: 3390369 user: 115/DIXDROID
    flags: (0x41) USR/- flags2: (0x40009) -/-/INC
    flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/-
  pid: 45 O/S info: user: oracle, term: UNKNOWN, ospid: 30174
    image: oracle@oracleontario.ontadomain (TNS V1-V3)
  client details:
    O/S info: user: oracle, term: pts/1, ospid: 30173
    machine: oracleontario.ontadomain program: sqlplus@oracleontario.ontadomain (TNS V1-V3)
    application name: SQL*Plus, hash value=3669949024
  current SQL:
  UPDATE TABLEA SET TOTAL_AMT = (SELECT SUM (AMT) FROM TABLEB WHERE FK_ID = :B1 ) WHERE PK_ID = :B1

Session 440:
  sid: 440 ser: 22765 audsid: 3380369 user: 115/DIXDROID
    flags: (0x41) USR/- flags2: (0x40009) -/-/INC
    flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/-
  pid: 24 O/S info: user: oracle, term: UNKNOWN, ospid: 29847
    image: oracle@oracleontario.ontadomain (TNS V1-V3)
  client details:
    O/S info: user: oracle, term: pts/2, ospid: 29846
    machine: oracleontario.ontadomain program: sqlplus@oracleontario.ontadomain (TNS V1-V3)
    application name: SQL*Plus, hash value=3669949024
  current SQL:
  delete tableB where pk_id = 1213

.....
.......


----- Current SQL Statement for this session (sql_id=duw5q5rpd5xvs) -----
UPDATE TABLEA SET TOTAL_AMT = (SELECT SUM (AMT) FROM TABLEB WHERE FK_ID = :B1 ) WHERE PK_ID = :B1
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x8ed34c98        11  DIXDROID.TABLEB_ST_TRG
.......
.........


----- VKTM Time Drifts Circular Buffer -----
session 427: DID 0001-002D-000000D9     session 440: DID 0001-0018-0000001C
session 440: DID 0001-0018-0000001C     session 427: DID 0001-002D-000000D9

Rows waited on:
  Session 427: obj - rowid = 0001416A - AAAUFqAAHAAAMgvAAB
  (dictionary objn - 82282, file - 7, block - 51247, slot - 1)
  Session 440: obj - rowid = 0001416C - AAAUFsAAHAAAMg/AAE
  (dictionary objn - 82284, file - 7, block - 51263, slot - 4)
.....
.......


Current Wait Stack:
     0: waiting for 'enq: TX - row lock contention'
        name|mode=0x54580006, usn<<16 | slot=0x4001a, sequence=0xc1e
        wait_id=69 seq_num=70 snap_id=1
        wait times: snap=18.173838 sec, exc=18.173838 sec, total=18.173838 sec
        wait times: max=infinite, heur=18.173838 sec
        wait counts: calls=6 os=6
        in_wait=1 iflags=0x15a0
    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 1, sid: 440, ser: 22765
      Dumping final blocker:
        inst: 1, sid: 440, ser: 22765
    There are 1 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 440, ser: 22765
      wait event: 'enq: TX - row lock contention'
        p1: 'name|mode'=0x54580006
        p2: 'usn<<16 | slot'=0x80009
        p3: 'sequence'=0xc62
      row_wait_obj#: 82284, block#: 51263, row#: 4, file# 7
      min_blocked_time: 0 secs, waiter_cache_ver: 33942
    Wait State:
      fixed_waits=0 flags=0x22 boundary=(nil)/-1
    Session Wait History:
        elapsed time of 0.004677 sec since current wait
........
............


24: USER ospid 29847 sid 440 ser 22765, waiting for 'enq: TX - row lock contention'
          Cmd: DELETE
          Blocked by inst: 1, sid: 427, ser: 13193
          Final Blocker inst: 1, sid: 427, ser: 13193
.....
.............


45: USER ospid 30174 sid 427 ser 13193, waiting for 'enq: TX - row lock contention'
          Cmd: UPDATE
          Blocked by inst: 1, sid: 440, ser: 22765
          Final Blocker inst: 1, sid: 440, ser: 22765
.........
.............


 The history is displayed in reverse chronological order.

    sample interval: 1 sec, max history 120 sec
    ---------------------------------------------------
      [19 samples,                                         04:13:16 - 04:13:34]
        waited for 'enq: TX - row lock contention', seq_num: 70
          p1: 'name|mode'=0x54580006
          p2: 'usn<<16 | slot'=0x4001a
          p3: 'sequence'=0xc1e
      [14 samples,                                         04:13:01 - 04:13:15]
        idle wait at each sample
      [session created at: 04:13:01]
    ---------------------------------------------------
    Sampled Session History Summary:
      longest_non_idle_wait: 'enq: TX - row lock contention'
      [19 samples, 04:13:16 - 04:13:34]
    ---------------------------------------------------
.........
..........


      Virtual Thread:
      kgskvt: 0x9b0e7f10, sess: 0x9ccdd4b8, pdb: 0, sid: 427 ser: 13193
      vc: (nil), proc: 0x9db42028, idx: 427
      consumer group cur: OTHER_GROUPS (pdb 0) (upd? 0)
mapped: DEFAULT_CONSUMER_GROUP, orig:  (pdb 0)
      vt_state: 0x2, vt_flags: 0xE030, blkrun: 0, numa: 1
      inwait: 1, wait event: 307, posted_run: 0
      lastmodrngcnt: 0, lastmodrngcnt_loc: '(null)'
      lastmodrblcnt: 0, lastmodrblcnt_loc: '(null)'
      location where insched last set: kgskbwt
      location where insched last cleared: kgskbwt
      location where inwait last set: kgskbwt
      location where inwait last cleared: NULL
      is_assigned: 1, in_scheduler: 0, insched: 0
      vt_active: 0 (pending: 1)
      vt_pq_active: 0, dop: 0, pq_servers (cur: 0 cg: 0)
      ps_allocs: 0, pxstmts (act: 0, done: 0 cg: 0)
      used quanta (usecs):
      stmt: 57272, accum: 0, mapped: 0, tot: 57272
      exec start consumed time lapse: 164377 usec
      exec start elapsed time lapse: 18179246 usec
      idle time: 0 ms, active time: 57272 (cg: 57272) usec
      last updnumps: 0 usec, active time (pq: 0 ps: 0) ms
      cpu yields:       stmt: 0, accum: 0, mapped: 0, tot: 0
      cpu waits:       stmt: 0, accum: 0, mapped: 0, tot: 0
      cpu wait time (usec):       stmt: 0, accum: 0, mapped: 0, tot: 0
      ASL queued time outs: 0, time: 0 (cur 0, cg 0)
      PQQ queued time outs: 0, time: 0 (cur 0, cg 0)
      Queue timeout violation: 0
      calls aborted: 0, num est exec limit hit: 0
      KTU Session Commit Cache Dump for IDLs:
      KTU Session Commit Cache Dump for Non-IDLs:
      ----------------------------------------
........
.............

Hope It Helped!
Prashant Dixit

Advertisement

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Finally, I am an Oracle ACE :)

Posted by FatDBA on December 21, 2022

Christmas came a little early!

Finally, after associated with Oracle Databases and technologies for more than 15 years, and contributing towards the database community for more than 12 years, The FatDBA has finally got the golden email from Oracle Ace Program – “I am officially an Oracle Ace now ♠ ♠ ♠ “ 🙏 🤘 😊

Link to my profile: https://apexapps.oracle.com/pls/apex/r/ace_program/oracle-aces/ace?ace_id=2364&clear=2

Posted in Uncategorized | Leave a Comment »

Some of the new exotic wait events in Oracle 21c …

Posted by FatDBA on November 12, 2022

Hi Guys,


Lately I was doing tests on Oracle 21c and came across few of the exotic new wait events added to Oracle 21c. I was using 21.3.0.0.0 and have noticed few new Exadata, ASM, Dataguard, AI/ML, Exascale, Cluster, exa persistent memory, CTWR etc. specific waits were added to the database. I have also noticed some strange looking waits as well i.e. ‘Spare1′,’Spare2′,’Spare3’ ….

Being an innovation release, and since this is a beta version, there is no document available or published for the customer for any of these new waits on Metalink

Couple of new ‘Bloom Filter‘ related waits added to 21c
Bloom Filter EOF
Bloom Filter Fetch

Bunch of new Exadata ‘smart Index|Table scan’ related, User I/O classed wait events added to 21c are
cell smart index scan request
cell smart index scan: db timezone upgrade
cell smart index scan: disabled by user
cell smart index scan: pass through
cell smart table scan request
cell smart table scan: db timezone upgrade
cell smart table scan: disabled by user
cell smart table scan: pass through
cell worker online completion
cell worker retry

Few of the new machine learning & Artificial Intellegence related wait events added in 21c are
enq: AI - Seek operation
enq: AI - dequeue operation
enq: AI - lwm update operation
enq: AI - remove subscriber
enq: AI - start cross operation
enq: AI - stop cross operation
enq: AI - truncate operation

Some of the new RAC (cluster class), global cache (cache fusion) block related waits added to 21c are
gc cr block direct read
gc current block direct read

Some of the new EDSLib/EBSLib latch waits, which are used to gum the code between RDBMS and Exascale storage added in 21c are given below.
latch: EDSLib File
latch: EDSLib Message
latch: EDSLib Root
latch: EGSLib Root
latch: EGSLibDS Root
latch: EGSLibDataStore Object
latch: EGSLibDataStoreShard Object
latch: EGSLibForum Object
latch: EGSLibNet Root
latch: EGSLibSP Root
latch: EGSLibStoragePool Object
latch: EGSLibStoragePoolRing Object
latch: FOB s.o list parent latch

This is by far the most interesting one, with no name and they seem some sort of free or extra wait events names. I am familiar with few of the other ‘spare’ i.e. “enq: BS – krb spare”, “enq: SP..xx”, “enq: SX.xx.xxx“, but these are different
spare 10
spare 2
spare 3
spare 4
spare 5
spare 6
spare 7
spare 8
spare 9

Couple of new ASM related wait events were added too
enq: AM - ASM Scrubbing
enq: AM - ASM used quota calculation

Some of the new enqueue waits on Change Tracking file are
enq: CT - CTWR DBA buffer resize
enq: CT - CTWR datafile sync
enq: CT - CTWR thread checkpoint

Some of the new enqueue related waits on Exadata Persistent Memory (PMEM)
enq: FF - PMEMfs - ODM
enq: FF - PMEMfs - Param
enq: FF - PMEMfs - Resize
enq: FF - PMEMfs - aXtnd pool
enq: FF - PMEMfs - reSze pool

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Kafka Java Client for Transactional Event Queues (TEQ), Another great feature added to Oracle 21c …

Posted by FatDBA on November 6, 2022

Hi Guys,

Recently I was testing some advance queuing functions to setup an asynchronous communication (using event queues) between few applications and users on Oracle 21c database, to build a messaging platform. I was looking for a way to connect my client side libraries to allow Kafka APIs to connect to Oracle DB.

I had no clue how to do this integration, and then I came across this official documentation which talks about a new functionality in Oracle Database Version 21c – Kafka Java Client for Transactional Event Queues (TEQ) that enables Kafka application compatibility with Oracle Database. This provides easy migration of Kafka applications to TEQ.

Oracle Transactional Event Queue (TEQ) makes it easy to implement event-based applications. It is also highly integrated with Apache Kafka. Apart from enabling apps that use Kafka APIs to transparently operate on Oracle TEQ, Oracle TEQ also supports bi-directional information flow between TEQ and Kafka, so that changes are available in TEQ or Kafka as soon as possible in near-real-time.

Apache Kafka Connect is a framework included in Apache Kafka that integrates Kafka with other systems. Oracle TEQ will provide standard JMS package and related JDBC, Transaction packages to establish the connection and complete the transactional data flow. Oracle TEQ configures standard Kafka JMS connectors to establish interoperability and complete the data flow between the two messaging systems.

Another great blog post on the topic : https://docs.oracle.com/en/database/oracle/oracle-database/21/adque/Kafka_cient_interface_TEQ.html#GUID-94589C97-F323-4607-8C3A-10A0EDF9DA0D

Hope It Helps!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Are you looking for a method to stop automatic SQL quarantine without disabling the entire SQL Quarantine feature ? Welcome to Oracle 21c …

Posted by FatDBA on September 23, 2022

Hi All,

Recently I was working on a 21c database for a POC where at one point I want to disable automatic creation of SQL Quarantine, but without disabling the entire statement Quarantine feature. This 21.3.0.0.0 Database had got the resource manager enabled, and one of the SQL was taking long time to complete, longer than the allowed directive limits on IO & CPU TIME, and as expected the SQL was killed with message “ORA-00040: active time limit exceeded – call aborted” and the SQL plan was quarantined. I wanted to stop or disable the auto creation of SQL Quarantines for the SQL in question, after RM terminates the SQL.

If you want to read more about SQL Quarantine, please click this link to my earlier post with a demo about it.

I remember in 19c there wasn’t any way to achieve that and can only regulate behavior using two of the underscore parameters _quarantine_enabled or _optimizer_quarantine_sql. Oracle 21c has introduced two two new parameters to control the behavior of SQL Quarantine, and that specially solves this issue.

First one is optimizer_capture_sql_quarantine, if set to FALSE, would disable the automatic creation of SQL Quarantine configurations after RM termination of a SQL query execution. This is FALSE by default.

The second parameter is optimizer_use_sql_quarantine, if set to FALSE would disable the use of existing SQL Quarantine configurations in a database. This parameter determines whether the optimizer considers SQL Quarantine configurations when choosing an execution plan for a SQL statement. This is TRUE by default, thereby allowing users to manually create and use SQL Quarantine configurations.

Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0

-- Default Setting
SQL> show parameter OPTIMIZER_CAPTURE_SQL_QUARANTINE

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_capture_sql_quarantine     boolean     FALSE
SQL>
SQL>

-- Default Setting
SQL> sho parameter OPTIMIZER_USE_SQL_QUARANTINE

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_use_sql_quarantine         boolean     TRUE
SQL>
SQL>

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

FRA full with archives ORA-38706 ORA-38708 ? Not always think of a BUG … And when that proud little DBA in me missed common sense

Posted by FatDBA on September 11, 2022

Hi All,
Some time back I was working on an 19.16 database where I’d to enable FLASHBACK on a database, but immediately kicked out with an error “ORA-38706: Cannot turn on FLASHBACK DATABASE logging. ORA-38708: not enough space for first flashback database log file”. So, I’d tried to expand the FRA size, but thought to check what there inside the FRA, it was 99.9% full with 270 archive log files occupying 99.39% of the total allocated space. So, everything was good till that point.

[oracle@fatdba ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Sep 5 19:07:28 2022
Version 19.16.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL>
SQL> show parameter db_recovery_file_dest_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest_size           big integer 251G


NAME         SPACE_LIMIT_GB SPACE_AVAILABLE_GB PERCENT_FULL
------------ -------------- ------------------ ------------
+RECO                   251         .240234375         99.9


FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               1          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         99.39                         0             270          0
BACKUP PIECE                           .01                         0               2          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

But I thought to check space at the ASM Level. I queried v$asm_diskgroup and results were totally opposite with what I saw with V$RECOVERY_FILE_DEST and v$flash_recovery_area_usage. RECO Disk Group (FRA location) was almost 100% free and only 0.54% was consumed. Same results were there when I’d queried RECO DG via asmcmd.

-- results from v$asm_diskgroup view
Disk Group            Sector   Block   Allocation
Name                    Size    Size    Unit Size State       Type   Total Size (MB) Used Size (MB) Pct. Used
-------------------- ------- ------- ------------ ----------- ------ --------------- -------------- ---------
DATA                     512   4,096    1,048,576 CONNECTED   EXTERN         691,197        356,322     51.55
OCRVFDG                  512   4,096    4,194,304 MOUNTED     EXTERN          25,596            100       .39
RECO                     512   4,096    1,048,576 CONNECTED   EXTERN       1,048,575          5,645       .54
                                                                     --------------- --------------
Grand Total:                                                               1,765,368        362,067



ASMCMD> lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512             512   4096  1048576    691197   334875                0          334875              0             N  DATA/
MOUNTED  EXTERN  N         512             512   4096  4194304     25596    25496                0           25496              0             N  OCRVFDG/
MOUNTED  EXTERN  N         512             512   4096  1048576   1048575  1043233                0         1043233              0             N  RECO/
ASMCMD>

At this point I’d started thinking about some kind of a BUG in the database, I know about few FRA related bugs in earlier Oracle versions. Just before I’d tried anything else, I thought to execute dbms_backup_restore.refreshagedfiles which refreshes the view. At the same time tried kra_options event which resets v$recovery_file_dest, but no luck 😦

SQL>
SQL> alter session set events 'immediate trace name kra_options level 1';

Session altered.

SQL>  execute dbms_backup_restore.refreshagedfiles;

PL/SQL procedure successfully completed.

SQL>

And right at that point I was very sure it was happening all due to a bug and was thinking to contact Oracle support. But just before that, that proud little DBA in me thought, Am I missing something ? Any other possible areas I should have explored before connecting with OCS ?

I’d checked RMAN to see if it still has any expired archivelog details and ran ‘crosscheck archivelog all’, and it identified exactly 270 older archivelogs which were non existent. So, that proud little DBA was wrong 🙂 .. When the count was matched exactly with the v$flash_recovery_area_usage, I’d deleted all of those expired archivelogs from catalog.

-- To Crosscheck all archivelog files present in the RMAN catalog.
RMAN> crosscheck archivelog all;

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=258 device type=DISK
validation failed for archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_1.261.1109109619 RECID=1 STAMP=1109109620
validation failed for archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_2.264.1109401205 RECID=2 STAMP=1109401206
validation failed for archived log
.....
........
..........
..............
validation failed for archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_268.463.1114203831 RECID=268 STAMP=1114203829
validation failed for archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_269.469.1114207455 RECID=269 STAMP=1114207454
validation succeeded for archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_321.531.1114624193 RECID=270 STAMP=1114624193
Crosschecked 270 objects


-- Delete expired archivelog files
RMAN>
RMAN>
RMAN> delete expired archivelog all;
...
.......
270     1    270     X 31-AUG-22
        Name: +RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_268.463.1114203831 RECID=268 STAMP=1114203829
..
.......
Do you really want to delete the above objects (enter YES or NO)? YES
deleted archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_1.261.1109109619 RECID=1 STAMP=1109109620
deleted archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_2.264.1109401205 RECID=2 STAMP=1109401206
deleted archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_3.265.1109671797 RECID=3 STAMP=1109671799
deleted archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_4.266.1109955617 RECID=4 STAMP=1109955618
deleted archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_5.267.1110233333 RECID=5 STAMP=1110233334
deleted archived log
archived log file name=+RECO/FATDBA/ARCHIVELOG/2022_07_02/thread_1_seq_6.268.1110485231 RECID=6 STAMP=1110485232
deleted archived log
Deleted 270 EXPIRED objects

RMAN>
RMAN>

And immediately I saw the change in FRA related dynamic views (V$RECOVERY_FILE_DEST and v$flash_recovery_area_usage) and matched the genuine utilization at the ASM level.

[oracle@fatdba ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Sep 5 19:07:28 2022
Version 19.16.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL>
SQL>
SQL> SELECT NAME,
       (SPACE_LIMIT / 1024 / 1024 / 1024) SPACE_LIMIT_GB,
         ((SPACE_LIMIT - SPACE_USED + SPACE_RECLAIMABLE) / 1024 / 1024 / 1024) AS SPACE_AVAILABLE_GB,
       ROUND((SPACE_USED - SPACE_RECLAIMABLE) / SPACE_LIMIT * 100, 1) AS PERCENT_FULL
  FROM V$RECOVERY_FILE_DEST;  

NAME         SPACE_LIMIT_GB SPACE_AVAILABLE_GB PERCENT_FULL
------------ -------------- ------------------ ------------
+RECO                   200          197.59082          1.2

SQL> select * from v$flash_recovery_area_usage;

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                           .01                         0               1          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                           .57                         0               2          0
BACKUP PIECE                           .01                       .01               2          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

8 rows selected.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , | Leave a Comment »

How to trace DBMS_STATS and see what is under the hood for a slow sluggish stats collection ..

Posted by FatDBA on July 10, 2022

Recently while working on a slow stats gathering case, someone asked if there is a way to know what’s happening beneath the surface ? what all flags and calculations its doing internally while on the front end the DBMS_STATS still running ? Yes, there is a way! As with most of the Oracle utilities, DBMS_STATS too comes with its own tracing facility which you can call using dbms_stats.set_global_prefs. Today’s post is all about enabling tracing on DBMS_STATS package.

Few of the commonly used DBMS_STATS flags/options are given below, you can always do a combination if wanted to club …

4 = This is to trace table stats
8 = This is to trace index stats
16 = This is to trace columnar stats
512 = auto stats job
1024 = This is to trace parallel executions
4096 = This is to trace partition prunes
16384 = This one traces extended column stats
32768 = To trace approximate NDV (number distinct values) ….

Let’s quickly collect traces for both Indexes and Tables followed by the stas collection for a table and its dependent Indices.

SQL>
SQL> exec dbms_stats.set_global_prefs('TRACE',4+8);

PL/SQL procedure successfully completed.

SQL>

SQL> set time on
13:28:55 SQL>
13:28:56 SQL>
13:28:56 SQL>
13:28:56 SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYS' , tabname => 'BIGTAB',cascade => true, estimate_percent => 10,method_opt=>'for all indexed columns size 1', granularity => 'ALL', degree => 4);

PL/SQL procedure successfully completed.

13:28:59 SQL>
13:29:00 SQL>


SQL> select name, value from v$diag_info where name='Diag Trace';

NAME                                     VALUE
---------------------------------------- ------------------------------------------------------------------------------------
Diag Trace                               /u01/app/oracle/diag/rdbms/localdb/localdb/trace

Alright, we have our trace ready, let’s dig in deep what’s all inside the trace that we have collected for both the Table and it’s Index.
It starts with a standard header in the trace files, followed by all by default explicitly mentioned flags/options with DBMS_STATS in XML format.

-- Header
DBMS_STATS: Record gather table stats operation on table : BIGTAB
DBMS_STATS:   job name: 
DBMS_STATS:    |--> Operation id: 1341
DBMS_STATS: gather_table_stats: <params><param name="block_sample" val="FALSE"/><param name="cascade" val="TRUE"/>
<param name="concurrent" val="FALSE"/><param name="degree" val="4"/><param name="estimate_percent" val="10"/>
<param name="force" val="FALSE"/><param name="granularity" val="ALL"/><param name="method_opt" val="for all indexed columns size 1"/><param name="no_invalidate" val="NULL"/>
<param name="ownname" val="SYS"/><param name="partname" val=""/><param name="reporting_mode" val="FALSE"/><param name="statid" val=""/><param name="statown" val=""/>
<param name="stattab" val=""/><param name="stattype" val="DATA"/><param name="tabname" val="BIGTAB"/></params>
DBMS_STATS: Start gather table stats -- tabname: BIGTAB

Next comes the preferences set for the called Table, ‘BIGTAB’ in our case. This will give you more idea about what all options were set overall and if there is any scope to tune and tweak anyone of them.

DBMS_STATS: Preferences for table SYS.BIGTAB
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME                                         - 
DBMS_STATS: STATS_RETENTION                                   - 
DBMS_STATS: MON_MODS_ALL_UPD_TIME                             - 
DBMS_STATS: SNAPSHOT_UPD_TIME                                 - 
DBMS_STATS: TRACE                                             - 12
DBMS_STATS: DEBUG                                             - 0
DBMS_STATS: SYS_FLAGS                                         - 1
DBMS_STATS: SPD_RETENTION_WEEKS                               - 53
DBMS_STATS: CASCADE                                           - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT                                  - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE                                            - NULL
DBMS_STATS: METHOD_OPT                                        - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE                                     - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY                                       - AUTO
DBMS_STATS: PUBLISH                                           - TRUE
DBMS_STATS: STALE_PERCENT                                     - 10
DBMS_STATS: APPROXIMATE_NDV                                   - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM                         - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE                        - FALSE
DBMS_STATS: INCREMENTAL                                       - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL                      - TRUE
DBMS_STATS: AUTOSTATS_TARGET                                  - AUTO
DBMS_STATS: CONCURRENT                                        - OFF
DBMS_STATS: JOB_OVERHEAD_PERC                                 - 1
DBMS_STATS: JOB_OVERHEAD                                      - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS                           - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS                        - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS                          - 3
DBMS_STATS: TABLE_CACHED_BLOCKS                               - 1
DBMS_STATS: INCREMENTAL_LEVEL                                 - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS                             - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS                                           - GATHER
DBMS_STATS: GATHER_AUTO                                       - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY                                     - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE                                         - 0
DBMS_STATS: GATHER_SCAN_RATE                                  - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER                    - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS                              - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS                         - 15
DBMS_STATS: ROOT_TRIGGER_PDB                                  - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD                         - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS                        - FALSE
DBMS_STATS: AUTO_TASK_STATUS                                  - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME                            - 3600
DBMS_STATS: AUTO_TASK_INTERVAL                                - 900
DBMS_STATS: STATS_MODEL_INTERNAL_MINRSQ                       - 0.9
DBMS_STATS: STATS_MODEL_INTERNAL_CONTROL                      - 0
DBMS_STATS: STATS_MODEL                                       - ON
DBMS_STATS: AUTO_STATS_ADVISOR_TASK                           - TRUE

Next sections is where it monitors the stats gathering for Table and its Index, their start and end times were captured, for example : “APPROX_NDV_ALGORITHM => Non-Incremental” was chosen. Table level stats were used i.e. row count, blocks, average row length, sample size, number of indexes etc.

Next it jumps to the Index where it tries to calculate the global statistics for the partitioned Index by aggregating the partition-level statistics but as its a non-partitioned Index, it simples moves to the next stage. Next it triggers the Indexing options (just enables them at the start) i.e. nlb (number of leaf blocks), ndk (number of distinct keys), clf (clustering factor), nblks (number of blocks), sample percentage, degree (parallelism), nrw (key counts) etc by allocating them value 1 which means to consider them.
Finally it starts to analyze the Index structure (In this case it’s not using approximate_ndv algorithm) and calculates the final values of nrw, nlb, ndk, clf etc.

Few Internal options were also used i.e. ctx.conc_ctx, ctx.batching_coeff etc.

-- Table Stats
DBMS_STATS: Started table SYS.BIGTAB. at 28-JUN-22 01.28.58.499744000 PM -04:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting
DBMS_STATS: reporting_man_log_task: target: "SYS"."BIGTAB" objn: 108254 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: ALL global_requested: NULL pfix: 
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: HLL in incremental (no stats exist)
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: ALL execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix:  ownname: SYS tabname: BIGTAB pname:  spname:  execution phase: 1
DBMS_STATS: Number of rows in the table = 492190, blocks = , average row length = 19, chain count = , scan rate = 0, sample size = 49219, cstats.count = 3, cind = 2

-- Index part 
DBMS_STATS: Started index SYS.IDX_DIXIT at 28-JUN-22 01.28.58.816819000 PM -04:00 granularity: ALL gIdxGran: 
DBMS_STATS: Specified granularity = , New granularity = ALL, Fixed granularity = ALL
DBMS_STATS: granularity ALL
DBMS_STATS: reporting_man_log_task: target: "SYS"."IDX_DIXIT" objn: 108255 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS:  Gather index subpartition stats...
DBMS_STATS:  Gather index partition stats...
DBMS_STATS:  Gather global index stats...
DBMS_STATS: Start analyze_index_using_sql
DBMS_STATS: tab_stats_stale: BIGTAB not analyzed using full compute
DBMS_STATS:   Number of blocks in the index: 1030
DBMS_STATS: Start execute_analyze_index(owner= SYS, indname= IDX_DIXIT, fobjn= , sample_pct= 10, seed= 0, degree= 4, collect_nrw= 1, collect_nlb= 1, collect_ndk= 1, collect_clf= 1, bmi= 0, iot_ov= 0, iot_sec= 0, ppredtxt= , nblks= 1030)
DBMS_STATS: execute_analyze_index(): Not using approximate_ndv, pct=111.529126213592233009708737864077669903,sample_pct=10
DBMS_STATS: End execute_analyze_index(ssize= 494100, nrw= 494100, nlb= 1027, ndk= 1000, clf= 494100)
DBMS_STATS: target_size: 1027
DBMS_STATS: Finished index SYS.IDX_DIXIT at 28-JUN-22 01.28.59.464468000 PM -04:00

DBMS_STATS: Finished table SYS.BIGTAB. at 28-JUN-22 01.28.59.519495000 PM -04:00

Once you are done with the stats tracing, close it immediately!

SQL> exec dbms_stats.set_global_prefs('TRACE',0);

PL/SQL procedure successfully completed.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

FatDBA ranked 27th among top 100 Oracle database blogs on the planet … :)

Posted by FatDBA on June 4, 2022

Hello All, my blog FatDBA have been awarded with the ‘Top 100 Oracle Blogs on the planet‘ (Rank 27) by feedspot. The best Oracle blog list curated from thousands of blogs on the web and ranked by traffic, social media followers, domain authority & freshness.

Thank you so much guys for your support \,,/

https://blog.feedspot.com/oracle_blogs/

Posted in Uncategorized | Tagged: | Leave a Comment »

What is a short stack dump ?

Posted by FatDBA on May 23, 2022

Hi Everyone,

Short stacks are a rich source of information which can help you to do analyzing and deconstructing a process which is part of a problem in Oracle databases, and the good part is you don’t have to rely on any OS utilities like gdb, pstack etc. to extract stacks or traces. Its pretty simple and quick to generate the short stacks, you have to pass on the OSPID of the process, and issue SHORT_STACK option for ORADEBUG, and will generate a sequence of all functions involved and their current position in a calls stack. You can also use errorstack for the same, but short_stacks are fast and point-to-point!

Now one can ask – What is a ‘call stack’ ?
A call stack is a list of names of methods called at run time from the beginning of a program until the execution of the current statement. A call stack is mainly intended to keep track of the point to which each active subroutine should return control when it finishes executing. Call stack acts as a tool to debug an application when the method to be traced can be called in more than one context. This forms a better alternative than adding tracing code to all methods that call the given method.

Let me run it on a real time system where we are facing issues where due to adaptive switching between Log write methods LGWR was causing excessive ‘log file sync‘ Waits

[oracle@oracleontario ~]$ ps -ef|grep lg
oracle     4402      1  0 00:25 ?        00:00:00 ora_lgwr_dixitdb
oracle     4719   3201  0 00:25 pts/1    00:00:00 grep --color=auto lg

-- Here in the example I am generating short stacks for LGWR process
SQL>
SQL> oradebug setospid 4402
Oracle pid: 19, Unix process pid: 4402, image: oracle@oracleontario.ontadomain (LGWR)
SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+187
<-ksliwat()+2233<-kslwaitctx()+200<-ksarcv()+320<-ksbabs()+670<-ksbrdp()+1167<-opirip()+541<-opidrv()+581<-sou2o()+165
<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
SQL>
SQL>



-- After a gap of ~ 10-15 Mins
-- Let me generate the short stack once again just to see if there is any difference.
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+992 <-sigacthandler()+104
<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212 <-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680
<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380


So, as you’ve seen in above code, the clear change is there in the LGWR short stacks, we did few log switches, modified a big table in the database followed by multiple commits. If you carefully look at the stack, it was function ksedsts() where it was at the time we generated the stack, and function ksdxfstk () called it, followed by ksdxcb(), sspuser() and immediate change after functions called by LGWR internal call stacks. They matched one of the known BUG which gave us a hint that we resolved after modified a undocumented parameter.

So, it can help us locating the buggy functions called by any user session or background processes and is very useful in case when you are battling with a strange looking problem in your Oracle database.

Few of the functions that I am able to recall are given below.
semtimedop() is one of the function for semaphore operations also known as semop and is used to perform atomically an array of semaphore operations on the set of semaphores associated with the semaphore identifier specified by semid.
‘ksdxcb’ is a function that’s called usually when the command oradebug is being used.
sighandler is the programming signal handler and is used to locate an exception. When a signal is sent to a process, various register and stack operations occur that make it look as though the leaf PC at the time of the signal is the return address for a call to a system function, sigacthandler(). sigacthandler() calls the user-specified signal handler just as any function would call another.
The sigaltstack() function allows a thread to define and examine the state of an alternate stack area on which signals are processed.

opidrv() is ORACLE Program Interface DRiVer (IGNORE)
opiodr is ORACLE code request driver – route the current request
main() is the standard executable entry point
ksedst() is the KSE dump the call stack
skgmstack() is the call specified function with extra STACK space
rpidrv() or the RPI is theRecursive Program Interface DRiVer
pfrrun() or PSDEVN is the PL/SQL Interpreter Main Instruction Loop
kkxexe() or KKX is to execute plsql

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Bad clustering factor on >12.1 ? Have you tried Attribute Clustering ?

Posted by FatDBA on May 16, 2022

Recently we were facing one performance issue with one of the SQL which references a highly volatile table in the entire database, and one of the largest too. The tables goes through massive DMLs each minute and that has lead to huge randomness in the data distribution, and expectedly few of its critical indexes has a very high (bad) clustering factor (CF).

A quick explanation on the clustering factor, it is a measure of the ordered-ness of an index in comparison to the table that it is based on. That would be a simple explanation on Oracle’s clustering factor (CF), but if you want to read more about it I would recommend to read blog posts by Richard Foote, Jonathan Lewis and few other great articles by Randolf Geist.

Okay, coming back to the case – We tried few methods (fixed BLevel, Leaf_Blocks, degree, stats recollection etc.), but none of them helped much to stabilize performance of the query, and finally we tried something that was introduced in Oracle 12.1 the ‘Attribute Clustering‘ that helped us to resolve the problem. The attribute clustering improves physical IOs for tables and its partitions. An attribute-clustered table stores data in close proximity on disk in an ordered way based on the values of a certain set of columns in the table or a set of columns in the other tables.

Attribute clustering is a user-defined table directive that provides data clustering on one or more columns in a table. The directives can be specified when the table is created or modified. There are two types of attribute clustering:

  • With Linear Ordering : Linear ordering stores the data according to the order of specified columns. This is the default type of clustering.
  • With Interleaved Ordering : It accurately determines exactly where data is located on the disk. This enabled I/O Pruning. This uses a special multidimensional clustering technique based on Z-order curve fitting.

Note: Zone mapping is a separately licensed feature.

Let’s do some test to understand the feature better. Here in the demo I would be using linear order clustering without any zonemaps. You can try try pairing the LO clustering with zone maps too. All tests I am doing it on a 19.15.0.0.0 sandbox environment.

For testing purpose, I have created a new tables with some 2000000 random values/rows inserted into it, this is to mimic a bad clustering factor (CF) scenario. I have also created an Index on the top of a column (date) and have collected the statistics.

[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Apr 27 00:13:08 2022
Version 19.15.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0

SQL>
SQL> create table new_test (id number, DOB date, text varchar2(40));

Table created.

SQL> insert into new_test select rownum, sysdate-trunc(dbms_random.value(0, 20000)), 'PRASHANT DIXIT'
from dual connect by level <= 2000000;  2

2000000 rows created.

SQL>
SQL> commit;
Commit complete.

SQL> create index idx_newtest on new_test(dob);

Index created.

SQL> EXEC DBMS_STATS.gather_table_stats('DIXDROID', 'NEW_TEST', estimate_percent => null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.



-- Here you can see the extremely bad CLUSTERING FACTOR of the Index. 
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST';  2

TABLE_NAME INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST   IDX_NEWTEST               10097    2000000           1989246

SQL>

With such a bad clustering factor of the Index, let me try to run a query on the same column and pass a date range and see the execution plan.

SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';

Explained.

SQL> 


SQL> SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 1847055510

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |  3100 | 86800 |  2750   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| NEW_TEST |  3100 | 86800 |  2750   (1)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

With a bad clustering factor, the query is going for a full table scan on NEW_TEST table, and the overall cost of the access path is very high too. Let’s alter the table using Linear attribute clustering – I won’t be using materialized zonemaps here

SQL> alter table NEW_TEST add clustering by linear order(DOB) without materialized zonemap;

Table altered.

SQL> alter table NEW_TEST move online;

Table altered.

-- Now if you check you will the improved CF of the Index.
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST';  

TABLE_NAME INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST   IDX_NEWTEST               10097    2000000              9277

The clustering factor of the Index was greatly improved after altering the table column (DOB) with linear order clustering enabled. Let’s try to run the same SQL and see if there is any improvements or not.

SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';

Explained.


SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------
Plan hash value: 1446839462

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |  3100 | 86800 |    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NEW_TEST    |  3100 | 86800 |    26   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_NEWTEST |  3100 |       |    11   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

15 rows selected.

SQL>
SQL>

And we can see the FTS was replaced with a quick INDEX RANGE SCAN on IDX_NEWTEST index (created to cover DOB column). Look at the great reduction in the cost per step and final cost of the access path and plan tree.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Leave a Comment »

 
%d bloggers like this: