Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Autopsy of a non responsive database using SYSTEMSTATE dumps

Posted by FatDBA on March 14, 2022

One of the very common scenarios that we DBAs has encountered are when the database is hanging, freeze or totally non responsive, or when your database crashes with a serious performance problem, or when the SQLPlus cannot connect to the server or when you noticed that we can not save the instance/node then most of the time DBA take the system state dump for further analysis or you can say for RCA purpose.
In this case, if you want to obtain the current status of the database for later diagnosis, then we need to use SYSTEMSTATE dump to know what the process is doing, what it is waiting for, who is the resource owner, and who is blocking others. In case of the above problems, it is very helpful to analyze the cause of the problem by collecting SYSTEMSTATE dump in time.

In order to understand the purpose, let me create a situation, a problem, I will simulate a row locking (pessimistic type) case in to the database. From the first session I had created a table and insert few rows into it and didn’t commit and opened few others sessions and tried to update same row. And I can see a good amount of blocking is created in the database!

LOCK_TREE
----------
1.97
  1.58
  1.60
  1.75

USERNAME           SID    SERIAL# TY LOCK_TYPE     LOCK_REQUESTE       ID1        ID2 SQL_TEXT                           
----------- ---------- ---------- -- ------------- ------------- ---------- ---------- ----------------------------------------
SYS                 58      17036 TX None               Exclusive    589839       1777 update test set id=09 where ID=10
SYS                 60      21943 TX None               Exclusive    589839       1777 update test set id=09 where ID=10
SYS                 75         91 TX None               Exclusive    589839       1777 update test set id=09 where ID=10

Right time to generate the SYSTEMSTATE dump, below is how you can generate that. Do system state dump few times in a row, so we can determine whether the processes are hung or active.

-- connect with the database if its not hang or freeze
SQL>  connect / as sysdba

-- Connect using prelim option if its stalled or freeze
SQL>  sqlplus -prelim / as sysdba

-- Use SETMYPID command to select the current process as the ORADEBUG process
SQL>  oradebug setmypid
Statement processed. 

-- This is to remove the limitation on the size of the trace file use. 
-- In Oracle 8.1.6 and above the maximum size of the trace file defaults to UNLIMITED
SQL> oradebug unlimit
Statement processed. 

-- This is to take the systemstate dumps with 266 level (short stack + dump)
SQL>  oradebug dump systemstate 266
Statement processed. 

-- Or use below in case running on a RAC setup
SQL>  oradebug -g all dump systemstate 266

SQL>  oradebug dump systemstate 266
Statement processed. 

SQL>  oradebug dump systemstate 266
Statement processed. 

-- This command prints the name of the current trace file
SQL>  Oradebug tracefile_name
/u01/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_191381.trc

There are few other levels which you can use and are discussed below.
Note: I highly recommend to check with Oracle support to use there levels as few of them with higher level are little risky and known to crash instances.

Level 2 : Takes a dump (not including the lock element)
Level 10 : Takes a dump including lock details
Level 11 : Takes a dump + global cache of RAC
Level 256 : Takes short stacks (function stack)
Level 258 (256 + 2) : Takes a short stack + dump (not including the lock element)
Level 266 (256 + 10) : Takes a short stack + dump
Level 267 (256 + 11) : Takes a short stack + dump + global cache of RAC

So we have artificially created a locking scenario on the database and we are aware about the sessions involved in this pessimistic type locking. Now we will use the same information and that will help us to navigate through the massive SYSTEMSTATE dumps. Lets dig into the trace file that we have generated for the session.

The first part or the header contains details like hostname, machine info, service, module and client details, followed by command processing’s that we did through ORADEBUG.

Trace file /u01/app/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_5516.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Build label:    RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
ORACLE_HOME:    /u01/app/oracle/product/19.0.0/dbhome_1
System name:	Linux
Node name:	oracleontario.ontadomain
Release:	3.10.0-1062.el7.x86_64
Version:	#1 SMP Fri Aug 9 18:36:11 GMT 2019
Machine:	x86_64
Instance name: dixitdb
Redo thread mounted by this instance: 1
Oracle process number: 47
Unix process pid: 5516, image: oracle@oracleontario.ontadomain (TNS V1-V3)


*** 2022-03-07T23:29:00.706787-05:00
*** SESSION ID:(79.52211) 2022-03-07T23:29:00.706811-05:00
*** CLIENT ID:() 2022-03-07T23:29:00.706818-05:00
*** SERVICE NAME:(SYS$USERS) 2022-03-07T23:29:00.706824-05:00
*** MODULE NAME:(sqlplus@oracleontario.ontadomain (TNS V1-V3)) 2022-03-07T23:29:00.706830-05:00
*** ACTION NAME:() 2022-03-07T23:29:00.706836-05:00
*** CLIENT DRIVER:(SQL*PLUS) 2022-03-07T23:29:00.706840-05:00

Processing Oradebug command 'setmypid'

*** 2022-03-07T23:29:00.706867-05:00
Oradebug command 'setmypid' console output: <none>

*** 2022-03-07T23:29:05.127755-05:00
Processing Oradebug command 'unlimit'

*** 2022-03-07T23:29:05.127963-05:00
Oradebug command 'unlimit' console output: <none>

*** 2022-03-07T23:29:10.346290-05:00
Processing Oradebug command 'dump systemstate 266'

Next part is where you will see process summary and its dumps, including all user OSPIDs, their commands and blocking details with wait information and is the most interesting one and important section of the dump to understand the wait/blocking chain.

Dumping the Process Summary
       1: PSEUDO process
       2: PMON ospid 3904 sid 2 ser 9826, waiting for 'pmon timer' 
       3: CLMN ospid 3906 sid 3 ser 8008, waiting for 'pmon timer' 
       4: PSP0 ospid 3908 sid 4 ser 46393, waiting for 'rdbms ipc message' 
       5: VKTM ospid 3910 sid 5 ser 26025, waiting for 'VKTM Logical Idle Wait' 
       6: GEN0 ospid 3915 sid 6 ser 36023, waiting for 'rdbms ipc message' 
       7: MMAN ospid 3917 sid 7 ser 17720, waiting for 'rdbms ipc message' 
.....
......
.......
      47: USER ospid 5516 sid 79 ser 52211, 
      49: USER ospid 4429 sid 58 ser 17036, waiting for 'enq: TX - row lock contention' 
          Cmd: UPDATE
          Blocked by inst: 1, sid: 97, ser: 2507
          Final Blocker inst: 1, sid: 97, ser: 2507
      50: USER ospid 4538 sid 75 ser 91, waiting for 'enq: TX - row lock contention' 
          Cmd: UPDATE
          Blocked by inst: 1, sid: 97, ser: 2507
          Final Blocker inst: 1, sid: 97, ser: 2507
      51: W006 ospid 4290 sid 76 ser 17641, waiting for 'Space Manager: slave idle wait' 
      52: USER ospid 4636 sid 60 ser 21943, waiting for 'enq: TX - row lock contention' 
          Cmd: UPDATE
          Blocked by inst: 1, sid: 97, ser: 2507
          Final Blocker inst: 1, sid: 97, ser: 2507
      55: M004 ospid 4722 sid 74 ser 6422, waiting for 'class slave wait' 

And from the above trace section we understood that the final blocker is with SID 97, serial 2507 where SIDs 58,75 and 60 all waiting on event ‘enq: TX – row lock contention’.

Next section is where it dumps buffer cache working set information which contains size of the BC, number of pools, state information and other information. Followed by process level statistics – state, session, latching, post information along with short stack dumps which is debugging/triggering events that generates a trace file in case of any error and full chain. The trace also contains details about wait stacks, blocking chains (waiters), session event history,

Here in below section you will also see the SQL text (update test set id=09 where ID=10) and its details which has caused the blocking, its statistics, execution counts, active locking info, SQL ID, Table Info, HashValues and namespace info.

.....
........
............
 (WS) size: 48950 (0) wsid: 3 state: 0 pool: 3
 (WS) bsi: 0 bsz: 8192 pg: 0 dbwr: 0
 (WS) bgotten: 334899 fbwanted: 0
 (WS) sumwrt: 21641 sumscn: 0
 (WS) Waits fb: 0 wc: 0 bb: 33 sl: 0
 (WS) inspected fb: 296603 db: 9136 pn: 154
.....
...
.....

*** 2022-03-07T23:29:12.285778-05:00
PROCESS 46: 
  ----------------------------------------
  SO: 0x7deed748, type: process (2), map: 0x7d3d9878
      state: LIVE (0x4532), flags: 0x1
      owner: (nil), proc: 0x7deed748
      link: 0x7deed768[0x7deed768, 0x7deed768]
      child list count: 6, link: 0x7deed7b8[0x7def8f68, 0x74f28318]
      pg: 0
  SOC: 0x7d3d9878, type: process (2), map: 0x7deed748
       state: LIVE (0x99fc), flags: INIT (0x1)
  (process) Oracle pid:46, ser:2, calls cur/top: (nil)/0x7024aec8
            flags : (0x0) -  icon_uid:0 logon_pdbid=0
            flags2: (0x0),  flags3: (0x510) 
            call error: 0, sess error: 0, txn error 0
            intr queue: empty
    (post info) last post received: 166 0 3
                last post received-location: ksl2.h LINE:4395 ID:kslpsr
                last process to post me: 0x7d3b5758 1 6
...
....
    Short stack dump: 
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-read()+14<-sntpread()+28<-ntpfprd()+126<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+905<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
    KGL-UOL (Process state object)
    KGX Atomic Operation Log 0x7d3da9f8
     Mutex (nil)(0, 0) idn 0 oper NONE(0)
     FSO mutex uid 262142 efd 0 whr 0 slp 0
    KGX Atomic Operation Log 0x7d3daa60
...
....
.....
      LibraryHandle:  Address=0x6b734070 Hash=f528c430 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
        ObjectName:  Name=update test set id=09 where ID=10 
          FullHashValue=d45588e2ae2d8ef0798380f7f528c430 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=4113089584 OwnerIdn=0 
        Statistics:  InvalidationCount=0 ExecutionCount=4 LoadCount=2 ActiveLocks=4 TotalLockCount=8 TotalPinCount=1 
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=3 HandleInUse=3 HandleReferenceCount=0 
        Concurrency:  DependencyMutex=0x6b734120(0, 2, 0, 0) Mutex=0x6b7341c0(0, 95, 0, 0) 
        Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000] 
        WaitersLists:  
          Lock=0x6b734100[0x6b734100,0x6b734100] 
          Pin=0x6b7340e0[0x6b7340e0,0x6b7340e0] 
          LoadLock=0x6b734158[0x6b734158,0x6b734158] 
        Timestamp:  Current=03-07-2022 23:12:11 
        HandleReference:  Address=0x6b734258 Handle=(nil) Flags=[00] 
        LibraryObject:  Address=0x6b732f28 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] Flags3=[0000] PublicFlags=[0000] 
          ChildTable:  size='16' 
            Child:  id='0' Table=0x6b733da0 Reference=0x6b7338a8 Handle=0x6b732918 
        NamespaceDump:  
          Parent Cursor:  sql_id=7m0w0yzukjj1h parent=0x6b732ff8 maxchild=1 plk=y ppn=n prsfcnt=0 obscnt=0

Next we will check more details about the blocker or the session that has blocked others, here SID 97 is the case.

    (session) sid: 97 ser: 2507 trans: 0x7815a9a8, creator: 0x7d3d9878
              flags: (0x41) USR/- flags2: (0x40009) -/-/INC
              flags_idl: (0x0) status: -/-/-/- kill: -/-/-/-
              DID: 0001-002E-000000010000-0000-00000000, short-term DID: 
              txn branch: (nil)
              edition#: 134              user#/name: 0/SYS
              oct: 0, prv: 0, sql: (nil), psql: 0x71cc7dc8
              stats: 0x705a2930, PX stats: 0x128f8d84
    service name: SYS$USERS
    client details:
      O/S info: user: oracle, term: pts/1, ospid: 4321
      machine: oracleontario.ontadomain program: sqlplus@oracleontario.ontadomain (TNS V1-V3)
      application name: sqlplus@oracleontario.ontadomain (TNS V1-V3), hash value=2432467899
.....
........
.........
    Current Wait Stack:
     0: waiting for 'SQL*Net message from client'
        driver id=0x62657100, #bytes=0x1, =0x0
        wait_id=650 seq_num=652 snap_id=1
        wait times: snap=11 min 38 sec, exc=11 min 38 sec, total=11 min 38 sec
        wait times: max=infinite, heur=11 min 38 sec
        wait counts: calls=0 os=0
        in_wait=1 iflags=0x1a0
    There are 3 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 58, ser: 17036
      wait event: 'enq: TX - row lock contention'
        p1: 'name|mode'=0x54580006
        p2: 'usn<<16 | slot'=0x9000f
        p3: 'sequence'=0x6f1
      row_wait_obj#: 77878, block#: 122337, row#: 0, file# 1
.........
............
.............
    Session Wait History:
        elapsed time of 0.000001 sec since current wait
     0: waited for 'SQL*Net message to client'
        driver id=0x62657100, #bytes=0x1, =0x0
        wait_id=649 seq_num=651 snap_id=1
        wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.000040 sec of elapsed time
     1: waited for 'SQL*Net message from client'
        driver id=0x62657100, #bytes=0x1, =0x0
        wait_id=648 seq_num=650 snap_id=1
        wait times: snap=0.000557 sec, exc=0.000557 sec, total=0.000557 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.000017 sec of elapsed time
     2: waited for 'SQL*Net message to client'
        driver id=0x62657100, #bytes=0x1, =0x0
        wait_id=647 seq_num=649 snap_id=1
        wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.000015 sec of elapsed time
     3: waited for 'SQL*Net message from client'
        driver id=0x62657100, #bytes=0x1, =0x0
        wait_id=646 seq_num=648 snap_id=1
        wait times: snap=0.000280 sec, exc=0.000280 sec, total=0.000280 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.000014 sec of elapsed time
     4: waited for 'PGA memory operation'
        =0x0, =0x0, =0x0
        wait_id=645 seq_num=647 snap_id=1
        wait times: snap=0.000349 sec, exc=0.000349 sec, total=0.000349 sec
        wait times: max=infinite
        wait counts: calls=0 os=0

....
......
........

You will also see details on SQLs that we’ve executed using same SID (97), here in my case I was running few SQLs to identify locking in the database and were captured in the dump along with its parent and child cursor details, followed by some object level stats.

      ----------------------------------------
      SO: 0x74f5c378, type: LIBRARY OBJECT LOCK (118), map: 0x6bd886b0
          state: LIVE (0x4532), flags: 0x1
          owner: 0x7def8f48, proc: 0x7deed748
          link: 0x74f5c398[0x74eec918, 0x74f44c98]
          child list count: 0, link: 0x74f5c3e8[0x74f5c3e8, 0x74f5c3e8]
          pg: 0
      SOC: 0x6bd886b0, type: LIBRARY OBJECT LOCK (118), map: 0x74f5c378
           state: LIVE (0x99fc), flags: INIT (0x1)

      LibraryObjectLock:  Address=0x6bd886b0 Handle=0x71cc7dc8 Mode=N 
        CanBeBrokenCount=1 Incarnation=1 ExecutionCount=1 
        
        User=0x7d658150 Session=0x7d658150 ReferenceCount=1 
        Flags=CNB/[0001] SavepointNum=558 Time=03/07/2022 23:17:34 
      LibraryHandle:  Address=0x71cc7dc8 Hash=3b6daacc LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
        ObjectName:  Name=select	sn.USERNAME,
	m.SID,
	sn.SERIAL#,
	m.TYPE,
	decode(LMODE,
		0, 'None',
		1, 'Null',
		2, 'Row-S (SS)',
		3, 'Row-X (SX)',
		4, 'Share',
		5, 'S/Row-X (SSX)',
		6, 'Exclusive') lock_type,
	decode(REQUEST,
		0, 'None',
		1, 'Null',
		2, 'Row-S (SS)',
		3, 'Row-X (SX)',
		4, 'Share',
		5, 'S/Row-X (SSX)',
		6, 'Exclusive') lock_requested,
	m.ID1,
	m.ID2,
	t.SQL_TEXT
from 	v$session sn,
	v$lock m ,
	v$sqltext t
where 	t.ADDRESS = sn.SQL_ADDRESS
and 	t.HASH_VALUE = sn.SQL_HASH_VALUE
and 	((sn.SID = m.SID  
          FullHashValue=245c9e2e0077915a5f18e8bc3b6daacc Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=997042892 OwnerIdn=0 
        Statistics:  InvalidationCount=0 ExecutionCount=1 LoadCount=2 ActiveLocks=1 TotalLockCount=1 TotalPinCount=1 
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0 
        Concurrency:  DependencyMutex=0x71cc7e78(0, 2, 0, 0) Mutex=0x71cc7f18(0, 28, 0, 0) 
        Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000] 
        WaitersLists:  
          Lock=0x71cc7e58[0x71cc7e58,0x71cc7e58] 
          Pin=0x71cc7e38[0x71cc7e38,0x71cc7e38] 
          LoadLock=0x71cc7eb0[0x71cc7eb0,0x71cc7eb0] 
        Timestamp:  Current=03-07-2022 23:17:33 
        HandleReference:  Address=0x71cc8298 Handle=(nil) Flags=[00] 
        LibraryObject:  Address=0x70f48488 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] Flags3=[0000] PublicFlags=[0000] 
          ChildTable:  size='16' 
            Child:  id='0' Table=0x70f49300 Reference=0x70f48e08 Handle=0x70e14070 
        NamespaceDump:  
          Parent Cursor:  sql_id=5y678rhxqvaqc parent=0x70f48558 maxchild=1 plk=y ppn=n prsfcnt=0 obscnt=0 
      ----------------------------------------
      SO: 0x74eec8f8, type: LIBRARY OBJECT LOCK (118), map: 0x717c4470
          state: LIVE (0x4532), flags: 0x1
          owner: 0x7def8f48, proc: 0x7deed748
          link: 0x74eec918[0x74eec998, 0x74f5c398]
          child list count: 0, link: 0x74eec968[0x74eec968, 0x74eec968]
          pg: 0
      SOC: 0x717c4470, type: LIBRARY OBJECT LOCK (118), map: 0x74eec8f8
           state: LIVE (0x99fc), flags: INIT (0x1)

      LibraryObjectLock:  Address=0x717c4470 Handle=0x73516078 Mode=N 
        CanBeBrokenCount=3 Incarnation=3 ExecutionCount=0 
        Context=0x7f6fe229eb68 
        User=0x7d658150 Session=0x7d5ecd58 ReferenceCount=1 
        Flags=[0000] SavepointNum=0 Time=03/07/2022 23:17:33 
      LibraryHandle:  Address=0x73516078 Hash=0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
        Name:  Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 
        Statistics:  InvalidationCount=1 ExecutionCount=1146 LoadCount=2 ActiveLocks=6 TotalLockCount=354 TotalPinCount=1506 
        Counters:  BrokenCount=3 RevocablePointer=3 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0 
        Concurrency:  DependencyMutex=0x73516128(0, 0, 0, 0) Mutex=0x73517920(0, 1552, 0, 0) 
        Flags=RON/PIN/PN0/EXP/CHD/[10012111] Flags2=XRIV/[0010] 
        WaitersLists:  
          Lock=0x73516108[0x73516108,0x73516108] 
          Pin=0x735160e8[0x735160e8,0x735160e8] 
          LoadLock=0x73516160[0x73516160,0x73516160] 
        LibraryObject:  Address=0x73514f30 HeapMask=0000-0001-0001-0000 Flags=EXS/RIV[0200] Flags2=/FRI[400000] Flags3=[0000] PublicFlags=[0000] 
        NamespaceDump:  
          Child Cursor:  Heap0=0x73515000 Heap6=0x72b8cc00 Heap0 Load Time=03-07-2022 23:10:39 Heap6 Load Time=03-07-2022 23:10:39

Other important section is the ‘Enqueue Dump‘. Here in our case this section discloses that enqueue name was “TX-0009000F-000006F1-00000000-00000000” and was acquired in ‘X’ (Exclusive) mode by the holder session ID 97.

2022-03-07 23:29:12.319*:ksq.c@10787:ksqdmc(): Enqueue Dump      (enqueue) TX-0009000F-000006F1-00000000-00000000  DID: ksqlkdid: 0001-002E-00000001

      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x16
      mode: X, lock_flag: 0x20, lock: 0x7815a9e0, res: 0x7b39e9e0
      own: 0x7d658150, sess: 0x7d658150, proc: 0x7d3d9878, prv: 0x7b39e9f0
       xga: (nil), heap: UGA

Now let’s check the same info for waiter processes (49, 52).

-- SID 49
2022-03-07 23:29:12.445*:ksq.c@10787:ksqdmc(): Enqueue Dump      (enqueue) TX-0009000F-000006F1-00000000-00000000  DID: ksqlkdid: 0001-0031-00000009

      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x16
      req: X, lock_flag: 0x30, lock: 0x7b272110, res: 0x7b39e9e0
      own: 0x7d5f6cf8, sess: 0x7d5f6cf8, proc: 0x7d3dd898, prv: 0x7b39ea00


-- SID 52
2022-03-07 23:30:33.181*:ksq.c@10787:ksqdmc(): Enqueue Dump      (enqueue) TX-0009000F-000006F1-00000000-00000000  DID: ksqlkdid: 0001-0032-00000005

      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x16
      req: X, lock_flag: 0x30, lock: 0x7b271a78, res: 0x7b39e9e0
      own: 0x7d621360, sess: 0x7d621360, proc: 0x7d3dedf8, prv: 0x7b272120

At the end of the process dumps it prints details about the Hex/heap dump of the block, block row dumps involved.

-- For Process 52 (PID 52).
Dump of memory from 0x000000009012E000 to 0x0000000090130000
09012E000 0000A206 0041DDE1 0042614F 00000000  [......A.OaB.....]
09012E010 00000000 00110001 00013036 0042614E  [........60..NaB.]
09012E020 00008000 00031E02 00000000 00130005  [................]
09012E030 000006D9 0100068B 001C0190 80008000  [................]
09012E040 00423B71 00000000 00000000 00000000  [q;B.............]
09012E050 00000000 00000000 00000000 000D0100  [................]
09012E060 002CFFFF 1EFF1F52 00001EFF 1F9A000D  [..,.R...........]
09012E070 1F8E1F94 1F821F88 1F761F7C 1F6A1F70  [........|.v.p.j.]
09012E080 1F5E1F64 1F521F58 0284024C 02F302BC  [d.^.X.R.L.......]
09012E090 0363032C 03D1039A 04470408 04BD047E  [,.c.......G.~...]
09012E0A0 053304FC 05B00571 061C05E6 06900652  [..3.q.......R...]
09012E0B0 070006C8 076C0736 07DA07A3 08480810  [....6.l.......H.]
09012E0C0 08B5087F 092108EB 098E0958 0A0A09CC  [......!.X.......]
09012E0D0 0A7E0A40 0AF30ABC 0B610B2A 0BDF0BA0  [@.~.....*.a.....]
09012E0E0 0C4F0C18 0CBE0C86 0D2D0CF6 0D9D0D66  [..O.......-.f...]
09012E0F0 0E0B0DD4 0E810E42 0EF70EB8 0F6D0F36  [....B.......6.m.]
09012E100 0FEB0FAC 10591022 10C71090 113F1106  [....".Y.......?.]
09012E110 11AD1176 121D11E5 128D1254 12FB12C4  [v.......T.......]
09012E120 13691332 13DE13A7 145C141D 14D21493  [2.i.......\.....]
09012E130 15471511 15B3157D 162715E9 1695165F  [..G.}.....'._...]
09012E140 170216CB 176F1739 17DD17A7 18491813  [....9.o.......I.]
09012E150 18BD187F 193118F3 199F1969 1A1B19DD  [......1.i.......]
09012E160 1A871A51 1AF31ABD 1B641B2C 1BD01B9A  [Q.......,.d.....]
09012E170 1C3E1C07 1CAC1C74 1D191CE3 1D851D4F  [..>.t.......O...]
09012E180 1DF91DC3 74007000 6D006900 7A006900  [.....p.t.i.m.i.z]
09012E190 72006500 69005F00 6D006E00 6D006500  [.e.r._.i.n.m.e.m]
09012E1A0 72006F00 5F007900 77006100 72006100  [.o.r.y._.a.w.a.r]
09012E1B0 27006500 27002000 61006600 73006C00  [.e.'. .'.f.a.l.s]
09012E1C0 27006500 20002900 6F006E00 73005F00  [.e.'.). .n.o._.s]
09012E1D0 62007500 74007300 62007200 70005F00  [.u.b.s.t.r.b._.p]
....
......
...........
09012FFD0 002C12C1 11C10201 0201002C 002C10C1  [..,.....,.....,.]
09012FFE0 0FC10201 0201002C 002C0EC1 0DC10201  [....,.....,.....]
09012FFF0 0201002C 002C0CC1 0BC10201 614F0600  [,.....,.......Oa]
Block header dump:  0x0041dde1
 Object id on Block? Y
 seg/obj: 0x13036  csc:  0x000000000042614e  itc: 2  flg: O  typ: 1 - DATA


block_row_dump:
tab 0, row 0, @0x1f9a
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 0b
tab 0, row 1, @0x1f94
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 0c
tab 0, row 2, @0x1f8e
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 0d
tab 0, row 3, @0x1f88
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 0e
.....
.......
col  0: [ 2]  c1 16
end_of_block_dump

At the end it prints details about the redo transport, ASM File Info, SGA Information, PDB/CDB details …

KGSKSGA info:
is the system running: yes
  session that stopped the system: (nil)
number of plans 1 classes 4
low threshold delta 0
high threshold delta 2
number of cpus[0]: 1
thresholds - low[0]: 1, high[0]: 3
dynamic threshold adjustments - misbehave_adjust[0]: 0, cpu_util_adj[0]: 2
dynamic thresholds - low[0]: 3, high[0]: 5
running cnt[0]: 1, runnable cnt[0]: 3
flags: 0x0
debug: 0x0
all pointers:
num_latches 300 current 222 chg_class 0x74e364a0 chg_state 0x74e42020
class_list 0x74dfaa88,top_plan 0x7084e9d8, plan_list 0x74e03d38
heaps - kgsk subheap 0x74e142f0, plan heap 0x7b5d7438
backgrounds in_sched (pids): 

.....
..........
...............
Redo Log Management (from krsk_dump_rlm)
  SRL min count:  0
  SRL max count:  0
  SRL min avail:  0
  SRL file size:  0
  SRL block size: 0
  SRL use FRA :   FALSE

So, if you want to dig in deep and want more proofs and details about the incident, SYSTEMSTATE dumps is a great source of all internals that were involved at the time of the problem. This isn’t limited to row locking problems, you can generate it for all sort of events, problems or issues i.e. Row cache, latch free, LC Locks, CBC are few of the common ones …

Hope It Helped!
Prashant Dixit

Advertisement

3 Responses to “Autopsy of a non responsive database using SYSTEMSTATE dumps”

  1. Saibal Ghosh said

    Excellent article Prashant! The level of detail provided is very insightful! Keep on writing such fantastic articles.

  2. The reason is, Siebel is trying to launch the Licence agreement and if the user is not a adminstrator some of the DLL’s will not be registered.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: