Tales From A Lazy Fat DBA

Den of an Oracle DB Performance freak & a fan of PostgreSQL & Cassandra …. \,,/

  • Likes

    • 167,493
  • Archives

  • Categories

  • Subscribe

  • Advertisements
  • Interested in Database Performance Tuning ?

    Learn Oracle Performance Tuning from experts

  • Prashant Dixit is the FatDBA

  • Follow Tales From A Lazy Fat DBA on WordPress.com
  • My Twitter Feeds

  • Disclaimer!

    FatDBA or Oracle ‘Ant’ is an independent web-blog/site.The experiences, Test cases, views, and opinions expressed in this website are my own and does not reflect the views or opinions of my employer.

    This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.
    Product and company names mentioned in this website may be the trademarks of their respective owners.

Posts Tagged ‘troubleshooting’

OPatch – Error occurred during initialization of VM, Could not reserve enough space for XXXXXXKB object heap

Posted by FatDBA on February 19, 2019

Hi Guys,

Disucssing a random issue what i’ve encountered few hours back, is a problem related with the new version of the OPatch which when unzipped generating a weird error and is discussed below.



[oracle@gunna:~/app/oracle/product/12.2.0/dbhome_1/OPatch/28822515]$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./

Error occurred during initialization of VM
Could not reserve enough space for 39957221KB object heap


On OCS download page for OPatch, the auto version is set to 32-bit (Linux X86).
Check if the name of the downloaded file is something similar ‘p6880880_122010_LINUX.zip’. If yes, then you have downloaded the 32 bit version. Choose ‘Linux x86-64’ as the right vrsion and try again

Let’s try again.



[oracle@gunna:~/app/oracle/product/12.2.0/dbhome_1/OPatch/28822515]$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Oracle Interim Patch Installer version 12.2.0.1.16
Copyright (c) 2018, Oracle Corporation.  All rights reserved.

PREREQ session

Oracle Home       : /home/oracle/app/oracle/product/12.2.0/dbhome_1
Central Inventory : /home/oracle/app/oraInventory
   from           : /home/oracle/app/oracle/product/12.2.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.16
OUI version       : 12.2.0.1.4
Log file location : /home/oracle/app/oracle/product/12.2.0/dbhome_1/cfgtoollogs/opatch/opatch2018-12-24_00-46-02AM_1.log

Invoking prereq "checkconflictagainstohwithdetail"

Prereq "checkConflictAgainstOHWithDetail" passed.

OPatch succeeded.


All good now!

Hope It Helps
Prashant ‘Fatdba’ Dixit

Advertisements

Posted in troubleshooting | Tagged: | Leave a Comment »

How to freeze your Oracle Database – Yes, you heard it right, for fun!

Posted by FatDBA on February 13, 2019

Hi Everyone,

I’m back after a long time, was little occupied. Now lets’s bring the forum/blog back to life after this long break and would like to post about something really interesting and is *Only* for your test or sandboxes. Okay let me quickly unwrap what i have this time with me.

Ever imagine how i can *Freeze* my database to simulate some test scenarios or to study/understand database behavior and troubleshoot ? – I think most of us does. So, here goes my first post of this year with something ill-famed 🙂

Test: I will try to lock/freeze the log writer process (LGWR) on one of my test machine. We only have one LGWR process in this database.
There are few other ways as well to do this but will discuss one of the easiest way to do this using OS commands.
Intention: Will see what exactly happens in the database during the time when system is busy with log writer freezed.
Situation: I hve generated some artificial load in order to understand the full impact of this act.
Note: THIS IS NOT AT ALL TO TEST ON YOUR PRODUCTION SYSTEMS AND IS ONLY FOR TEST SYSTEMS OR FOR FUN! – NO MALEVOLENCE

Step 1: Let’s first check details about our Log writer process on this test database.
Okay so its running fine with PID 5302



[oracle@gunna scripts]$ ps -ef|grep lgwr
oracle     5302      1  0 18:11 ?        00:00:26 ora_lgwr_gunnadb
oracle    47463  47411  0 22:43 pts/4    00:00:00 grep lgwr


Step 2: Now i will put this OS process on HOLD by using the KILL command with its argument -STOP.



[oracle@gunna scripts]$ kill -STOP 5302
[oracle@gunna scripts]$ ps -ef|grep lgwr
oracle     5302      1  0 18:11 ?        00:00:26 ora_lgwr_gunnadb
oracle    47499  47411  0 22:43 pts/4    00:00:00 grep lgwr


Step 3: Now lets see what’s going on on the database.
Below is the screenshot that i have captured during the time when the system was doing work with its LGWR process on HOLD and is a screes-fix from ORATOP for some real time monitoring.
So few of the things that we understand are:
– We are getting lots of ‘Log File Sync’ & ‘Log Buffer Space’ waits in the database and is pretty obvious due to lack if inactivity of Log Writer. This happens when server processes write data into the log buffer faster than the LGWR process can write it out. Just want you to update that Log buffer is an area in SGA where redo is temporarily stored before it can be written to disk.
‘Buffer Busy waits’ are quite visible too during the probe period due to constant DMLs operations happening in the background by user SOE.
– Lot’s of blockers coming in RED and point to SID 19 and is of LGWR process.
– The number of Average Active Sessions or AAS or system workload was getting higher.

Step 4: So during above step (3), the system was extremely slow and is almost dead and it failed to flush what’s there in the belly of the log writer and this causing this entire fiasco in the system. So let’s try to resume the process, but before that lets quickly check what is the status of this process on OS.

Let’s check the process details which are available under proc directory with process ID.



[oracle@gunna 5302]$ pwd
/proc/5302/task/5302


Okay, so the process is in mode T which means STOPPED.



[oracle@gunna 5302]$ more status
Name:   ora_lgwr_gunnad
State:  T (stopped)
Tgid:   5302
Pid:    5302




[oracle@gunna trace]$ ps aux |grep lgwr
oracle     5302  0.1  1.2 1392316 28304 ?       Ts   Nov19   0:34 ora_lgwr_gunnadb ---> T represents in HOLD/PAUSED.
oracle    75583  0.0  0.0 103376   808 pts/1    S+   01:25   0:00 grep lgwr


Next, let’s see if there is something captured by TOP utility.



[oracle@gunna 5302]$ top -p 5302
top - 00:41:39 up  6:46,  8 users,  load average: 3.69, 3.69, 2.63
Tasks:   1 total,   0 running,   1 sleeping,   1 stopped,   0 zombie   --------> 1 Stopped means one process is stopped.
Cpu(s):  0.0%us,  6.2%sy,  0.0%ni,  0.0%id, 93.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2346844k total,  2192512k used,   154332k free,     3820k buffers
Swap:  2031612k total,   281356k used,  1750256k free,   932548k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  5302 oracle    20   0 1359m  29m  26m S  0.0  1.3   0:34.20 ora_lgwr_gunnad                ----> TIME+ column was freezed and was not moving, reflects that the process was freeze.


PROCESS STATE CODES
       Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a
       process.
       D    Uninterruptible sleep (usually IO)
       R    Running or runnable (on run queue)
       S    Interruptible sleep (waiting for an event to complete)
       T    Stopped, either by a job control signal or because it is being traced.
       W    paging (not valid since the 2.6.xx kernel)
       X    dead (should never be seen)
       Z    Defunct ("zombie") process, terminated but not reaped by its parent.


Now i guess enough of mess we spilled, time to resume the process.



[oracle@gunna scripts]$
[oracle@gunna scripts]$ kill -CONT 5302


Quickly the status of the process gets changed.



[oracle@gunna 5302]$ more status |grep State
State:  D (disk sleep)
[oracle@gunna 5302]$

[oracle@gunna 5302]$ more status |grep State
State:  S (sleeping)


Soon after we bring the process back to normal mode its all good and system was stable back again.
Once again, this is only for the test or play to understand performance.

Hope It Helps
Prashant ‘The Fatdba’ Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

DBMS_XPLAN and its different options/arguments available

Posted by FatDBA on May 10, 2018

Hi Mates,

I am back after a long time to write something that is very vital piece of information when you are about to start any troubleshooting task, yes that is the EXECUTION PLAN, well there are many ways to generate the CBO plans (i.e. AUTOTRACE, extended/debug traces, utlxpls.sql, V$SQL_PLAN, Few of the specialized SQL specific AWR reports like awrsqrpt.sql, STS etc.) but the most common and best way of doing thisng in this subecjt is to use DBMS_XPLAN & its function DISPLAY_CURSOR.

So, yes today’s topic is to understand what all options do we have to generate a more interactive, detailed, elaborative plans.

Okay will start it with very rudimentary (BASIC) styled plan and slowly will use rest of the arguments/options available. The plan includes the operation, options, and the object name (table, index, MV, etc)


SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('0h79fq6vx4p99',0,'BASIC'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

------------------------------------
| Id  | Operation         | Name   |
------------------------------------
|   0 | SELECT STATEMENT  |        |
|   1 |  TABLE ACCESS FULL| DIXEMP |
------------------------------------

Next is the most common way of generating the plans that’s with the DISPLAY function which allows us to display the execution plan stored in the plan table. First we explain a SQL statement.
Next we use the DBMS_XPLAN.DISPLAY function to display the execution plan.


SQL> explain plan for select * from dixemp;
Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3980174453

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |    14 |   532 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| DIXEMP |    14 |   532 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Next is the ‘ALLSTATS LAST’ option for FORMAT parameter. Lets see how to use it.
ALLSTATS is a shortcut for IOSTATS and MEMSTATS (e.q. BUFFER information) information and the other keyword LAST can be specified to see only the statistics for the last execution.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0h79fq6vx4p99, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

--------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |     14 |00:00:00.01 |      10 |
|   1 |  TABLE ACCESS FULL| DIXEMP |      1 |     14 |     14 |00:00:00.01 |      10 |
--------------------------------------------------------------------------------------

But if you take a look at the above plan you’ll see that the plan doesn’t contain few of the vital columns or stats like COST and bytes processed (BYTES) as it doesn’t comes by default with that, you actually have to add few more predicates to get that info. That is +cost and +bytes with your FORMAT parameter.

Below is how you can get that missing info from the plan.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'0h79fq6vx4p99',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0h79fq6vx4p99, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

-----------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |       |     3 (100)|     14 |00:00:00.01 |      10 |
|   1 |  TABLE ACCESS FULL| DIXEMP |      1 |     14 |   532 |     3   (0)|     14 |00:00:00.01 |      10 |
-----------------------------------------------------------------------------------------------------------

And the ALL parameter will give you the general plan but rest all of the details like Query Block Name / Object Alias, Predicate Information, column projection details.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('0h79fq6vx4p99',0,'ALL'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0h79fq6vx4p99, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS FULL| DIXEMP |    14 |   532 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / DIXEMP@SEL$1

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "DIXEMP"."EMPNO"[NUMBER,22], "DIXEMP"."ENAME"[VARCHAR2,10],
       "DIXEMP"."JOB"[VARCHAR2,9], "DIXEMP"."MGR"[NUMBER,22],
       "DIXEMP"."HIREDATE"[DATE,7], "DIXEMP"."SAL"[NUMBER,22],
       "DIXEMP"."COMM"[NUMBER,22], "DIXEMP"."DEPTNO"[NUMBER,22]

ALLSTATS is a shortcut for IOSTATS and MEMSTATS (e.q. BUFFER information) with your plan.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('0h79fq6vx4p99',0,'ALLSTATS'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0h79fq6vx4p99, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

--------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      7 |        |     98 |00:00:00.01 |      70 |
|   1 |  TABLE ACCESS FULL| DIXEMP |      7 |     14 |     98 |00:00:00.01 |      70 |
--------------------------------------------------------------------------------------

Now, if you want to view additional details of your plan, for example set of hints to reproduce the statement or OUTLINE, you can use it in your format parameter.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'0h79fq6vx4p99',format=>'ALLSTATS LAST +outline'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0h79fq6vx4p99, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

--------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |     14 |00:00:00.01 |      10 |
|   1 |  TABLE ACCESS FULL| DIXEMP |      1 |     14 |     14 |00:00:00.01 |      10 |
--------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "DIXEMP"@"SEL$1")
      END_OUTLINE_DATA
  */

Adding or removing any data/stats from the plan, that’s possible using + or – signs follwed by argument.
Example: if you want t view cost and bytes information use +cost, +bytes in your plan or if you want to remove the same info in your run of dbms_xplan use -cost, -bytes.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'0h79fq6vx4p99',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0h79fq6vx4p99, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

-----------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |       |     3 (100)|     14 |00:00:00.01 |      10 |
|   1 |  TABLE ACCESS FULL| DIXEMP |      1 |     14 |   532 |     3   (0)|     14 |00:00:00.01 |      10 |
-----------------------------------------------------------------------------------------------------------

You can write a mix of both as below


SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(sql_id => '0h79fq6vx4p99', cursor_child_no => 0, FORMAT => 'TYPICAL -ROWS -BYTES +COST +PARALLEL +PARTITION +IOSTATS +MEMSTATS +ALIAS +PEEKED_BINDS +OUTLINE +PREDICATE -PROJECTION +REMOTE +NOTE'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
SQL_ID  0h79fq6vx4p99, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from dixemp

Plan hash value: 3980174453

--------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      7 |        |     3 (100)|          |     98 |00:00:00.01 |      70 |
|   1 |  TABLE ACCESS FULL| DIXEMP |      7 |     14 |     3   (0)| 00:00:01 |     98 |00:00:00.01 |      70 |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / DIXEMP@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "DIXEMP"@"SEL$1")
      END_OUTLINE_DATA
  */

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | Leave a Comment »

How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! – Part 2

Posted by FatDBA on January 13, 2018

Repeat: I think I’ve mistakenly deleted the post, so re-posting the same.

Hey Folks,

Back with second edition of my previous post on troubleshooting performance issues using Linux in-built profiler named ‘perf’. Many of the users requested to write about few use cases situations on when and how to use this tool.

So, this post is all about discussing some cases that i have faced while working on few performance tuning projects and few of them are from the tests that I’ve performed on my Lab systems.

Okay coming back to the reason on why we need this, perf i mean ?
The answer is sometimes Oracle wait interface is not enough and you need to dig deeper inside the system to understand the problem. That point you have to use some third party, in-built dedicated tools for performance investigation. There perf might help you to understand what your resource intensive query is doing on OS layers.
So, using the tool you can monitor your process on what’s its doing!

And in case if you are using perf on Virtualised system, you might get error “perf.data file has no samples“.
In order to fix it try with “-e cpu-clock” arguments to collect the sample data and then interpret the file.

Okay so one fine day, you saw a spike in server’s CPU consumption using TOP, Oratop, OEM, Scheduled scripts or by any possible monitoring techniques and you have identified the process and its other attributes.
Using the PID you have reached the SID, SQL_ID, SQL_TEXT and other statistics. So now you might want to analyze and understand the oracle’s execution.

 
Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 Session             :  DIXIT (1:53089)
 SQL ID              :  71aa5ju8pwtf2
 SQL Execution ID    :  16777216
 Execution Started   :  01/09/2018 06:10:32
 First Refresh Time  :  01/09/2018 06:10:32
 Last Refresh Time   :  01/09/2018 06:13:03
 Duration            :  152s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus@dixitlab.localdomain (TNS V1-V3)

Global Stats
========================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
========================================================
|     163 |     150 |      115 |     5M | 648K |   5GB |
========================================================

SQL Plan Monitoring Details (Plan Hash Value=12102956)
===================================================================================================================================================================
| Id   |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |       Activity Detail       | Progress |
|      |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |         (# samples)         |          |
===================================================================================================================================================================
| -> 0 | SELECT STATEMENT     |      |         |       |       150 |     +2 |     1 |        0 |      |       |          |                             |          |
| -> 1 |   SORT AGGREGATE     |      |       1 |       |       150 |     +2 |     1 |        0 |      |       |          |                             |          |
| -> 2 |    TABLE ACCESS FULL | T294 |   96523 | 56397 |       151 |     +1 |     1 |      306 | 648K |   5GB |    99.34 | Cpu (69)                    |      31% |
|      |                      |      |         |       |           |        |       |          |      |       |          | db file sequential read (6) |          |
|      |                      |      |         |       |           |        |       |          |      |       |          | direct path read (126)      |          |
===================================================================================================================================================================

Above results points to high CPU Time (150 Seconds). Now to look more deeper in to things and to understand what exactly the session is doing.

[root@dixitlab ~]#  perf top -e cpu-clock -p 3505 

   PerfTop:     349 irqs/sec  kernel:42.7%  exact:  0.0% [1000Hz cpu-clock],  (target_pid: 3505)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                    DSO
             _______ _____ ___________________________ __________________________________________________

             1981.00 46.2% _raw_spin_unlock_irqrestore [kernel.kallsyms]
              906.00 21.1% _intel_fast_memcmp          /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
              270.00  6.3% kole_simple_string_match    /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 
               96.00  2.2% copy_user_generic_unrolled  [kernel.kallsyms]
               96.00  2.2% kcbgtcr                     /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               54.00  1.3% __intel_new_memset          /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 
               37.00  0.9% __intel_ssse3_rep_memcpy    /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               30.00  0.7% kghfrf                      /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               28.00  0.7% kghalf                      /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               27.00  0.6% kcbldio                     /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               24.00  0.6% kdxbrs1                     /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               24.00  0.6% kspgvc                      /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               21.00  0.5% kksMapCursor                /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle



Results shows system was mostly busy processing kernel calls (kernel.kallsyms) and its function ‘_raw_spin_unlock_irqrestore‘ with 46% of its time spend using CPU cycles. It’s coming with huge overhead and with large samples and in general irq_restore shows up because re-enabling interrupts is costly, but this is not the real CPU consumption but is how the tool, Interrupt and system works in few of the Virtual systems – I was testing it on my Lab VM.

Here i would like to thank Tanel Poder for reviewing the document and highlighting the issue with the perf top command when executed inside a VM environment.

…… perf top that concluded _raw_spin_unlock_irqrestore taking lots of CPU time. I’ve been through this myself in past – and this is likely not actual CPU usage but rather how perf, OS and interrupts work in some VMs (basically measurement bias/errors by these tools). I assume that you tested in a VM? ……..

Next in the list is function/object ‘_intel_fast_memcmp‘ called by oracle with 21.1% which i believe is for fast memory compilations.

So the conclusion of the analysis:
Most of the CPU by process was spend processing kernel calls and for for fast memory compilations.

Purpose Revisit: This gives you a glimpse of what happens with the process calls and monitors a cpu-bound, database process is pass its time.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

Using Flame Graphs to analyze performance & workloads: Part 1

Posted by FatDBA on January 13, 2018

Hi Guys,

Now after my last two posts on Perf tool/profiler, i guess this the right time to move ahead with interpretation on stack trace data.
Back with another post, this time as promised about ‘Flame Graphs’!
So, would like to first start with the very standard question – Why, what this Flame Graphs are ?

Flame Graphs is tool developed by Kernel/System Performance maestro Brendan Gregg to project or visualize strack traces for both user/all processes and kernel-level stacks for CPU, Off-CPU etc. You can use it on stacks/traces collected by various profiler i.e. perf, Dtrace, SystemTap etc.

In this post we will be using PERF for statistics collection and project them using Flame Graphs and will understand the complexity and the code paths of many Oracle’s internal functions. This being the very first edition for this subject we will start with basics and will first discuss about CPU graphs and try to understand why and where CPUs are busy using stack traces and identify hot code-paths.This can be a really helpful tool for fast identification of performance problems where conventional oracle performance tools failed.

Okay, let’s jump on how you will do it. First you have to download the toolkit (Download Link)
This tool generates results in SVG format using below three steps.

  • Stats Collection using any of the profiler e.g. Perf, Dtrace, Systemtap etc.
  • Compress or Fold your stats
  • Finally use the core (flamegraph.pl) script to generate the SVG file.

Okay, so now after enough description on tool, let’s start out first case – “CPU Flame Graphs for an expensive SQL Statement”.

Scenario: We have a resource intensive query “COUNT on one of the big table “ running from session with SPID 19280
So, we will be collecting process strack traces while the SQL statement was in run.

SQL> explain plan for select count(*) from dwh.**********_*** where ****TIMESTAMP > '23-JUN-16' and ******KEY > 789999999;

Explained.

SQL> SELECT * FROM TABLE(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1112565023

------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |    14 |  2604K  (1)| 08:40:53 |
|   1 |  SORT AGGREGATE    |                     |     1 |    14 |            |          |
|*  2 |   TABLE ACCESS FULL| **************_**** |    39M|   524M|  2604K  (1)| 08:40:53 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("*****KEY">789999999 AND "*****TIMESTAMP">'23-JUN-16')

14 rows selected.

SQL>

Now collecting perf statistics for the process at low-frequency of 997 Hertz.

[root@dselimw5862 perftest]# perf record -a -g -F997 -p 19280 

perf record: Woken up 3926 times to write data ]
[ perf record: Captured and wrote 981.809 MB perf.data (9751081 samples) ]

This will create perf data file under the same directory.

[root@dselimw5862 perftest]# ls -ltrh
-rw-------. 1 root root 982M Jan 12 14:58 perf.data

Next, collapse the stacks using below

[root@dselimw5862 perftest]# /root/perftest/FlameGraph-master/stackcollapse-perf.pl 

Finally, time to render the SVG file.

[root@dselimw5862 perftest]# /root/perftest/FlameGraph-master/flamegraph.pl --title "Flame Graph: Selective data search on dwh.W6AUDITHISTORY_FULL Table" > ImageAuditHistory.svg 

Let’s open the SVG file in browser and see what’s it’s got!

Important Note: The x-axis reflects stack profile population, and the y-axis represents the stack depth

Here you will see many of the familiar Oracle specific functions/processes opiodr, opitsk, opiino, opidrv etc.
Okay so, If you take a look at the map, you will see few of the Linux processes spawned first which called Oracle Internal functions next which moved to the query execution (i.e SELECT FETCH -> GROUP BY SORT -> TABLE ACCESS) and then to other functions. For example you can see the execution time during the sampling was spent in kdstf0100101000km functions which was called by kdsttgr which is Kernel Data Scan Table Get Row.

So, now question ‘Which function is on CPU the most ??
The top edge shows who is on CPU directly, copy_user_generic_unrolled function which is used when there is no optimization on CPU level.
Remember Flame Graphs follows ancestry, means copy_user_generic_unrolled was called by __pread_nocancel and __pread_nocancel was called by ksfd_skgfqio and which called … and then it points to …

Next look little above from the bottom of the graph with three functions __libc_start_main, __pread_nocancel and kaf4reasrp1km. If we visually compare them it is clear that __pread_nocancel function was running more often than other two. We see one of the linux function __pread_nocancel() was sampled a lot during the query execution and found repeated 2 times in the graph. This is used by libc when a cancellation point (a POSIX threads concept) is not permitted.

In Short: So, the Flame Graphs might help you when you have a problem in hand ‘My Production Database had poor performance”.
It was a heavy CPU consumer, so i used to CPU profiler to see WHY ? – But do you thing reading those lengthy, messy, long, raw traces that easy. So, here at this point there is a need of some data representation tool to capture the problems and sources and to visualize them. Here comes Flame Graphs to picture, The hero! It also gives you flexibility to move your cursor to any of the object to understand how much percent the object was present during the sample time.

Well, in later editions for this subject i will try to cover more scenarios. The sole purpose of this post is to give audience an idea, a gist of the topic.

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | Leave a Comment »

How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! – Part 1

Posted by FatDBA on January 8, 2018

Hi Pals,

Some time back i was part of a performance task-force for one of our customers in Czech where i did a complete full 360 degrees auditing and troubleshooting of their full database stack. But you know most of the times it not just the software or the database but the real problem lies underneath the OS layer.

Okay, so back to the discussion – So during the assignment at one point where i had to collect some statistics for the OS to prove my point that ‘Problem is not with the Database‘ but with something else, i had used few of the tools like DTrace, Linux Trace Toolkit or LTT, Systemtap and one of the profiler which i hardly used before ‘Perf‘.

Yes, so today’s topic of discussion is to give you some basic idea (Will cover the deep analysis using the tool in future) about the ‘Perf’ profiler which helps to dissect and understand hardware events and measures performance.

Okay, let me take a real time scenario where a sudden surge in CPU in a production environment raised questions.
During the initial probing of issues we have found that that about 25% of the CPU is consumed by SQLPLUS command with PID 6720, lets see what and why its doing that ….

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6720 oracle    20   0  113m  14m 9.8m R 25.8  0.5   0:20.32 sqlplus   as sysdba

Lets first Sample on-CPU functions for the process ID on 99 Hertz using below command for PID 6720.

perf record -F 99 -p 6720 

This will produce a raw file named ‘perf.data’ under the same directory from where you called it.
Let it run for some time. You can cancel the execution by pressing CTRL+C.

[root@dixitlab perf]# ls -ltrh
total 12K
-rw-------. 1 root root 12K Jan  8 01:14 perf.data

Now lets see what’s there inside the raw file —- lets Dump raw contents from perf.data as hex for debugging.
For this we will use perf script with flag D, it will automatically considers the perf.data file present under the directory to show charts.


[root@dixitlab perf]# perf script -D

0x110 [0x48]: event: 1
.
. ... raw event: size 72 bytes
.  0000:  01 00 00 00 01 00 48 00 ff ff ff ff 00 00 00 00  ......H.........
.  0010:  00 00 00 00 00 00 00 00 ff ff ff 9f ff ff ff ff  ................
.  0020:  00 00 00 81 ff ff ff ff 5b 6b 65 72 6e 65 6c 2e  ........[kernel.
.  0030:  6b 61 6c 6c 73 79 6d 73 5d 5f 74 65 78 74 00 00  kallsyms]_text..
.  0040:  00 00 00 00 00 00 00 00                          ........
.
0x110 [0x48]: PERF_RECORD_MMAP -1/0: [0(0xffffffff9fffffff) @ 0xffffffff81000000]: [kernel.kallsyms]_text

0x158 [0x78]: event: 1
.
. ... raw event: size 120 bytes
.  0000:  01 00 00 00 01 00 78 00 ff ff ff ff 00 00 00 00  ......x.........
.  0010:  00 00 00 a0 ff ff ff ff ff bf 01 00 00 00 00 00  ................
.  0020:  00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64  ......../lib/mod
.  0030:  75 6c 65 73 2f 32 2e 36 2e 33 39 2d 34 30 30 2e  ules/2.6.39-400.
.  0040:  32 39 37 2e 33 2e 65 6c 36 75 65 6b 2e 78 38 36  297.3.el6uek.x86
.  0050:  5f 36 34 2f 6b 65 72 6e 65 6c 2f 64 72 69 76 65  _64/kernel/drive
.  0060:  72 73 2f 6d 64 2f 64 6d 2d 6d 6f 64 2e 6b 6f 00  rs/md/dm-mod.ko.
.  0070:  00 00 00 00 00 00 00 00                          ........
.


0x2bf8 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f6062943000(0x4c000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libons.so

0x2c60 [0x68]: event: 1
.
. ... raw event: size 104 bytes
.  0000:  01 00 00 00 02 00 68 00 23 11 00 00 23 11 00 00  ......h.#...#...
.  0010:  00 80 ba 62 60 7f 00 00 00 90 4d 00 00 00 00 00  ...b`.....M.....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 6e 6e 7a 31 32 2e 73 6f 00  lib/libnnz12.so.
.  0060:  00 00 00 00 00 00 00 00                          ........
.
0x2c60 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f6062ba8000(0x4d9000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libnnz12.so

0x2cc8 [0x68]: event: 1
.
. ... raw event: size 104 bytes
.  0000:  01 00 00 00 02 00 68 00 23 11 00 00 23 11 00 00  ......h.#...#...
.  0010:  00 10 2f 63 60 7f 00 00 00 c0 22 00 00 00 00 00  ../c`.....".....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 69 70 63 31 2e 73 6f 00 00  lib/libipc1.so..
.  0060:  00 00 00 00 00 00 00 00                          ........
.
0x2cc8 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f60632f1000(0x22c000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libipc1.so

0x2d30 [0x68]: event: 1
.
. ... raw event: size 104 bytes
.  0000:  01 00 00 00 02 00 68 00 23 11 00 00 23 11 00 00  ......h.#...#...
.  0010:  00 50 72 63 60 7f 00 00 00 20 07 00 00 00 00 00  .Prc`.... ......
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 6d 71 6c 31 2e 73 6f 00 00  lib/libmql1.so..
.  0060:  00 00 00 00 00 00 00 00                          ........
.
0x2d30 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f6063725000(0x72000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmql1.so

0x2d98 [0x78]: event: 1
.
. ... raw event: size 120 bytes
.  0000:  01 00 00 00 02 00 78 00 23 11 00 00 23 11 00 00  ......x.#...#...
.  0010:  00 c0 99 63 60 7f 00 00 00 90 3a 00 00 00 00 00  ...c`.....:.....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 63 6c 6e 74 73 68 63 6f 72  lib/libclntshcor
.  0060:  65 2e 73 6f 2e 31 32 2e 31 00 00 00 00 00 00 00  e.so.12.1.......
.  0070:  00 00 00 00 00 00 00 00                          ........
.
0x2d98 [0x78]: PERF_RECORD_MMAP 4387/4387: [0x7f606399c000(0x3a9000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libclntshcore.so.12.1

0x2e10 [0x70]: event: 1
.
. ... raw event: size 112 bytes
.  0000:  01 00 00 00 02 00 70 00 23 11 00 00 23 11 00 00  ......p.#...#...
.  0010:  00 a0 f6 63 60 7f 00 00 00 30 65 03 00 00 00 00  ...c`....0e.....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 63 6c 6e 74 73 68 2e 73 6f  lib/libclntsh.so
.  0060:  2e 31 32 2e 31 00 32 2e 00 00 00 00 00 00 00 00  .12.1.2.........
.
0x2e10 [0x70]: PERF_RECORD_MMAP 4387/4387: [0x7f6063f6a000(0x3653000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libclntsh.so.12.1

0x2e80 [0x70]: event: 1
.
. ... raw event: size 112 bytes
.  0000:  01 00 00 00 02 00 70 00 23 11 00 00 23 11 00 00  ......p.#...#...
.  0010:  00 20 a1 67 60 7f 00 00 00 40 0e 00 00 00 00 00  . .g`....@......
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 73 71 6c 70 6c 75 73 2e 73  lib/libsqlplus.s
.  0060:  6f 00 32 2e 31 00 32 2e 00 00 00 00 00 00 00 00  o.2.1.2.........
.
0x2e80 [0x70]: PERF_RECORD_MMAP 4387/4387: [0x7f6067a12000(0xe4000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libsqlplus.so

0x2ef0 [0x38]: event: 1
.
. ... raw event: size 56 bytes
.  0000:  01 00 00 00 02 00 38 00 23 11 00 00 23 11 00 00  ......8.#...#...
.  0010:  00 90 8f 09 ff 7f 00 00 00 10 00 00 00 00 00 00  ................
.  0020:  00 00 00 00 00 00 00 00 5b 76 64 73 6f 5d 00 70  ........[vdso].p
.  0030:  00 00 00 00 00 00 00 00                          ........
.

Now lets get back to the point where we had the process (PID: 6720) consuming highest CPU resources.

now profiling CPU’s with sample rate of 99 Hertz/second, -a for all CPU’s, -g for stack traces, sleep of 60 seconds for duration of run.

[root@dixitlab perf]# perf record -F 99 -a -g -- sleep 60 
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.388 MB perf.data (~16948 samples) ]

Now as i discussed above, it will generate a new perf.data under the directory.

[root@dixitlab perf]# ls -ltrh
total 768K
-rw-------. 1 root root 379K Jan  8 01:30 perf.data.old
-rw-------. 1 root root 386K Jan  8 01:31 perf.data

Next, we will interpret results from the newly created ‘perf.data’ file.
I have used ‘perf report -n –stdio’, you can simply use ‘perf report’ too.


[root@dixitlab perf]#
[root@dixitlab perf]# perf report -n --stdio 
# Events: 2K cpu-clock
#
# Overhead  Samples            Command          Shared Object                                    Symbol
# ........ ..........  ...............  .....................  ........................................
#
    50.05%       1001             sshd  [e1000]                [k] e1000_phy_read_status
                       |
                       --- e1000_phy_read_status
                          |
                          |--99.60%-- dev_hard_start_xmit
                          |          sch_direct_xmit
                          |          dev_queue_xmit
                          |          ip_finish_output
                          |          ip_output
                          |          ip_local_out
                          |          ip_queue_xmit
                          |          tcp_transmit_skb
                          |          tcp_write_xmit
                          |          __tcp_push_pending_frames
                          |          tcp_push
                          |          tcp_sendmsg
                          |          inet_sendmsg
                          |          sock_aio_write
                          |          do_sync_write
                          |          vfs_write
                          |          sys_write
                          |          system_call_fastpath
                          |          __write_nocancel
                           --0.40%-- [...]

     8.35%        167          swapper  [kernel.kallsyms]      [k] native_safe_halt
                    |
                    --- native_safe_halt
                        default_idle
                        cpu_idle
                        rest_init
                        start_kernel
                        x86_64_start_reservations
                        x86_64_start_kernel

     4.45%         89          sqlplus  [kernel.kallsyms]      [k] finish_task_switch
                    |
                    --- finish_task_switch
                        __schedule
                       |
                       |--87.64%-- __cond_resched
                       |          _cond_resched
                       |          mutex_lock
                       |          |
                       |          |--91.03%-- process_output_block
                       |          |          n_tty_write
                       |          |          tty_write
                       |          |          vfs_write
                       |          |          sys_write
                       |          |          system_call_fastpath
                       |          |          __write_nocancel
                       |          |
                       |           --8.97%-- process_output
                       |                     n_tty_write
                       |                     tty_write
                       |                     vfs_write
                       |                     sys_write
                       |                     system_call_fastpath
                       |                     __write_nocancel
                       |
                        --12.36%-- schedule
                                  sysret_careful
                                  |
                                  |--81.82%-- __write_nocancel
                                  |
                                   --18.18%-- __write_nocancel


.................
...........................
..................................

Note:
You can further sort results based on cpu, shared objects etc.
perf report –sort=cpu
perf report –sort=dso

Above results reflects the division. While taking a look at the tree, it shows a combination or a club of few OS Layer components (Starting from dev_hard_start_xmit) was sampled of time 50.05*99.60 = 49.84%
with parent sshd command and shared object [e1000] and is a physical read operation.

Rest of all might seem gibberish but needs time and patience to understand. I will try to cover the profiler in more detail with more use case scenarios in future. Till the time you can check the WIKI page of the tool.

Note: In case if you are using perf on VMWare system, you might get error “perf.data file has no samples“.
In order to fix it try using the ‘perf record’ option with -e cpu-clock arguments to collect the sample data and then use ‘perf report’ to interpret the file.

example:
perf record -F 99 -a -g -e cpu-clock — sleep 20


Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | 3 Comments »

DB Upgrade Failed! :( – Have to start it from the scratch, Really ??

Posted by FatDBA on December 4, 2017

Hi Mates,

Lets recall the list of per-requisites that we have to perform before doing a database upgrade!
Few of them i quickly recall are – Full RMAN Backup, Creating Guaranteed Restore points etc.

But are these options allows us to resume any failed upgrade activity ? — NO

Let’s go back to time before Oracle 12c (Ex: 10g, 11g etc.)

Resuming of failed upgrade was not possible and you have to recall the catupgrd.sql and open the database in UPGRADE mode but this will only start it from the beginning and won’t resume from the point it got failed last time. So same amount of time we be consumed once again for the upgrade

SQL> startup upgrade
SQL> spool /home/dixit/upgradedir/catupgrdscriptout.log
SQL> @?/rdbms/admin/catupgrd.sql

Now lets discuss about 12c, starting from 12c Release 1 we have the all new parallel upgrade utility (catctl.pl). For this you still have to open your database in UPGRADE mode and will be controlled by this new perl script.
If talk about restarting of failed upgrade, here we’ve got a small relief as you will be able to restart the upgrade from the phase where it was crashed/stopped using -P option.

Example:

Serial Phase #:56 [UPGR] Files:1 Time: 3s
******************* Migration ******************
Restart Phase #:57 Files: 1 Time: 0s
Serial Phase #:58 Files: 1 Time: 102s
Restart Phase #:59 Files: 1 Time: 1s
Serial Phase #:60 Files: 1 Time: 221s
Restart Phase #:61 Files: 1 Time: 0s
Serial Phase #:62 Files: 1 Time: 25s
Restart Phase #:63 Files: 1 Time: 0s
Serial Phase #:64 Files: 1 Time: 2s
Serial Phase #:65 Files: 1 Calling sqlpatch with LD_LIBRARY_PATH=/u01/app/oracle/product/12.1.0/db_1/lib; export LD_LIBRARY_PATH;/u01/app/oracle/product/12.1.0/db_1/perl/bin/perl -I /u01/app/oracle/product/12.1.0/db_1/rdbms/admin -I /u01/app/oracle/product/12.1.0/db_1/rdbms/admin/../../sqlpatch /u01/app/oracle/product/12.1.0/db_1/rdbms/admin/../../sqlpatch/sqlpatch.pl -verbose -upgrade_mode_only > catupgrd_datapatch_upgrade.log 2> catupgrd_datapatch_upgrade.err FAILED

As we see the upgrade failed at phase number 65, so we can restart the upgrade mentioning the phase number (-P) to resume it from that phase and upgrade process won’t repeat steps happened successfully.

$ORACLE_HOME/perl/bin/perl catctl.pl -p 65 -l /home/dixit/upgradedir/catupgrd.sql

******************* Migration ******************
Serial Phase #:65 Files: 1 Time: 28s
Serial Phase #:66 Files: 1 Time: 3921s
Serial Phase #:68 Files: 1 Time: 0s
Serial Phase #:69 Files: 1 Time: 57s
Serial Phase #:70 Files: 1 Time: 1027s
Serial Phase #:71 Files: 1 Time: 3s
Serial Phase #:72 Files: 1 Time: 0s
Serial Phase #:73 Files: 1 Time: 18s
Grand Total Time: 6459s
LOG FILES: (catupgrd*.log)
Upgrade Summary Report Located in:
/u01/app/oracle/product/12.1.0/db_1/cfgtoollogs/JANI/upgrade/upg_summary.log
Grand Total Upgrade Time: [0d:0h:40m:39s]

Now, lets discuss the main objective of writing this new blog entry.
Starting from Oracle 12c Release 2 we have got more control over upgrade activities as with this latest release you will be able to resume the upgrade from where it got stopped last time. Oracle introduced a new flag -R with catctl.pl to achieve this, so don’t worry about the upgrade phases 🙂

$ORACLE_HOME/perl/bin/perl catctl.pl -R -l /home/dixit/upgradedir/catupgrd.sql

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , , | Leave a Comment »

Optimizer making wrong decisions. Histogram is that you ?

Posted by FatDBA on December 4, 2017

Hey Mates,

Would discuss a scenario that i faced some time back with one of the customer where one of the query (Part of Monthly application maintenance) started taking huge amount of time to complete.
During the analysis I’ve found it doing some extremely expensive Full Table Scans on a fat table when it shouldn’t. Let me elaborate what was the issue.

After reading the execution plan and the extended traces of Optimizer (10053) to understand the decisions made by the CBO.

Traces gave me a hint that the optimizer was able to accurately estimate the cardinality for all of the predicates as long as string is 32 bytes or less. But at one of the operation step in execution plan the estimations were wrong and crooked. I saw the CBO decided to ‘Return each of the row’ irrespective of strings and it was actually happening because the histogram contains only the first 32 characters of the string data.

So the histogram was considering only first 32 characters of the column value and the optimizer this particular value i.e “siebelm01-vip@2002:101:11f::101:1” is happening 5776 times in the table. So it ignored the index and used full table scan. Oracle will only consider the first 32 characters in the column in generating histograms and in such cases what we faced as a aftermath, the optimizer will make wrong decisions.

So as a immediate solution to the problem I’ve deleted the histogram on this specific column and this plan fixed this query plan change issue.

BEGIN
dbms_stats.delete_column_stats(ownname=>'SIEBELADMIN', tabname=>'TX2121', colname=>'C131HJA12', col_stat_type=>'HISTOGRAM');
END;
/

Other fixes:
– In case of longer strings (32 or more) in case of text search use Oracle Text or CONTAINS operator in SQL logic.

Hope That Helps!
Prashant Dixit

Posted in Advanced | Tagged: , , | Leave a Comment »

Why my ASM Command Line (ASMCMD) is so slow, How to make ASMCMD run faster ?

Posted by FatDBA on November 1, 2017

ASMCMD is a command-line utility that you can use to easily view and manipulate files and directories within Automatic Storage Management (ASM) disk groups. It can list the contents of disk groups, perform searches, create and remove directories and aliases, display space utilization, and more.

But some of the times i have noticed some errors or slowness in command executions with ASMCMD and i believe you guys have too faced the same in the past. And the problem with ASMCMD errors are that they are not much detailed and are obscure which makes the troubleshooting more complicated and direction less.

There are few of the methods or the ways that i follow to handle performance issues with the asmcmd command line are given below.

1. Use ORADEBUG
What happens when you connect with ASMCMD ?
It actually connects with the ASM instance with SYSASM privilege and the same moment a background local process spawns with name BEQ.
Now once you recognize the process using ps -ef commands you can bind it to the ORADEBUG with errostack flag.

2. Truss or STRACE of ASMCMD and its processes.

example:

$ strace -aeft -o /dixit/labtest/asmcmdtrbsst.log asmcmd
ASMCMD>

3. Set the DBI_TRACE for ASMCMD perl tracing
Asmcmd is a wrapper for asmcmdcore script which is a shell script that starts a Perl program. If you are a Perl programmer, you can easily extend this script to add additional commands and security checks. We can use the DBI_TRACE argument to collect more diagnostic information on asm command line.

$ export DBI_TRACE=1
ASMCMD>

Hope That Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

CKPT process blocking table gather stats session intermittently … Why ?

Posted by FatDBA on November 1, 2017

Hi Folks,
Today i would like to share one of the experience that we had while working in one of the production system with a customer with a weird situation where the Gather stats session getting intermittently blocked by CKPT database background process in database and sometimes stays as it is for more than 30 mins.

We were getting the “enq: RO – fast object reuse” wait contention when gathering schema/table statistics in parallel using DBMS_STATS package with DEGREE>1

During the analysis i’ve generated the System State dump and saw a clear blocking situation on object Enq RO-00010059-00000001 .

Snippet from SS Dump.

Resource Holder State
Enq RO-00010059-00000001 14: waiting for ‘rdbms ipc message’
Enq RO-00010059-00000001 89: 89: is waiting for 14: 89:

Workaround for the problem is either of the two solutions
– We can try flush the Buffer Cache.
Though flushing the buffer cache causes dirty blocks to be written to disk and will have some performance impact.
– Setting the parameter “_db_fast_obj_truncate” to FALSE.
This will revert back to 9i way of invalidating buffers in buffer cache.

Hope That Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

 
%d bloggers like this: