Tales From A Lazy Fat DBA

Fan of Oracle DB & Performance, PostgreSQL, Cassandra & much more … \,,/

  • Likes

    • 278,716
  • Archives

  • ΰ₯

  • Categories

  • Subscribe

  • Prashant Dixit is the FatDBA

  • Follow Tales From A Lazy Fat DBA on WordPress.com
  • My Twitter Feeds

  • Disclaimer!

    FatDBA or Oracle ‘Ant’ is an independent web-blog/site.The experiences, Test cases, views, and opinions expressed in this website are my own and does not reflect the views or opinions of my employer.

    This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.
    Product and company names mentioned in this website may be the trademarks of their respective owners.

Posts Tagged ‘Tuning’

Quarantine a SQL ? This can save from a system performance pandemic …

Posted by FatDBA on July 6, 2021

Hi Guys,

Last weekend I was asked to examine a performance problem where customer was observing sporadic resource spikes on CPU & PIOs (Physical IO scans), and had no idea how to handle the situation. They already have identified the problematic SQL and asked me to take a look, while doing initial investigation on the system, I found the resource manager was enabled and was killing the ill SQL every time when it tries to breach the limits set on CPU & other resources.

I spent some time understanding query behavior and performance, and I found this a scenario of SQL plan (PHV) flip where the query optimizer toggles between a good and a worst plan. This being a time sensitive issue and customer needs a quick fix before I identify the reason behind the PHV flip, we have to think of any technique on how we can stop the SQL execution with bad PHVs. This was an Oracle EE 19c system running on Exadata, I immediately proposed for a solution using new feature of ‘SQL Quarantine‘, which helps to prevent reuse of same execution plan which was terminated by resource managers due to resource limits.

Let me show you how to do that, here I will demonstrate how to quarantine a SQL & all its execution plans.

Well this is just a use case, you can also quarantine the SQL for all its PHVs or execution plans or even through the SQL Text. Here I will show how you can quarantine a SQL for all its execution plans.

Note: This feature is currently available only on Enterprise Edition on Engineered Systems, like Exadata and Exadata Cloud Service, but for this demo, I will be enabling the ‘_exadata_feature_on‘ parameter for demo purposes on my sandbox setup.

** Please don’t touch this parameter in your production environments if you’re not running on Oracle EE on the top of Exadata or Exadata Cloud services.

[oracle@canttowinsec ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Jul 3 23:33:41 2021
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


SQL>
SQL> @hidden
Enter value for param: exadata_feature_on
old   5:    and a.ksppinm like '%&param%'
new   5:    and a.ksppinm like '%exadata_feature_on%'

Parameter                                     Session Value             Instance Value            descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_exadata_feature_on                           FALSE                     FALSE                     Exadata Feature On

SQL>
SQL>


SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>


SQL> startup
ORACLE instance started.

Total System Global Area  922745216 bytes
Fixed Size                  8903040 bytes
Variable Size             843055104 bytes
Database Buffers           67108864 bytes
Redo Buffers                3678208 bytes
Database mounted.
Database opened.
SQL>
SQL>
SQL>



SQL>
SQL> @hidden
Enter value for param: exadata_feature_on
old   5:    and a.ksppinm like '%&param%'
new   5:    and a.ksppinm like '%exadata_feature_on%'

Parameter                                     Session Value             Instance Value            descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_exadata_feature_on                           TRUE                      TRUE                      Exadata Feature On

SQL>




-- For this demo I have created one table with 1000000 rows. 

SQL> select count(*) from bigtab;

  COUNT(*)
----------
   1000000

Next check the SQLID, PHV and TEXT of the SQL for which you want to setup this QUARANTINE feature and will create the quarantine configuration for it. You can set it for SQL ID, SQL ID + PHV or for SQL TEXT.

SQL> select sql_id, plan_hash_value, sql_text from v$sqlarea where sql_text like '%select * from bigtab%';

SQL_ID        PLAN_HASH_VALUE
------------- ---------------
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6fwqzurbc8y7k       441133017
select * from bigtab

SQL>


-- You can create CREATE_QUARANTINE BY SQL_ID 
quarantine_config := DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_ID(SQL_ID => '&sql_id')
quarantine_config := DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_ID(SQL_ID => '&sql_id', PLAN_HASH_VALUE => '&PLAN_HASH_VALUE');

-- You can create CREATE_QUARANTINE_BY_SQL_TEXT.
quarantine_config := DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_TEXt(SQL_TEXT => to_clob('select count(*) from emp'));



-- This command creates a quarantine configuration for SQL ID '6fwqzurbc8y7k' and all its execution plans.
SQL>
SQL> DECLARE
quarantine_sql VARCHAR2(30);
BEGIN
quarantine_sql :=   DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_ID(SQL_ID => '6fwqzurbc8y7k');
END;
/  

PL/SQL procedure successfully completed.

SQL>



-- Check if quarantine config/profile is created.

SQL> select NAME, SQL_TEXT, ELAPSED_TIME, cpu_time,CREATED, ENABLED from dba_sql_quarantine;


NAME                           SQL_TEXT                       ELAPSED_TIME         CPU_TIME                       CREATED                        ENA
------------------------------ ------------------------------ -------------------- ------------------------------ ------------------------------ ---
SQL_QUARANTINE_2akx3bq44wvc5   select * from bigtab           ALWAYS               ALWAYS                         04-JUL-21 12.09.25.567422 AM   YES


Now next we will use the DBMS_SQLQ.ALTER_QUARANTINE procedure to put limits on resource usage i.e. Elapsed time, CPU, IO (MBs), Number of physical (PIOs) requests, Number of logical (LIOs) requests.

SQL>
SQL> BEGIN
    DBMS_SQLQ.ALTER_QUARANTINE(
       QUARANTINE_NAME => 'SQL_QUARANTINE_2akx3bq44wvc5',
       PARAMETER_NAME  => 'CPU_TIME',
       PARAMETER_VALUE => '5');

    DBMS_SQLQ.ALTER_QUARANTINE(
       QUARANTINE_NAME => 'SQL_QUARANTINE_2akx3bq44wvc5',
       PARAMETER_NAME  => 'ELAPSED_TIME',
       PARAMETER_VALUE => '10');
END;
/  2    3    4    5    6    7    8    9   10   11   12

PL/SQL procedure successfully completed.

SQL>


-- let's verify limits that we have set against this SQL Quarantine profile.

NAME                           SQL_TEXT                       ELAPSED_TIME         CPU_TIME                       CREATED                        ENA
------------------------------ ------------------------------ -------------------- ------------------------------ ------------------------------ ---
SQL_QUARANTINE_2akx3bq44wvc5   select * from bigtab           10                   5                              04-JUL-21 12.12.41.918609 AM   YES

Next we will setup the RESOURCE MANAGER.

-- Create a pending area.
SQL> begin
 dbms_resource_manager.create_pending_area();
 end;
 /  2    3    4

PL/SQL procedure successfully completed.

-- Create a consumer group with name 'DIXITPOC_GROUP'.
SQL> begin
 dbms_resource_manager.create_consumer_group(CONSUMER_GROUP=>'DIXITPOC_GROUP',COMMENT=>'To test SQL quarantine feature for one SQL');
 end;
 /  2    3    4

PL/SQL procedure successfully completed.


-- Now bind the consumer group with the user 'DIXIT', I mean this could be any group through which you will execute the query.
SQL> begin
 dbms_resource_manager.set_consumer_group_mapping(attribute => 'ORACLE_USER',value => 'DIXIT',consumer_group =>'DIXITPOC_GROUP' );
 end;
  /  2    3    4

PL/SQL procedure successfully completed.


-- Create a resource plan with name 'POC_FOR_QUARANTINE'.
SQL> begin
dbms_resource_manager.create_plan(plan => 'POC_FOR_QUARANTINE',comment => 'Kill SQL and its other execution plans when exceeding CPU and Elapsed seconds');
end;
/  2    3    4

PL/SQL procedure successfully completed.


-- Create a plan directive by allocating resource plan 'POC_FOR_QUARANTINE' to consumer group 'DIXITPOC_GROUP'. I am setting execution limit of 5 seconds for the SQL.
SQL> begin
 dbms_resource_manager.create_plan_directive(
 plan => 'POC_FOR_QUARANTINE',
 group_or_subplan => 'DIXITPOC_GROUP',
 comment => 'Kill SQL and its other execution plans when exceeding CPU and Elapsed seconds',
 switch_group=>'CANCEL_SQL',
 switch_time => 5,
 switch_estimate=>false);
 end;
 /  2    3    4    5    6    7    8    9   10

PL/SQL procedure successfully completed.




-- Allocate full resources to rest of the sessions which are not part of this plan.
SQL> begin
 dbms_resource_manager.create_plan_directive(PLAN=> 'POC_FOR_QUARANTINE', GROUP_OR_SUBPLAN=>'OTHER_GROUPS',COMMENT=>'leave rest as is');
 end;
 /  2    3    4

PL/SQL procedure successfully completed.



-- Validate and submit pending area.
SQL> begin
 dbms_resource_manager.validate_pending_area();
 end;
 /  2    3    4

PL/SQL procedure successfully completed.

SQL> begin
 dbms_resource_manager.submit_pending_area();
 end;
 /  2    3    4

PL/SQL procedure successfully completed.

SQL>


-- Grant switch privilege to the DIXIT user which will switch it to 'DIXITPOC_GROUP' group.
SQL> begin
dbms_resource_manager_privs.grant_switch_consumer_group('DIXIT','DIXITPOC_GROUP',false);
end;
/  2    3    4

PL/SQL procedure successfully completed.

-- Create initial consumer group for DIXIT user for 'DIXITPOC_GROUP' group.
SQL> begin
dbms_resource_manager.set_initial_consumer_group('DIXIT','DIXITPOC_GROUP');
end;
/  2    3    4

PL/SQL procedure successfully completed.

SQL>

Next set the resource manager at the database level, we will have to set the RESOURCE_MANAGER_PLAN to value or name of the plan ‘POC_FOR_QUARANTINE’.

SQL> show parameter resource

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
resource_limit                       boolean     TRUE
resource_manage_goldengate           boolean     FALSE
resource_manager_cpu_allocation      integer     5
resource_manager_plan                string
SQL> 


SQL> alter DIXITtem set RESOURCE_MANAGER_PLAN = 'POC_FOR_QUARANTINE';
DIXITtem altered.

SQL> show parameter resource

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
resource_limit                       boolean     TRUE
resource_manage_goldengate           boolean     FALSE
resource_manager_cpu_allocation      integer     5
resource_manager_plan                string      POC_FOR_QUARANTINE
SQL>

Execute the SQL for the first time, you will get ‘active time limit exceeded’ due to RM.

SQL>
SQL> select * from emp;

ERROR at line 2:
ORA-00040: active time limit exceeded - call aborted

Now, if you run the SQL for the second time, you will see your quarantine plan/profile will be used and will abort the execution of the SQL.

SQL>
SQL> select * from emp;

ERROR at line 2:
ORA-56955: quarantined plan used

Hope It Helped
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , | Leave a Comment »

Getting SQL Net message from dblink, have you tried DRIVING_SITE hint ?

Posted by FatDBA on June 27, 2021

Hi Guys,

Last week I was asked to take a look in to one performance problem where customer DBA reported excessive ‘SQL*Net message from dblink’ in the system. As I don’t have direct access to the box, I asked him to share both AWR and ASH performance reports for the time when they observe the problem. And yes, he was right, there were few waits on this event but with very high average elapsed times or single execution times. Next I verified the ASH data for the period and I was able to identify this SQL (SELECT) which is frequently waiting on this wait tvent.

About this event, as you guys know SQL Net messages are classified as Network classed events and happens when some part of the final dataset is coming from a remote database which is accessed via DBLink. Alright, so we know the problem and we have the problematic SQL, let’s simulate the problem and the solution.

I will first create the DBLink that I will be using to access the remote data which is present in a different database on a different server/host and next I will create the sample table with some test data both on Site A and Site B.

Reference used:
BIGTAB: Table on Site 1, local site.
BIGTAB2: Table on site 2, remote location

SQL>
SQL> CREATE DATABASE LINK dixitdlink1
  2     CONNECT TO dixit IDENTIFIED BY oracle90
   USING '(DESCRIPTION=
            (ADDRESS = (PROTOCOL = TCP)(HOST = canttowinsec.quebecdomain)(PORT = 1521))
            (CONNECT_DATA=(SERVICE_NAME=dhavaldb.quebecdomain))
          )';  

Database link created.

SQL>


-- On source database, on local host.
CREATE TABLE BIGTAB (
   id         NUMBER,
   weight     NUMBER,
   adate      DATE
);

Table created.

INSERT INTO BIGTAB (id, weight, adate)
SELECT MOD(ROWNUM,1000),
       DBMS_RANDOM.RANDOM,
       SYSDATE-1000+DBMS_RANDOM.VALUE(0,1000)
 FROM all_objects
/

67447 rows created.


SQL> delete from prashant.BIGTAB where rownum < 60000;

59999 rows deleted.




-- This I have created on target database, on a remote host.
CREATE TABLE BIGTAB2 (
   id         NUMBER,
   weight     NUMBER,
   adate      DATE
);

Table created.


INSERT INTO BIGTAB2 (id, weight, adate)
SELECT MOD(ROWNUM,1000),
       DBMS_RANDOM.RANDOM,
       SYSDATE-1000+DBMS_RANDOM.VALUE(0,1000)
 FROM all_objects
/

67447 rows created.

Alright, the test data is created on both the sites, and in order to mimic the exact case that I’d faced, I have intentionally deleted 59999 table from table that exists in site 1. So now we have table BIGTAB with 7735 records and on remote database we have a table with name BIGTAB2 with 67447 rows. Let’s execute few queries and do a Inner Join on both the tables.

Test 1: With no hint or tuning/tweaking of any kind. Asking to bring all data from remote site here to local site and then perform the join operation or this can also be done if we put the hint and mention name of the local table, or table that is small i.e. /*+ DRIVING_SITE(BIGTAB) */

SQL> select * from prashant.BIGTAB, dixit.BIGTAB2@dixitdlink1 where ADATE=DATEOFREGISTER;

        ID     WEIGHT ADATE             ID     WEIGHT DATEOFREG
---------- ---------- --------- ---------- ---------- ---------
       502 -1.363E+09 09-MAR-20        989 1353403341 09-MAR-20
       577 -185863752 22-FEB-21        247 -626839962 22-FEB-21
........
............
...............
.................


Execution Plan
----------------------------------------------------------
Plan hash value: 2705338834

----------------------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         | 19931 |  1362K|   144   (0)| 00:00:01 |        |      |
|*  1 |  HASH JOIN         |         | 19931 |  1362K|   144   (0)| 00:00:01 |        |      |
|   2 |   TABLE ACCESS FULL| BIGTAB  |  6948 |   237K|    68   (0)| 00:00:01 |        |      |
|   3 |   REMOTE           | BIGTAB2 | 19930 |   681K|    76   (0)| 00:00:01 | DIXIT~ | R->S |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ADATE"="DATEOFREGISTER")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   3 - SELECT "ID","WEIGHT","DATEOFREGISTER" FROM "DIXIT"."BIGTAB2" "BIGTAB2"
       (accessing 'DIXITDLINK1.ONTADOMAIN' )


Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

So, this was happening there on that day, on that system with that query waiting on ‘SQL*Net message from dblink‘, now this is the right time I should introduce the DRIVING_SITE hint, I have been trying and experimenting with this hint for a long time now, maybe since my 9i days.

This hint is useful for distributed queries to force optimizer to use a particular site as a driving site, or it instructs the optimizer to execute the query at a different site than that selected by the database. Like in previous test case all rows from remote host which matches the condition are sent to the local site and the join is finally executed on the local site.

Now, I am going to use the DRIVING_SITE (will mention name of the remote table) and force the optimizer to perform all join operations at the remote site, hence the query will be executed there and the result set is returned to the local site.

So, you should try with the DRIVING_SITE hint on the site where the huge table resides!

Test 2: With DRIVING_HINT on remote table BIGTAB2 which is big and has 67447 records.

SQL> select /*+ DRIVING_SITE(BIGTAB2) */ * from prashant.BIGTAB, dixit.BIGTAB2@dixitdlink1 where ADATE=DATEOFREGISTER;

        ID     WEIGHT ADATE             ID     WEIGHT DATEOFREG
---------- ---------- --------- ---------- ---------- ---------
       502 -1.363E+09 09-MAR-20        989 1353403341 09-MAR-20
       577 -185863752 22-FEB-21        247 -626839962 22-FEB-21
.....
.......
........
..........



Execution Plan
----------------------------------------------------------
Plan hash value: 2214131741

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|         |    82 |  5740 |    70   (0)| 00:00:01 |        |      |
|*  1 |  HASH JOIN             |         |    82 |  5740 |    70   (0)| 00:00:01 |        |      |
|   2 |   REMOTE               | BIGTAB  |    82 |  2870 |     2   (0)| 00:00:01 |      ! | R->S |
|   3 |   TABLE ACCESS FULL    | BIGTAB2 | 53126 |  1815K|    68   (0)| 00:00:01 | DHAVA~ |      |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A2"."ADATE"="A1"."DATEOFREGISTER")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   2 - SELECT "ID","WEIGHT","ADATE" FROM "PRASHANT"."BIGTAB" "A2" (accessing '!' )


Note
-----
   - fully remote statement
   - dynamic statistics used: dynamic sampling (level=2)

Look at the cost, rows processed and bytes (data processed), all of them are reduced by a great extent. Of course not much elapsed time difference you see here in this example as the dataset is small and is a less complex query, you can try it yourself and see the magic!

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Parse CPU to Parse Elapsd % – Lets clear the Perplex!

Posted by FatDBA on January 26, 2018

Happy Republic Day!

 

Hi Mates,

I see a lot of confusion, mix-ups and perplexity in between the DBAs on few of the metrics under ‘Instance Efficiency % section’ in AWR reports. Specially for one of the metric “Parse CPU to Parse Elapsd %”. In today’s post i will try to explain the metrics in detail which will help you to understand it in depth and clear the muddiness.

Few of the lines you might have read about this metric i.e.
“we should always look for as low as possible numbers for this metric …”
“Try to achieve the impossible value of zero for this one “

– Even some of the Metalink notes are misleading too.

Scenario:

Below is the snippet from one of my test box.

Note: Just taking a look at the instance efficiency ratios can be very dangerous and i advise to first start with Load Profile, top 5 waits and there on …

Okay, so my definition of this statistic differ from what you judge after reading the name of the metric – This datum signals the delay/wait in parsing of SQL queries during the snap interval.
In our example the value is 1.37% this means that for every CPU second spend parsing we spent about 72.99 (100/1.37) Seconds of clock time. It can happen due to various reasons i.e Latch or any contention between the sessions etc.

The ideal value for this stat should also be 100% like rest of the ratios (Yes, that’s correct!!).

Let’s see how this value was calculated, what all it considers while deducing that final figure of 1.37%.
It takes the “parse time cpu/parse time elapsed * 100” to get the ‘Parse CPU to Parse Elapsed’ figure.

 

Statistic	                              Total	per Second     per Trans
------------------------------------------------------------------------------------------------
parse time cpu                                  398	      0.11	0.01
parse time elapsed                           29,055	      8.07	0.81

 

So, in short each time there is a drop in this metric from 100%, means the database was waiting for something which slowed down the parse times.
If you want to dig in deep then you should trace the session using 10046 tracing with level 8 to see where the other % of parse time is being spent and leaving this for readers to test.

 

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Shared Pool Management in 12c, What’s new!

Posted by FatDBA on January 17, 2018

Hi Everyone,

Few days while working on Shared Pool issue where we were getting ORA-4031 on one of the 11gR2 database i discovered something interesting and new regarding the SGA duration management. Here i wont discuss the problem that i faced and how we fixed but would try to show the architectural changes that has happened with 12c that can now fix these errors/issues.

Let me explain what are ‘Durations’ first. The shared pool is made up of a number of granules. The shared pool then split into sub-pools if you have a large enough SGA, and each sub-pool consists of a number of non-overlapping granules. In 11g each sub-pool also split into four sub-sub-pools known as durations.

What was there before 12c arrived
Starting from Oracle 10g each sub-pool in SGA was divided in to four durations.
Let’s check the distribution by generating the Heap Dump for shared pool, here i used oradebug with level 2 (This provides you the full SGA Summary or you can try with level 2050 to get full summary with contents).

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump heapdump 2
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_8127.trc

.....
******************************************************
HEAP DUMP heap name="sga heap(1,0)"  desc=380030610
Total heap size    =218102664
Total free space   =  1066928
Total reserved free space   =  8439520
Unpinned space     = 38812528  rcr=11971 trn=17906
Permanent space    =208595160
HEAP DUMP heap name="sga heap(1,1)"  desc=380031e68
Total heap size    = 67108512
Total free space   =  2912528
Total reserved free space   =  1382816
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,2)"  desc=3800336c0
Total heap size    =167771280
Total free space   = 92743480
Total reserved free space   =  3852856
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,3)"  desc=380034f18
Total heap size    =268434048
Total free space   = 74547592
Total reserved free space   = 13497472
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,0)"  desc=380039e38
Total heap size    =201325536
Total free space   =    17200
Total reserved free space   =  8435920
Unpinned space     = 26474112  rcr=7934 trn=8094
Permanent space    =192871456
HEAP DUMP heap name="sga heap(2,1)"  desc=38003b690
Total heap size    = 83885640
Total free space   = 48723768
Total reserved free space   =  1035792
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,2)"  desc=38003cee8
Total heap size    =369096816
Total free space   =258674312
Total reserved free space   = 16982464
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,3)"  desc=38003e740
Total heap size    =218102664
Total free space   = 17202608
Total reserved free space   = 10966696
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,0)"  desc=380043660
Total heap size    =184548408
Total free space   =    13008
Total reserved free space   =  5061928
Unpinned space     = 26943408  rcr=4930 trn=9425
Permanent space    =179472608
HEAP DUMP heap name="sga heap(3,1)"  desc=380044eb8
Total heap size    = 67108512
Total free space   = 27568352
Total reserved free space   =     4744
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,2)"  desc=380046710
Total heap size    =352319688
Total free space   =233302736
Total reserved free space   = 15981216
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,3)"  desc=380047f68
Total heap size    =385873944
Total free space   =143746536
Total reserved free space   = 19402616
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
.....
******************************************************

So above stats shows that we have three sub-pools [Sub pool (1,0), (1,1), (1,2), (1,3) ….. (3,0), (3,1), (3,2), (3,3)] of SGA Heaps with Four Durations each heap. And every duration has its own size, free space and reserved free space. This type of distribution possibly causes the ORA 4031 even when you have enough free space in other durations and this is what the actual cause in my earlier case, but here we won’t discuss how we fixed that.

Okay so now lets do the same with 12c database. Lets generate the Heap Dump for SGA in 12c database and see the distributions of durations here. Once again we will use the oradebug to dump heapdump with Level 2.

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump heapdump 2
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/tunedb/tunedb/trace/tunedb_ora_11054.trc


******************************************************
HEAP DUMP heap name="sga heap(1,0)"  desc=0x60103678
 extent sz=0xfe0 alt=304 het=32767 rec=9 flg=0x82 opc=0
 parent=(nil) owner=(nil) nex=(nil) xsz=0x1000000 heap=(nil)
 fl2=0x24, nex=(nil), idx=1, dsxvers=1, dsxflg=0x0
 dsx first ext=0x73000000
 dsx empty ext bytes=0  subheap rc link=0x730000c0,0x730000c0

******************************************************
HEAP DUMP heap name="sga heap(1,3)"  desc=0x60107f80
 extent sz=0xfe0 alt=304 het=32767 rec=9 flg=0x82 opc=0
 parent=(nil) owner=(nil) nex=(nil) xsz=0x1000000 heap=(nil)
 fl2=0x24, nex=(nil), idx=1, dsxvers=1, dsxflg=0x0
 dsx first ext=0x7e000000
 dsx empty ext bytes=0  subheap rc link=0x7e0000c0,0x7e0000c0

Alright, so here we only have two groups of SGA durations – ‘Sub pool 1, duration 0’ and ‘Sub Pool 1 duration 3’ for improved sharability and to avoid ORA 4031 errors.

Hope It helps
Prashant Dixit

Posted in Advanced | Tagged: , , , | Leave a Comment »

Script to measure the Source & Contribution of any Wait Event within AWR snapshots.

Posted by FatDBA on January 1, 2018

🀘 Happy New Year Everyone! 🀘

So, here i am with the maiden post of this year.
Today i will discuss about a script that might be handy and useful while you are investigating any performance problem with the database.
In fact this is one of my favorite script which i always considers to run at a point when we have identified the leading wait event and the time-frame of the spike or for any transient variation in system behavior. This script helps you to measure the source and contribution or the impact of any specific wait event.

The script is a join between ASH views and historical views (dba_hist_active_sess_history AND dba_hist_snapshot) and provides you stats based on Average Number of Active Sessions (AAS) were waiting on this event during the period and with what total contribution in terms of percentage.

Lets work on a use case.
Suppose you have a system with huge User IOs happening, as conformed by any report or tool (AWR, ASH, Via any Script, OEM etc.)

For example using AWR we found one of the User IO wait class event ‘db file scattered read’ was leading the ‘Top Foreground Wait Events’ charts on a specific time when we have the high User IO load on system.

                                           Total Wait       Avg   % DB Wait
Event                                Waits Time (sec)      Wait   time Class
------------------------------ ----------- ---------- --------- ------ --------
db file scattered read              10,282       82.8    41.50ms  43.9 User I/O

So, now you have identified the leading wait event and the timeframe with maximum intensity or frequency, Lets call the script.

Script Inputs:
Example:

Enter value for event_class:
User I/O
Enter value for event_name: db file scattered read
Enter value for begin_snap: 193
Enter value for end_snap: 200
Enter value for dbid: 2896132084

WITH
events AS (
SELECT /*+ MATERIALIZE NO_MERGE */
       SUBSTR(TRIM(h.sql_id||' '||h.program||' '||
       CASE h.module WHEN h.program THEN NULL ELSE h.module END), 1, 128) source,
       h.dbid,
       COUNT(*) samples
  FROM dba_hist_active_sess_history h,
       dba_hist_snapshot s
 WHERE h.wait_class = TRIM('&Event_Class') AND h.event = TRIM('&Event_Name')
   AND h.snap_id BETWEEN &Begin_Snap AND &End_Snap
   AND h.dbid = &dbid
   AND s.snap_id = h.snap_id
   AND s.dbid = h.dbid
   AND s.instance_number = h.instance_number
 GROUP BY
       h.sql_id,
       h.program,
       h.module,
       h.dbid
 ORDER BY
       3 DESC
),
total AS (
SELECT SUM(samples) samples,
       SUM(CASE WHEN ROWNUM > 15 THEN samples ELSE 0 END) others
  FROM events
)
SELECT e.source,
       e.samples,
       ROUND(100 * e.samples / t.samples, 1) percent,
       (SELECT DBMS_LOB.SUBSTR(s.sql_text, 1000, 1) FROM dba_hist_sqltext s WHERE s.sql_id = SUBSTR(e.source, 1, 13) AND s.dbid = e.dbid AND ROWNUM = 1) sql_text
  FROM events e,
       total t
 WHERE ROWNUM  0.1
 UNION ALL
SELECT 'Others',
       others samples,
       ROUND(100 * others / samples, 1) percent,
       NULL sql_text
  FROM total
 WHERE others > 0
   AND ROUND(100 * others / samples, 1) > 0.1;

Below is the output of the query.

SOURCE                                                                      SAMPLES      PERCENT       SQL_TEXT
------------------------------------------------------------------------  ---------- ---------------  -------------------------------
5av23g8w7f3ka sqlplus@dixitlab.localdomain (TNS V1-V3)                          1          45.23       select * from dixit.bigtab
9a7gbkahasj1a sqlplus@dixitlab.localdomain (TNS V1-V4)                          1           5          select * from dixit.gianttable
OTHERS                                                                          14         49.77

Okay so result shows one of the SQL statement with SQL id 5av23g8w7f3ka was responsible for more than 45% of these waits.
So, now you have the evidence and can start troubleshooting this specific statement to reduce the USER IOs.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | 3 Comments »

Auto Stats Gathering in Oracle 12c & Something Interesting :)

Posted by FatDBA on December 14, 2017

Hi Fellas,
Starting from Oracle 12c there is a new feature added which collects the statistics when you perform the Bulk Loads when using any of the two methods:
– CREATE TABLE AS SELECT (CTAS)
– INSERT INTO … SELECT (Into an empty table using DPR or Direct Path Read).

SQL> explain plan for create table dixittab as select * from scottisdead;
Explained.
 
SQL> select * from table(DBMS_XPLAN.DISPLAY);
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 14312189
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |               |   500K|  8812K|   612   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | DIXITTAB      |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |               |   500K|  8812K|   371   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SCOTTISDEAD   |   500K|  8812K|   371   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------
10 rows selected. 

Above in execution plan you’ll see the new operation named “OPTIMIZER STATISTICS GATHERING” at ID 2.
Lets verify if the stats are collected.

SQL> select table_name, last_analyzed from user_tables where table_name = 'DIXITTAB';
 
TABLE_NAME       LAST_ANALYZED
---------------- -------------
DIXITTAB         12-DEC-17

Yup, stats were collected!
Same way stats will be auto collected during the other type of Bulk Load method (INSERT INTO .. SELECT).

There may be times when you want to disable this feature, situations like.
– Long/Huge Insert operations which is taking huge time on STATS GATHERING.
– With an extremely large dataset where you don’t want to collect stats.

In order to achieve that we have the option, with the use of a hint which instructs oracle to not gather table statistics.

SQL> create table dixittab as select /*+NO_GATHER_OPTIMIZER_STATISTICS */* from scottisdead;

Now, something interesting i would like to discuss …..
Is there any other condition when the stats won’t be collected automatically except barring it using NO_GATHER_OPTIMIZER_STATISTICS Hint ?

Lets try to do some conventional bulk loading using INSERT INTO .. SELECT method.
To do some tests – I am intentionally commenting few of the columns both the tables. I’ve commented DATE_VAL column of newly created table TABLE1 and DATE_VALUE of the table selected SAMPLE.


SQL> create table table1 (ident number, date_val date, text_val varchar2(4000));
Table created.


SQL> insert /*+ append */ into table1
(IDENT
--, DATE_VAL
, TEXT_VAL)
SELECT ID
--, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE; 

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------

Plan hash value: 1523099961
-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT   |        |   100K|  2539K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT    | TABLE1 |       |       |            |          |
|   2 |   TABLE ACCESS FULL| SAMPLE |   100K|  2539K|   154   (1)| 00:00:01 |
-----------------------------------------------------------------------------

9 rows selected.

😦 😦 Why, the auto stats gathering behavior not repeated this time ??

This happened because Oracle needs inclusion of all the columns of a table in order to kick in the OPTIMIZER STATISTICS GATHERING operation —> Let me show you what i said in above statement.

SQL> insert /*+ append */ into table1
(IDENT
, DATE_VAL
, TEXT_VAL)
SELECT ID
, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE;  

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------

Plan hash value: 1523099961
-------------------------------------------------------------------------------------------
| Id  | Operation                        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | TABLE1 |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SAMPLE |   100K|  3320K|   154   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------

10 rows selected.

Yes, the the stats were collected this time when we’ve included all the columns of the tables.
I haven’t seen any documentation on this restriction on this new feature of Oracle 12c. Hope Oracle adds this soon this to their documentation πŸ™‚ …..

Hope It Helps!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Optimizer making wrong decisions. Histogram is that you ?

Posted by FatDBA on December 4, 2017

Hey Mates,

Would discuss a scenario that i faced some time back with one of the customer where one of the query (Part of Monthly application maintenance) started taking huge amount of time to complete.
During the analysis I’ve found it doing some extremely expensive Full Table Scans on a fat table when it shouldn’t. Let me elaborate what was the issue.

After reading the execution plan and the extended traces of Optimizer (10053) to understand the decisions made by the CBO.

Traces gave me a hint that the optimizer was able to accurately estimate the cardinality for all of the predicates as long as string is 32 bytes or less. But at one of the operation step in execution plan the estimations were wrong and crooked. I saw the CBO decided to ‘Return each of the row’ irrespective of strings and it was actually happening because the histogram contains only the first 32 characters of the string data.

So the histogram was considering only first 32 characters of the column value and the optimizer this particular value i.e “siebelm01-vip@2002:101:11f::101:1” is happening 5776 times in the table. So it ignored the index and used full table scan. Oracle will only consider the first 32 characters in the column in generating histograms and in such cases what we faced as a aftermath, the optimizer will make wrong decisions.

So as a immediate solution to the problem I’ve deleted the histogram on this specific column and this plan fixed this query plan change issue.

BEGIN
dbms_stats.delete_column_stats(ownname=>'SIEBELADMIN', tabname=>'TX2121', colname=>'C131HJA12', col_stat_type=>'HISTOGRAM');
END;
/

Other fixes:
– In case of longer strings (32 or more) in case of text search use Oracle Text or CONTAINS operator in SQL logic.

Hope That Helps!
Prashant Dixit

Posted in Advanced | Tagged: , , | Leave a Comment »

How to use “SQL Patch” technique to fix Query performance issues.

Posted by FatDBA on November 30, 2017

Hi Guys,

In my last post “How to fix SQL Plan issues using OUTLINE DATA.” where we learned how to fix wrong cost estimates by the CBO using OUTLINE DATA of optimal plan available. But there are few issues with the approach and only worked with ad-hoc statements, as failed to handle situations when

– How to deal with situations when the SQL is coming from bundled applications and cannot be modified ?
– How to look for less number of hints leading to better execution plan?
– How, if it’s not possible to use the SQL BASELINES.

As discussed in the same last post where we added the full list of OUTLINE DATA, it seems little complicated, dirty and disordered to add the full outline set to the SQL statement.
ex: select /*+ …full outline here…*/

So, how to add required set of HINTS to force the better plan for the SQL.

Lets do a test to explain this subject in detail.

SQL TEXT: select count(*) from dixit.test;
Problem Statement: Query taking huge time to complete the COUNT of rows for TEST table and doing a FULL TABLE SCAN despite of Indexes.

Now after querying the DBA_HIST_SQLSTAT view i saw a total of 2 execution plans available and query is somehow picking the wrong plan or the costliest plan.
So, lets grab the OUTLINE data of the better plan first.

select * from table(dbms_xplan.display_awr(‘SQL_ID’, PHV, format => ‘ADVANCED’));
OR
EXPLAIN PLAN for select count(*) from dixit.test;
select * from table(dbms_xplan.display(format=>’+OUTLINE’));

Outline Data
————-
/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@"SEL$1" "TEST"@"SEL$1")
OUTLINE_LEAF(@"SEL$1")
ALL_ROWS
DB_VERSION('11.2.0.4')
OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

Lets create the SQL PATCH for the statement to force the SQL to use it with required number of HINTS.

begin
dbms_sqldiag_internal.i_create_patch(
sql_text => 'select count(*) from dixit.test',
hint_text => 'INDEX_RS_ASC(@"SEL$1" "STAGE"@"SEL$1")',
name => 'test_sql_patch_dixit');
end; /

I ran the query once again and as expected it ran this time with very less elapsed time and used Index Scan in place of expensive FTS.
I also noticed a new NOTE coming just below the execution plan.

Note
--------
- SQL patch "test_parallel_patch" used for this statement

In case if want to delete the patch.

begin
DBMS_SQLDIAG.DROP_SQL_PATCH(name => 'test_sql_patch_dixit');
end;
/


HOPE IT HELPS!
Prashant Dixit

Posted in Advanced | Tagged: , , , | Leave a Comment »

How to fix SQL Plan issues using OUTLINE DATA.

Posted by FatDBA on November 30, 2017

Hi Mates,
Today would like to discuss about the ‘SQL Plan Stability’, specially situations when the performance of a SQL statement degrades and performs poor in one environment and working absolutely good in other spheres.
I experienced such issues a lot during past few years while working as a performance consultant for many of the projects, but recently i encountered similar issue and would like to discuss about the same.
In short the situation is given below for that problematic SQL

Same request they say takes in SIT 268 sec:
β€’ I run in SIT again –> 500 secs
β€’ I run also in DEV –> 20 secs
β€’ I run in preSIT –> 5 secs

Below are the details for the given Ill-SQL.

SQL Text:
SELECT a.port_inst_id AS mdf_es_port_inst_id, a.status AS mdf_es_port_status, d.port_inst_id AS mdf_ls_port_inst_id, d.status AS mdf_ls_port_status, regexp_substr( c.descr, :"SYS_B_00", :"SYS_B_01", :"SYS_B_02" ) AS naming_area, a.status AS mdf_es_port_status2, DECODE( a.reservation_inst_id, NULL, :"SYS_B_03", :"SYS_B_04" ) AS mdf_es_active_reservation, a.port_access_id AS lic_identity, egi_get_uda_value( a.equip_inst_id, :"SYS_B_05", :"SYS_B_06", :"SYS_B_07" ) AS kc, b.name AS coordinate, a.port_hum_id egi_port_name, egi_get_uda_value( d.equip_inst_id, :"SYS_B_08", :"SYS_B_09", :"SYS_B_10" ) AS cabinet_name, e.name AS termination_block_name, d.port_access_ id AS pair_name, egi_get_uda_value( d.equip_inst_id, :"SYS_B_11", :"SYS_B_12", :"SYS_B_13" ) AS signing_system, egi_get_uda_value( a.port_inst_id, :"SYS_B_14", :"SYS_B_15", :"SYS_B_16" ) AS processor_exchange, a.circ_path_inst_id AS es_circ_path_inst_id, a.next_path_inst_id AS es_next_path_inst_id, d.circ_path_inst_id AS ls_circ_path_inst_id, d.next_path_inst_id AS ls_next_path_inst_id, DECODE( d.role, :"SYS_B_17", :"SYS_B_18", :"SYS_B_19", :"SYS_B_20", :"SYS_B_21", :"SYS_B_22", :"SYS_B_23", :"SYS_B_24", NULL ) AS cabling_type FROM epa a, card_inst b, equip_inst c, epa d, card_inst e, xxx_xxxx_xxxxxxxx f WHERE a.port_inst_id = f.port_inst_id AND f.val_attr_inst_id = ( SELECT val_attr_inst_id FROM xxx_xxxx_xxxxxxxx WHERE group_name = :"SYS_B_25" AND attr_name = :"SYS_B_26" ) AND a.card_inst_id = b.card_inst_id AND b.type = :"SYS_B_27" AND a.equip_inst_id = c.equip_inst_id AND a.z_wired_port_inst_id = d.port_inst_id (+) AND d.card_inst_id = e.card_inst_id (+) AND regexp_substr( a.port_access_id, :"SYS_B_28", :"SYS_B_29", :"SYS_B_30" ) = :"SYS_B_31" AND to_number(regexp_substr( a.port_access_id, :"SYS_B_32", :"SYS_B_33", :"SYS_B_34" ) ) >= to_number(: "SYS_B_35") AND to_number(regexp_substr( a.port_access_id, :"SYS_B_36", :"SYS_B_37", :"SYS_B_38" ) ) <= to_number(:"SYS_B_39") AND f.attr_value = :"SYS_B_40" ORDER BY :"SYS_B_41"

I see around 5 different PHV’s or plans coming for this statement and last two 18733286 and 2202641467 are the worst plans here, first one with PHV 1663132818 is the BEST among all others.
1663132818 is with very less elapsed time, CPU Usage, IO time, LIOs, Disk reads, Rows Processed and Cost.
** From DBA_HIST_SQLSTAT view.

Let’s collect the OUTLINE and QUERY Block details of the statement, and this can be collected using
select * from table(dbms_xplan.display_awr(‘SQL_ID’, PHV, format => ‘ADVANCED’));

Query Block Name / Object Alias (identified by operation id):
————————————————————-

1 - SEL$841DDE77
10 - SEL$841DDE77 / VAL_ATTR_NAME@SEL$2
11 - SEL$841DDE77 / VAL_ATTR_NAME@SEL$2
12 - SEL$841DDE77 / F@SEL$1
13 - SEL$841DDE77 / F@SEL$1
14 - SEL$841DDE77 / A@SEL$1
15 - SEL$841DDE77 / A@SEL$1
16 - SEL$841DDE77 / C@SEL$1
17 - SEL$841DDE77 / C@SEL$1
18 - SEL$841DDE77 / D@SEL$1
19 - SEL$841DDE77 / D@SEL$1
20 - SEL$841DDE77 / E@SEL$1
21 - SEL$841DDE77 / E@SEL$1
22 - SEL$841DDE77 / B@SEL$1
23 - SEL$841DDE77 / B@SEL$1

Outline Data
————-

/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
DB_VERSION('12.1.0.2')
OPT_PARAM('optimizer_index_cost_adj' 10)
FIRST_ROWS(1)
OUTLINE_LEAF(@"SEL$841DDE77")
MERGE(@"SEL$683B0107")
OUTLINE(@"SEL$C772B8D1")
UNNEST(@"SEL$2")
OUTLINE(@"SEL$683B0107")
OUTLINE(@"SEL$7511BFD2")
OUTLINE(@"SEL$2")
OUTLINE(@"SEL$1")
INDEX_RS_ASC(@"SEL$841DDE77" "VAL_ATTR_NAME"@"SEL$2" ("VAL_ATTR_NAME"."GROUP_NAME" "VAL_ATTR_NAME"."ATTR_NAME"))
INDEX_RS_ASC(@"SEL$841DDE77" "F"@"SEL$1" ("PORT_ATTR_SETTINGS"."ATTR_VALUE"))
BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$841DDE77" "F"@"SEL$1")
INDEX_RS_ASC(@"SEL$841DDE77" "A"@"SEL$1" ("EPA"."PORT_INST_ID"))
INDEX_RS_ASC(@"SEL$841DDE77" "C"@"SEL$1" ("EQUIP_INST"."EQUIP_INST_ID"))
INDEX_RS_ASC(@"SEL$841DDE77" "D"@"SEL$1" ("EPA"."PORT_INST_ID"))
INDEX_RS_ASC(@"SEL$841DDE77" "E"@"SEL$1" ("CARD_INST"."CARD_INST_ID"))
INDEX(@"SEL$841DDE77" "B"@"SEL$1" ("CARD_INST"."CARD_INST_ID"))
LEADING(@"SEL$841DDE77" "VAL_ATTR_NAME"@"SEL$2" "F"@"SEL$1" "A"@"SEL$1" "C"@"SEL$1" "D"@"SEL$1" "E"@"SEL$1"
"B"@"SEL$1")
USE_NL(@"SEL$841DDE77" "F"@"SEL$1")
USE_NL(@"SEL$841DDE77" "A"@"SEL$1")
USE_NL(@"SEL$841DDE77" "C"@"SEL$1")
USE_NL(@"SEL$841DDE77" "D"@"SEL$1")
USE_NL(@"SEL$841DDE77" "E"@"SEL$1")
USE_NL(@"SEL$841DDE77" "B"@"SEL$1")
NLJ_BATCHING(@"SEL$841DDE77" "B"@"SEL$1")
END_OUTLINE_DATA
*/

Let’s use the outline of the good plan as a hint to force the good execution plan to be used and check the execution time for the query.
Run the query from SQLPLUS using the hint:

SELECT
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE(‘12.1.0.2’)
DB_VERSION(‘12.1.0.2’)
OPT_PARAM(‘optimizer_index_cost_adj’ 10)
FIRST_ROWS(1)
OUTLINE_LEAF(@”SEL$841DDE77″)
MERGE(@”SEL$683B0107″)
OUTLINE(@”SEL$C772B8D1″)
UNNEST(@”SEL$2″)
OUTLINE(@”SEL$683B0107″)
OUTLINE(@”SEL$7511BFD2″)
OUTLINE(@”SEL$2″)
OUTLINE(@”SEL$1″)
INDEX_RS_ASC(@”SEL$841DDE77″ “VAL_ATTR_NAME”@”SEL$2” (“VAL_ATTR_NAME”.”GROUP_NAME” “VAL_ATTR_NAME”.”ATTR_NAME”))
INDEX_RS_ASC(@”SEL$841DDE77″ “F”@”SEL$1” (“PORT_ATTR_SETTINGS”.”ATTR_VALUE”))
BATCH_TABLE_ACCESS_BY_ROWID(@”SEL$841DDE77″ “F”@”SEL$1″)
INDEX_RS_ASC(@”SEL$841DDE77” “A”@”SEL$1” (“EPA”.”PORT_INST_ID”))
INDEX_RS_ASC(@”SEL$841DDE77″ “C”@”SEL$1” (“EQUIP_INST”.”EQUIP_INST_ID”))
INDEX_RS_ASC(@”SEL$841DDE77″ “D”@”SEL$1” (“EPA”.”PORT_INST_ID”))
INDEX_RS_ASC(@”SEL$841DDE77″ “E”@”SEL$1” (“CARD_INST”.”CARD_INST_ID”))
INDEX(@”SEL$841DDE77″ “B”@”SEL$1” (“CARD_INST”.”CARD_INST_ID”))
LEADING(@”SEL$841DDE77″ “VAL_ATTR_NAME”@”SEL$2” “F”@”SEL$1” “A”@”SEL$1” “C”@”SEL$1” “D”@”SEL$1” “E”@”SEL$1”
“B”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “F”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “A”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “C”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “D”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “E”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “B”@”SEL$1″)
NLJ_BATCHING(@”SEL$841DDE77” “B”@”SEL$1”)
END_OUTLINE_DATA
*/

A.PORT_INST_ID AS MDF_ES_PORT_INST_ID, A.STATUS AS MDF_ES_PORT_STATUS, D.PORT_INST_ID AS MDF_LS_PORT_INST_ID, D.STATUS AS MDF_LS_PORT_STATUS, REGEXP_SUBSTR(C.DESCR,'[^ ]+’, 1,1) AS NAMING_AREA, A.STATUS AS MDF_ES_PORT_STATUS2, DECODE(A.RESERVATION_INST_ID, NULL, ‘N’, ‘Y’) AS MDF_ES_ACTIVE_RESERVATION, A.PORT_ACCESS_ID AS LIC_IDENTITY, EGI_GET_UDA_VALUE(A.EQUIP_INST_ID,’EQUIPMENT’,’Eqp Legacy Names’,’TAD-N’) AS KC, B.NAME AS COORDINATE, A.PORT_HUM_ID EGI_PORT_NAME, EGI_GET_UDA_VALUE(D.EQUIP_INST_ID,’EQUIPMENT’,’Eqp Legacy Names’,’TAD-N’) AS CABINET_NAME, E.NAME AS TERMINATION_BLOCK_NAME, D.PORT_ACCESS_ID AS PAIR_NAME, EGI_GET_UDA_VALUE(D.EQUIP_INST_ID,’EQUIPMENT’,’Eqp Legacy Names’,’SIGNSYST’) AS SIGNING_SYSTEM, EGI_GET_UDA_VALUE(A.PORT_INST_ID,’PORT’,’Copper Port Info’,’Processor Exchange’) AS PROCESSOR_EXCHANGE, A.CIRC_PATH_INST_ID AS ES_CIRC_PATH_INST_ID, A.NEXT_PATH_INST_ID AS ES_NEXT_PATH_INST_ID, D.CIRC_PATH_INST_ID AS LS_CIRC_PATH_INST_ID, D.NEXT_PATH_INST_ID AS LS_NEXT_PATH_INST_ID, DECODE(D.ROLE,’Primary’,’P’,’Secondary’,’S’,’Spare’,’R’,’Junction’,’F’,NULL) AS CABLING_TYPE FROM EPA A, CARD_INST B, EQUIP_INST C, EPA D, CARD_INST E, PORT_ATTR_SETTINGS F WHERE A.PORT_INST_ID =F.PORT_INST_ID AND F.VAL_ATTR_INST_ID= (SELECT VAL_ATTR_INST_ID FROM VAL_ATTR_NAME WHERE GROUP_NAME=’Copper Port Info’ AND ATTR_NAME =’Processor Exchange’ ) AND A.CARD_INST_ID =B.CARD_INST_ID AND B.TYPE =’ES’ AND A.EQUIP_INST_ID =C.EQUIP_INST_ID AND A.Z_WIRED_PORT_INST_ID =D.PORT_INST_ID(+) AND D.CARD_INST_ID =E.CARD_INST_ID(+) AND REGEXP_SUBSTR(A.PORT_ACCESS_ID,'[^-]+’, 1,1) =:B4 AND TO_NUMBER(REGEXP_SUBSTR(A.PORT_ACCESS_ID,'[^-]+’, 1,2))>=TO_NUMBER(:B3 ) AND TO_NUMBER(REGEXP_SUBSTR(A.PORT_ACCESS_ID,'[^-]+’, 1,2))<=TO_NUMBER(:B2 ) AND F.ATTR_VALUE =:B1 ORDER BY 8

I’ve executed the query in SIT and it’s taking no more than 5 seconds now with those OUTLINES, that is reeeeally good 😊

Now when the best plan is identified and tested, Question comes “HOW TO FIX THIS PERMANENTLY ??”

This can be done using one of the Oracle provided script named “coe_xfr_sql_profile.sql” which is part of popular SQLTXPLAIN/SQLT tool.
Download it from Metalink website (Direct Link: https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=458593899434965&parent=DOCUMENT&sourceId=1955195.1&id=215187.1&_afrWindowMode=0&_adf.ctrl-state=11co9htqxp_387#aref_section13)

In order to fix or correct the cost estimates for CBO to adopt a good execution plan we can follow below steps.
β€’ Run the script coe_xfr_sql_profile.sql as SYSDBA user providing the sql_id and the good Plan Hash Value (PHV)
SQL> START coe_xfr_sql_profile.sql SQL_ID PHV

β€’ Above step generates a script named in the format: “coe_xfr_sql_profile_SQL_ID_PLAN_HASH_VALUE.sql” i.e. including the sql_id and plan hash value.
Example: “coe_xfr_sql_profile_ SQLID_PHV.sql”

β€’ If you like to use the profile for all literals, you can modify the force_match parameter in the generated script from false to true: force_match => TRUE
This will ensure the profile will be enabled even when different literals are used in the SQL query.

β€’ Run the generated script as SYSDBA user in order to correct the optimizer cost estimates used in the execution plan for the sql_id and encourage it to use the desired plan obtained from the stated plan hash value.

β€’ Next we can check if profile is accepted successfully
select SQL_ID, SQL_PROFILE,PLAN_HASH_VALUE from V$SQL where SQL_ID=’ SQL ID’;

β€’ In case if profile is required to be disabled/dropped then do following steps.
The profile can be disabled or dropped as follows using the the SQL_PROFILE name returned from the query above:

EXEC DBMS_SQLTUNE.ALTER_SQL_PROFILE(Name => ”, Attribute_Name => ‘STATUS’, Value => ‘DISABLED’);

To Drop the Profile:
EXEC DBMS_SQLTUNE.DROP_SQL_PROFILE(Name => ”);

This way we can test and fix a Plan Stability issues with any SQL statement!

Hope It Helps!
Prashant Dixit

Posted in Advanced | Tagged: , , | 3 Comments »

Are the Cardinality Estimates Correct in my Execution Plan ?

Posted by FatDBA on September 26, 2017

Struck in a difficult performance issue related with a SQL and you have to verify if the Cardinality estimates made by the MIGHTY CBO are correct, No idea how t0 do that 😦 😦

Lets things make little easy for ourselves!
Let me take an example and explain how to do this.

SQL Statement (From my Personal Test Environment):
SELECT COUNT (DISTINCT SB_NO) FROM OPS$EXP.C_AL_SB WHERE SB_NO IN (SELECT DISTINCT SB_NO FROM OPS$EXP.C_AL_AWB WHERE EGM_DT BETWEEN :1 AND :1 ) AND ERR_MESG =’S’

Below is the execution plan for the SQL (Lets forgot about the behemoth elapsed time and Cost and Rows Processed in the plan for a minute πŸ™‚ ) ….

So the above plan doesn’t show any estimations or Cardinality details what it considered during the creation of the plan, But starting from 10g we have GATHER_PLAN_STATISTICS hint. The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement.

These execution statistics are then shown next to the original Optimizer estimates in the execution plan if you use the function DBMS_XPLAN.DISPLAY_CURSOR to display the plan. You also have to set the FORMAT parameter to ‘ALLSTATS LAST’ (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>’ALLSTATS LAST’)).

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT (DISTINCT SB_NO) FROM OPS$EXP.C_AL_SB WHERE SB_NO IN (SELECT DISTINCT SB_NO FROM OPS$EXP.C_AL_AWB WHERE EGM_DT BETWEEN :1 AND :1 ) AND ERR_MESG =’S’;

The execution plan for the query is as follows:

The original Optimizer estimates are shown in the E-Rows column while the actual statistics gathered during execution are shown in the A-Rows column.

Posted in Advanced | Tagged: | 2 Comments »

 
%d bloggers like this: