Tales From A Lazy Fat DBA

Den of an Oracle DB Performance freak & a fan of PostgreSQL & Cassandra …. \,,/

  • Likes

    • 145,586
  • Archives

  • Categories

  • Subscribe

  • Advertisements

Posts Tagged ‘Tuning’

Parse CPU to Parse Elapsd % – Lets clear the Perplex!

Posted by FatDBA on January 26, 2018

Happy Republic Day!

 

Hi Mates,

I see a lot of confusion, mix-ups and perplexity in between the DBAs on few of the metrics under ‘Instance Efficiency % section’ in AWR reports. Specially for one of the metric “Parse CPU to Parse Elapsd %”. In today’s post i will try to explain the metrics in detail which will help you to understand it in depth and clear the muddiness.

Few of the lines you might have read about this metric i.e.
“we should always look for as low as possible numbers for this metric …”
“Try to achieve the impossible value of zero for this one “

– Even some of the Metalink notes are misleading too.

Scenario:

Below is the snippet from one of my test box.

Note: Just taking a look at the instance efficiency ratios can be very dangerous and i advise to first start with Load Profile, top 5 waits and there on …

Okay, so my definition of this statistic differ from what you judge after reading the name of the metric – This datum signals the delay/wait in parsing of SQL queries during the snap interval.
In our example the value is 1.37% this means that for every CPU second spend parsing we spent about 72.99 (100/1.37) Seconds of clock time. It can happen due to various reasons i.e Latch or any contention between the sessions etc.

The ideal value for this stat should also be 100% like rest of the ratios (Yes, that’s correct!!).

Let’s see how this value was calculated, what all it considers while deducing that final figure of 1.37%.
It takes the “parse time cpu/parse time elapsed * 100” to get the ‘Parse CPU to Parse Elapsed’ figure.

 

Statistic	                              Total	per Second     per Trans
------------------------------------------------------------------------------------------------
parse time cpu                                  398	      0.11	0.01
parse time elapsed                           29,055	      8.07	0.81

 

So, in short each time there is a drop in this metric from 100%, means the database was waiting for something which slowed down the parse times.
If you want to dig in deep then you should trace the session using 10046 tracing with level 8 to see where the other % of parse time is being spent and leaving this for readers to test.

 

Hope It Helps
Prashant Dixit

Advertisements

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Shared Pool Management in 12c, What’s new!

Posted by FatDBA on January 17, 2018

Hi Everyone,

Few days while working on Shared Pool issue where we were getting ORA-4031 on one of the 11gR2 database i discovered something interesting and new regarding the SGA duration management. Here i wont discuss the problem that i faced and how we fixed but would try to show the architectural changes that has happened with 12c that can now fix these errors/issues.

Let me explain what are ‘Durations’ first. The shared pool is made up of a number of granules. The shared pool then split into sub-pools if you have a large enough SGA, and each sub-pool consists of a number of non-overlapping granules. In 11g each sub-pool also split into four sub-sub-pools known as durations.

What was there before 12c arrived
Starting from Oracle 10g each sub-pool in SGA was divided in to four durations.
Let’s check the distribution by generating the Heap Dump for shared pool, here i used oradebug with level 2 (This provides you the full SGA Summary or you can try with level 2050 to get full summary with contents).

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump heapdump 2
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_8127.trc

.....
******************************************************
HEAP DUMP heap name="sga heap(1,0)"  desc=380030610
Total heap size    =218102664
Total free space   =  1066928
Total reserved free space   =  8439520
Unpinned space     = 38812528  rcr=11971 trn=17906
Permanent space    =208595160
HEAP DUMP heap name="sga heap(1,1)"  desc=380031e68
Total heap size    = 67108512
Total free space   =  2912528
Total reserved free space   =  1382816
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,2)"  desc=3800336c0
Total heap size    =167771280
Total free space   = 92743480
Total reserved free space   =  3852856
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,3)"  desc=380034f18
Total heap size    =268434048
Total free space   = 74547592
Total reserved free space   = 13497472
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,0)"  desc=380039e38
Total heap size    =201325536
Total free space   =    17200
Total reserved free space   =  8435920
Unpinned space     = 26474112  rcr=7934 trn=8094
Permanent space    =192871456
HEAP DUMP heap name="sga heap(2,1)"  desc=38003b690
Total heap size    = 83885640
Total free space   = 48723768
Total reserved free space   =  1035792
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,2)"  desc=38003cee8
Total heap size    =369096816
Total free space   =258674312
Total reserved free space   = 16982464
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,3)"  desc=38003e740
Total heap size    =218102664
Total free space   = 17202608
Total reserved free space   = 10966696
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,0)"  desc=380043660
Total heap size    =184548408
Total free space   =    13008
Total reserved free space   =  5061928
Unpinned space     = 26943408  rcr=4930 trn=9425
Permanent space    =179472608
HEAP DUMP heap name="sga heap(3,1)"  desc=380044eb8
Total heap size    = 67108512
Total free space   = 27568352
Total reserved free space   =     4744
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,2)"  desc=380046710
Total heap size    =352319688
Total free space   =233302736
Total reserved free space   = 15981216
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,3)"  desc=380047f68
Total heap size    =385873944
Total free space   =143746536
Total reserved free space   = 19402616
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
.....
******************************************************

So above stats shows that we have three sub-pools [Sub pool (1,0), (1,1), (1,2), (1,3) ….. (3,0), (3,1), (3,2), (3,3)] of SGA Heaps with Four Durations each heap. And every duration has its own size, free space and reserved free space. This type of distribution possibly causes the ORA 4031 even when you have enough free space in other durations and this is what the actual cause in my earlier case, but here we won’t discuss how we fixed that.

Okay so now lets do the same with 12c database. Lets generate the Heap Dump for SGA in 12c database and see the distributions of durations here. Once again we will use the oradebug to dump heapdump with Level 2.

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump heapdump 2
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/tunedb/tunedb/trace/tunedb_ora_11054.trc


******************************************************
HEAP DUMP heap name="sga heap(1,0)"  desc=0x60103678
 extent sz=0xfe0 alt=304 het=32767 rec=9 flg=0x82 opc=0
 parent=(nil) owner=(nil) nex=(nil) xsz=0x1000000 heap=(nil)
 fl2=0x24, nex=(nil), idx=1, dsxvers=1, dsxflg=0x0
 dsx first ext=0x73000000
 dsx empty ext bytes=0  subheap rc link=0x730000c0,0x730000c0

******************************************************
HEAP DUMP heap name="sga heap(1,3)"  desc=0x60107f80
 extent sz=0xfe0 alt=304 het=32767 rec=9 flg=0x82 opc=0
 parent=(nil) owner=(nil) nex=(nil) xsz=0x1000000 heap=(nil)
 fl2=0x24, nex=(nil), idx=1, dsxvers=1, dsxflg=0x0
 dsx first ext=0x7e000000
 dsx empty ext bytes=0  subheap rc link=0x7e0000c0,0x7e0000c0

Alright, so here we only have two groups of SGA durations – ‘Sub pool 1, duration 0’ and ‘Sub Pool 1 duration 3’ for improved sharability and to avoid ORA 4031 errors.

Hope It helps
Prashant Dixit

Posted in Advanced | Tagged: , , , | Leave a Comment »

Script to measure the Source & Contribution of any Wait Event within AWR snapshots.

Posted by FatDBA on January 1, 2018

🤘 Happy New Year Everyone! 🤘

So, here i am with the maiden post of this year.
Today i will discuss about a script that might be handy and useful while you are investigating any performance problem with the database.
In fact this is one of my favorite script which i always considers to run at a point when we have identified the leading wait event and the time-frame of the spike or for any transient variation in system behavior. This script helps you to measure the source and contribution or the impact of any specific wait event.

The script is a join between ASH views and historical views (dba_hist_active_sess_history AND dba_hist_snapshot) and provides you stats based on Average Number of Active Sessions (AAS) were waiting on this event during the period and with what total contribution in terms of percentage.

Lets work on a use case.
Suppose you have a system with huge User IOs happening, as conformed by any report or tool (AWR, ASH, Via any Script, OEM etc.)

For example using AWR we found one of the User IO wait class event ‘db file scattered read’ was leading the ‘Top Foreground Wait Events’ charts on a specific time when we have the high User IO load on system.

                                           Total Wait       Avg   % DB Wait
Event                                Waits Time (sec)      Wait   time Class
------------------------------ ----------- ---------- --------- ------ --------
db file scattered read              10,282       82.8    41.50ms  43.9 User I/O

So, now you have identified the leading wait event and the timeframe with maximum intensity or frequency, Lets call the script.

Script Inputs:
Example:

Enter value for event_class:
User I/O
Enter value for event_name: db file scattered read
Enter value for begin_snap: 193
Enter value for end_snap: 200
Enter value for dbid: 2896132084

WITH
events AS (
SELECT /*+ MATERIALIZE NO_MERGE */
       SUBSTR(TRIM(h.sql_id||' '||h.program||' '||
       CASE h.module WHEN h.program THEN NULL ELSE h.module END), 1, 128) source,
       h.dbid,
       COUNT(*) samples
  FROM dba_hist_active_sess_history h,
       dba_hist_snapshot s
 WHERE h.wait_class = TRIM('&Event_Class') AND h.event = TRIM('&Event_Name')
   AND h.snap_id BETWEEN &Begin_Snap AND &End_Snap
   AND h.dbid = &dbid
   AND s.snap_id = h.snap_id
   AND s.dbid = h.dbid
   AND s.instance_number = h.instance_number
 GROUP BY
       h.sql_id,
       h.program,
       h.module,
       h.dbid
 ORDER BY
       3 DESC
),
total AS (
SELECT SUM(samples) samples,
       SUM(CASE WHEN ROWNUM > 15 THEN samples ELSE 0 END) others
  FROM events
)
SELECT e.source,
       e.samples,
       ROUND(100 * e.samples / t.samples, 1) percent,
       (SELECT DBMS_LOB.SUBSTR(s.sql_text, 1000, 1) FROM dba_hist_sqltext s WHERE s.sql_id = SUBSTR(e.source, 1, 13) AND s.dbid = e.dbid AND ROWNUM = 1) sql_text
  FROM events e,
       total t
 WHERE ROWNUM  0.1
 UNION ALL
SELECT 'Others',
       others samples,
       ROUND(100 * others / samples, 1) percent,
       NULL sql_text
  FROM total
 WHERE others > 0
   AND ROUND(100 * others / samples, 1) > 0.1;

Below is the output of the query.

SOURCE                                                                      SAMPLES      PERCENT       SQL_TEXT
------------------------------------------------------------------------  ---------- ---------------  -------------------------------
5av23g8w7f3ka sqlplus@dixitlab.localdomain (TNS V1-V3)                          1          45.23       select * from dixit.bigtab
9a7gbkahasj1a sqlplus@dixitlab.localdomain (TNS V1-V4)                          1           5          select * from dixit.gianttable
OTHERS                                                                          14         49.77

Okay so result shows one of the SQL statement with SQL id 5av23g8w7f3ka was responsible for more than 45% of these waits.
So, now you have the evidence and can start troubleshooting this specific statement to reduce the USER IOs.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | 3 Comments »

Auto Stats Gathering in Oracle 12c & Something Interesting :)

Posted by FatDBA on December 14, 2017

Hi Fellas,
Starting from Oracle 12c there is a new feature added which collects the statistics when you perform the Bulk Loads when using any of the two methods:
– CREATE TABLE AS SELECT (CTAS)
– INSERT INTO … SELECT (Into an empty table using DPR or Direct Path Read).

SQL> explain plan for create table dixittab as select * from scottisdead;
Explained.
 
SQL> select * from table(DBMS_XPLAN.DISPLAY);
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 14312189
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |               |   500K|  8812K|   612   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | DIXITTAB      |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |               |   500K|  8812K|   371   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SCOTTISDEAD   |   500K|  8812K|   371   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------
10 rows selected. 

Above in execution plan you’ll see the new operation named “OPTIMIZER STATISTICS GATHERING” at ID 2.
Lets verify if the stats are collected.

SQL> select table_name, last_analyzed from user_tables where table_name = 'DIXITTAB';
 
TABLE_NAME       LAST_ANALYZED
---------------- -------------
DIXITTAB         12-DEC-17

Yup, stats were collected!
Same way stats will be auto collected during the other type of Bulk Load method (INSERT INTO .. SELECT).

There may be times when you want to disable this feature, situations like.
– Long/Huge Insert operations which is taking huge time on STATS GATHERING.
– With an extremely large dataset where you don’t want to collect stats.

In order to achieve that we have the option, with the use of a hint which instructs oracle to not gather table statistics.

SQL> create table dixittab as select /*+NO_GATHER_OPTIMIZER_STATISTICS */* from scottisdead;

Now, something interesting i would like to discuss …..
Is there any other condition when the stats won’t be collected automatically except barring it using NO_GATHER_OPTIMIZER_STATISTICS Hint ?

Lets try to do some conventional bulk loading using INSERT INTO .. SELECT method.
To do some tests – I am intentionally commenting few of the columns both the tables. I’ve commented DATE_VAL column of newly created table TABLE1 and DATE_VALUE of the table selected SAMPLE.


SQL> create table table1 (ident number, date_val date, text_val varchar2(4000));
Table created.


SQL> insert /*+ append */ into table1
(IDENT
--, DATE_VAL
, TEXT_VAL)
SELECT ID
--, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE; 

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------

Plan hash value: 1523099961
-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT   |        |   100K|  2539K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT    | TABLE1 |       |       |            |          |
|   2 |   TABLE ACCESS FULL| SAMPLE |   100K|  2539K|   154   (1)| 00:00:01 |
-----------------------------------------------------------------------------

9 rows selected.

😦 😦 Why, the auto stats gathering behavior not repeated this time ??

This happened because Oracle needs inclusion of all the columns of a table in order to kick in the OPTIMIZER STATISTICS GATHERING operation —> Let me show you what i said in above statement.

SQL> insert /*+ append */ into table1
(IDENT
, DATE_VAL
, TEXT_VAL)
SELECT ID
, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE;  

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------

Plan hash value: 1523099961
-------------------------------------------------------------------------------------------
| Id  | Operation                        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | TABLE1 |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SAMPLE |   100K|  3320K|   154   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------

10 rows selected.

Yes, the the stats were collected this time when we’ve included all the columns of the tables.
I haven’t seen any documentation on this restriction on this new feature of Oracle 12c. Hope Oracle adds this soon this to their documentation 🙂 …..

Hope It Helps!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Optimizer making wrong decisions. Histogram is that you ?

Posted by FatDBA on December 4, 2017

Hey Mates,

Would discuss a scenario that i faced some time back with one of the customer where one of the query (Part of Monthly application maintenance) started taking huge amount of time to complete.
During the analysis I’ve found it doing some extremely expensive Full Table Scans on a fat table when it shouldn’t. Let me elaborate what was the issue.

After reading the execution plan and the extended traces of Optimizer (10053) to understand the decisions made by the CBO.

Traces gave me a hint that the optimizer was able to accurately estimate the cardinality for all of the predicates as long as string is 32 bytes or less. But at one of the operation step in execution plan the estimations were wrong and crooked. I saw the CBO decided to ‘Return each of the row’ irrespective of strings and it was actually happening because the histogram contains only the first 32 characters of the string data.

So the histogram was considering only first 32 characters of the column value and the optimizer this particular value i.e “siebelm01-vip@2002:101:11f::101:1” is happening 5776 times in the table. So it ignored the index and used full table scan. Oracle will only consider the first 32 characters in the column in generating histograms and in such cases what we faced as a aftermath, the optimizer will make wrong decisions.

So as a immediate solution to the problem I’ve deleted the histogram on this specific column and this plan fixed this query plan change issue.

BEGIN
dbms_stats.delete_column_stats(ownname=>'SIEBELADMIN', tabname=>'TX2121', colname=>'C131HJA12', col_stat_type=>'HISTOGRAM');
END;
/

Other fixes:
– In case of longer strings (32 or more) in case of text search use Oracle Text or CONTAINS operator in SQL logic.

Hope That Helps!
Prashant Dixit

Posted in Advanced | Tagged: , , | Leave a Comment »

How to use “SQL Patch” technique to fix Query performance issues.

Posted by FatDBA on November 30, 2017

Hi Guys,

In my last post “How to fix SQL Plan issues using OUTLINE DATA.” where we learned how to fix wrong cost estimates by the CBO using OUTLINE DATA of optimal plan available. But there are few issues with the approach and only worked with ad-hoc statements, as failed to handle situations when

– How to deal with situations when the SQL is coming from bundled applications and cannot be modified ?
– How to look for less number of hints leading to better execution plan?
– How, if it’s not possible to use the SQL BASELINES.

As discussed in the same last post where we added the full list of OUTLINE DATA, it seems little complicated, dirty and disordered to add the full outline set to the SQL statement.
ex: select /*+ …full outline here…*/

So, how to add required set of HINTS to force the better plan for the SQL.

Lets do a test to explain this subject in detail.

SQL TEXT: select count(*) from dixit.test;
Problem Statement: Query taking huge time to complete the COUNT of rows for TEST table and doing a FULL TABLE SCAN despite of Indexes.

Now after querying the DBA_HIST_SQLSTAT view i saw a total of 2 execution plans available and query is somehow picking the wrong plan or the costliest plan.
So, lets grab the OUTLINE data of the better plan first.

select * from table(dbms_xplan.display_awr(‘SQL_ID’, PHV, format => ‘ADVANCED’));
OR
EXPLAIN PLAN for select count(*) from dixit.test;
select * from table(dbms_xplan.display(format=>’+OUTLINE’));

Outline Data
————-
/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@"SEL$1" "TEST"@"SEL$1")
OUTLINE_LEAF(@"SEL$1")
ALL_ROWS
DB_VERSION('11.2.0.4')
OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

Lets create the SQL PATCH for the statement to force the SQL to use it with required number of HINTS.

begin
dbms_sqldiag_internal.i_create_patch(
sql_text => 'select count(*) from dixit.test',
hint_text => 'INDEX_RS_ASC(@"SEL$1" "STAGE"@"SEL$1")',
name => 'test_sql_patch_dixit');
end; /

I ran the query once again and as expected it ran this time with very less elapsed time and used Index Scan in place of expensive FTS.
I also noticed a new NOTE coming just below the execution plan.

Note
--------
- SQL patch "test_parallel_patch" used for this statement

In case if want to delete the patch.

begin
DBMS_SQLDIAG.DROP_SQL_PATCH(name => 'test_sql_patch_dixit');
end;
/


HOPE IT HELPS!
Prashant Dixit

Posted in Advanced | Tagged: , , , | Leave a Comment »

How to fix SQL Plan issues using OUTLINE DATA.

Posted by FatDBA on November 30, 2017

Hi Mates,
Today would like to discuss about the ‘SQL Plan Stability’, specially situations when the performance of a SQL statement degrades and performs poor in one environment and working absolutely good in other spheres.
I experienced such issues a lot during past few years while working as a performance consultant for many of the projects, but recently i encountered similar issue and would like to discuss about the same.
In short the situation is given below for that problematic SQL

Same request they say takes in SIT 268 sec:
• I run in SIT again –> 500 secs
• I run also in DEV –> 20 secs
• I run in preSIT –> 5 secs

Below are the details for the given Ill-SQL.

SQL Text:
SELECT a.port_inst_id AS mdf_es_port_inst_id, a.status AS mdf_es_port_status, d.port_inst_id AS mdf_ls_port_inst_id, d.status AS mdf_ls_port_status, regexp_substr( c.descr, :"SYS_B_00", :"SYS_B_01", :"SYS_B_02" ) AS naming_area, a.status AS mdf_es_port_status2, DECODE( a.reservation_inst_id, NULL, :"SYS_B_03", :"SYS_B_04" ) AS mdf_es_active_reservation, a.port_access_id AS lic_identity, egi_get_uda_value( a.equip_inst_id, :"SYS_B_05", :"SYS_B_06", :"SYS_B_07" ) AS kc, b.name AS coordinate, a.port_hum_id egi_port_name, egi_get_uda_value( d.equip_inst_id, :"SYS_B_08", :"SYS_B_09", :"SYS_B_10" ) AS cabinet_name, e.name AS termination_block_name, d.port_access_ id AS pair_name, egi_get_uda_value( d.equip_inst_id, :"SYS_B_11", :"SYS_B_12", :"SYS_B_13" ) AS signing_system, egi_get_uda_value( a.port_inst_id, :"SYS_B_14", :"SYS_B_15", :"SYS_B_16" ) AS processor_exchange, a.circ_path_inst_id AS es_circ_path_inst_id, a.next_path_inst_id AS es_next_path_inst_id, d.circ_path_inst_id AS ls_circ_path_inst_id, d.next_path_inst_id AS ls_next_path_inst_id, DECODE( d.role, :"SYS_B_17", :"SYS_B_18", :"SYS_B_19", :"SYS_B_20", :"SYS_B_21", :"SYS_B_22", :"SYS_B_23", :"SYS_B_24", NULL ) AS cabling_type FROM epa a, card_inst b, equip_inst c, epa d, card_inst e, xxx_xxxx_xxxxxxxx f WHERE a.port_inst_id = f.port_inst_id AND f.val_attr_inst_id = ( SELECT val_attr_inst_id FROM xxx_xxxx_xxxxxxxx WHERE group_name = :"SYS_B_25" AND attr_name = :"SYS_B_26" ) AND a.card_inst_id = b.card_inst_id AND b.type = :"SYS_B_27" AND a.equip_inst_id = c.equip_inst_id AND a.z_wired_port_inst_id = d.port_inst_id (+) AND d.card_inst_id = e.card_inst_id (+) AND regexp_substr( a.port_access_id, :"SYS_B_28", :"SYS_B_29", :"SYS_B_30" ) = :"SYS_B_31" AND to_number(regexp_substr( a.port_access_id, :"SYS_B_32", :"SYS_B_33", :"SYS_B_34" ) ) >= to_number(: "SYS_B_35") AND to_number(regexp_substr( a.port_access_id, :"SYS_B_36", :"SYS_B_37", :"SYS_B_38" ) ) <= to_number(:"SYS_B_39") AND f.attr_value = :"SYS_B_40" ORDER BY :"SYS_B_41"

I see around 5 different PHV’s or plans coming for this statement and last two 18733286 and 2202641467 are the worst plans here, first one with PHV 1663132818 is the BEST among all others.
1663132818 is with very less elapsed time, CPU Usage, IO time, LIOs, Disk reads, Rows Processed and Cost.
** From DBA_HIST_SQLSTAT view.

Let’s collect the OUTLINE and QUERY Block details of the statement, and this can be collected using
select * from table(dbms_xplan.display_awr(‘SQL_ID’, PHV, format => ‘ADVANCED’));

Query Block Name / Object Alias (identified by operation id):
————————————————————-

1 - SEL$841DDE77
10 - SEL$841DDE77 / VAL_ATTR_NAME@SEL$2
11 - SEL$841DDE77 / VAL_ATTR_NAME@SEL$2
12 - SEL$841DDE77 / F@SEL$1
13 - SEL$841DDE77 / F@SEL$1
14 - SEL$841DDE77 / A@SEL$1
15 - SEL$841DDE77 / A@SEL$1
16 - SEL$841DDE77 / C@SEL$1
17 - SEL$841DDE77 / C@SEL$1
18 - SEL$841DDE77 / D@SEL$1
19 - SEL$841DDE77 / D@SEL$1
20 - SEL$841DDE77 / E@SEL$1
21 - SEL$841DDE77 / E@SEL$1
22 - SEL$841DDE77 / B@SEL$1
23 - SEL$841DDE77 / B@SEL$1

Outline Data
————-

/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
DB_VERSION('12.1.0.2')
OPT_PARAM('optimizer_index_cost_adj' 10)
FIRST_ROWS(1)
OUTLINE_LEAF(@"SEL$841DDE77")
MERGE(@"SEL$683B0107")
OUTLINE(@"SEL$C772B8D1")
UNNEST(@"SEL$2")
OUTLINE(@"SEL$683B0107")
OUTLINE(@"SEL$7511BFD2")
OUTLINE(@"SEL$2")
OUTLINE(@"SEL$1")
INDEX_RS_ASC(@"SEL$841DDE77" "VAL_ATTR_NAME"@"SEL$2" ("VAL_ATTR_NAME"."GROUP_NAME" "VAL_ATTR_NAME"."ATTR_NAME"))
INDEX_RS_ASC(@"SEL$841DDE77" "F"@"SEL$1" ("PORT_ATTR_SETTINGS"."ATTR_VALUE"))
BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$841DDE77" "F"@"SEL$1")
INDEX_RS_ASC(@"SEL$841DDE77" "A"@"SEL$1" ("EPA"."PORT_INST_ID"))
INDEX_RS_ASC(@"SEL$841DDE77" "C"@"SEL$1" ("EQUIP_INST"."EQUIP_INST_ID"))
INDEX_RS_ASC(@"SEL$841DDE77" "D"@"SEL$1" ("EPA"."PORT_INST_ID"))
INDEX_RS_ASC(@"SEL$841DDE77" "E"@"SEL$1" ("CARD_INST"."CARD_INST_ID"))
INDEX(@"SEL$841DDE77" "B"@"SEL$1" ("CARD_INST"."CARD_INST_ID"))
LEADING(@"SEL$841DDE77" "VAL_ATTR_NAME"@"SEL$2" "F"@"SEL$1" "A"@"SEL$1" "C"@"SEL$1" "D"@"SEL$1" "E"@"SEL$1"
"B"@"SEL$1")
USE_NL(@"SEL$841DDE77" "F"@"SEL$1")
USE_NL(@"SEL$841DDE77" "A"@"SEL$1")
USE_NL(@"SEL$841DDE77" "C"@"SEL$1")
USE_NL(@"SEL$841DDE77" "D"@"SEL$1")
USE_NL(@"SEL$841DDE77" "E"@"SEL$1")
USE_NL(@"SEL$841DDE77" "B"@"SEL$1")
NLJ_BATCHING(@"SEL$841DDE77" "B"@"SEL$1")
END_OUTLINE_DATA
*/

Let’s use the outline of the good plan as a hint to force the good execution plan to be used and check the execution time for the query.
Run the query from SQLPLUS using the hint:

SELECT
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE(‘12.1.0.2’)
DB_VERSION(‘12.1.0.2’)
OPT_PARAM(‘optimizer_index_cost_adj’ 10)
FIRST_ROWS(1)
OUTLINE_LEAF(@”SEL$841DDE77″)
MERGE(@”SEL$683B0107″)
OUTLINE(@”SEL$C772B8D1″)
UNNEST(@”SEL$2″)
OUTLINE(@”SEL$683B0107″)
OUTLINE(@”SEL$7511BFD2″)
OUTLINE(@”SEL$2″)
OUTLINE(@”SEL$1″)
INDEX_RS_ASC(@”SEL$841DDE77″ “VAL_ATTR_NAME”@”SEL$2” (“VAL_ATTR_NAME”.”GROUP_NAME” “VAL_ATTR_NAME”.”ATTR_NAME”))
INDEX_RS_ASC(@”SEL$841DDE77″ “F”@”SEL$1” (“PORT_ATTR_SETTINGS”.”ATTR_VALUE”))
BATCH_TABLE_ACCESS_BY_ROWID(@”SEL$841DDE77″ “F”@”SEL$1″)
INDEX_RS_ASC(@”SEL$841DDE77” “A”@”SEL$1” (“EPA”.”PORT_INST_ID”))
INDEX_RS_ASC(@”SEL$841DDE77″ “C”@”SEL$1” (“EQUIP_INST”.”EQUIP_INST_ID”))
INDEX_RS_ASC(@”SEL$841DDE77″ “D”@”SEL$1” (“EPA”.”PORT_INST_ID”))
INDEX_RS_ASC(@”SEL$841DDE77″ “E”@”SEL$1” (“CARD_INST”.”CARD_INST_ID”))
INDEX(@”SEL$841DDE77″ “B”@”SEL$1” (“CARD_INST”.”CARD_INST_ID”))
LEADING(@”SEL$841DDE77″ “VAL_ATTR_NAME”@”SEL$2” “F”@”SEL$1” “A”@”SEL$1” “C”@”SEL$1” “D”@”SEL$1” “E”@”SEL$1”
“B”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “F”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “A”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “C”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “D”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “E”@”SEL$1″)
USE_NL(@”SEL$841DDE77” “B”@”SEL$1″)
NLJ_BATCHING(@”SEL$841DDE77” “B”@”SEL$1”)
END_OUTLINE_DATA
*/

A.PORT_INST_ID AS MDF_ES_PORT_INST_ID, A.STATUS AS MDF_ES_PORT_STATUS, D.PORT_INST_ID AS MDF_LS_PORT_INST_ID, D.STATUS AS MDF_LS_PORT_STATUS, REGEXP_SUBSTR(C.DESCR,'[^ ]+’, 1,1) AS NAMING_AREA, A.STATUS AS MDF_ES_PORT_STATUS2, DECODE(A.RESERVATION_INST_ID, NULL, ‘N’, ‘Y’) AS MDF_ES_ACTIVE_RESERVATION, A.PORT_ACCESS_ID AS LIC_IDENTITY, EGI_GET_UDA_VALUE(A.EQUIP_INST_ID,’EQUIPMENT’,’Eqp Legacy Names’,’TAD-N’) AS KC, B.NAME AS COORDINATE, A.PORT_HUM_ID EGI_PORT_NAME, EGI_GET_UDA_VALUE(D.EQUIP_INST_ID,’EQUIPMENT’,’Eqp Legacy Names’,’TAD-N’) AS CABINET_NAME, E.NAME AS TERMINATION_BLOCK_NAME, D.PORT_ACCESS_ID AS PAIR_NAME, EGI_GET_UDA_VALUE(D.EQUIP_INST_ID,’EQUIPMENT’,’Eqp Legacy Names’,’SIGNSYST’) AS SIGNING_SYSTEM, EGI_GET_UDA_VALUE(A.PORT_INST_ID,’PORT’,’Copper Port Info’,’Processor Exchange’) AS PROCESSOR_EXCHANGE, A.CIRC_PATH_INST_ID AS ES_CIRC_PATH_INST_ID, A.NEXT_PATH_INST_ID AS ES_NEXT_PATH_INST_ID, D.CIRC_PATH_INST_ID AS LS_CIRC_PATH_INST_ID, D.NEXT_PATH_INST_ID AS LS_NEXT_PATH_INST_ID, DECODE(D.ROLE,’Primary’,’P’,’Secondary’,’S’,’Spare’,’R’,’Junction’,’F’,NULL) AS CABLING_TYPE FROM EPA A, CARD_INST B, EQUIP_INST C, EPA D, CARD_INST E, PORT_ATTR_SETTINGS F WHERE A.PORT_INST_ID =F.PORT_INST_ID AND F.VAL_ATTR_INST_ID= (SELECT VAL_ATTR_INST_ID FROM VAL_ATTR_NAME WHERE GROUP_NAME=’Copper Port Info’ AND ATTR_NAME =’Processor Exchange’ ) AND A.CARD_INST_ID =B.CARD_INST_ID AND B.TYPE =’ES’ AND A.EQUIP_INST_ID =C.EQUIP_INST_ID AND A.Z_WIRED_PORT_INST_ID =D.PORT_INST_ID(+) AND D.CARD_INST_ID =E.CARD_INST_ID(+) AND REGEXP_SUBSTR(A.PORT_ACCESS_ID,'[^-]+’, 1,1) =:B4 AND TO_NUMBER(REGEXP_SUBSTR(A.PORT_ACCESS_ID,'[^-]+’, 1,2))>=TO_NUMBER(:B3 ) AND TO_NUMBER(REGEXP_SUBSTR(A.PORT_ACCESS_ID,'[^-]+’, 1,2))<=TO_NUMBER(:B2 ) AND F.ATTR_VALUE =:B1 ORDER BY 8

I’ve executed the query in SIT and it’s taking no more than 5 seconds now with those OUTLINES, that is reeeeally good 😊

Now when the best plan is identified and tested, Question comes “HOW TO FIX THIS PERMANENTLY ??”

This can be done using one of the Oracle provided script named “coe_xfr_sql_profile.sql” which is part of popular SQLTXPLAIN/SQLT tool.
Download it from Metalink website (Direct Link: https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=458593899434965&parent=DOCUMENT&sourceId=1955195.1&id=215187.1&_afrWindowMode=0&_adf.ctrl-state=11co9htqxp_387#aref_section13)

In order to fix or correct the cost estimates for CBO to adopt a good execution plan we can follow below steps.
• Run the script coe_xfr_sql_profile.sql as SYSDBA user providing the sql_id and the good Plan Hash Value (PHV)
SQL> START coe_xfr_sql_profile.sql SQL_ID PHV

• Above step generates a script named in the format: “coe_xfr_sql_profile_SQL_ID_PLAN_HASH_VALUE.sql” i.e. including the sql_id and plan hash value.
Example: “coe_xfr_sql_profile_ SQLID_PHV.sql”

• If you like to use the profile for all literals, you can modify the force_match parameter in the generated script from false to true: force_match => TRUE
This will ensure the profile will be enabled even when different literals are used in the SQL query.

• Run the generated script as SYSDBA user in order to correct the optimizer cost estimates used in the execution plan for the sql_id and encourage it to use the desired plan obtained from the stated plan hash value.

• Next we can check if profile is accepted successfully
select SQL_ID, SQL_PROFILE,PLAN_HASH_VALUE from V$SQL where SQL_ID=’ SQL ID’;

• In case if profile is required to be disabled/dropped then do following steps.
The profile can be disabled or dropped as follows using the the SQL_PROFILE name returned from the query above:

EXEC DBMS_SQLTUNE.ALTER_SQL_PROFILE(Name => ”, Attribute_Name => ‘STATUS’, Value => ‘DISABLED’);

To Drop the Profile:
EXEC DBMS_SQLTUNE.DROP_SQL_PROFILE(Name => ”);

This way we can test and fix a Plan Stability issues with any SQL statement!

Hope It Helps!
Prashant Dixit

Posted in Advanced | Tagged: , , | 3 Comments »

Are the Cardinality Estimates Correct in my Execution Plan ?

Posted by FatDBA on September 26, 2017

Struck in a difficult performance issue related with a SQL and you have to verify if the Cardinality estimates made by the MIGHTY CBO are correct, No idea how t0 do that 😦 😦

Lets things make little easy for ourselves!
Let me take an example and explain how to do this.

SQL Statement (From my Personal Test Environment):
SELECT COUNT (DISTINCT SB_NO) FROM OPS$EXP.C_AL_SB WHERE SB_NO IN (SELECT DISTINCT SB_NO FROM OPS$EXP.C_AL_AWB WHERE EGM_DT BETWEEN :1 AND :1 ) AND ERR_MESG =’S’

Below is the execution plan for the SQL (Lets forgot about the behemoth elapsed time and Cost and Rows Processed in the plan for a minute 🙂 ) ….

So the above plan doesn’t show any estimations or Cardinality details what it considered during the creation of the plan, But starting from 10g we have GATHER_PLAN_STATISTICS hint. The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement.

These execution statistics are then shown next to the original Optimizer estimates in the execution plan if you use the function DBMS_XPLAN.DISPLAY_CURSOR to display the plan. You also have to set the FORMAT parameter to ‘ALLSTATS LAST’ (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>’ALLSTATS LAST’)).

SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT (DISTINCT SB_NO) FROM OPS$EXP.C_AL_SB WHERE SB_NO IN (SELECT DISTINCT SB_NO FROM OPS$EXP.C_AL_AWB WHERE EGM_DT BETWEEN :1 AND :1 ) AND ERR_MESG =’S’;

The execution plan for the query is as follows:

The original Optimizer estimates are shown in the E-Rows column while the actual statistics gathered during execution are shown in the A-Rows column.

Posted in Advanced | Tagged: | 2 Comments »

How to tune the IO contentions related with the Compaction in Cassandra ?

Posted by FatDBA on August 20, 2017

Hi Fellas,
Back and this time with some performance tuning scopes for Cassandra DB during the ‘Compaction’ process.
Before i proceed, would like to explain a bit about the compaction in Cassandra and what exactly is this and why a necessary evil …

Compaction in Cassandra refers to the operation of merging multiple SSTables into a single new one. Typically, compaction is done in a database for two primary reasons:

– To reduce the storage usage.
– To improve read performance by merging keys and obtaining a consolidated index.

For example, in Apache Cassandra, data files are merged periodically to form compacted SSTables.

There is a good chance of contention happening in database due to Compaction activity as the Compaction increases I/O contention on SSTable data read. Writing data in Cassandra database is generally fast and the write impacts may not be seen but reading data from SSTables will be slow in case when I/O contention increases due to compaction activities and degrades the performance of the database.

First would like to discuss how to identify the compaction related contentions on the database.
– We can use the “nodetool tablestats” or the old “nodetool cfstats” command to
monitor or watch-keep SSTables.
Below is a sample result from one of the Cassandra database server, here we need to check
– Check if the count is keep on growing, because that points out that there may be contention between reading SST
and the compaction process.
– Read generally slows down due to an obvious reason of data distributed or fragmented across many SSTs and
Compaction running continuous in the background.

%nodetool tablestats -H dixit.playlist
Keyspace: dixit
Read Count: 182849
Read Latency: 0.11363755339104945 ms.
Write Count: 435355
Write Latency: 0.01956930550929701 ms.
Pending Flushes: 0
Table: standard1
SSTable count: 2
Space used (live): 51.62 MB
Space used (total): 51.62 MB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 302.36 KB
SSTable Compression Ratio: 0.0
Number of keys (estimate): 376390
Memtable cell count: 200120
Memtable data size: 45.16 MB
Memtable off heap memory used: 0 bytes
Memtable switch count: 2
Local read count: 182849
Local read latency: 0.125 ms
Local write count: 435355
Local write latency: 0.022 ms
Pending flushes: 0
Bloom filter false positives: 11
Bloom filter false ratio: 0.00000
Bloom filter space used: 265.81 KB
Bloom filter off heap memory used: 265.8 KB
Index summary off heap memory used: 36.57 KB
Compression metadata off heap memory used: 0 bytes
Compacted partition minimum bytes: 216 bytes
Compacted partition maximum bytes: 258 bytes
Compacted partition mean bytes: 258 bytes
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1

Below is the command that can be used to check for compaction statistics, here you need to look at the ‘pending tasks’, and ‘bytes total in progress’.

$ nodetool compactionstats
pending tasks: 5
compaction type keyspace table completed total unit progress
Compaction Keyspace1 Standard1 282310680 302170540 bytes 93.43%
Compaction Keyspace1 Standard1 58457931 307520780 bytes 19.01%
Active compaction remaining time : 0h00m16s

Solution to the problem
1. First one is quite simple – Avoid merging of update/delete requests.
2. Reduce the frequency of in-memory objects (In Memtables) flush.

This can be done by increasing the size of the memtables to avoid or stop database to perform frequent flushes.
– Less number of flushes leads to fewer SSTs compaction.
– Less Compaction reduces the I/Contentions and this in turn improve reads.
– There are couple of parameters that you can adjust in your cassandra.yaml file to control the flushing.
i.e. memtable_flush_after_mins, memtable_throughput_in_mb , memtable_operations_in_millions.

3. One more solution but that only applies on systems where this stress in IO is not much frequent, we can reduce
the “thread priority” which reduces the IOs.
As lowering the priority slows down the compaction writes but only applies if it doesn’t happen frequently.

Add below lines in cassandra-env.conf file (Under /conf folder) to lower the compaction priority.

JVM_OPTS=”$JVM_OPTS -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Dcassandra.compaction.priority=1″

One last line would like to add.
In case when the IO is a genuine problem, you will need to add more nodes or replace disks with better performing one’s or high IO disks.

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , , | Leave a Comment »

EXECUTION PLAN: “automatic DOP: skipped because of IO calibrate statistics are missing”

Posted by FatDBA on September 23, 2014

Recently during one Performance Problem i have attached one better SQL Profile to the statement which includes to add DOP (Degree Of Parallelism) to reduce the impacts of a definite FTS (Full Table Scan) but found one NOTE coming during the execution plan generation which reads
“automatic DOP: skipped because of IO calibrate statistics are missing”

SQL> execute dbms_sqltune.accept_sql_profile(task_name => ‘TASK_33935’,task_owner => ‘SYS’, replace => TRUE, profile_type =>DBMS_SQLTUNE.PX_PROFILE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.86
SQL> explain plan for select count(*) from DIXIT_EVW_ETAILQ;

Explained.

Elapsed: 00:00:00.03
SQL> @xplan

PLAN_TABLE_OUTPUT
———————————————————————————————————————————————————————–

————————————————————————————————————————————-
Plan hash value: 584586630

—————————————————————————————————————-
| Id  | Operation              | Name             | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
—————————————————————————————————————-
|   0 | SELECT STATEMENT       |                  |     1 | 36776   (1)| 00:07:22 |        |      |            |
|   1 |  SORT AGGREGATE        |                  |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |                  |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000         |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |                  |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |                  |   177M| 36776   (1)| 00:07:22 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| DIXIT_EVW_ETAILQ |   177M| 36776   (1)| 00:07:22 |  Q1,00 | PCWP |            |
—————————————————————————————————————-

Note
—–
   – automatic DOP: skipped because of IO calibrate statistics are missing
   – SQL profile “SYS_SQLPROF_0148a0b0821b0005” used for this statement

The ‘AUTOMATIC DOP’ is skipped because I/O calibration is not run to gather the required statistics. Required statistics can be collected using DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure.

SQL> select status from V$IO_CALIBRATION_STATUS;

STATUS
——————————-
NOT AVAILABLE

DECLARE
lat INTEGER;
iops INTEGER;
mbps INTEGER;
BEGIN
–DBMS_RESOURCE_MANAGER.CALIBRATE_IO(, ,iops, mbps, lat);
DBMS_RESOURCE_MANAGER.CALIBRATE_IO (28, 10, iops, mbps, lat);
DBMS_OUTPUT.PUT_LINE (‘max_iops = ‘ || iops);
DBMS_OUTPUT.PUT_LINE (‘latency = ‘ || lat);
DBMS_OUTPUT.PUT_LINE (‘max_mbps = ‘ || mbps);
end;
/

max_iops = 5944
latency = 9
max_mbps = 75

18 rows selected.

Elapsed: 00:00:00.11

Issue:
If using DBMS_RESOURCE_MANAGER.CALIBRATE_IO there are times when you might recieve beloe error message
ORA-56708: Could not find any datafiles with asynchronous i/o capability

Resolution:
Then we need to enable asynch I/O, set below two values to mentioned settings in the init.ora file.

disk_asynch_io = true
filesystemio_options = asynch

SQL> select status from V$IO_CALIBRATION_STATUS;

STATUS
——————–
AVAILABLE

Now you can implement the DOP for the query and this way you can reduce FTS impacts.

Hope That Helps
Prashant Dixit

Posted in Uncategorized | Tagged: , , | Leave a Comment »

 
%d bloggers like this: