Tales From A Lazy Fat DBA

Love all databases! – Its all about performance, troubleshooting & much more …. ¯\_(ツ)_/¯

  • Prashant Dixit is the 'FatDBA' ...
  • Follow me on Twitter

Posts Tagged ‘Tuning’

10053 Trace Viewer : A life savior when handling colossal optimizer traces

Posted by FatDBA on October 18, 2021

Hi Everyone,

I am sure that my last post about 10053 debug traces has sparked some interest in optimizer cost calculations and estimations 🙂 As you guys are familiar that these traces aren’t that easy to digest and interpret, as they are pretty complicated, a humongous pile of internal cryptic information’s. One of the reader asked – If there are any tool that can help to at least format the trace and its sections ? Yes, there are few and one of my favorite is the 10053 viewer, and is what I am using from last few years now (lucky that I found that great blog post by Jonathan Lewis).

Click here if want to download it!

The tool is pretty easy to use! You have to click on ‘open trace file’ button and browse the 10053 trace from the system and click on ‘show trace file’ (next button).

Now load the trace file.

Now you’ll have a drop down view to select from. Once the trace is loaded, you can access sections by using ‘+’ to expand and ‘‘-‘ to minimize the section.

Expand to get more details about any particular section.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Oracle event 10046 debug traces, they really aren’t that ‘complicated’ as we think – A 10046 trace break apart!

Posted by FatDBA on October 16, 2021

Hi Everyone,

Oracle has a long list of internal debug codes and this tracing is an art and a real craft. 10046 debug traces is one of the popular method for collecting extended SQL trace (like SQL_TRACE=TRUE) information for Oracle sessions. This we specially use to determine or distinguish the nature of a SQL Tuning problem. By setting this event, you can get detailed trace information of Oracle’s internal execution system analysis, call, wait, and bind variables, which plays a very important role in analyzing the performance of the system. This provides a great source and different levels of details about SQLs.

This post is all about breaking the parts of the trace and understand some of the critical sections to help understanding about SQL stats that it captures. I am not going to show how to generate them as the steps are pretty straight forward and are available on metalink.

Though there are multiple use cases of 10046, but I recently used them to understand a complicated and costly PL/SQL program which has got more than 1000 different SQL’s inside that it calls and I was interested to check on the costliest among them and why is that …. Though there are surely other ways to get the details of same level like using SQL Profiler, SQL Traces etc. but none of them provides the level of details what 10046 gives) and elapsed time parsing (prsela).

Above used snippet is from live 10046 sorted traces from a production system running on 10.2.0.5.0 (Yes, an old application). There isn’t much different that you will notice if you execute it on any new version of database too, there are very few changes that you will notice in case running on a latest oracle DB version. Okay let’s first understand few of the keywords or column names used in the result.

TKPROF: Release 10.2.0.5.0 - Production on Fri Sep 27 03:31:42 2021

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Trace file: dixitdb_ora_28282_10046_for_spdixitM.trc
Sort options: exeela  fchela  prsela  
********************************************************************************

SELECT COUNT(*) 
FROM
 CANONTALAB.DIXIT1_SAMPLE WHERE DIXIT1_SAMPLE_NO=:B1 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      134      0.00       0.00          0          0          0           0
Execute    862      0.03       0.03          0          0          0           0
Fetch      862     46.59      45.52          0    1235246          0         862
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1858     46.63      45.55          0    1235246          0         862

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 296  (CANONTALAB)   (recursive depth: 3)

Rows     Row Source Operation
-------  ---------------------------------------------------
      2  SORT AGGREGATE (cr=2866 pr=0 pw=0 time=114898 us)
      2   INDEX FAST FULL SCAN PK_DIXIT1_SAMPLE (cr=2866 pr=0 pw=0 time=114888 us)(object id 125001)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      2   SORT (AGGREGATE)
      2    INDEX   MODE: ANALYZED (UNIQUE SCAN) OF 'PK_DIXIT1_SAMPLE' 
               (INDEX (UNIQUE))

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                        60       12.17         43.54
  
********************************************************************************

COUNT – Represents the umber of times a SQL statement was parsed, executed, or was fetched.
CPU – Total CPU time in seconds for all parse, execute, or fetch calls for the SQL.
ELAPSED – Total elapsed time in seconds for all parse, execute, or fetch calls for the SQL statement.
DISK – Total number of data blocks physically read from the datafiles on disk for all parse, execute, or fetch calls.
QUERY – Total number of buffers retrieved in consistent mode for all parse, execute, or fetch calls. Usually, buffers are retrieved in consistent mode for queries. A Consistent Get is where oracle returns a block from the block buffer cache but has to take into account checking to make sure it is the block current at the time the query started. Or is a normal reading of a block from the buffer cache. A check will be made if the data needs reconstructing from rollback info to give you a consistent view
CURRENT – Total number of buffers retrieved in current mode. Buffers are retrieved in current mode for statements such as INSERT, UPDATE, and DELETE. A DB block get (or current get in TKPROF) not only gets the block as it is right now, but it stops anyone else from getting that block (in current mode!) until we change it and release it. If someone else got there first, we wait.
ROWS – Total number of rows processed by the SQL statement. This total does not include rows processed by subqueries of the SQL statement. Also the rows gives statistics about 3 calls: Parse, Execute & Fetch
PARSE – Translates the SQL statement into an execution plan, including checks for proper security authorization and checks for the existence of tables, columns, and other referenced objects. This is where the physical and logical transformations and optimizations happens.
EXECUTE – Actual execution of the statement by Oracle. For INSERT, UPDATE, and DELETE statements, this modifies the data. For SELECT statements, this identifies the selected rows.
FETCH – Retrieves rows returned by a query. Fetches are only performed for SELECT statements.

Okay, now when all the column names and table entries are explained, let me try to explain what it represents in the form of those numbers.

It says 1235246 blocks were received in consistent mode during fetch operation. Since this is a SELECT statement, the blocks are shown during Fetch operation. If its a DML statement then blocks will be shown during Execute operation. Misses in library cache during each call. If there is no miss then it wont be mentioned. The 1 miss for the SQL is very much acceptable since when a SQL runs for the first time it need to be parsed and executed and execution plan will be stored. So parse and execute call will have 1 misses. If you see the statement parse call happened 134 times but the miss count is only 1, meaning that the statement was parsed only once and was stored in library cache. For next 133 parses the same parsed statement from library cache was used. So we have miss 1 and hit 133. Similarly execution plan was made only once and 861 times Oracle used same execution plan from library cache.

Now next jump to the row source operations and codes that it uses. cr = Consistent Reads, pr = Physical Reads, pw = Physical Writes, time = time taken by this step in microseconds. You might see some other codes used i.e. cost = cost incured by the step, size = represent the size of data in that step and card = cardinality.

So, the query was found doing an UNIQUE SCAN on it’s primary key index with name PK_DIXIT1_SAMPLE in ALL_ROWS mode and which is pretty understood as an equality predicate was used and the unique or primary key constraint was sufficient by itself to produce an index unique scan. And finally it shows wait event details and is pretty straight forward and says that it waits on ‘db file sequential read‘ with max wait time of 12.17

With the values above we need make a decision to whether to tune the SQL or not. Unless we have a locking problem or bad performing SQLs we shouldn’t worry about CPU time or the elapsed time. because timings come into consideration only when we have bad performing SQLs. The important factor is the number of block visits, both query (that is, subject to read consistency) and current (that is, not subject to read consistency). Segment headers and blocks that are going to be updated are acquired in current mode, but all query and subquery processing requests the data in query mode.

Hope It Helped
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | 4 Comments »

A ‘simple’ dissection of an optimizer decision and cost using Oracle 10053 trace event … Yes, it’s simple!

Posted by FatDBA on October 10, 2021

Hi Guys,

In today’s post I will try to explain the cost, stats (system & object level), join order, cardinality & the selectivity calculations about execution plan entries and will take help of the infamous 10053 optimizer traces. Though 10053 is notoriously complicated, detailed and extensive even for simple SQL statements but always has the wealth of information about optimizer decisions like plan directives, peeked bind values, predicate move around, Join elimination stats, query transformations, fix controls, join eliminations, join orders (easily complicated) & other computations or calculations. So, I will try to keep it as simple as I can.

About ways of generating the report, you can access it via ALTER SESSION, ORADEBUG or even through DBMS_SQLDIAG.DUMP_TRACE (does not require diagnostic pack). There are multiple use cases of using 10053, but the most common is when you want to compare any best or worst plans and you want to see the difference in cost calculations.

Alright, so this is the complete SQL statement followed up by its execution plan and predicate details.

SELECT * FROM (SELECT n.object_inst_id, n.name, n.type, rownum idx
FROM testsch.PDI_FBI_HASHAH1 n join testsch.PDI_FBI_HASHAH1_b on  TESTID_name = :"SYS_B_00"  and owning_object_id = n.object_inst_id
and n.name = NVL(null, n.name) AND n.status = NVL(:"SYS_B_01", :"SYS_B_02")
AND n.type = NVL(:"SYS_B_03", n.type) AND NVL(n.test_allocation,:"SYS_B_04") = NVL(NVL(null, n.test_allocation),:"SYS_B_05")
AND  NVL(n.number_usage, :"SYS_B_06")=:"SYS_B_07" AND ( n.owner IS NULL OR instr(lower(n.owner), :"SYS_B_08") > :"SYS_B_09" )
AND n.assigned_csn_id IS NULL AND n.locked_on IS NULL AND n.lock_owner IS NULL
AND (n.lock_id IS NULL OR n.LOCK_DATE <= (sysdate - NVL(n.lock_period,
(SELECT dixi_trait_values FROM testsch.PDI_HASHIS_MERGEDINFO_DATA WHERE HASH_BATTR_SHA = :"SYS_B_10")) / :"SYS_B_11" / :"SYS_B_12"))
AND target_object_id = :"SYS_B_13" ORDER BY TO_NUMBER(n.name)) OUTPUTS WHERE OUTPUTS.idx between :"SYS_B_14" and :"SYS_B_15"


----------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                               |     2 |   364 |   107   (0)| 00:00:01 |
|*  1 |  VIEW                          |                               |     2 |   364 |   107   (0)| 00:00:01 |
|   2 |   COUNT                        |                               |       |       |            |          |
|   3 |    NESTED LOOPS                |                               |     2 |   340 |   107   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| PDI_FBI_HASHAH1               |  6719 |   505K|   106   (0)| 00:00:01 |
|*  5 |      INDEX FULL SCAN           | PDI_FBI_NNAME_UTYPE_STA_NPFIX |  3065 |       |     7   (0)| 00:00:01 |
|*  6 |      TABLE ACCESS FULL         | PDI_HASHIS_MERGEDINFO_DATA    |     1 |    28 |     3   (0)| 00:00:01 | --> How the CBO decided cost 3 ? 
|*  7 |     INDEX RANGE SCAN           | IDX_TRNNUMBERB_SMOKE          |     1 |    93 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OUTPUTS"."IDX"<=10 AND "OUTPUTS"."IDX">=1)
   4 - filter("N"."TYPE"='MOBILE' AND ("N"."OWNER" IS NULL OR INSTR(LOWER("N"."OWNER"),'dna oyj')>0)
              AND NVL("N"."NUMBER_USAGE",'External')='Internal' AND "N"."ASSIGNED_CSN_ID" IS NULL AND
              NVL("N"."TEST_ALLOCATION",'EAI')=NVL(NVL(NULL,"N"."TEST_ALLOCATION"),'EAI') AND "N"."LOCKED_ON" IS NULL
              AND "N"."LOCK_OWNER" IS NULL AND ("N"."LOCK_ID" IS NULL OR
              "N"."LOCK_DATE"<=SYSDATE@!-NVL("N"."LOCK_PERIOD", (SELECT "DIXI_TRAIT_VALUES" FROM
              "TESTSCH"."PDI_HASHIS_MERGEDINFO_DATA" "PDI_HASHIS_MERGEDINFO_DATA" WHERE
              "HASH_BATTR_SHA"='MY_CALL_DURATION'))/60/24))
   5 - access("N"."STATUS"='Available')
       filter("N"."STATUS"='Available')
   6 - filter("HASH_BATTR_SHA"='MY_CALL_DURATION')
   7 - access("TESTID_NAME"='ind.fatdba.application.dss.more.production.BESTID.PDINumberBestSubClass'
               AND "OWNING_OBJECT_ID"="N"."OBJECT_INST_ID" AND "TARGET_OBJECT_ID"=21)

Let’s try to understand how optimizer come up with the cost numbers, how it derives these values, on what basis and what all it considers during the calculations. I will explaining about one of the access operation at Line ID 6 of the plan where it’s doing a TABLE ACCESS FULL on table PDI_HASHIS_MERGEDINFO_DATA and has a final cost of 3 associated with it.

So, what it is ? what all was happened under the hood by the optimizer to get this value of 3 for this operation ?

This is the subquery for which we will be checking the cost calculations within 10053 trace files.

(SELECT dixi_trait_values FROM testsch.PDI_HASHIS_MERGEDINFO_DATA WHERE HASH_BATTR_SHA = :"SYS_B_10")) / :"SYS_B_11" / :"SYS_B_12"))

Let me replace the sys generated values with their original values, for this I will be checking under “Peeked values of the binds in SQL statement” section. The same you can get from ‘predicate information’ section of the execution plan too, but let’s stick to 10053 for the moment.

*******************************************
Peeked values of the binds in SQL statement
*******************************************
----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=01 mxl=128(73) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0100 frm=01 csi=873 siz=128 off=0
  kxsbbbfp=7f3791745ee0  bln=128  avl=73  flg=09
  value="ind.fatdba.application.dss.more.production.BESTID.PDINumberBestSubClass"
...
......
........

 Bind#10
  oacdty=01 mxl=32(19) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0100 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3791745d48  bln=32  avl=19  flg=09
  value="MY_CALL_DURATION"
 Bind#11
  oacdty=02 mxl=22(02) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0100 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f3791745d20  bln=22  avl=02  flg=09
  value=60
 Bind#12
  oacdty=02 mxl=22(02) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0100 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f3791745cf8  bln=22  avl=02  flg=09
  value=24

Okay, let’s place them in the subquery.

SELECT dixi_trait_values FROM testsch.PDI_HASHIS_MERGEDINFO_DATA WHERE HASH_BATTR_SHA='MY_CALL_DURATION/60/24';

These are the contents of the column.

SQL> SELECT dixi_trait_values FROM testsch.PDI_HASHIS_MERGEDINFO_DATA;

DIXI_TRAIT_VALUES
-----------------
             60
            120
            200
             10
             60

Let’s check for access path analysis for this SQL.

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: PDI_HASHIS_MERGEDINFO_DATA  Alias: PDI_HASHIS_MERGEDINFO_DATA
  #Rows: 5  SSZ: 0  LGR: 0  #Blks:  5  AvgRowLen:  31.00  NEB: 0  ChainCnt:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
try to generate single-table filter predicates from ORs for query block SEL$3 (#0)
finally: "PDI_HASHIS_MERGEDINFO_DATA"."DIXI_TRAIT_VALUES"=:B1

=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$3 (#0)
Applicable DS directives:
Return code in qosdSetupDirCtx4QB: NODIR
=====================================
SPD: END context at query block level
=====================================
Access path analysis for PDI_HASHIS_MERGEDINFO_DATA
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for PDI_HASHIS_MERGEDINFO_DATA[PDI_HASHIS_MERGEDINFO_DATA] 
  SPD: Return code in qosdDSDirSetup: NOQBCTX, estType = TABLE
  Column (#1): CONF_ATTR_NAME(VARCHAR2)
    AvgLen: 24 NDV: 5 Nulls: 0 Density: 0.200000
  Table: PDI_HASHIS_MERGEDINFO_DATA  Alias: PDI_HASHIS_MERGEDINFO_DATA
    Card: Original: 5.000000  Rounded: 1  Computed: 1.000000  Non Adjusted: 1.000000
  Scan IO  Cost (Disk) =   3.000000
  Scan CPU Cost (Disk) =   36377.200000
  Total Scan IO  Cost  =   3.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 5.000000 (#rows))
                       =   3.000000
  Total Scan CPU  Cost =   36377.200000 (scan (Disk))
                         + 250.000000 (cpu filter eval) (= 50.000000 (per row) * 5.000000 (#rows))
                       =   36627.200000
  Access Path: TableScan
    Cost:  3.002419  Resp: 3.002419  Degree: 0
      Cost_io: 3.000000  Cost_cpu: 36627
      Resp_io: 3.000000  Resp_cpu: 36627
  Best:: AccessPath: TableScan
         Cost: 3.002419  Degree: 1  Resp: 3.002419  Card: 1.000000  Bytes: 0.000000

.....
.......

OPTIMIZER STATISTICS AND COMPUTATIONS
PJE:  Bypassed; QB has a single table SEL$3 (#0)
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  PDI_HASHIS_MERGEDINFO_DATA[PDI_HASHIS_MERGEDINFO_DATA]#0
***********************
Best so far:  Table#: 0  cost: 3.002419  card: 1.000000  bytes: 28.000000
***********************

If you check above you will see it considers the CPU & IO to generate the final cost values and that’s what we already know. Cost is the amount of work the optimizer estimates to run your query via a specific plan. The optimizer generally prefers lower cost plans. The optimizer’s cost model accounts for the IO, CPU, and network resources that will be used by the query.
IO_COST: I/O cost of the operation as estimated by the query optimizer’s approach. The value of this column is proportional to the number of data blocks read by the operation. For statements that use the rule-based approach, this column is null.
CPU_COST: CPU cost of the operation as estimated by the query optimizer’s approach. The value of this column is proportional to the number of machine cycles required for the operation. For statements that use the rule-based approach, this column is null.

At the same time it checks for dictionary stats to get table level information i.e. average row lenght, blocks, total rows and takes help of metrics NDV (nuber of distinct values), nulls, density of rows etc. to calculate the original cardinality of 5 (number of distinct entries) and the rounded cardinality of 1 – this SQL has resulted only into 1 row as an output, hence 1 is the rounded cardinality.

So, the main elements are ‘Scan IO Cost (Disk)’, ‘Scan CPU Cost (Disk)’ which finally derives the ‘Total Scan IO Cost’ value of 3.

That’s how the optimizer has calculated this value of 3 for this specific access operation, similarly you understand any of the other access entries in plan, and their associated costs.

If you want to read more on 10053 traces, I recommend you a great article by Wolfgang Breitling, Jonathan Lewis (his book cost-based Oracle fundamentals) & few posts from Maria Colgan.

Hope It Helped
Prashant Dixit

Posted in Uncategorized | Tagged: , | 1 Comment »

Quarantine a SQL ? This can save from a system performance pandemic …

Posted by FatDBA on July 6, 2021

Hi Guys,

Last weekend I was asked to examine a performance problem where customer was observing sporadic resource spikes on CPU & PIOs (Physical IO scans), and had no idea how to handle the situation. They already have identified the problematic SQL and asked me to take a look, while doing initial investigation on the system, I found the resource manager was enabled and was killing the ill SQL every time when it tries to breach the limits set on CPU & other resources.

I spent some time understanding query behavior and performance, and I found this a scenario of SQL plan (PHV) flip where the query optimizer toggles between a good and a worst plan. This being a time sensitive issue and customer needs a quick fix before I identify the reason behind the PHV flip, we have to think of any technique on how we can stop the SQL execution with bad PHVs. This was an Oracle EE 19c system running on Exadata, I immediately proposed for a solution using new feature of ‘SQL Quarantine‘, which helps to prevent reuse of same execution plan which was terminated by resource managers due to resource limits.

Let me show you how to do that, here I will demonstrate how to quarantine a SQL & all its execution plans.

Well this is just a use case, you can also quarantine the SQL for all its PHVs or execution plans or even through the SQL Text. Here I will show how you can quarantine a SQL for all its execution plans.

Note: This feature is currently available only on Enterprise Edition on Engineered Systems, like Exadata and Exadata Cloud Service, but for this demo, I will be enabling the ‘_exadata_feature_on‘ parameter for demo purposes on my sandbox setup.

** Please don’t touch this parameter in your production environments if you’re not running on Oracle EE on the top of Exadata or Exadata Cloud services.

[oracle@canttowinsec ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Jul 3 23:33:41 2021
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


SQL>
SQL> @hidden
Enter value for param: exadata_feature_on
old   5:    and a.ksppinm like '%&param%'
new   5:    and a.ksppinm like '%exadata_feature_on%'

Parameter                                     Session Value             Instance Value            descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_exadata_feature_on                           FALSE                     FALSE                     Exadata Feature On

SQL>
SQL>


SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>


SQL> startup
ORACLE instance started.

Total System Global Area  922745216 bytes
Fixed Size                  8903040 bytes
Variable Size             843055104 bytes
Database Buffers           67108864 bytes
Redo Buffers                3678208 bytes
Database mounted.
Database opened.
SQL>
SQL>
SQL>



SQL>
SQL> @hidden
Enter value for param: exadata_feature_on
old   5:    and a.ksppinm like '%&param%'
new   5:    and a.ksppinm like '%exadata_feature_on%'

Parameter                                     Session Value             Instance Value            descr
--------------------------------------------- ------------------------- ------------------------- ------------------------------------------------------------
_exadata_feature_on                           TRUE                      TRUE                      Exadata Feature On

SQL>




-- For this demo I have created one table with 1000000 rows. 

SQL> select count(*) from bigtab;

  COUNT(*)
----------
   1000000

Next check the SQLID, PHV and TEXT of the SQL for which you want to setup this QUARANTINE feature and will create the quarantine configuration for it. You can set it for SQL ID, SQL ID + PHV or for SQL TEXT.

SQL> select sql_id, plan_hash_value, sql_text from v$sqlarea where sql_text like '%select * from bigtab%';

SQL_ID        PLAN_HASH_VALUE
------------- ---------------
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6fwqzurbc8y7k       441133017
select * from bigtab

SQL>


-- You can create CREATE_QUARANTINE BY SQL_ID 
quarantine_config := DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_ID(SQL_ID => '&sql_id')
quarantine_config := DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_ID(SQL_ID => '&sql_id', PLAN_HASH_VALUE => '&PLAN_HASH_VALUE');

-- You can create CREATE_QUARANTINE_BY_SQL_TEXT.
quarantine_config := DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_TEXt(SQL_TEXT => to_clob('select count(*) from emp'));



-- This command creates a quarantine configuration for SQL ID '6fwqzurbc8y7k' and all its execution plans.
SQL>
SQL> DECLARE
quarantine_sql VARCHAR2(30);
BEGIN
quarantine_sql :=   DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_ID(SQL_ID => '6fwqzurbc8y7k');
END;
/  

PL/SQL procedure successfully completed.

SQL>



-- Check if quarantine config/profile is created.

SQL> select NAME, SQL_TEXT, ELAPSED_TIME, cpu_time,CREATED, ENABLED from dba_sql_quarantine;


NAME                           SQL_TEXT                       ELAPSED_TIME         CPU_TIME                       CREATED                        ENA
------------------------------ ------------------------------ -------------------- ------------------------------ ------------------------------ ---
SQL_QUARANTINE_2akx3bq44wvc5   select * from bigtab           ALWAYS               ALWAYS                         04-JUL-21 12.09.25.567422 AM   YES


Now next we will use the DBMS_SQLQ.ALTER_QUARANTINE procedure to put limits on resource usage i.e. Elapsed time, CPU, IO (MBs), Number of physical (PIOs) requests, Number of logical (LIOs) requests.

SQL>
SQL> BEGIN
    DBMS_SQLQ.ALTER_QUARANTINE(
       QUARANTINE_NAME => 'SQL_QUARANTINE_2akx3bq44wvc5',
       PARAMETER_NAME  => 'CPU_TIME',
       PARAMETER_VALUE => '5');

    DBMS_SQLQ.ALTER_QUARANTINE(
       QUARANTINE_NAME => 'SQL_QUARANTINE_2akx3bq44wvc5',
       PARAMETER_NAME  => 'ELAPSED_TIME',
       PARAMETER_VALUE => '10');
END;
/  2    3    4    5    6    7    8    9   10   11   12

PL/SQL procedure successfully completed.

SQL>


-- let's verify limits that we have set against this SQL Quarantine profile.

NAME                           SQL_TEXT                       ELAPSED_TIME         CPU_TIME                       CREATED                        ENA
------------------------------ ------------------------------ -------------------- ------------------------------ ------------------------------ ---
SQL_QUARANTINE_2akx3bq44wvc5   select * from bigtab           10                   5                              04-JUL-21 12.12.41.918609 AM   YES

Next we will setup the RESOURCE MANAGER.

-- Create a pending area.
SQL> begin
 dbms_resource_manager.create_pending_area();
 end;
 /  2    3    4

PL/SQL procedure successfully completed.

-- Create a consumer group with name 'DIXITPOC_GROUP'.
SQL> begin
 dbms_resource_manager.create_consumer_group(CONSUMER_GROUP=>'DIXITPOC_GROUP',COMMENT=>'To test SQL quarantine feature for one SQL');
 end;
 /  2    3    4

PL/SQL procedure successfully completed.


-- Now bind the consumer group with the user 'DIXIT', I mean this could be any group through which you will execute the query.
SQL> begin
 dbms_resource_manager.set_consumer_group_mapping(attribute => 'ORACLE_USER',value => 'DIXIT',consumer_group =>'DIXITPOC_GROUP' );
 end;
  /  2    3    4

PL/SQL procedure successfully completed.


-- Create a resource plan with name 'POC_FOR_QUARANTINE'.
SQL> begin
dbms_resource_manager.create_plan(plan => 'POC_FOR_QUARANTINE',comment => 'Kill SQL and its other execution plans when exceeding CPU and Elapsed seconds');
end;
/  2    3    4

PL/SQL procedure successfully completed.


-- Create a plan directive by allocating resource plan 'POC_FOR_QUARANTINE' to consumer group 'DIXITPOC_GROUP'. I am setting execution limit of 5 seconds for the SQL.
SQL> begin
 dbms_resource_manager.create_plan_directive(
 plan => 'POC_FOR_QUARANTINE',
 group_or_subplan => 'DIXITPOC_GROUP',
 comment => 'Kill SQL and its other execution plans when exceeding CPU and Elapsed seconds',
 switch_group=>'CANCEL_SQL',
 switch_time => 5,
 switch_estimate=>false);
 end;
 /  2    3    4    5    6    7    8    9   10

PL/SQL procedure successfully completed.




-- Allocate full resources to rest of the sessions which are not part of this plan.
SQL> begin
 dbms_resource_manager.create_plan_directive(PLAN=> 'POC_FOR_QUARANTINE', GROUP_OR_SUBPLAN=>'OTHER_GROUPS',COMMENT=>'leave rest as is');
 end;
 /  2    3    4

PL/SQL procedure successfully completed.



-- Validate and submit pending area.
SQL> begin
 dbms_resource_manager.validate_pending_area();
 end;
 /  2    3    4

PL/SQL procedure successfully completed.

SQL> begin
 dbms_resource_manager.submit_pending_area();
 end;
 /  2    3    4

PL/SQL procedure successfully completed.

SQL>


-- Grant switch privilege to the DIXIT user which will switch it to 'DIXITPOC_GROUP' group.
SQL> begin
dbms_resource_manager_privs.grant_switch_consumer_group('DIXIT','DIXITPOC_GROUP',false);
end;
/  2    3    4

PL/SQL procedure successfully completed.

-- Create initial consumer group for DIXIT user for 'DIXITPOC_GROUP' group.
SQL> begin
dbms_resource_manager.set_initial_consumer_group('DIXIT','DIXITPOC_GROUP');
end;
/  2    3    4

PL/SQL procedure successfully completed.

SQL>

Next set the resource manager at the database level, we will have to set the RESOURCE_MANAGER_PLAN to value or name of the plan ‘POC_FOR_QUARANTINE’.

SQL> show parameter resource

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
resource_limit                       boolean     TRUE
resource_manage_goldengate           boolean     FALSE
resource_manager_cpu_allocation      integer     5
resource_manager_plan                string
SQL> 


SQL> alter DIXITtem set RESOURCE_MANAGER_PLAN = 'POC_FOR_QUARANTINE';
DIXITtem altered.

SQL> show parameter resource

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
resource_limit                       boolean     TRUE
resource_manage_goldengate           boolean     FALSE
resource_manager_cpu_allocation      integer     5
resource_manager_plan                string      POC_FOR_QUARANTINE
SQL>

Execute the SQL for the first time, you will get ‘active time limit exceeded’ due to RM.

SQL>
SQL> select * from emp;

ERROR at line 2:
ORA-00040: active time limit exceeded - call aborted

Now, if you run the SQL for the second time, you will see your quarantine plan/profile will be used and will abort the execution of the SQL.

SQL>
SQL> select * from emp;

ERROR at line 2:
ORA-56955: quarantined plan used

Hope It Helped
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , | Leave a Comment »

Getting SQL Net message from dblink, have you tried DRIVING_SITE hint ?

Posted by FatDBA on June 27, 2021

Hi Guys,

Last week I was asked to take a look in to one performance problem where customer DBA reported excessive ‘SQL*Net message from dblink’ in the system. As I don’t have direct access to the box, I asked him to share both AWR and ASH performance reports for the time when they observe the problem. And yes, he was right, there were few waits on this event but with very high average elapsed times or single execution times. Next I verified the ASH data for the period and I was able to identify this SQL (SELECT) which is frequently waiting on this wait tvent.

About this event, as you guys know SQL Net messages are classified as Network classed events and happens when some part of the final dataset is coming from a remote database which is accessed via DBLink. Alright, so we know the problem and we have the problematic SQL, let’s simulate the problem and the solution.

I will first create the DBLink that I will be using to access the remote data which is present in a different database on a different server/host and next I will create the sample table with some test data both on Site A and Site B.

Reference used:
BIGTAB: Table on Site 1, local site.
BIGTAB2: Table on site 2, remote location

SQL>
SQL> CREATE DATABASE LINK dixitdlink1
  2     CONNECT TO dixit IDENTIFIED BY oracle90
   USING '(DESCRIPTION=
            (ADDRESS = (PROTOCOL = TCP)(HOST = canttowinsec.quebecdomain)(PORT = 1521))
            (CONNECT_DATA=(SERVICE_NAME=dhavaldb.quebecdomain))
          )';  

Database link created.

SQL>


-- On source database, on local host.
CREATE TABLE BIGTAB (
   id         NUMBER,
   weight     NUMBER,
   adate      DATE
);

Table created.

INSERT INTO BIGTAB (id, weight, adate)
SELECT MOD(ROWNUM,1000),
       DBMS_RANDOM.RANDOM,
       SYSDATE-1000+DBMS_RANDOM.VALUE(0,1000)
 FROM all_objects
/

67447 rows created.


SQL> delete from prashant.BIGTAB where rownum < 60000;

59999 rows deleted.




-- This I have created on target database, on a remote host.
CREATE TABLE BIGTAB2 (
   id         NUMBER,
   weight     NUMBER,
   adate      DATE
);

Table created.


INSERT INTO BIGTAB2 (id, weight, adate)
SELECT MOD(ROWNUM,1000),
       DBMS_RANDOM.RANDOM,
       SYSDATE-1000+DBMS_RANDOM.VALUE(0,1000)
 FROM all_objects
/

67447 rows created.

Alright, the test data is created on both the sites, and in order to mimic the exact case that I’d faced, I have intentionally deleted 59999 table from table that exists in site 1. So now we have table BIGTAB with 7735 records and on remote database we have a table with name BIGTAB2 with 67447 rows. Let’s execute few queries and do a Inner Join on both the tables.

Test 1: With no hint or tuning/tweaking of any kind. Asking to bring all data from remote site here to local site and then perform the join operation or this can also be done if we put the hint and mention name of the local table, or table that is small i.e. /*+ DRIVING_SITE(BIGTAB) */

SQL> select * from prashant.BIGTAB, dixit.BIGTAB2@dixitdlink1 where ADATE=DATEOFREGISTER;

        ID     WEIGHT ADATE             ID     WEIGHT DATEOFREG
---------- ---------- --------- ---------- ---------- ---------
       502 -1.363E+09 09-MAR-20        989 1353403341 09-MAR-20
       577 -185863752 22-FEB-21        247 -626839962 22-FEB-21
........
............
...............
.................


Execution Plan
----------------------------------------------------------
Plan hash value: 2705338834

----------------------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         | 19931 |  1362K|   144   (0)| 00:00:01 |        |      |
|*  1 |  HASH JOIN         |         | 19931 |  1362K|   144   (0)| 00:00:01 |        |      |
|   2 |   TABLE ACCESS FULL| BIGTAB  |  6948 |   237K|    68   (0)| 00:00:01 |        |      |
|   3 |   REMOTE           | BIGTAB2 | 19930 |   681K|    76   (0)| 00:00:01 | DIXIT~ | R->S |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ADATE"="DATEOFREGISTER")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   3 - SELECT "ID","WEIGHT","DATEOFREGISTER" FROM "DIXIT"."BIGTAB2" "BIGTAB2"
       (accessing 'DIXITDLINK1.ONTADOMAIN' )


Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

So, this was happening there on that day, on that system with that query waiting on ‘SQL*Net message from dblink‘, now this is the right time I should introduce the DRIVING_SITE hint, I have been trying and experimenting with this hint for a long time now, maybe since my 9i days.

This hint is useful for distributed queries to force optimizer to use a particular site as a driving site, or it instructs the optimizer to execute the query at a different site than that selected by the database. Like in previous test case all rows from remote host which matches the condition are sent to the local site and the join is finally executed on the local site.

Now, I am going to use the DRIVING_SITE (will mention name of the remote table) and force the optimizer to perform all join operations at the remote site, hence the query will be executed there and the result set is returned to the local site.

So, you should try with the DRIVING_SITE hint on the site where the huge table resides!

Test 2: With DRIVING_HINT on remote table BIGTAB2 which is big and has 67447 records.

SQL> select /*+ DRIVING_SITE(BIGTAB2) */ * from prashant.BIGTAB, dixit.BIGTAB2@dixitdlink1 where ADATE=DATEOFREGISTER;

        ID     WEIGHT ADATE             ID     WEIGHT DATEOFREG
---------- ---------- --------- ---------- ---------- ---------
       502 -1.363E+09 09-MAR-20        989 1353403341 09-MAR-20
       577 -185863752 22-FEB-21        247 -626839962 22-FEB-21
.....
.......
........
..........



Execution Plan
----------------------------------------------------------
Plan hash value: 2214131741

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|         |    82 |  5740 |    70   (0)| 00:00:01 |        |      |
|*  1 |  HASH JOIN             |         |    82 |  5740 |    70   (0)| 00:00:01 |        |      |
|   2 |   REMOTE               | BIGTAB  |    82 |  2870 |     2   (0)| 00:00:01 |      ! | R->S |
|   3 |   TABLE ACCESS FULL    | BIGTAB2 | 53126 |  1815K|    68   (0)| 00:00:01 | DHAVA~ |      |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A2"."ADATE"="A1"."DATEOFREGISTER")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   2 - SELECT "ID","WEIGHT","ADATE" FROM "PRASHANT"."BIGTAB" "A2" (accessing '!' )


Note
-----
   - fully remote statement
   - dynamic statistics used: dynamic sampling (level=2)

Look at the cost, rows processed and bytes (data processed), all of them are reduced by a great extent. Of course not much elapsed time difference you see here in this example as the dataset is small and is a less complex query, you can try it yourself and see the magic!

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Parse CPU to Parse Elapsd % – Lets clear the Perplex!

Posted by FatDBA on January 26, 2018

Happy Republic Day!

 

Hi Mates,

I see a lot of confusion, mix-ups and perplexity in between the DBAs on few of the metrics under ‘Instance Efficiency % section’ in AWR reports. Specially for one of the metric “Parse CPU to Parse Elapsd %”. In today’s post i will try to explain the metrics in detail which will help you to understand it in depth and clear the muddiness.

Few of the lines you might have read about this metric i.e.
“we should always look for as low as possible numbers for this metric …”
“Try to achieve the impossible value of zero for this one “

– Even some of the Metalink notes are misleading too.

Scenario:

Below is the snippet from one of my test box.

Note: Just taking a look at the instance efficiency ratios can be very dangerous and i advise to first start with Load Profile, top 5 waits and there on …

Okay, so my definition of this statistic differ from what you judge after reading the name of the metric – This datum signals the delay/wait in parsing of SQL queries during the snap interval.
In our example the value is 1.37% this means that for every CPU second spend parsing we spent about 72.99 (100/1.37) Seconds of clock time. It can happen due to various reasons i.e Latch or any contention between the sessions etc.

The ideal value for this stat should also be 100% like rest of the ratios (Yes, that’s correct!!).

Let’s see how this value was calculated, what all it considers while deducing that final figure of 1.37%.
It takes the “parse time cpu/parse time elapsed * 100” to get the ‘Parse CPU to Parse Elapsed’ figure.

 

Statistic	                              Total	per Second     per Trans
------------------------------------------------------------------------------------------------
parse time cpu                                  398	      0.11	0.01
parse time elapsed                           29,055	      8.07	0.81

 

So, in short each time there is a drop in this metric from 100%, means the database was waiting for something which slowed down the parse times.
If you want to dig in deep then you should trace the session using 10046 tracing with level 8 to see where the other % of parse time is being spent and leaving this for readers to test.

 

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Shared Pool Management in 12c, What’s new!

Posted by FatDBA on January 17, 2018

Hi Everyone,

Few days while working on Shared Pool issue where we were getting ORA-4031 on one of the 11gR2 database i discovered something interesting and new regarding the SGA duration management. Here i wont discuss the problem that i faced and how we fixed but would try to show the architectural changes that has happened with 12c that can now fix these errors/issues.

Let me explain what are ‘Durations’ first. The shared pool is made up of a number of granules. The shared pool then split into sub-pools if you have a large enough SGA, and each sub-pool consists of a number of non-overlapping granules. In 11g each sub-pool also split into four sub-sub-pools known as durations.

What was there before 12c arrived
Starting from Oracle 10g each sub-pool in SGA was divided in to four durations.
Let’s check the distribution by generating the Heap Dump for shared pool, here i used oradebug with level 2 (This provides you the full SGA Summary or you can try with level 2050 to get full summary with contents).

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump heapdump 2
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_8127.trc

.....
******************************************************
HEAP DUMP heap name="sga heap(1,0)"  desc=380030610
Total heap size    =218102664
Total free space   =  1066928
Total reserved free space   =  8439520
Unpinned space     = 38812528  rcr=11971 trn=17906
Permanent space    =208595160
HEAP DUMP heap name="sga heap(1,1)"  desc=380031e68
Total heap size    = 67108512
Total free space   =  2912528
Total reserved free space   =  1382816
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,2)"  desc=3800336c0
Total heap size    =167771280
Total free space   = 92743480
Total reserved free space   =  3852856
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,3)"  desc=380034f18
Total heap size    =268434048
Total free space   = 74547592
Total reserved free space   = 13497472
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,0)"  desc=380039e38
Total heap size    =201325536
Total free space   =    17200
Total reserved free space   =  8435920
Unpinned space     = 26474112  rcr=7934 trn=8094
Permanent space    =192871456
HEAP DUMP heap name="sga heap(2,1)"  desc=38003b690
Total heap size    = 83885640
Total free space   = 48723768
Total reserved free space   =  1035792
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,2)"  desc=38003cee8
Total heap size    =369096816
Total free space   =258674312
Total reserved free space   = 16982464
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,3)"  desc=38003e740
Total heap size    =218102664
Total free space   = 17202608
Total reserved free space   = 10966696
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,0)"  desc=380043660
Total heap size    =184548408
Total free space   =    13008
Total reserved free space   =  5061928
Unpinned space     = 26943408  rcr=4930 trn=9425
Permanent space    =179472608
HEAP DUMP heap name="sga heap(3,1)"  desc=380044eb8
Total heap size    = 67108512
Total free space   = 27568352
Total reserved free space   =     4744
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,2)"  desc=380046710
Total heap size    =352319688
Total free space   =233302736
Total reserved free space   = 15981216
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,3)"  desc=380047f68
Total heap size    =385873944
Total free space   =143746536
Total reserved free space   = 19402616
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
.....
******************************************************

So above stats shows that we have three sub-pools [Sub pool (1,0), (1,1), (1,2), (1,3) ….. (3,0), (3,1), (3,2), (3,3)] of SGA Heaps with Four Durations each heap. And every duration has its own size, free space and reserved free space. This type of distribution possibly causes the ORA 4031 even when you have enough free space in other durations and this is what the actual cause in my earlier case, but here we won’t discuss how we fixed that.

Okay so now lets do the same with 12c database. Lets generate the Heap Dump for SGA in 12c database and see the distributions of durations here. Once again we will use the oradebug to dump heapdump with Level 2.

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump heapdump 2
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/tunedb/tunedb/trace/tunedb_ora_11054.trc


******************************************************
HEAP DUMP heap name="sga heap(1,0)"  desc=0x60103678
 extent sz=0xfe0 alt=304 het=32767 rec=9 flg=0x82 opc=0
 parent=(nil) owner=(nil) nex=(nil) xsz=0x1000000 heap=(nil)
 fl2=0x24, nex=(nil), idx=1, dsxvers=1, dsxflg=0x0
 dsx first ext=0x73000000
 dsx empty ext bytes=0  subheap rc link=0x730000c0,0x730000c0

******************************************************
HEAP DUMP heap name="sga heap(1,3)"  desc=0x60107f80
 extent sz=0xfe0 alt=304 het=32767 rec=9 flg=0x82 opc=0
 parent=(nil) owner=(nil) nex=(nil) xsz=0x1000000 heap=(nil)
 fl2=0x24, nex=(nil), idx=1, dsxvers=1, dsxflg=0x0
 dsx first ext=0x7e000000
 dsx empty ext bytes=0  subheap rc link=0x7e0000c0,0x7e0000c0

Alright, so here we only have two groups of SGA durations – ‘Sub pool 1, duration 0’ and ‘Sub Pool 1 duration 3’ for improved sharability and to avoid ORA 4031 errors.

Hope It helps
Prashant Dixit

Posted in Advanced | Tagged: , , , | Leave a Comment »

Script to measure the Source & Contribution of any Wait Event within AWR snapshots.

Posted by FatDBA on January 1, 2018

🤘 Happy New Year Everyone! 🤘

So, here i am with the maiden post of this year.
Today i will discuss about a script that might be handy and useful while you are investigating any performance problem with the database.
In fact this is one of my favorite script which i always considers to run at a point when we have identified the leading wait event and the time-frame of the spike or for any transient variation in system behavior. This script helps you to measure the source and contribution or the impact of any specific wait event.

The script is a join between ASH views and historical views (dba_hist_active_sess_history AND dba_hist_snapshot) and provides you stats based on Average Number of Active Sessions (AAS) were waiting on this event during the period and with what total contribution in terms of percentage.

Lets work on a use case.
Suppose you have a system with huge User IOs happening, as conformed by any report or tool (AWR, ASH, Via any Script, OEM etc.)

For example using AWR we found one of the User IO wait class event ‘db file scattered read’ was leading the ‘Top Foreground Wait Events’ charts on a specific time when we have the high User IO load on system.

                                           Total Wait       Avg   % DB Wait
Event                                Waits Time (sec)      Wait   time Class
------------------------------ ----------- ---------- --------- ------ --------
db file scattered read              10,282       82.8    41.50ms  43.9 User I/O

So, now you have identified the leading wait event and the timeframe with maximum intensity or frequency, Lets call the script.

Script Inputs:
Example:

Enter value for event_class:
User I/O
Enter value for event_name: db file scattered read
Enter value for begin_snap: 193
Enter value for end_snap: 200
Enter value for dbid: 2896132084

WITH
events AS (
SELECT /*+ MATERIALIZE NO_MERGE */
       SUBSTR(TRIM(h.sql_id||' '||h.program||' '||
       CASE h.module WHEN h.program THEN NULL ELSE h.module END), 1, 128) source,
       h.dbid,
       COUNT(*) samples
  FROM dba_hist_active_sess_history h,
       dba_hist_snapshot s
 WHERE h.wait_class = TRIM('&Event_Class') AND h.event = TRIM('&Event_Name')
   AND h.snap_id BETWEEN &Begin_Snap AND &End_Snap
   AND h.dbid = &dbid
   AND s.snap_id = h.snap_id
   AND s.dbid = h.dbid
   AND s.instance_number = h.instance_number
 GROUP BY
       h.sql_id,
       h.program,
       h.module,
       h.dbid
 ORDER BY
       3 DESC
),
total AS (
SELECT SUM(samples) samples,
       SUM(CASE WHEN ROWNUM > 15 THEN samples ELSE 0 END) others
  FROM events
)
SELECT e.source,
       e.samples,
       ROUND(100 * e.samples / t.samples, 1) percent,
       (SELECT DBMS_LOB.SUBSTR(s.sql_text, 1000, 1) FROM dba_hist_sqltext s WHERE s.sql_id = SUBSTR(e.source, 1, 13) AND s.dbid = e.dbid AND ROWNUM = 1) sql_text
  FROM events e,
       total t
 WHERE ROWNUM  0.1
 UNION ALL
SELECT 'Others',
       others samples,
       ROUND(100 * others / samples, 1) percent,
       NULL sql_text
  FROM total
 WHERE others > 0
   AND ROUND(100 * others / samples, 1) > 0.1;

Below is the output of the query.

SOURCE                                                                      SAMPLES      PERCENT       SQL_TEXT
------------------------------------------------------------------------  ---------- ---------------  -------------------------------
5av23g8w7f3ka sqlplus@dixitlab.localdomain (TNS V1-V3)                          1          45.23       select * from dixit.bigtab
9a7gbkahasj1a sqlplus@dixitlab.localdomain (TNS V1-V4)                          1           5          select * from dixit.gianttable
OTHERS                                                                          14         49.77

Okay so result shows one of the SQL statement with SQL id 5av23g8w7f3ka was responsible for more than 45% of these waits.
So, now you have the evidence and can start troubleshooting this specific statement to reduce the USER IOs.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | 3 Comments »

Auto Stats Gathering in Oracle 12c & Something Interesting :)

Posted by FatDBA on December 14, 2017

Hi Fellas,
Starting from Oracle 12c there is a new feature added which collects the statistics when you perform the Bulk Loads when using any of the two methods:
– CREATE TABLE AS SELECT (CTAS)
– INSERT INTO … SELECT (Into an empty table using DPR or Direct Path Read).

SQL> explain plan for create table dixittab as select * from scottisdead;
Explained.
 
SQL> select * from table(DBMS_XPLAN.DISPLAY);
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 14312189
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |               |   500K|  8812K|   612   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | DIXITTAB      |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |               |   500K|  8812K|   371   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SCOTTISDEAD   |   500K|  8812K|   371   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------
10 rows selected. 

Above in execution plan you’ll see the new operation named “OPTIMIZER STATISTICS GATHERING” at ID 2.
Lets verify if the stats are collected.

SQL> select table_name, last_analyzed from user_tables where table_name = 'DIXITTAB';
 
TABLE_NAME       LAST_ANALYZED
---------------- -------------
DIXITTAB         12-DEC-17

Yup, stats were collected!
Same way stats will be auto collected during the other type of Bulk Load method (INSERT INTO .. SELECT).

There may be times when you want to disable this feature, situations like.
– Long/Huge Insert operations which is taking huge time on STATS GATHERING.
– With an extremely large dataset where you don’t want to collect stats.

In order to achieve that we have the option, with the use of a hint which instructs oracle to not gather table statistics.

SQL> create table dixittab as select /*+NO_GATHER_OPTIMIZER_STATISTICS */* from scottisdead;

Now, something interesting i would like to discuss …..
Is there any other condition when the stats won’t be collected automatically except barring it using NO_GATHER_OPTIMIZER_STATISTICS Hint ?

Lets try to do some conventional bulk loading using INSERT INTO .. SELECT method.
To do some tests – I am intentionally commenting few of the columns both the tables. I’ve commented DATE_VAL column of newly created table TABLE1 and DATE_VALUE of the table selected SAMPLE.


SQL> create table table1 (ident number, date_val date, text_val varchar2(4000));
Table created.


SQL> insert /*+ append */ into table1
(IDENT
--, DATE_VAL
, TEXT_VAL)
SELECT ID
--, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE; 

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------

Plan hash value: 1523099961
-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT   |        |   100K|  2539K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT    | TABLE1 |       |       |            |          |
|   2 |   TABLE ACCESS FULL| SAMPLE |   100K|  2539K|   154   (1)| 00:00:01 |
-----------------------------------------------------------------------------

9 rows selected.

😦 😦 Why, the auto stats gathering behavior not repeated this time ??

This happened because Oracle needs inclusion of all the columns of a table in order to kick in the OPTIMIZER STATISTICS GATHERING operation —> Let me show you what i said in above statement.

SQL> insert /*+ append */ into table1
(IDENT
, DATE_VAL
, TEXT_VAL)
SELECT ID
, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE;  

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------

Plan hash value: 1523099961
-------------------------------------------------------------------------------------------
| Id  | Operation                        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | TABLE1 |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SAMPLE |   100K|  3320K|   154   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------

10 rows selected.

Yes, the the stats were collected this time when we’ve included all the columns of the tables.
I haven’t seen any documentation on this restriction on this new feature of Oracle 12c. Hope Oracle adds this soon this to their documentation 🙂 …..

Hope It Helps!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Optimizer making wrong decisions. Histogram is that you ?

Posted by FatDBA on December 4, 2017

Hey Mates,

Would discuss a scenario that i faced some time back with one of the customer where one of the query (Part of Monthly application maintenance) started taking huge amount of time to complete.
During the analysis I’ve found it doing some extremely expensive Full Table Scans on a fat table when it shouldn’t. Let me elaborate what was the issue.

After reading the execution plan and the extended traces of Optimizer (10053) to understand the decisions made by the CBO.

Traces gave me a hint that the optimizer was able to accurately estimate the cardinality for all of the predicates as long as string is 32 bytes or less. But at one of the operation step in execution plan the estimations were wrong and crooked. I saw the CBO decided to ‘Return each of the row’ irrespective of strings and it was actually happening because the histogram contains only the first 32 characters of the string data.

So the histogram was considering only first 32 characters of the column value and the optimizer this particular value i.e “siebelm01-vip@2002:101:11f::101:1” is happening 5776 times in the table. So it ignored the index and used full table scan. Oracle will only consider the first 32 characters in the column in generating histograms and in such cases what we faced as a aftermath, the optimizer will make wrong decisions.

So as a immediate solution to the problem I’ve deleted the histogram on this specific column and this plan fixed this query plan change issue.

BEGIN
dbms_stats.delete_column_stats(ownname=>'SIEBELADMIN', tabname=>'TX2121', colname=>'C131HJA12', col_stat_type=>'HISTOGRAM');
END;
/

Other fixes:
– In case of longer strings (32 or more) in case of text search use Oracle Text or CONTAINS operator in SQL logic.

Hope That Helps!
Prashant Dixit

Posted in Advanced | Tagged: , , | Leave a Comment »

 
%d bloggers like this: