Tales From A Lazy Fat DBA

Den of an Oracle DB Performance freak & a fan of Cassandra, MySQL and PostgreSQL …

  • Likes

    • 129,380
  • Archives

  • Categories

  • Subscribe

  • Advertisements
  • Prashant Dixit is the FatDBA

  • Follow Tales From A Lazy Fat DBA on WordPress.com
  • My Twitter Feeds

  • Disclaimer!

    FatDBA or Oracle ‘Ant’ is an independent web-blog/site.The experiences, Test cases, views, and opinions expressed in this website are my own and does not reflect the views or opinions of my employer.

    This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.
    Product and company names mentioned in this website may be the trademarks of their respective owners.

Archive for the ‘troubleshooting’ Category

Visualize your database performance statistics using Tableau.

Posted by FatDBA on January 28, 2018

Hi Everyone,

Today’s post is all about producing some interactive data visualizations your system/database statistics using one of the most popular BI tool Tableau.
There are lot’s of other tools which can be used for your data analysis i.e. Excel, Microstrategy Analytics, Domo, QlikView but i always find Tableau easiest way to do such things, specially during all my database audit, 360 health reviews and troubleshooting task-forces.

Tableau offers a suite of tools that include an online, desktop and server version. All of these versions provide a easy-to-use drag and drop interface that can help you quickly turn your data into business insights. Like many other data analytics and visualization tools, Tableau can connect to local or remote data of many different formats.

Okay now after that short introduction of the tool, time to do some tests using the tool.
I have divided process in to three step activity and are discussed below.

First: Data Collection
You can collect your AWR reports in TEXT format and which will be later on parsed to create a CSV file.
There are many tools/scripts available online to generate multiple AWR reports of your database. I see an awesome work was already done by FlashDBA (Download). You can use his script to generate batch AWR reports and yes in TEXT format ONLY!

Example:

[oracle@dixitlab AWR]$ ls -ltr
total 12852
-rw-r--r--. 1 oracle oinstall 225031 Jan 27 21:25 awrrpt_1_445_446.txt
-rw-r--r--. 1 oracle oinstall 255010 Jan 27 21:26 awrrpt_1_446_447.txt
-rw-r--r--. 1 oracle oinstall 253217 Jan 27 21:26 awrrpt_1_447_448.txt
-rw-r--r--. 1 oracle oinstall 253217 Jan 27 21:26 awrrpt_1_448_449.txt
-rw-r--r--. 1 oracle oinstall 244229 Jan 27 21:26 awrrpt_1_449_450.txt
........
.................

Second: Data Fold or Compression.
Now next you need a tool to fold your AWR reports in to a CSV. Here once again FlashDBA did a marvelous job, he wrote one fabulous script to parse your AWR text files and generate a final AWR report.
You can download the script from his Github (Download Link)

As far as the script, you need to pass the format of your files and direct output to a CSV as shown below.

[oracle@dixitlab AWR]$ ./awr-parser.sh awr*.txt > tunedbperftests.csv

Info : Parsing file awrrpt_1_445_446.txt at 2018-01-27 21:32:49
Info : Parsing file awrrpt_1_446_447.txt at 2018-01-27 21:32:53
Info : Parsing file awrrpt_1_447_448.txt at 2018-01-27 21:32:56
Info : Parsing file awrrpt_1_448_449.txt at 2018-01-27 21:33:01
Info : Parsing file awrrpt_1_449_450.txt at 2018-01-27 21:33:07
Info : Parsing file awrrpt_1_450_451.txt at 2018-01-27 21:33:15
Info : Parsing file awrrpt_1_451_452.txt at 2018-01-27 21:33:21
....
........
Info : Parsing file awrrpt_1_499_500.txt at 2018-01-27 21:36:56
Info : No more files found
Info :
Info : ______SUMMARY______
Info : Files found : 55
Info : Files processed : 55
Info : Processing errors : 0
Info :
Info : Completed with 0 errors
[oracle@dixitlab AWR]$

With that you are done with the parsing of reports and have got the final CSV which we will be using to play around within Tableau.
Contents inside the parsed file.

Filename	Database Name	Instance Number	Instance Name	Database Version	Cluster	Hostname	Host OS	Num CPUs	Server Memory (GB)	DB Block Size	Begin Snap	Begin Time	End Snap	End Time	Elapsed Time (mins)	DB Time (mins)	Average Active Sessions	Busy Flag	Logical Reads/sec
awrrpt_1_445_446.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	445	1/26/2018 21:57	446	1/26/2018 23:00	62.24	6.33	0.1	N	2629.5
awrrpt_1_446_447.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	446	1/26/2018 23:00	447	1/27/2018 0:00	60.19	12.18	0.2	N	13973.4
awrrpt_1_447_448.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	447	1/27/2018 0:00	448	1/27/2018 1:00	60.15	13.52	0.2	N	14055.8
awrrpt_1_448_449.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	448	1/27/2018 1:00	449	1/27/2018 2:00	60.15	10.13	0.1	N	11597.4
awrrpt_1_449_450.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	449	1/27/2018 2:00	450	1/27/2018 3:00	60.16	0.03	0	N	65.4
awrrpt_1_450_451.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	450	1/27/2018 3:00	451	1/27/2018 4:00	60.12	0.02	0	N	70.3
awrrpt_1_452_453.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	452	1/27/2018 5:00	453	1/27/2018 6:00	60.13	0.69	0	N	189.8
awrrpt_1_453_454.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	453	1/27/2018 6:00	454	1/27/2018 7:00	60.13	2.88	0	N	2439.1
awrrpt_1_454_455.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	454	1/27/2018 7:00	455	1/27/2018 8:00	60.14	12.57	0.2	N	14027.3
awrrpt_1_455_456.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	455	1/27/2018 8:00	456	1/27/2018 9:00	60.14	10.11	0.1	N	13916.6
awrrpt_1_456_457.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	456	1/27/2018 9:00	457	1/27/2018 10:00	60.14	10.26	0.1	N	13941.5
awrrpt_1_457_458.txt	TUNEDB	PRIMARY	tunedb	EE	1	dixitlab.localdo	Linux x86 64-bit	1	2.77	8K	457	1/27/2018 10:00	458	1/27/2018 10:50	50.25	18.67	0.3	N	14118.9

Third: Data Representation using Tableau.
Okay so here we have the final parsed CSV of all those TEXT AWR reports named ‘tunedbperftests.csv’ and we are ready to play around and learn.

Immediately after launching you will see couple of options available for Data Sources on the left. Choose TEXT as the source and browse the CSV to load.

Next you will see all rows of your data source (tunedbperftests.csv in our case) file.

Next click on Worksheet, your personal area to play.

Tableau then divides the data in two main types: dimensions and measures. Dimensions are usually those fields that cannot be aggregated; measures, as its name suggests, are those fields that can be measured, aggregated, or used for mathematical operations. Dimension fields are usually used for row or column headings; measures are usually used for plotting or giving values to the sizes of markers.

             

Next tab is for Analytical functions, i.e. If you want to add a constant, average, mean, median averages or any reference lines to your graph/chart.

All good now, so we all all set yto plot out performance charts for that DB historical information that we have collected in the form of CSV and loaded to Tableau. Lets, plot for average hard parse per/second, Average DB Time, Average Pareses, Average Transactions happened against Time (Hourly rate of BEGIN TIME as a measure).

So, using above Area Graph you have plotted the average metric usages on the database during a time period.

Next, i will visualize one of the most prominent db wait event observed in the database during the probe (data collection) period ‘DPR’ or ‘Direct Path Reads’ and will plot the Bar graph against the TIME (Hourly BEGIN TIME).

Some more stats visualisations, this time ‘top 5 waits‘ and their hourly frequency.

There are lot’s of other things that you can do with your statistics, i mean you can plot your data in the form of Square, side by side circle, polygon, pie char, polygons, gantt bar, line graph, area graphs, box-and-whisker plots, highlight tables and many more.

So, imagine and you can visualize your database statistics using Tableau!

Questions are welcome. Happy reading! 🙂 🙂

Hope It Helps
Prashant Dixit

Advertisements

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

AAS or Average Number Of Active Sessions – The first thing to look in an AWR & its Uses.

Posted by FatDBA on January 27, 2018

Hi All,

Today’s post is all about answering the question ‘What is the very first thing that one should look out for in an AWR report ?‘. I have got this question so many times in the past about the first statistic i prefer to look at when troubleshooting a performance problem so though of answering this by writing this post with some real-time examples.

And the answer is ‘AAS‘ or ‘Average Number of Active Sessions’ is the first thing that i always look out for while reading AWR reports because It gives you a quick idea about how busy the system is and about the workload happening.

Okay so first lets understand what is an ‘Active Session’ : In simple words It is a session currently spending time in the database (i.e. from v$session where status=’ACTIVE’).
Now, what exactly is AAS – It’s the ratio or rate of change of DB time over clock time. The value of this metric is calculated by using a standard formula of (DB Time/Elapsed Time).

Lets calculate the value for one of the system.

Host Name	        Platform	       CPUs   
dixitLab1.fatdba.com	Linux x86 64-bit	16

AAS In this case : 1024.72/60.04 => 17.067 of average active sessions during the snap interval of an hour.

Let’s further decode the magical Figures of AAS.
We always use CPU Count as a standard for comparing the AAS. Few rule of thumbs while doing this comparison are give below.
– If the AAS is higher than the number of CPU you have then there is a problem. i.e In above example we have an AAS value of 17 and CPU
count 16, hence we could have performance problems and needs investigation.
– If the value is very high than the number of CPUs then there is a choke-point in the database.

You could also use the AAS to plot your graphs, lines and Manhattan’s as one of the axis to compare it with CPU consumption and quickly pin point the pain areas and time slots. Let understand and use it through a scenario.

Assume one fine day you got a call from monitoring team that they have observed huge spikes in system resource usage and many of the other metrics set on the dashboard are in red. And as usual lot’s of fingers and eyes started pointing towards you and the DBA team.

Now you as a DBA quickly generated the AWR for that specific time frame to understand the system behavior and performance and observed a huge workload is happening on the database with AAS of 305 (For a 2 Node RAC database with 128 CPUs collectively) and some huge peaks for Application class (i.e. row lock contentions etc.), User IO classified waits (i.e. DBF Sequential Reads, read by other session etc.) and some Network class waits (i.e SQL*Net message from dblink waits) in your database.

Now you want to understand the trend for wait classes for the database during last few days. Here you can use DBA_HIST_ACTIVE_SESSION_HISTORY view to collect historical statistics for the database which you will use to plot charts using excel, tableau etc.

I have collected similar stats using ASH view and have plotted a graph using few of my data representation tools to understand this transient variation in system performance.

Here you see a sudden spike in DB wait Classes (Specially User IO, Cluster, Application and Network) on March 5th with average number of active sessions (AAS) stacked for both of RAC nodes was close around 305. Which if compare it with number of total CPUs (64+64=128) is extremely high.

After further investigation you understand that it’s application class wait ‘enq: TX – row lock contention‘ which is the primary cause of this high system resources utilization.

Below graph is a representation of AAS Waiting on Application class event ‘enq: TX – row lock contention’ per Instance on the database where we can the same happening. A constant then a sudden raise in row locking contentions.

And you have identified the major sources contributing towards this row locking during the probe period of last 7 days till now. You can do a join on dba_hist_active_sess_history and dba_hist_snapshot to get this historical information — Read my previous article on how to get this past information from AWR repository.

Now when you have narrowed down the problem and have identified the problematic SQLs with their total contribution, you can now start the query optimization/tuning to fix the issue.
There are lot of other data representations you can do by using AAS as one of the graph axis i.e. AAS on CPU and Top Wait Events and will discuss in my further posts.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | 2 Comments »

Parse CPU to Parse Elapsd % – Lets clear the Perplex!

Posted by FatDBA on January 26, 2018

Happy Republic Day!

 

Hi Mates,

I see a lot of confusion, mix-ups and perplexity in between the DBAs on few of the metrics under ‘Instance Efficiency % section’ in AWR reports. Specially for one of the metric “Parse CPU to Parse Elapsd %”. In today’s post i will try to explain the metrics in detail which will help you to understand it in depth and clear the muddiness.

Few of the lines you might have read about this metric i.e.
“we should always look for as low as possible numbers for this metric …”
“Try to achieve the impossible value of zero for this one “

– Even some of the Metalink notes are misleading too.

Scenario:

Below is the snippet from one of my test box.

Note: Just taking a look at the instance efficiency ratios can be very dangerous and i advise to first start with Load Profile, top 5 waits and there on …

Okay, so my definition of this statistic differ from what you judge after reading the name of the metric – This datum signals the delay/wait in parsing of SQL queries during the snap interval.
In our example the value is 1.37% this means that for every CPU second spend parsing we spent about 72.99 (100/1.37) Seconds of clock time. It can happen due to various reasons i.e Latch or any contention between the sessions etc.

The ideal value for this stat should also be 100% like rest of the ratios (Yes, that’s correct!!).

Let’s see how this value was calculated, what all it considers while deducing that final figure of 1.37%.
It takes the “parse time cpu/parse time elapsed * 100” to get the ‘Parse CPU to Parse Elapsed’ figure.

 

Statistic	                              Total	per Second     per Trans
------------------------------------------------------------------------------------------------
parse time cpu                                  398	      0.11	0.01
parse time elapsed                           29,055	      8.07	0.81

 

So, in short each time there is a drop in this metric from 100%, means the database was waiting for something which slowed down the parse times.
If you want to dig in deep then you should trace the session using 10046 tracing with level 8 to see where the other % of parse time is being spent and leaving this for readers to test.

 

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! – Part 2

Posted by FatDBA on January 13, 2018

Repeat: I think I’ve mistakenly deleted the post, so re-posting the same.

Hey Folks,

Back with second edition of my previous post on troubleshooting performance issues using Linux in-built profiler named ‘perf’. Many of the users requested to write about few use cases situations on when and how to use this tool.

So, this post is all about discussing some cases that i have faced while working on few performance tuning projects and few of them are from the tests that I’ve performed on my Lab systems.

Okay coming back to the reason on why we need this, perf i mean ?
The answer is sometimes Oracle wait interface is not enough and you need to dig deeper inside the system to understand the problem. That point you have to use some third party, in-built dedicated tools for performance investigation. There perf might help you to understand what your resource intensive query is doing on OS layers.
So, using the tool you can monitor your process on what’s its doing!

And in case if you are using perf on Virtualised system, you might get error “perf.data file has no samples“.
In order to fix it try with “-e cpu-clock” arguments to collect the sample data and then interpret the file.

Okay so one fine day, you saw a spike in server’s CPU consumption using TOP, Oratop, OEM, Scheduled scripts or by any possible monitoring techniques and you have identified the process and its other attributes.
Using the PID you have reached the SID, SQL_ID, SQL_TEXT and other statistics. So now you might want to analyze and understand the oracle’s execution.

 
Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 Session             :  DIXIT (1:53089)
 SQL ID              :  71aa5ju8pwtf2
 SQL Execution ID    :  16777216
 Execution Started   :  01/09/2018 06:10:32
 First Refresh Time  :  01/09/2018 06:10:32
 Last Refresh Time   :  01/09/2018 06:13:03
 Duration            :  152s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus@dixitlab.localdomain (TNS V1-V3)

Global Stats
========================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
========================================================
|     163 |     150 |      115 |     5M | 648K |   5GB |
========================================================

SQL Plan Monitoring Details (Plan Hash Value=12102956)
===================================================================================================================================================================
| Id   |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |       Activity Detail       | Progress |
|      |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |         (# samples)         |          |
===================================================================================================================================================================
| -> 0 | SELECT STATEMENT     |      |         |       |       150 |     +2 |     1 |        0 |      |       |          |                             |          |
| -> 1 |   SORT AGGREGATE     |      |       1 |       |       150 |     +2 |     1 |        0 |      |       |          |                             |          |
| -> 2 |    TABLE ACCESS FULL | T294 |   96523 | 56397 |       151 |     +1 |     1 |      306 | 648K |   5GB |    99.34 | Cpu (69)                    |      31% |
|      |                      |      |         |       |           |        |       |          |      |       |          | db file sequential read (6) |          |
|      |                      |      |         |       |           |        |       |          |      |       |          | direct path read (126)      |          |
===================================================================================================================================================================

Above results points to high CPU Time (150 Seconds). Now to look more deeper in to things and to understand what exactly the session is doing.

[root@dixitlab ~]#  perf top -e cpu-clock -p 3505 

   PerfTop:     349 irqs/sec  kernel:42.7%  exact:  0.0% [1000Hz cpu-clock],  (target_pid: 3505)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                    DSO
             _______ _____ ___________________________ __________________________________________________

             1981.00 46.2% _raw_spin_unlock_irqrestore [kernel.kallsyms]
              906.00 21.1% _intel_fast_memcmp          /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
              270.00  6.3% kole_simple_string_match    /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 
               96.00  2.2% copy_user_generic_unrolled  [kernel.kallsyms]
               96.00  2.2% kcbgtcr                     /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               54.00  1.3% __intel_new_memset          /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 
               37.00  0.9% __intel_ssse3_rep_memcpy    /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               30.00  0.7% kghfrf                      /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               28.00  0.7% kghalf                      /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               27.00  0.6% kcbldio                     /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               24.00  0.6% kdxbrs1                     /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               24.00  0.6% kspgvc                      /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
               21.00  0.5% kksMapCursor                /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle



Results shows system was mostly busy processing kernel calls (kernel.kallsyms) and its function ‘_raw_spin_unlock_irqrestore‘ with 46% of its time spend using CPU cycles. It’s coming with huge overhead and with large samples and in general irq_restore shows up because re-enabling interrupts is costly, but this is not the real CPU consumption but is how the tool, Interrupt and system works in few of the Virtual systems – I was testing it on my Lab VM.

Here i would like to thank Tanel Poder for reviewing the document and highlighting the issue with the perf top command when executed inside a VM environment.

…… perf top that concluded _raw_spin_unlock_irqrestore taking lots of CPU time. I’ve been through this myself in past – and this is likely not actual CPU usage but rather how perf, OS and interrupts work in some VMs (basically measurement bias/errors by these tools). I assume that you tested in a VM? ……..

Next in the list is function/object ‘_intel_fast_memcmp‘ called by oracle with 21.1% which i believe is for fast memory compilations.

So the conclusion of the analysis:
Most of the CPU by process was spend processing kernel calls and for for fast memory compilations.

Purpose Revisit: This gives you a glimpse of what happens with the process calls and monitors a cpu-bound, database process is pass its time.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

Script to measure the Source & Contribution of any Wait Event within AWR snapshots.

Posted by FatDBA on January 1, 2018

🤘 Happy New Year Everyone! 🤘

So, here i am with the maiden post of this year.
Today i will discuss about a script that might be handy and useful while you are investigating any performance problem with the database.
In fact this is one of my favorite script which i always considers to run at a point when we have identified the leading wait event and the time-frame of the spike or for any transient variation in system behavior. This script helps you to measure the source and contribution or the impact of any specific wait event.

The script is a join between ASH views and historical views (dba_hist_active_sess_history AND dba_hist_snapshot) and provides you stats based on Average Number of Active Sessions (AAS) were waiting on this event during the period and with what total contribution in terms of percentage.

Lets work on a use case.
Suppose you have a system with huge User IOs happening, as conformed by any report or tool (AWR, ASH, Via any Script, OEM etc.)

For example using AWR we found one of the User IO wait class event ‘db file scattered read’ was leading the ‘Top Foreground Wait Events’ charts on a specific time when we have the high User IO load on system.

                                           Total Wait       Avg   % DB Wait
Event                                Waits Time (sec)      Wait   time Class
------------------------------ ----------- ---------- --------- ------ --------
db file scattered read              10,282       82.8    41.50ms  43.9 User I/O

So, now you have identified the leading wait event and the timeframe with maximum intensity or frequency, Lets call the script.

Script Inputs:
Example:

Enter value for event_class:
User I/O
Enter value for event_name: db file scattered read
Enter value for begin_snap: 193
Enter value for end_snap: 200
Enter value for dbid: 2896132084

WITH
events AS (
SELECT /*+ MATERIALIZE NO_MERGE */
       SUBSTR(TRIM(h.sql_id||' '||h.program||' '||
       CASE h.module WHEN h.program THEN NULL ELSE h.module END), 1, 128) source,
       h.dbid,
       COUNT(*) samples
  FROM dba_hist_active_sess_history h,
       dba_hist_snapshot s
 WHERE h.wait_class = TRIM('&Event_Class') AND h.event = TRIM('&Event_Name')
   AND h.snap_id BETWEEN &Begin_Snap AND &End_Snap
   AND h.dbid = &dbid
   AND s.snap_id = h.snap_id
   AND s.dbid = h.dbid
   AND s.instance_number = h.instance_number
 GROUP BY
       h.sql_id,
       h.program,
       h.module,
       h.dbid
 ORDER BY
       3 DESC
),
total AS (
SELECT SUM(samples) samples,
       SUM(CASE WHEN ROWNUM > 15 THEN samples ELSE 0 END) others
  FROM events
)
SELECT e.source,
       e.samples,
       ROUND(100 * e.samples / t.samples, 1) percent,
       (SELECT DBMS_LOB.SUBSTR(s.sql_text, 1000, 1) FROM dba_hist_sqltext s WHERE s.sql_id = SUBSTR(e.source, 1, 13) AND s.dbid = e.dbid AND ROWNUM = 1) sql_text
  FROM events e,
       total t
 WHERE ROWNUM  0.1
 UNION ALL
SELECT 'Others',
       others samples,
       ROUND(100 * others / samples, 1) percent,
       NULL sql_text
  FROM total
 WHERE others > 0
   AND ROUND(100 * others / samples, 1) > 0.1;

Below is the output of the query.

SOURCE                                                                      SAMPLES      PERCENT       SQL_TEXT
------------------------------------------------------------------------  ---------- ---------------  -------------------------------
5av23g8w7f3ka sqlplus@dixitlab.localdomain (TNS V1-V3)                          1          45.23       select * from dixit.bigtab
9a7gbkahasj1a sqlplus@dixitlab.localdomain (TNS V1-V4)                          1           5          select * from dixit.gianttable
OTHERS                                                                          14         49.77

Okay so result shows one of the SQL statement with SQL id 5av23g8w7f3ka was responsible for more than 45% of these waits.
So, now you have the evidence and can start troubleshooting this specific statement to reduce the USER IOs.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | 3 Comments »

Auto Stats Gathering in Oracle 12c & Something Interesting :)

Posted by FatDBA on December 14, 2017

Hi Fellas,
Starting from Oracle 12c there is a new feature added which collects the statistics when you perform the Bulk Loads when using any of the two methods:
– CREATE TABLE AS SELECT (CTAS)
– INSERT INTO … SELECT (Into an empty table using DPR or Direct Path Read).

SQL> explain plan for create table dixittab as select * from scottisdead;
Explained.
 
SQL> select * from table(DBMS_XPLAN.DISPLAY);
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 14312189
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |               |   500K|  8812K|   612   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | DIXITTAB      |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |               |   500K|  8812K|   371   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SCOTTISDEAD   |   500K|  8812K|   371   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------
10 rows selected. 

Above in execution plan you’ll see the new operation named “OPTIMIZER STATISTICS GATHERING” at ID 2.
Lets verify if the stats are collected.

SQL> select table_name, last_analyzed from user_tables where table_name = 'DIXITTAB';
 
TABLE_NAME       LAST_ANALYZED
---------------- -------------
DIXITTAB         12-DEC-17

Yup, stats were collected!
Same way stats will be auto collected during the other type of Bulk Load method (INSERT INTO .. SELECT).

There may be times when you want to disable this feature, situations like.
– Long/Huge Insert operations which is taking huge time on STATS GATHERING.
– With an extremely large dataset where you don’t want to collect stats.

In order to achieve that we have the option, with the use of a hint which instructs oracle to not gather table statistics.

SQL> create table dixittab as select /*+NO_GATHER_OPTIMIZER_STATISTICS */* from scottisdead;

Now, something interesting i would like to discuss …..
Is there any other condition when the stats won’t be collected automatically except barring it using NO_GATHER_OPTIMIZER_STATISTICS Hint ?

Lets try to do some conventional bulk loading using INSERT INTO .. SELECT method.
To do some tests – I am intentionally commenting few of the columns both the tables. I’ve commented DATE_VAL column of newly created table TABLE1 and DATE_VALUE of the table selected SAMPLE.


SQL> create table table1 (ident number, date_val date, text_val varchar2(4000));
Table created.


SQL> insert /*+ append */ into table1
(IDENT
--, DATE_VAL
, TEXT_VAL)
SELECT ID
--, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE; 

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------

Plan hash value: 1523099961
-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT   |        |   100K|  2539K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT    | TABLE1 |       |       |            |          |
|   2 |   TABLE ACCESS FULL| SAMPLE |   100K|  2539K|   154   (1)| 00:00:01 |
-----------------------------------------------------------------------------

9 rows selected.

😦 😦 Why, the auto stats gathering behavior not repeated this time ??

This happened because Oracle needs inclusion of all the columns of a table in order to kick in the OPTIMIZER STATISTICS GATHERING operation —> Let me show you what i said in above statement.

SQL> insert /*+ append */ into table1
(IDENT
, DATE_VAL
, TEXT_VAL)
SELECT ID
, DATE_VALUE
, TEXT_VALUE
FROM SAMPLE;  

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------

Plan hash value: 1523099961
-------------------------------------------------------------------------------------------
| Id  | Operation                        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | TABLE1 |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |        |   100K|  3320K|   154   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | SAMPLE |   100K|  3320K|   154   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------

10 rows selected.

Yes, the the stats were collected this time when we’ve included all the columns of the tables.
I haven’t seen any documentation on this restriction on this new feature of Oracle 12c. Hope Oracle adds this soon this to their documentation 🙂 …..

Hope It Helps!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

Why my ASM Command Line (ASMCMD) is so slow, How to make ASMCMD run faster ?

Posted by FatDBA on November 1, 2017

ASMCMD is a command-line utility that you can use to easily view and manipulate files and directories within Automatic Storage Management (ASM) disk groups. It can list the contents of disk groups, perform searches, create and remove directories and aliases, display space utilization, and more.

But some of the times i have noticed some errors or slowness in command executions with ASMCMD and i believe you guys have too faced the same in the past. And the problem with ASMCMD errors are that they are not much detailed and are obscure which makes the troubleshooting more complicated and direction less.

There are few of the methods or the ways that i follow to handle performance issues with the asmcmd command line are given below.

1. Use ORADEBUG
What happens when you connect with ASMCMD ?
It actually connects with the ASM instance with SYSASM privilege and the same moment a background local process spawns with name BEQ.
Now once you recognize the process using ps -ef commands you can bind it to the ORADEBUG with errostack flag.

2. Truss or STRACE of ASMCMD and its processes.

example:

$ strace -aeft -o /dixit/labtest/asmcmdtrbsst.log asmcmd
ASMCMD>

3. Set the DBI_TRACE for ASMCMD perl tracing
Asmcmd is a wrapper for asmcmdcore script which is a shell script that starts a Perl program. If you are a Perl programmer, you can easily extend this script to add additional commands and security checks. We can use the DBI_TRACE argument to collect more diagnostic information on asm command line.

$ export DBI_TRACE=1
ASMCMD>

Hope That Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

CKPT process blocking table gather stats session intermittently … Why ?

Posted by FatDBA on November 1, 2017

Hi Folks,
Today i would like to share one of the experience that we had while working in one of the production system with a customer with a weird situation where the Gather stats session getting intermittently blocked by CKPT database background process in database and sometimes stays as it is for more than 30 mins.

We were getting the “enq: RO – fast object reuse” wait contention when gathering schema/table statistics in parallel using DBMS_STATS package with DEGREE>1

During the analysis i’ve generated the System State dump and saw a clear blocking situation on object Enq RO-00010059-00000001 .

Snippet from SS Dump.

Resource Holder State
Enq RO-00010059-00000001 14: waiting for ‘rdbms ipc message’
Enq RO-00010059-00000001 89: 89: is waiting for 14: 89:

Workaround for the problem is either of the two solutions
– We can try flush the Buffer Cache.
Though flushing the buffer cache causes dirty blocks to be written to disk and will have some performance impact.
– Setting the parameter “_db_fast_obj_truncate” to FALSE.
This will revert back to 9i way of invalidating buffers in buffer cache.

Hope That Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

CLSRSC-351 & CRS-4000 Errors during execution of root.sh for GRID installation.

Posted by FatDBA on November 9, 2016

While doing GRID installation on one of the machine where there were few previous failed Grid installations happened I’ve got few of the error messages while running the root.sh script during my installation attempt.

This is what I’ve got during the process of executing the root.sh script.

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.1.0/grid_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘oracle’, privgrp ‘oinstall’..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
CRS-4000: Command Pin failed, or completed with errors.
2016/11/07 21:30:06 CLSRSC-161: Pin node using the command ‘/u01/app/oracle/product/12.1.0/grid_1/bin/crsctl pin css -n fatdba’ failed

I tried it executing the same second time: Praying for any magic happens this time 😉
But this time some more errors but have left some clues and actions.

[root@Fatdba /]#
[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.1.0/grid_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
2016/11/07 21:32:21 CLSRSC-351: Improper Oracle Clusterware configuration found on this host

2016/11/07 21:32:21 CLSRSC-353: Run ‘/u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl -deconfig’ to deconfigure existing failed configuration and then re-run ‘root.sh’

The command ‘/u01/app/oracle/product/12.1.0/grid_1/perl/bin/perl -I/u01/app/oracle/product/12.1.0/grid_1/perl/lib -I/u01/app/oracle/product/12.1.0/grid_1/crs/install /u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl ‘ execution failed

Okay, so it is clear that its happened due to some previous mess happened on the system before i got this as a task to install the software.It says that there is an improper clusterware configuration identified on the host and along it says to deinstall using roothas.pl script.

So i tried, but it says the ORS or the oracle restart stack is not active on the node, and it shouldn’t be as all the files were removed manually …
so it failed!

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl -deconfig
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
2016/11/07 21:32:54 CLSRSC-39: Oracle Restart stack is not active on this node
2016/11/07 21:32:54 CLSRSC-312: Failed to verify HA resources
Died at /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsdeconfig.pm line 1358.

Let’s try the last resort, the FORCE option to remove previous bad installs.
And it worked!

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl -deconfig -force
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Delete failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
2016/11/07 21:39:06 CLSRSC-337: Successfully deconfigured Oracle Restart stack

Lets try to run the root.sh script again to complete this new GRID installation.
It worked now!

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.1.0/grid_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘oracle’, privgrp ‘oinstall’..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
CRS-4664: Node fatdba successfully pinned.
2016/11/07 21:39:27 CLSRSC-330: Adding Clusterware entries to file ‘oracle-ohasd.conf’

fatdba 2016/11/07 21:40:01 /u01/app/oracle/product/12.1.0/grid_1/cdata/fatdba/backup_20161107_214001.olr 0
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘fatdba’
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘fatdba’
CRS-2677: Stop of ‘ora.evmd’ on ‘fatdba’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘fatdba’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2016/11/07 21:41:50 CLSRSC-327: Successfully configured Oracle Restart for a standalone server

Hope That Helps!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: | Leave a Comment »

Oracle Tracing Capabilities – Part 1

Posted by FatDBA on August 31, 2016

Hi Mates,
There are so many times when we exhausted all of our known tips, tricks and techniques of troubleshooting and after trying everything under the sun finally raised requests to Oracle Support, then they asked us to perform some unknown, uncanny, alien steps to troubleshoot the problem in hand and asks us to share the trace files (Which we, most of the times don’t understand) which they analyze and makes conclusion based on them.

Today i would like to start a series of posts where i will share steps to troubleshoot some of Oracle’s in-built tools and software. These steps will help to understand – What and how to enable tracing for them.

Trace Data Pump:
Sometimes while importing/exporting a dump it takes a long time to complete and hangs or session just ‘Freeze’ with no reason. Oracle provides an option to trace import export sessions too by using a parameter TRACE, using this option you can decipher sessions, master & slave processes and other control processes.
You can enable tracing by using the seven digit long hexadecimal argument for the trace option. Below is the complete list of tracing levels.

SHDW: To trace the Shadow process (API) (expdp/impdp)
20300 KUPV: To trace Fixed table
40300 ‘div’ To trace Process services
80300 KUPM: To trace Master Control Process (MCP) (DM)
100300 KUPF: To trace File Manager
200300 KUPC: To trace Queue services
400300 KUPW: To trace Worker process(es) (DW)
800300 KUPD: To trace Data Package
1000300 META: To trace Metadata Package
1FF0300 ‘all’ To trace all components (full tracing)

How to use it:

impdp \’/ as sysdba\’ SCHEMAS=DIXIT PARALLEL=8 JOB_NAME=testing_tracedpump TRACE=1FF0300
********
KUPP:10:58:22.050: Input trace/debug flags: 01FF0300 = 11818181
KUPP:10:58:22.050: Current trace/debug flags: 01FF0300 = 11818181
SHDW:10:58:22.050: Current user = SYS
SHDW:10:58:22.050: Current schema = SYS
SHDW:10:58:22.050: Current language = AMERICAN_AMERICA.AL32UTF8
SHDW:10:58:22.052: Current session address = 000000007TYBGGG0
SHDW:10:58:22.052: *** OPEN call ***
SHDW:10:58:22.052: operation = IMPORT
SHDW:10:58:22.052: job_mode = schema
SHDW:10:58:22.052: version =
SHDW:10:58:22.052: compression = 2
KUPV:10:58:22.058: Master Table create statement: CREATE TABLE “SYS”.”testing_tracedpump” (process_order NUMBER, duplicate NUMBER, dump_fileid NUMBER, dump_position NUMBER, dump_length NUMBER, dump_orig_length NUMBER
,,,,,,,,,,,,,,,,,,,,,,,,,

This will create some trace files for master control process (DM), shadow processes, slave/worker processes (DW) and under trace directory.
Next will write about more undocumented and hidden features available for some of the tools we use of daily basis.

Hope It Helps
Prashant Dixit

Posted in troubleshooting | Leave a Comment »

 
%d bloggers like this: