How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! – Part 2
Posted by FatDBA on January 13, 2018
Repeat: I think I’ve mistakenly deleted the post, so re-posting the same.
Hey Folks,
Back with second edition of my previous post on troubleshooting performance issues using Linux in-built profiler named ‘perf’. Many of the users requested to write about few use cases situations on when and how to use this tool.
So, this post is all about discussing some cases that i have faced while working on few performance tuning projects and few of them are from the tests that I’ve performed on my Lab systems.
Okay coming back to the reason on why we need this, perf i mean ?
The answer is sometimes Oracle wait interface is not enough and you need to dig deeper inside the system to understand the problem. That point you have to use some third party, in-built dedicated tools for performance investigation. There perf might help you to understand what your resource intensive query is doing on OS layers.
So, using the tool you can monitor your process on what’s its doing!
And in case if you are using perf on Virtualised system, you might get error “perf.data file has no samples“.
In order to fix it try with “-e cpu-clock” arguments to collect the sample data and then interpret the file.
Okay so one fine day, you saw a spike in server’s CPU consumption using TOP, Oratop, OEM, Scheduled scripts or by any possible monitoring techniques and you have identified the process and its other attributes.
Using the PID you have reached the SID, SQL_ID, SQL_TEXT and other statistics. So now you might want to analyze and understand the oracle’s execution.
Global Information
------------------------------
Status : EXECUTING
Instance ID : 1
Session : DIXIT (1:53089)
SQL ID : 71aa5ju8pwtf2
SQL Execution ID : 16777216
Execution Started : 01/09/2018 06:10:32
First Refresh Time : 01/09/2018 06:10:32
Last Refresh Time : 01/09/2018 06:13:03
Duration : 152s
Module/Action : SQL*Plus/-
Service : SYS$USERS
Program : sqlplus@dixitlab.localdomain (TNS V1-V3)
Global Stats
========================================================
| Elapsed | Cpu | IO | Buffer | Read | Read |
| Time(s) | Time(s) | Waits(s) | Gets | Reqs | Bytes |
========================================================
| 163 | 150 | 115 | 5M | 648K | 5GB |
========================================================
SQL Plan Monitoring Details (Plan Hash Value=12102956)
===================================================================================================================================================================
| Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Activity | Activity Detail | Progress |
| | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | (%) | (# samples) | |
===================================================================================================================================================================
| -> 0 | SELECT STATEMENT | | | | 150 | +2 | 1 | 0 | | | | | |
| -> 1 | SORT AGGREGATE | | 1 | | 150 | +2 | 1 | 0 | | | | | |
| -> 2 | TABLE ACCESS FULL | T294 | 96523 | 56397 | 151 | +1 | 1 | 306 | 648K | 5GB | 99.34 | Cpu (69) | 31% |
| | | | | | | | | | | | | db file sequential read (6) | |
| | | | | | | | | | | | | direct path read (126) | |
===================================================================================================================================================================
Above results points to high CPU Time (150 Seconds). Now to look more deeper in to things and to understand what exactly the session is doing.
[root@dixitlab ~]# perf top -e cpu-clock -p 3505 PerfTop: 349 irqs/sec kernel:42.7% exact: 0.0% [1000Hz cpu-clock], (target_pid: 3505) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ __________________________________________________ 1981.00 46.2% _raw_spin_unlock_irqrestore [kernel.kallsyms] 906.00 21.1% _intel_fast_memcmp /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 270.00 6.3% kole_simple_string_match /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 96.00 2.2% copy_user_generic_unrolled [kernel.kallsyms] 96.00 2.2% kcbgtcr /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 54.00 1.3% __intel_new_memset /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 37.00 0.9% __intel_ssse3_rep_memcpy /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 30.00 0.7% kghfrf /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 28.00 0.7% kghalf /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 27.00 0.6% kcbldio /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 24.00 0.6% kdxbrs1 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 24.00 0.6% kspgvc /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle 21.00 0.5% kksMapCursor /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
Results shows system was mostly busy processing kernel calls (kernel.kallsyms) and its function ‘_raw_spin_unlock_irqrestore‘ with 46% of its time spend using CPU cycles. It’s coming with huge overhead and with large samples and in general irq_restore shows up because re-enabling interrupts is costly, but this is not the real CPU consumption but is how the tool, Interrupt and system works in few of the Virtual systems – I was testing it on my Lab VM.
Here i would like to thank Tanel Poder for reviewing the document and highlighting the issue with the perf top command when executed inside a VM environment.
“…… perf top that concluded _raw_spin_unlock_irqrestore taking lots of CPU time. I’ve been through this myself in past – and this is likely not actual CPU usage but rather how perf, OS and interrupts work in some VMs (basically measurement bias/errors by these tools). I assume that you tested in a VM? ……..“
Next in the list is function/object ‘_intel_fast_memcmp‘ called by oracle with 21.1% which i believe is for fast memory compilations.
So the conclusion of the analysis:
Most of the CPU by process was spend processing kernel calls and for for fast memory compilations.
Purpose Revisit: This gives you a glimpse of what happens with the process calls and monitors a cpu-bound, database process is pass its time.
Hope It Helps
Prashant Dixit
Leave a Reply