Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! – Part 1

Posted by FatDBA on January 8, 2018

Hi Pals,

Some time back i was part of a performance task-force for one of our customers in Czech where i did a complete full 360 degrees auditing and troubleshooting of their full database stack. But you know most of the times it not just the software or the database but the real problem lies underneath the OS layer.

Okay, so back to the discussion – So during the assignment at one point where i had to collect some statistics for the OS to prove my point that ‘Problem is not with the Database‘ but with something else, i had used few of the tools like DTrace, Linux Trace Toolkit or LTT, Systemtap and one of the profiler which i hardly used before ‘Perf‘.

Yes, so today’s topic of discussion is to give you some basic idea (Will cover the deep analysis using the tool in future) about the ‘Perf’ profiler which helps to dissect and understand hardware events and measures performance.

Okay, let me take a real time scenario where a sudden surge in CPU in a production environment raised questions.
During the initial probing of issues we have found that that about 25% of the CPU is consumed by SQLPLUS command with PID 6720, lets see what and why its doing that ….

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6720 oracle    20   0  113m  14m 9.8m R 25.8  0.5   0:20.32 sqlplus   as sysdba

Lets first Sample on-CPU functions for the process ID on 99 Hertz using below command for PID 6720.

perf record -F 99 -p 6720 

This will produce a raw file named ‘perf.data’ under the same directory from where you called it.
Let it run for some time. You can cancel the execution by pressing CTRL+C.

[root@dixitlab perf]# ls -ltrh
total 12K
-rw-------. 1 root root 12K Jan  8 01:14 perf.data

Now lets see what’s there inside the raw file —- lets Dump raw contents from perf.data as hex for debugging.
For this we will use perf script with flag D, it will automatically considers the perf.data file present under the directory to show charts.


[root@dixitlab perf]# perf script -D

0x110 [0x48]: event: 1
.
. ... raw event: size 72 bytes
.  0000:  01 00 00 00 01 00 48 00 ff ff ff ff 00 00 00 00  ......H.........
.  0010:  00 00 00 00 00 00 00 00 ff ff ff 9f ff ff ff ff  ................
.  0020:  00 00 00 81 ff ff ff ff 5b 6b 65 72 6e 65 6c 2e  ........[kernel.
.  0030:  6b 61 6c 6c 73 79 6d 73 5d 5f 74 65 78 74 00 00  kallsyms]_text..
.  0040:  00 00 00 00 00 00 00 00                          ........
.
0x110 [0x48]: PERF_RECORD_MMAP -1/0: [0(0xffffffff9fffffff) @ 0xffffffff81000000]: [kernel.kallsyms]_text

0x158 [0x78]: event: 1
.
. ... raw event: size 120 bytes
.  0000:  01 00 00 00 01 00 78 00 ff ff ff ff 00 00 00 00  ......x.........
.  0010:  00 00 00 a0 ff ff ff ff ff bf 01 00 00 00 00 00  ................
.  0020:  00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64  ......../lib/mod
.  0030:  75 6c 65 73 2f 32 2e 36 2e 33 39 2d 34 30 30 2e  ules/2.6.39-400.
.  0040:  32 39 37 2e 33 2e 65 6c 36 75 65 6b 2e 78 38 36  297.3.el6uek.x86
.  0050:  5f 36 34 2f 6b 65 72 6e 65 6c 2f 64 72 69 76 65  _64/kernel/drive
.  0060:  72 73 2f 6d 64 2f 64 6d 2d 6d 6f 64 2e 6b 6f 00  rs/md/dm-mod.ko.
.  0070:  00 00 00 00 00 00 00 00                          ........
.


0x2bf8 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f6062943000(0x4c000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libons.so

0x2c60 [0x68]: event: 1
.
. ... raw event: size 104 bytes
.  0000:  01 00 00 00 02 00 68 00 23 11 00 00 23 11 00 00  ......h.#...#...
.  0010:  00 80 ba 62 60 7f 00 00 00 90 4d 00 00 00 00 00  ...b`.....M.....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 6e 6e 7a 31 32 2e 73 6f 00  lib/libnnz12.so.
.  0060:  00 00 00 00 00 00 00 00                          ........
.
0x2c60 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f6062ba8000(0x4d9000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libnnz12.so

0x2cc8 [0x68]: event: 1
.
. ... raw event: size 104 bytes
.  0000:  01 00 00 00 02 00 68 00 23 11 00 00 23 11 00 00  ......h.#...#...
.  0010:  00 10 2f 63 60 7f 00 00 00 c0 22 00 00 00 00 00  ../c`.....".....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 69 70 63 31 2e 73 6f 00 00  lib/libipc1.so..
.  0060:  00 00 00 00 00 00 00 00                          ........
.
0x2cc8 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f60632f1000(0x22c000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libipc1.so

0x2d30 [0x68]: event: 1
.
. ... raw event: size 104 bytes
.  0000:  01 00 00 00 02 00 68 00 23 11 00 00 23 11 00 00  ......h.#...#...
.  0010:  00 50 72 63 60 7f 00 00 00 20 07 00 00 00 00 00  .Prc`.... ......
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 6d 71 6c 31 2e 73 6f 00 00  lib/libmql1.so..
.  0060:  00 00 00 00 00 00 00 00                          ........
.
0x2d30 [0x68]: PERF_RECORD_MMAP 4387/4387: [0x7f6063725000(0x72000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmql1.so

0x2d98 [0x78]: event: 1
.
. ... raw event: size 120 bytes
.  0000:  01 00 00 00 02 00 78 00 23 11 00 00 23 11 00 00  ......x.#...#...
.  0010:  00 c0 99 63 60 7f 00 00 00 90 3a 00 00 00 00 00  ...c`.....:.....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 63 6c 6e 74 73 68 63 6f 72  lib/libclntshcor
.  0060:  65 2e 73 6f 2e 31 32 2e 31 00 00 00 00 00 00 00  e.so.12.1.......
.  0070:  00 00 00 00 00 00 00 00                          ........
.
0x2d98 [0x78]: PERF_RECORD_MMAP 4387/4387: [0x7f606399c000(0x3a9000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libclntshcore.so.12.1

0x2e10 [0x70]: event: 1
.
. ... raw event: size 112 bytes
.  0000:  01 00 00 00 02 00 70 00 23 11 00 00 23 11 00 00  ......p.#...#...
.  0010:  00 a0 f6 63 60 7f 00 00 00 30 65 03 00 00 00 00  ...c`....0e.....
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 63 6c 6e 74 73 68 2e 73 6f  lib/libclntsh.so
.  0060:  2e 31 32 2e 31 00 32 2e 00 00 00 00 00 00 00 00  .12.1.2.........
.
0x2e10 [0x70]: PERF_RECORD_MMAP 4387/4387: [0x7f6063f6a000(0x3653000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libclntsh.so.12.1

0x2e80 [0x70]: event: 1
.
. ... raw event: size 112 bytes
.  0000:  01 00 00 00 02 00 70 00 23 11 00 00 23 11 00 00  ......p.#...#...
.  0010:  00 20 a1 67 60 7f 00 00 00 40 0e 00 00 00 00 00  . .g`....@......
.  0020:  00 00 00 00 00 00 00 00 2f 75 30 31 2f 61 70 70  ......../u01/app
.  0030:  2f 6f 72 61 63 6c 65 2f 70 72 6f 64 75 63 74 2f  /oracle/product/
.  0040:  31 32 2e 32 2e 30 2f 64 62 68 6f 6d 65 5f 31 2f  12.2.0/dbhome_1/
.  0050:  6c 69 62 2f 6c 69 62 73 71 6c 70 6c 75 73 2e 73  lib/libsqlplus.s
.  0060:  6f 00 32 2e 31 00 32 2e 00 00 00 00 00 00 00 00  o.2.1.2.........
.
0x2e80 [0x70]: PERF_RECORD_MMAP 4387/4387: [0x7f6067a12000(0xe4000) @ 0]: /u01/app/oracle/product/12.2.0/dbhome_1/lib/libsqlplus.so

0x2ef0 [0x38]: event: 1
.
. ... raw event: size 56 bytes
.  0000:  01 00 00 00 02 00 38 00 23 11 00 00 23 11 00 00  ......8.#...#...
.  0010:  00 90 8f 09 ff 7f 00 00 00 10 00 00 00 00 00 00  ................
.  0020:  00 00 00 00 00 00 00 00 5b 76 64 73 6f 5d 00 70  ........[vdso].p
.  0030:  00 00 00 00 00 00 00 00                          ........
.

Now lets get back to the point where we had the process (PID: 6720) consuming highest CPU resources.

now profiling CPU’s with sample rate of 99 Hertz/second, -a for all CPU’s, -g for stack traces, sleep of 60 seconds for duration of run.

[root@dixitlab perf]# perf record -F 99 -a -g -- sleep 60 
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.388 MB perf.data (~16948 samples) ]

Now as i discussed above, it will generate a new perf.data under the directory.

[root@dixitlab perf]# ls -ltrh
total 768K
-rw-------. 1 root root 379K Jan  8 01:30 perf.data.old
-rw-------. 1 root root 386K Jan  8 01:31 perf.data

Next, we will interpret results from the newly created ‘perf.data’ file.
I have used ‘perf report -n –stdio’, you can simply use ‘perf report’ too.


[root@dixitlab perf]#
[root@dixitlab perf]# perf report -n --stdio 
# Events: 2K cpu-clock
#
# Overhead  Samples            Command          Shared Object                                    Symbol
# ........ ..........  ...............  .....................  ........................................
#
    50.05%       1001             sshd  [e1000]                [k] e1000_phy_read_status
                       |
                       --- e1000_phy_read_status
                          |
                          |--99.60%-- dev_hard_start_xmit
                          |          sch_direct_xmit
                          |          dev_queue_xmit
                          |          ip_finish_output
                          |          ip_output
                          |          ip_local_out
                          |          ip_queue_xmit
                          |          tcp_transmit_skb
                          |          tcp_write_xmit
                          |          __tcp_push_pending_frames
                          |          tcp_push
                          |          tcp_sendmsg
                          |          inet_sendmsg
                          |          sock_aio_write
                          |          do_sync_write
                          |          vfs_write
                          |          sys_write
                          |          system_call_fastpath
                          |          __write_nocancel
                           --0.40%-- [...]

     8.35%        167          swapper  [kernel.kallsyms]      [k] native_safe_halt
                    |
                    --- native_safe_halt
                        default_idle
                        cpu_idle
                        rest_init
                        start_kernel
                        x86_64_start_reservations
                        x86_64_start_kernel

     4.45%         89          sqlplus  [kernel.kallsyms]      [k] finish_task_switch
                    |
                    --- finish_task_switch
                        __schedule
                       |
                       |--87.64%-- __cond_resched
                       |          _cond_resched
                       |          mutex_lock
                       |          |
                       |          |--91.03%-- process_output_block
                       |          |          n_tty_write
                       |          |          tty_write
                       |          |          vfs_write
                       |          |          sys_write
                       |          |          system_call_fastpath
                       |          |          __write_nocancel
                       |          |
                       |           --8.97%-- process_output
                       |                     n_tty_write
                       |                     tty_write
                       |                     vfs_write
                       |                     sys_write
                       |                     system_call_fastpath
                       |                     __write_nocancel
                       |
                        --12.36%-- schedule
                                  sysret_careful
                                  |
                                  |--81.82%-- __write_nocancel
                                  |
                                   --18.18%-- __write_nocancel


.................
...........................
..................................

Note:
You can further sort results based on cpu, shared objects etc.
perf report –sort=cpu
perf report –sort=dso

Above results reflects the division. While taking a look at the tree, it shows a combination or a club of few OS Layer components (Starting from dev_hard_start_xmit) was sampled of time 50.05*99.60 = 49.84%
with parent sshd command and shared object [e1000] and is a physical read operation.

Rest of all might seem gibberish but needs time and patience to understand. I will try to cover the profiler in more detail with more use case scenarios in future. Till the time you can check the WIKI page of the tool.

Note: In case if you are using perf on VMWare system, you might get error “perf.data file has no samples“.
In order to fix it try using the ‘perf record’ option with -e cpu-clock arguments to collect the sample data and then use ‘perf report’ to interpret the file.

example:
perf record -F 99 -a -g -e cpu-clock — sleep 20


Hope It Helps
Prashant Dixit

Advertisement

3 Responses to “How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! – Part 1”

  1. […] « How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! &#8… […]

  2. […] « How to troubleshoot, understand HW events & measure performance using PERF (Linux Profiler)! &#8… […]

  3. […] with second edition of my previous post on troubleshooting performance issues using Linux in-built profiler named ‘perf’. Many […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: