Archive for the ‘Advanced’ Category

Advance/Troubleshooting/Error-Bug Fixing

Evaluating Storage Performance!

Posted by FatDBA on March 10, 2015

Being a DBA means you need to posses outstanding knowledge of DBA Subjects as obvious, but nowadays knowledge on storage/network/OS is a big surplus. This helps a modern day DBA to troubleshoot/rectify and helps to avoid million of hours wasted to identify DB issues when in fact there isn’t any.

This time i tried to discuss steps which one can take when there are waits likes – log file sync, db file async io submit, log file parallel write, control file parallel write and many of the parallel and sequential write observed in system.

Quick Fast Disk Test Results (DD Command):
———————————————
[oracle@dixitdb111 datafiles]$ time sh -c “dd if=/dev/zero of=dd-test-file bs=20k count=1000000 && sync”
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 26.4959 seconds, 309 MB/s

real 0m44.318s
user 0m0.278s
sys 0m19.410s
You have new mail in /var/spool/mail/oracle

[oracle@dixitdb111 datafiles]$ ls -ltrh
-rw-r–r– 1 oracle oinstall 7.7G Jan 30 11:59 dd-test-file

ORION
————-
ORION (ORacle IO Numbers) imitates the type of I/O performed by Oracle databases, which makes possible for you to measure I/O performance for storage arrangements without actually installing Oracle. This is now included in the “$ORACLE_HOME/bin” directory of Database/Grid installations.

There are many of the options available for the orion to run.
e.g. oltp, olap and many more.
Below pasted is a beautiful explanation by Alex Gorbachev (ACED, IOUG, OAKTABLE Member & Renowned Blogger)
Link: http://www.uyoug.org.uy/eventos2013/OTNLAD2013-Benchmarking-Oracle-IO-Performance-with-ORION-by-Alex-Gorbachev.pdf

[oracle@dixitdb111 bin]$ ./orion -run oltp
ORION: ORacle IO Numbers — Version 11.2.0.3.0
orion_20150203_0800
Calibration will take approximately 2 minutes.
Using a large value for -cache_size may take longer.

This will result in few .csv and text files with IO results with some beautiful charts/graphs and tabular records for the runtime.

Reference:
————–
_hist.csv
Histogram of I/O latencies.
_iops.csv
Performance results of
small I/Os in IOPS.
_lat.csv
Latency of small I/Os in microseconds.
_mbps.csv
Performance results of
large I/Os in MBPS.
_summary.txt
Summary of the input parameters, along with the minimum small I/O latency (in secs),
the maximum MBPS, and the maximum IOPS observed.
_trace.txt
Extended, unprocessed output

I/O calibration is one of those magical option. This feature enables user to assess the performance of the storage subsystem, and determine whether I/O performance problems are caused by the database or the storage subsystem. Unlike other external I/O calibration tools that issue I/Os sequentially, the I/O calibration feature of Oracle Database issues I/Os randomly using Oracle datafiles to access the storage media, producing results that more closely match the actual performance of the database.

SELECT d.name,
i.asynch_io
FROM v$datafile d,
v$iostat_file i
WHERE d.file# = i.file_no
AND i.filetype_name = ‘Data File’;

NAME —————̵ /dbmnt1/dixitdb/datafiles/system01.dbf /dbmnt2/dixitdb/datafiles/undotbs1_01.dbf /dbmnt1/dixitdb/datafiles/sysaux01.dbf /dbmnt1/dixitdb/datafiles/users01.dbf /dbmnt1/dixitdb/datafiles/dbsystem01.dbf /dbmnt1/dixitdb/datafiles/dbsystem02.dbf /dbmnt1/dixitdb/datafiles/dbsystem03.dbf /dbmnt1/dixitdb/datafiles/dbsystem04.dbf /dbmnt1/dixitdb/datafiles/dbsystem05.dbf /dbmnt1/dixitdb/datafiles/dbsystem06.dbf /dbmnt1/dixitdb/datafiles/dbsystem07.dbf /dbmnt1/dixitdb/datafiles/dbsystem08.dbf /dbmnt1/dixitdb/datafiles/dbsystem09.dbf /dbmnt1/dixitdb/datafiles/dbsystem10.dbf /dbmnt2/dixitdb/datafiles/unicode1tbs01.dbf /dbmnt2/dixitdb/datafiles/unicode2atbs01.dbf /dbmnt2/dixitdb/datafiles/r11testtbs.dbf /dbmnt1/dixitdb/datafiles/dbsystem11.dbf /dbmnt1/dixitdb/datafiles/dbsystem12.dbf /dbmnt2/dixitdb/datafiles/artest.dbf /dbmnt1/dixitdb/datafiles/dbsystem13.dbf /dbmnt1/dixitdb/datafiles/dbsystem14.dbf /dbmnt1/dixitdb/datafiles/dbsystem15.dbf /dbmnt3/dixitdb/datafiles/dbsystem16.dbf ASYNCH_IO
2;———————————————– ———
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON
ASYNC_ON

* below query is resource intensive. Load/CPU spikes are expected during the run.

SET SERVEROUTPUT ON
DECLARE
l_latency PLS_INTEGER;
l_iops PLS_INTEGER;
l_mbps PLS_INTEGER;
BEGIN
DBMS_RESOURCE_MANAGER.calibrate_io (num_physical_disks => 1,
max_latency => 20,
max_iops => l_iops,
max_mbps => l_mbps,
actual_latency => l_latency);

DBMS_OUTPUT.put_line(‘Max IOPS = ‘ || l_iops);
DBMS_OUTPUT.put_line(‘Max MBPS = ‘ || l_mbps);
DBMS_OUTPUT.put_line(‘Latency = ‘ || l_latency);
END;
/

Max IOPS = 610
Max MBPS = 67
Latency = 19

==================
Calibration runs can be monitored using the V$IO_CALIBRATION_STATUS view.

SET LINESIZE 100
COLUMN start_time FORMAT A20
COLUMN end_time FORMAT A20

SELECT TO_CHAR(start_time, ‘DD-MON-YYY HH24:MI:SS’) AS start_time,
TO_CHAR(end_time, ‘DD-MON-YYY HH24:MI:SS’) AS end_time,
max_iops,
max_mbps,
max_pmbps,
latency,
num_physical_disks AS disks
FROM dba_rsrc_io_calibrate;

START_TIME END_TIME MAX_IOPS MAX_MBPS MAX_PMBPS LATENCY DISKS
——————– ——————– ———- ———- ———- ———- ———-
30-JAN-015 09:49:10 30-JAN-015 09:53:14 610 67 27 19 1

Thanks
Prashant Dixit

Posted in Advanced | Tagged: performance, troubleshooting | Leave a Comment »

File is DELETED but didn’t reclaim space in filesystem – LINUX

Posted by FatDBA on March 9, 2015

There are times when even after you deleted a file in Linux but that didn’t reclaim space in filesystem.
Below command shows that there are few of the files which are deleted from the system but are still active and still taking the space on the disk.

[oracle@dixitdb053 /proc]# /usr/sbin/lsof |grep -i deleted
oracle 11441 oracle 19w REG 253,2 1810 3129788 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_diag_11441.trc (deleted)
oracle 11441 oracle 20w REG 253,2 126 3129795 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_diag_11441.trm (deleted)
oracle 30157 oracle 19w REG 253,2 14182 3129684 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_vkrm_30157.trc (deleted)
dd 32592 oracle 1w REG 253,2 24238593024 3129587 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_31778.trm (deleted)

Here the file is available under mount point — /opt/oracle and is still 86% full.

[oracle@dixitdb053 /proc/11441/fd]# df -hk
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
31614888 6127104 23856156 21% /
/dev/mapper/VolGroup02-LogVol05
36124288 29165480 5123800 86% /opt/oracle
/dev/sda1 101086 15711 80156 17% /boot
tmpfs 1956756 0 1956756 0% /dev/shm
113.11.88.199:/vol/dixitdb053_BO
110100480 58354656 51745824 54% /db01
113.11.88.199:/vol/vol_dixitdb053_backup
104857600 55011648 49845952 53% /backup

Well, the space will be automatically reclaimed depending on the size of the file deleted.
Below are the steps in order to remove it instantly from the file system

Steps:
=============
1. Go to /proc directory which contains PID of all active processes and files.
2. login to respective PID

[oracle@dixitdb053 /proc/11441/fd]# cd /proc/32592
[oracle@dixitdb053 /proc/32592]# cd fd
[oracle@dixitdb053 /proc/32592/fd]# pwd
/proc/32592/fd

3. Check if there is a link and in the end it says deleted.

[oracle@dixitdb053 /proc/32592/fd]# ls -ltrh
total 0
lrwx—— 1 oracle oinstall 64 Mar 7 07:57 2 -> /dev/pts/1
l-wx—— 1 oracle oinstall 64 Mar 7 08:25 1 -> /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_31778.trm (deleted)
lr-x—— 1 oracle oinstall 64 Mar 7 08:25 0 -> /dev/zero

4. Type > in the number that was shown in that line and it will release the space instantly.

[oracle@dixitdb053 /proc/32592/fd]# > 1

[oracle@dixitdb053 /proc/32592/fd]# ls -ltrh
total 0
lrwx—— 1 oracle oinstall 64 Mar 7 07:57 2 -> /dev/pts/1
l-wx—— 1 oracle oinstall 64 Mar 7 08:25 1 -> /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_31778.trm (deleted)
lr-x—— 1 oracle oinstall 64 Mar 7 08:25 0 -> /dev/zero

5. verify it once again by listing the open files in the filesystem to see if there is still an open file with the deleted status.

Now the space is reclaimed and has been brought down to 0.

[oracle@dixitdb053 /proc/32592/fd]# /usr/sbin/lsof |grep -i deleted
oracle 11441 oracle 19w REG 253,2 0 3129788 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_diag_11441.trc (deleted)
oracle 11441 oracle 20w REG 253,2 0 3129795 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_diag_11441.trm (deleted)
oracle 30157 oracle 19w REG 253,2 14792 3129684 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_vkrm_30157.trc (deleted)
dd 32592 oracle 1w REG 253,2 0 3129587 /opt/oracle/diag/rdbms/dixitdb/dixitdb/trace/dixitdb_ora_31778.trm (deleted)

And the mount point /opt/oracle usage went down too and reached to 17%.

[oracle@dixitdb053 /proc/32592/fd]# df -kh
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
31G 5.9G 23G 21% /
/dev/mapper/VolGroup02-LogVol05
35G 5.3G 28G 17% /opt/oracle
/dev/sda1 99M 16M 79M 17% /boot
tmpfs 1.9G 0 1.9G 0% /dev/shm
113.11.88.199:/vol/dixitdb053_BO
105G 56G 50G 54% /db01
113.11.88.199:/vol/vol_dixitdb053_backup
100G 53G 48G 53% /backup

Thanks
Prashant Dixit

Posted in Advanced | Tagged: linux, trouble | Leave a Comment »

Some Serious Fun with the Database!!

Posted by FatDBA on March 4, 2015

Reading Alert log using SQL Terminal:
select message_text from X$DBGALERTEXT where rownum <= 10;

MESSAGE_TEXT
—————————————————————————————————-
Adjusting the default value of parameter parallel_max_servers
from 320 to 120 due to the value of parameter processes (150)
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 64 KB

Total Shared Global Region in Large Pages = 0 KB (0%)

Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB

How to print a message to the alert log:
SQL> exec dbms_system.ksdwrt(1, ‘This message goes to trace file in the udump location’);
PL/SQL procedure successfully completed.

SQL> exec dbms_system.ksdwrt(2, ‘This message goes to the alert log’);
PL/SQL procedure successfully completed.

SQL> exec dbms_system.ksdwrt(3, ‘This message goes to the alert log and trace file in the udump location’);
PL/SQL procedure successfully completed.

SQL> exec dbms_system.ksdwrt(2, ‘ORA-0000118111: Testing monitoring tool’);
PL/SQL procedure successfully completed.

Thanks
Prashant Dixit

Posted in Advanced | Leave a Comment »

Large Pages & use_large_pages parameter in Oracle — BMC Patrol Alerts global parameter ”MEMUsedMemPerc” triggered on ”MEMORY.MEMORY”

Posted by FatDBA on December 31, 2014

All of sudden one morning we started constant alerts from one of our production system used for SAP and other BI/BO applications which reads about Memory Used Percentage threshold breached. Being one of the very rarely used and least bust production system out of all, this was never expected from the server. I have previously written about the benefits and usage of large pages in Linux.

Summary: prashantdb053 global parameter ”MEMUsedMemPerc” triggered on ”MEMORY.MEMORY”. 95 <= 95.03 <= 100
Notes: prashantdb053 global parameter ”MEMUsedMemPerc” triggered on ”MEMORY.MEMORY”. 95 <= 95.03 <= 100 Object=MEMORY Object Class=MEMORY Parameter=MEMUsedMemPerc
BMC Impact Manager Initiated Incident

The database I have running doesn’t use large pages, as shown in the alert.log:
****************** Large Pages Information *****************
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 16 MB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB

RECOMMENDATION:
Total Shared Global Region size is 2514 MB. For optimal performance,
prior to the next instance restart increase the number
of unused Large Pages by atleast 1257 2048 KB Large Pages (2514 MB)
system wide to get 100% of the Shared
Global Region allocated with Large pages
***********************************************************

Load statistics on the server at the time when we got the error message.

top – 07:26:24 up 30 days, 34 min, 3 users, load average: 0.07, 0.05, 0.01
Tasks: 183 total,   1 running, 182 sleeping,   0 stopped,   0 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem:   3913512k total, 3886112k used,    27400k free,   149432k buffers
Swap: 4947960k total,   284392k used, 4663568k free, 3036064k cached

Memory Statistics on the server.

[oracle@prashantdb053 /backup/dixit/scripts]# free -m
total       used       free     shared    buffers     cached
Mem:          3821       3795         26          0        146       2965
-/+ buffers/cache:        684       3137
Swap:         4831        277       4554

[oracle@prashantdb053 /backup/dixit/scripts]# cat /proc/meminfo
MemTotal:      3913512 kB
MemFree:         26504 kB
Buffers:        149548 kB
Cached:        3036664 kB
SwapCached:       4620 kB
Active:        2250272 kB
Inactive:      1377312 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      3913512 kB
LowFree:         26504 kB
SwapTotal:     4947960 kB
SwapFree:      4663568 kB
Dirty:             132 kB
Writeback:           0 kB
AnonPages:      437440 kB
Mapped:        1668000 kB
Slab:            86592 kB
PageTables:     143008 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6904716 kB
Committed_AS: 4125228 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    265480 kB
VmallocChunk: 34359472519 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Below stats shows that Large Page are not utlized or used.
[oracle@prashantdb053 /backup/dixit/scripts]# grep Huge /proc/meminfo
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

[oracle@prashantdb053 /backup/dixit/scripts]# vmstat -s
3913512 total memory
3884496 used memory
2248032 active memory
1376892 inactive memory
29016 free memory
149560 buffer memory
3036676 swap cache
4947960 total swap
284392 used swap
4663568 free swap
4399893 non-nice user cpu ticks
123135 nice user cpu ticks
1173193 system cpu ticks
497863320 idle cpu ticks
9827196 IO-wait cpu ticks
156731 IRQ cpu ticks
337956 softirq cpu ticks
0 stolen cpu ticks
14560315 pages paged in
286499680 pages paged out
1086328 pages swapped in
1030486 pages swapped out
2808993403 interrupts
2839798366 CPU context switches
1417413111 boot time
333424 forks

As you can see the parameter is static and requires an instance restart, so this is what I did next. Here is an interesting side effect of setting the parameter to “auto”: it doesn’t have an effect if you didn’t prepare the system for use of large pages in /etc/security/limits.conf.

07:33:48 SQL> select value,isdefault from V$PARAMETER_VALID_VALUES where name = ‘use_large_pages’;

VALUE                ISDEFAULT
——————– —————
TRUE                 TRUE
AUTO                 FALSE
ONLY                 FALSE
FALSE                FALSE

07:44:05 SQL> show parameter use_large_pages

NAME TYPE VALUE
———————————— ———– ——————————
use_large_pages string TRUE

So let’s change that, but dynamically and not manually.
07:57:32 SQL> alter system set use_large_pages=auto scope=spfile;
System altered.

HugePagesTotal is still 0, which means system is still not using Large Pages, To allow oracle to lock memory you need to grant it the privilege. I had to edit /etc/security/limits.conf and set the memlock parameters.

[oracle@prashantdb053 /backup/dixit/scripts]# grep Huge /proc/meminfo
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Being a static parameter we need to bounce the database to make changes persistent.
07:58:00 SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

07:58:18 SQL> startup
ORACLE instance started.

07:58:54 SQL> show parameter use_large_pages

NAME TYPE VALUE
———————————— ———– ——————————
use_large_pages string AUTO

Entries from alert log which shows that it has been changed to ‘AUTO’ values and started using large pages.

****************** Large Pages Information *****************
Parameter use_large_pages = AUTO
Total Shared Global Region in Large Pages = 2048 KB (0%)
Large Pages used by this instance: 1 (2048 KB)
Large Pages unused system wide = 3 (6144 KB) (alloc incr 16 MB)
Large Pages configured system wide = 4 (8192 KB)
Large Page size = 2048 KB
Time taken to allocate Large Pages = 0.033721 sec

RECOMMENDATION:
Total Shared Global Region size is 2514 MB. For optimal performance,
prior to the next instance restart increase the number
of unused Large Pages by atleast 1253 2048 KB Large Pages (2506 MB)
system wide to get 100% of the Shared
Global Region allocated with Large pages
***********************************************************

Finally after the change of parameter and hard coding limits system finally started using large pages and alerts coming from the Patrol Agents.

[oracle@prashantdb053 /opt/oracle/diag/rdbms/prashantdb/prashantdb/trace]# grep Huge /proc/meminfo
HugePages_Total:     4
HugePages_Free:      3
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Thanks
Prashant Dixit

Posted in Advanced | Tagged: Errors, linux | Leave a Comment »

Want to monitor your Exa Health — Answer is ‘Exachk’ …

Posted by FatDBA on December 30, 2014

Recently while working on one of the test exadata machine came across one of the very useful tool to monitoring exadata system health. Being one of the most popular topics that customers talk about is the monitoring of your exadata health.

The best tool for this is the Exachk (see MOS Doc ID 1070954.1)
This document contains the current Exachk release.

The recommendation or advices for Exachk is to:

1) Run the exachk (at a minimum) quarterly, and after any changes are made to the configuration
2) ALWAYS run the current exachk. This script is periodically updated/improved upon so it is very important to be current
3) Keep track of any failures to ensure that you can identify any new items that appear in the report
4) A score of 80 or above is a good score for production. It is very rare to have a score that is 99+.

There are also a great whitepaper released in September 2013. This white paper can be referred and downloaded from here.

Click to access exadata-health-resource-usage-2021227.pdf

Thanks
Prashant Dixit

Posted in Advanced | Tagged: exadata, performance | 2 Comments »

Active Session History (ASH) performed an emergency flush. ASH Undersized ?

Posted by FatDBA on August 13, 2014

One day suddenly, I got a warning on my Warehouse system that,
Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized.

If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 16777216 bytes. Both ASH size and the total number of emergency flushes since instance start-up can be monitored by running the following query:

SQL> select total_size,awr_flush_emergency_count from v$ash_info;

TOTAL_SIZE AWR_FLUSH_EMERGENCY_COUNT
———- ————————-
33554432 8

Reasons:
Typically some activity on system causes more active sessions, therefore filling the ASH buffers faster than usual causing this message to be displayed. This indicates the buffers might need to be increased to support peak activity on the database.
It is not a problem per session, just indicates the buffers might need to be increased to support peak activity on the database.

Fix:
Increase the size of ASH (Consider increasing around 50%) — (current_value+50% of current_value);

SQL> alter system set “_ASH_SIZE”=25165824 scope=both;
System altered.

Posted in Advanced | Tagged: Errors | Leave a Comment »

Where are my DBA_BLOCKERS and DBA_WAITERS ??

Posted by FatDBA on July 12, 2014

While doing performance checks and tuning for one of the database found that two of the very well known dynamic views are not working and throwing “Does Not Exist”. We are using – Oracle Database 11g Release 11.2.0.3.0 – 64bit Production.
I was trying to call the views using SYS user.

Purpose of using discussed two dynamic views are:
DBA_BLOCKERS displays a session if it is not waiting for a locked object but is holding a lock on an object for which another session is waiting.
DBA_WAITERS shows all the sessions that are waiting for a lock, but do not hold locks for which another session is waiting.

SQL> conn / as sysdba
Connected.

SQL> desc dba_blockers;
ERROR:
ORA-04043: object dba_blockers does not exist

SQL> desc dba_waiters
ERROR:
ORA-04043: object dba_waiters does not exist

Solution:
——————-
Call one script catblock.sql from ORACLE_HOME/rdbms/admin folder which creates required views, synonyms and provides required grants to the DBA privileged user and it will be back.

SQL> @?/rdbms/admin/catblock.sql

View created.

Synonym created.

Grant succeeded.

—–

SQL>
SQL>
SQL>
SQL> desc dba_blockers
Name Null? Type
—————————————- ——– —————————
HOLDING_SESSION NUMBER

SQL> desc dba * waiters;
Usage: DESCRIBE [schema.]object[@db_link]
SQL> desc dba_waiters
Name                                     Null?    Type
—————————————- ——– —————————
WAITING_SESSION                                   NUMBER
HOLDING_SESSION                                   NUMBER
LOCK_TYPE                                         VARCHAR2(26)
MODE_HELD                                         VARCHAR2(40)
MODE_REQUESTED                                    VARCHAR2(40)
LOCK_ID1                                          NUMBER
LOCK_ID2                                          NUMBER

Posted in Advanced | Tagged: performance | Leave a Comment »

Worst Execution Plan ever … Elapsed time of 999:59:59 :(

Posted by FatDBA on July 12, 2014

few days back i have came across with one of the worst execution plan that i have ever seen in whole life. Query talking 999 Hours – 59 Minutes and 59 Seconds to complete.

Look at this shock below.
1. Elapsed Time to complete PARENT Operation ‘UPDATE’ – 999:59:59
2. With only 167k rows to be processed.
3. Cost is coming huge too – 32G

ORIGINALPLAN:
——————
Plan hash value: 2191276670

-------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 167K| 7685K| 32G (1)|999:59:59 | | 1 | UPDATE | XXXXX | | | | | | 2 | NESTED LOOPS OUTER | | 167K| 7685K| 16333 (1)| 00:03:49 | | 3 | TABLE ACCESS FULL | XXXXX | 167K| 5069K| 16317 (1)| 00:03:49 | |* 4 | INDEX UNIQUE SCAN | IBXXXX | 1 | 16 | 0 (0)| 00:00:01 | |* 5 | TABLE ACCESS BY INDEX ROWID| TXXXX | 1 | 43 | 194K (1)| 00:45:17 | |* 6 | INDEX SKIP SCAN | IXXXX_490008000_1 | 191K| | 2261 (1)| 00:00:32 | --------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
—————————————————

4 – access(“H”.”C1″=”BXXXX”.”C1″(+))
5 – filter(“XXXX01238500″=:B1 AND “XXXX00426800″=’Incident Resolution Time’)
6 – access(“XXXX00365100″=4)
filter(“XXXX00365100″=4)

I tried below mentioned steps to eradicate the problem.
1. Tried to restructure SQL statement but was of no use.

2. Made a composite index on the impacted table XXXXX, But with that i has only reduced to 2 hours and 40 minutes. Although we have considered the execution time but team still want to reduce the CHANGE WINDOW of as low as possible because in total it would take 5 hours 20 minutes during the run of 2 queries.

2. With a thought in our mind and after reading few of the performance documents we tried to apply some fixes in case if we have any bug during the run of queries.
After applying all possible fixes problems endures.

SQL> select optimizer_feature_enable,count(*) from v$system_fix_control group by optimizer_feature_enable order by 1 asc;

OPTIMIZER_FEATURE_ENABLE COUNT(*) ------------------------- ---------- 10.1.0 3 10.1.0.3 1 10.1.0.5 2 10.2.0.1 7 10.2.0.2 12 10.2.0.3 11 10.2.0.4 73 10.2.0.5 112 11.1.0.6 40 11.1.0.7 19 11.2.0.1 67


OPTIMIZER_FEATURE_ENABLE    COUNT(*)

------------------------- ----------

11.2.0.2                         106

11.2.0.3                          75

8.0.0                             63

8.1.6                              1

8.1.7                              2

9.2.0                              8

9.2.0.8                            3

47

19 rows selected.
I tried some blind-applies to get the exact bug and have ended applying few of the bug fixes.
Note: I advocate always execute the same on your test servers first or perform impact analysis before using hidden parameters.

SQL> select * from v$system_fix_control where optimizer_feature_enable=’11.2.0.3′ and description like ‘%skip%’;

BUGNO VALUE SQL_FEATURE DESCRIPTION OPTIMIZER_FEATURE_ENABLE EVENT IS_DEFAULT ---------- ---------- ---------------------------------------- ---------------------------------------------------------------- ------------------------- ---------- ---------- 10080014 1 QKSFM_CBO_10080014 allow skip scan costing for PRIOR join in CONNECT BY query 11.2.0.3 0 1 9569678 1 QKSFM_CBO_9569678 skip identity operator when generating NULL IS NOT NULL 11.2.0.3 0 1 11744086 1 QKSFM_PLACE_GROUP_BY_11744086 skip SCG for query block with no aggregating columns 11.2.0.3 0 1 9227576 1 QKSFM_CBO_9227576 allow skip scan costing for semi/anti-join 11.2.0.3 0 1
SQL> alter session set “_fix_control”=’9814067:OFF’;
Session altered.

SQL> alter session set “_fix_control”=’10038373:OFF’;
Session altered.

SQL> alter session set “_fix_control”=’8893626:OFF’;
Session altered.

Ended up with no impact on the queries.

3. Used one SQL HINT — optimizer_features_enable(‘10.1.0’)
Forcing the query optimizer to use compatible old version while creating the plan — version 10.1.0

Look at the plan soon after the change.
------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 728359962

------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | UPDATE STATEMENT | | 167K| 7685K| 16361 (1)| 00:03:50 | | 1 | UPDATE | XXXXX | | | | | | 2 | NESTED LOOPS OUTER| | 167K| 7685K| 16361 (1)| 00:03:50 | | 3 | TABLE ACCESS FULL| XXXXX | 167K| 5069K| 16268 (1)| 00:03:48 | |* 4 | INDEX UNIQUE SCAN| IBXXXX | 1 | 16 | 0 (0)| 00:00:01 | |* 5 | TABLE ACCESS FULL | TXXXX | 1 | 43 | 7099 (1)| 00:01:40 | ------------------------------------------------------------------------------
WHOA!!! execution time with some realistic values — and the query got completed in only 3 minutes after using the HINT.

Thanks
Prashant Dixit

Posted in Advanced | Tagged: SQL, Tuning | 3 Comments »

SLOW SQL Procedure/Query: Time to use Oracle Profiler.

Posted by FatDBA on May 14, 2014

Tuning long PL/SQL programs is always a great pain but with the presence of DBMS_PROFILER this has become quite easy to monitor the performance of the PL/SQL programs and identifying the culprit statements consuming too much time while the run of the program.

$ cd $ORACLE_HOME/rdbms/admin
$ sqlplus / as sysdba

Execute profload.sql script and this will install the DBMS_PROFILER package.

SQL> @profload.sql

Package created.
Grant succeeded.
Synonym created.
Library created.
Package body created.

Testing for correct installation
SYS.DBMS_PROFILER successfully loaded.

PL/SQL procedure successfully completed.

/*Now create the user for a test of DBMS_PROFILER */

SQL> create user plsql_prof_test identified by test;
User created.

SQL> grant connect , resource to plsql_prof_test;
Grant succeeded.

SQL> conn plsql_prof_test/test
Connected.

/*Once connected run proftab.sql which will create related tables where profiler puts its results. */

SQL> @proftab.sql
drop table plsql_profiler_data cascade constraints
*
ERROR at line 1:
ORA-00942: table or view does not exist

drop table plsql_profiler_units cascade constraints
*
ERROR at line 1:
ORA-00942: table or view does not exist

drop table plsql_profiler_runs cascade constraints
*
ERROR at line 1:
ORA-00942: table or view does not exist

drop sequence plsql_profiler_runnumber
*
ERROR at line 1:
ORA-02289: sequence does not exist

Table created.
Comment created.
Table created.
Comment created.
Table created.
Comment created.
Sequence created.

Now we are all set to use the PROFILER to find areas of code causing issues.

09:00:07 SQL> execute dbms_profiler.start_profiler(‘archieve_test’);
PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03

09:00:24 SQL> show user
USER is “ARADMIN”

09:00:26 SQL> exec ARCHIVING.archieve_test (‘Bdsss_asssas’,30,20, ‘Change MyDesk’, 1);
PL/SQL procedure successfully completed.

Elapsed: 00:10:24.63

09:17:41 SQL> execute dbms_profiler.STOP_PROFILER;
PL/SQL procedure successfully completed.

Now when profiling is completed its the time to use one of the view ‘plsql_profiler_runs’ which always created after you execute both the two initial sripts.
Below query will provide you
– RUN ID (Will help us to investigate further)
– RUN DETAILS (Run Timings)
– Total Time

set linesize 400 pagesize 400
col run_comment format a20
set lines 10000
column run_owner format a30
column run_comment format a20
select runid,run_owner, run_date,run_total_time/1000000000 run_total_time,run_comment from plsql_profiler_runs where run_comment like ‘%chieve%’ and RUN_DATE LIKE ‘%14-04-29%’;

RUNID RUN_OWNER RUN_DATE RUN_TOTAL_TIME RUN_COMMENT
———- —————————— —————– ————– ——————–
63 ARAMAEK 14-04-25 09:00:24 1078.98 archieve_test

Now when we have the RUN_ID we can collect and fragment query which is causing issues during the run time.
Below query will help us to divide the query (Procedure) with unit numbers and sometimes total time taken.

09:21:32 SQL> select runid,unit_number,unit_type,unit_owner, unit_name, unit_timestamp, total_time from plsql_profiler_units where runid=63;

RUNID UNIT_NUMBER UNIT_TYPE                        UNIT_OWNER                       UNIT_NAME                        UNIT_TIMESTAMP    TOTAL_TIME
———- ———– ——————————– ——————————– ——————————– —————– ———-
63           1 ANONYMOUS BLOCK                  <anonymous>                      <anonymous>                      00-00-00 01:01:01         .0
63           2 ANONYMOUS BLOCK                  <anonymous>                      <anonymous>                      00-00-00 01:01:01         .0
63           3 PACKAGE BODY                     ARADMIN                          ARCHIVING                        14-03-23 12:29:38         .0
63           4 ANONYMOUS BLOCK                  <anonymous>                      <anonymous>                      00-00-00 01:01:01         .0

Elapsed: 00:00:00.00

Below query will provide total number of occurences of UNIT SEGMENTS along with total time.

COLUMN runid FORMAT 99999
COLUMN unit_number FORMAT 99999
COLUMN unit_type FORMAT A20
column unit_name format a11
column TotalTime format a10
column MinTime format a10
SELECT u.runid,
u.unit_number,
u.unit_type,
u.unit_name,
d.line#,
to_char(d.total_occur) as TotalOccur,
to_char(d.total_time) as TotalTime,
to_char(d.min_time) as MinTime,
to_char(d.max_time) as max_time
FROM plsql_profiler_units u
JOIN plsql_profiler_data d ON u.runid = d.runid AND u.unit_number = d.unit_number
WHERE u.runid = 1
ORDER BY TotalOccur;

RUNID UNIT_NUMBER UNIT_TYPE UNIT_NAME LINE# TOTALOCCUR TOTALTIME MINTIME MAX_TIME
—— ———– ——————– ———– ———- —————————————- ———- ———- —————————————-

1           3 PACKAGE BODY         ARCHIVING         8741 5                                        5000       1000       1000
1           3 PACKAGE BODY         ARCHIVING         8904 5                                        136732000 25889000   29557000
1           3 PACKAGE BODY         ARCHIVING         8908 5                                        15000      3000       3000
1           3 PACKAGE BODY         ARCHIVING         8930 5                                        54969000   145000     54211000
1           3 PACKAGE BODY         ARCHIVING         8934 5                                        3000       1000       1000
1           3 PACKAGE BODY         ARCHIVING         8958 5                                        15445000   383000     13588000
1           3 PACKAGE BODY         ARCHIVING         8962 5                                        2000       1000       1000
1           3 PACKAGE BODY         ARCHIVING         8964 5                                        59000      1000       40000
1           3 PACKAGE BODY         ARCHIVING         8967 5                                        328000     38000      123000
1           3 PACKAGE BODY         ARCHIVING         8969 5                                        6000       1000       2000
1           3 PACKAGE BODY         ARCHIVING         7780 6                                        140969000 29000      140716000
1           3 PACKAGE BODY         ARCHIVING         7782 6                                        7000       1000       2000
1           3 PACKAGE BODY         ARCHIVING         8702 6                                        5000       1000       1000
1           3 PACKAGE BODY         ARCHIVING         1463 972                                      524000     1000       12000

Almost similar query above but this time sorted based on Total Time basis.

09:27:29 SQL> select runid,unit_number, line#,total_occur, to_char(total_time), to_char(min_time), to_char(max_time) from plsql_profiler_data where runid=63
09:27:37 2 and total_time !=0 and min_time !=0 and max_time !=0 order by TOTAL_TIME DESC, MIN_TIME DESC, MAX_TIME DESC, total_occur;

RUNID UNIT_NUMBER      LINE# TOTAL_OCCUR TO_CHAR(TOTAL_TIME)                      TO_CHAR(MIN_TIME)                        TO_CHAR(MAX_TIME)
—— ———– ———- ———– —————————————- —————————————- —————————————-
63           3       5090         220 226072292804                             34994                                    17049666731
63           3       5240          20 181066607554                             3979107078                               103125720715
63           3       4926          20 141296980613                             6541296441                               7361430883
63           3       5195          40 67123652951                              256962                                   3455214090
63           3       5093         200 2777416725                               3649463                                  130042883
63           3       5141          20 1573445706                               999                                      85397446
63           3       5040          26 1123533842                               6529040                                  260607691
63           3       4920          20 538701811                                1999706                                  116417886
63           3       5144           7 538252877                                96985                                    183333050
63           3       4903          20 511259845                                18179327                                 76123809
63           3        941         371 295843511                                527922                                   5997118
63           3       4900          20 243442214                                8460756                                  40755009
63           3       5198          20 223308174                                6913983                                  62818765
63           3       5154           4 197841917                                3578473                                  186799540
63           3       5147           4 173199539                                4495339                                  159311581
63           3       5037          46 111132663                                85987                                    18228320
63           3       5046          26 104860585                                315953                                   49972654
63           3        942         371 101599065                                101985                                   3873430
63           3       5103         200 69940718                                 111983                                   23141598
63           3       5156           4 67220118                                 3353507                                  35764742
63           3        956        2547 52771242                                 9998                                     2742596

TO FIND THE ORIGINAL TEXT OF THE OBJECT FROM ALL_SOURCE VIEW.
SELECT line || ‘ : ‘ || text FROM all_source WHERE owner = ‘ARAMAEK ‘ AND type= ‘PACKAGE BODY’ AND name = ‘ARCHIVING_TEST’;

* In this case this has resulted in a 11014 lines of PACKAGE BODY named ‘ARCHIEVING_TEST’.

09:29:55 SQL> select OWNER#, OBJ#, NAME, NAMESPACE, TYPE#,STATUS from obj$ where name =’ARCHIVING_TEST’;

OWNER#       OBJ# NAME                            NAMESPACE      TYPE#     STATUS
———- ———- —————————— ———- ———- ———-
58      85332 ARCHIVING_TEST                               1          9          1
58      85334 ARCHIVING_TEST                             2         11          1

Now you can locate those lines from the package body which is causing issues.

select source from source$ where obj#=85334 and line=5090;
FETCH aSH INTO aStateHistoryID;

select source from source$ where obj#=85334 and line=5240;
FETCH aTR INTO aTicketRelationID;

select source from source$ where obj#=85334 and line between 5239 and 5242 /* Problem Line 5240 */;

—————————————————————————————————————————————-
OPEN aTR FOR ‘SELECT ‘ || getTableColumnName(trFormName, Request_MAST) ||’ FROM ‘ ||t_tr_table||’ WHERE ‘|| getTableColumnName(trFormName, Parent_Request_OID) || ‘ = ‘ || db_request_oid ; LOOP
FETCH aTR INTO aRecordRelationID;
EXIT WHEN aTR%NOTFOUND;
dynInsert := ‘INSERT INTO ‘||t_astr_table||’ (‘||t_tr_tableCol||’) SELECT ‘||t_tr_tableCol||’ from ‘||t_tr_table||’ where C1 = :bind_var’;

Now when we are zeroed to only few line of codes out of 11014 line of codes we can investiate the issue.
Result: After small analysis we found there is few issues with the variable TYPE declaration and issue is fixed after same has been fixed.

Thanks
Prashant Dixit

Posted in Advanced | Tagged: Errors, performance | Leave a Comment »

Get DBID when Instance is in NOMOUNT mode.

Posted by FatDBA on May 5, 2014

One fine day we found that we have lost our control-file and the catalog.
RULE: To restore the controlfile, you must know the DBID. Being a UAT/Testing server we don’t have noted the DBID at any safe place.
Method/Fix: You can extract the DBID from the header of a datafile, assuming you have access to it. The database instance needs to up in NOMOUNT mode. Well, it has to be NOMOUNT because you haven’t restored the controlfile yet, a major requirement for the mount operation.

We have forced the DB to start in NOMOUNT Mode.

SQL> startup nomount;
ORACLE instance started.

Total System Global Area 523108352 bytes
Fixed Size                  1337632 bytes
Variable Size             402654944 bytes
Database Buffers          113246208 bytes
Redo Buffers                5869568 bytes

SQL> show parameter db_name
NAME TYPE VALUE
———————————— ———– ——————————
db_name string sairam

Let us place a unique identifier in the trace file names in order to easily identify the trace generated which used to find the DBID of the database.
SQL> alter session set tracefile_identifier = dixit;
Session altered.

We’ll now dump one of the datafile and dump few of the DB Blocks (12 Blocks)
SQL> alter system dump datafile ‘/u01/app/oracle/oradata/sairam/xyz.dbf’ block min 1 block max 12;
System altered.

Traces generated during the ALTER SYSTEM DUMP step which contains block header information which includes DBID.

-rw-r—– 1 oracle oinstall 170 May 5 21:44 sairam_ora_9192_DIXIT.trm
-rw-r—– 1 oracle oinstall 177K May 5 21:44 sairam_ora_9192_DIXIT.trc
[oracle@prashant trace]$ pwd
/u01/app/oracle/diag/rdbms/sairam/sairam/trace

Below is the excerpt from the trace file which contains header information
*IN BOLD WE HAVE THE DBID OF THE DATABASE.

System name:    Linux
Node name:      prashant
Release:        2.6.18-164.el5
Version:        #1 SMP Tue Aug 18 15:51:54 EDT 2009
Machine:        i686
Instance name: sairam
Redo thread mounted by this instance: 0 <none>
Oracle process number: 19
Unix process pid: 9192, image: oracle@prashant (TNS V1-V3)

*** 2014-05-05 21:44:16.933
*** SESSION ID:(1.3) 2014-05-05 21:44:16.933
*** CLIENT ID:() 2014-05-05 21:44:16.933
*** SERVICE NAME:() 2014-05-05 21:44:16.933
*** MODULE NAME:(sqlplus@prashant (TNS V1-V3)) 2014-05-05 21:44:16.933
*** ACTION NAME:() 2014-05-05 21:44:16.933

Start dump data block from file /u01/app/oracle/oradata/sairam/xyz.dbf minblk 1 maxblk 12
V10 STYLE FILE HEADER:
Compatibility Vsn = 186646528=0xb200000
Db ID=410122232=0x1871f7f8, Db Name=’SAIRAM’
Activation ID=0=0x0
Control Seq=22929=0x5991, File size=2560=0xa00
File Number=7, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 7 rdba: 0x01c00002 (1024/29360130)
scn: 0x0000.001a1cf6 seq: 0x01 flg: 0x04 tail: 0x1cf61d01
frmt: 0x02 chkval: 0xb87a type: 0x1d=KTFB Bitmapped File Space Header
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00518600 to 0x0051A600

Thanks
Prashant Dixit

Posted in Advanced | Tagged: Errors | Leave a Comment »

« Previous Entries

Next Entries »

Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes

Archive for the ‘Advanced’ Category

Evaluating Storage Performance!

File is DELETED but didn’t reclaim space in filesystem – LINUX

Some Serious Fun with the Database!!

Large Pages & use_large_pages parameter in Oracle — BMC Patrol Alerts global parameter ”MEMUsedMemPerc” triggered on ”MEMORY.MEMORY”

Want to monitor your Exa Health — Answer is ‘Exachk’ …

Active Session History (ASH) performed an emergency flush. ASH Undersized ?

Where are my DBA_BLOCKERS and DBA_WAITERS ??

Worst Execution Plan ever … Elapsed time of 999:59:59 :(

SLOW SQL Procedure/Query: Time to use Oracle Profiler.

Get DBID when Instance is in NOMOUNT mode.

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes

Archive for the ‘Advanced’ Category

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: