Tales From A Lazy Fat Oracle DBA

$ prashantdixit/oracle90@ace as sysdba

  • Prashant Dixit is the FatDBA

  • Follow Tales From A Lazy Fat Oracle DBA on WordPress.com
  • My Twitter Feeds

  • Oracle Radio

  • Magic Of Oracle

  • Disclaimer!

    FatDBA or Oracle 'Ant' is an independent web-blog/site.The experiences, Test cases, views, and opinions expressed in this website are my own and does not reflect the views or opinions of my employer.

    This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.
    Product and company names mentioned in this website may be the trademarks of their respective owners.

Posts Tagged ‘troubleshooting’

CLSRSC-351 & CRS-4000 Errors during execution of root.sh for GRID installation.

Posted by FatDBA on November 9, 2016

While doing GRID installation on one of the machine where there were few previous failed Grid installations happened I’ve got few of the error messages while running the root.sh script during my installation attempt.

This is what I’ve got during the process of executing the root.sh script.

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.1.0/grid_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘oracle’, privgrp ‘oinstall’..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
CRS-4000: Command Pin failed, or completed with errors.
2016/11/07 21:30:06 CLSRSC-161: Pin node using the command ‘/u01/app/oracle/product/12.1.0/grid_1/bin/crsctl pin css -n fatdba’ failed

I tried it executing the same second time: Praying for any magic happens this time 😉
But this time some more errors but have left some clues and actions.

[root@Fatdba /]#
[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.1.0/grid_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
2016/11/07 21:32:21 CLSRSC-351: Improper Oracle Clusterware configuration found on this host

2016/11/07 21:32:21 CLSRSC-353: Run ‘/u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl -deconfig’ to deconfigure existing failed configuration and then re-run ‘root.sh’

The command ‘/u01/app/oracle/product/12.1.0/grid_1/perl/bin/perl -I/u01/app/oracle/product/12.1.0/grid_1/perl/lib -I/u01/app/oracle/product/12.1.0/grid_1/crs/install /u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl ‘ execution failed

Okay, so it is clear that its happened due to some previous mess happened on the system before i got this as a task to install the software.It says that there is an improper clusterware configuration identified on the host and along it says to deinstall using roothas.pl script.

So i tried, but it says the ORS or the oracle restart stack is not active on the node, and it shouldn’t be as all the files were removed manually …
so it failed!

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl -deconfig
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
2016/11/07 21:32:54 CLSRSC-39: Oracle Restart stack is not active on this node
2016/11/07 21:32:54 CLSRSC-312: Failed to verify HA resources
Died at /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsdeconfig.pm line 1358.

Let’s try the last resort, the FORCE option to remove previous bad installs.
And it worked!

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/crs/install/roothas.pl -deconfig -force
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Delete failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
2016/11/07 21:39:06 CLSRSC-337: Successfully deconfigured Oracle Restart stack

Lets try to run the root.sh script again to complete this new GRID installation.
It worked now!

[root@Fatdba /]# /u01/app/oracle/product/12.1.0/grid_1/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.1.0/grid_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/oracle/product/12.1.0/grid_1/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘oracle’, privgrp ‘oinstall’..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
CRS-4664: Node fatdba successfully pinned.
2016/11/07 21:39:27 CLSRSC-330: Adding Clusterware entries to file ‘oracle-ohasd.conf’

fatdba 2016/11/07 21:40:01 /u01/app/oracle/product/12.1.0/grid_1/cdata/fatdba/backup_20161107_214001.olr 0
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘fatdba’
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘fatdba’
CRS-2677: Stop of ‘ora.evmd’ on ‘fatdba’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘fatdba’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2016/11/07 21:41:50 CLSRSC-327: Successfully configured Oracle Restart for a standalone server

Hope That Helps!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: | Leave a Comment »

Oracle GRID install failed -INS-40404

Posted by FatDBA on November 9, 2016

Many times while doing some crude deletion or cleanup of GRID directories you have faced an error with the GUI or at the CLI during the progress when you were almost sure that you have deleted everything what is needed.
Bam! right at that moment an error appears saying “The installer has detected a configured instance of Oracle grid infrastructure software on the server” with error code INS-40404.

You tried recalling everything you did to clean and remove every trace of it …. No Luck 😦

Solution to the problem is:
Go to oracle folder under /etc directory.
Remove the ‘ocr.loc’ file which keeps on pointing the local ocr locations.

Hope It Helps!
Prashant Dixit

Posted in Advanced | Tagged: | Leave a Comment »

runInstaller Error: An unexpected error has been detected by HotSpot Virtual Machine

Posted by FatDBA on January 6, 2016

Hello Everyone,
Installing your Oracle Software using GUI Method requires to call “runInstaller” script and is always an easy step if you have proper permissions, DISPLAY settings in place.
But here i would like to discuss one of the case where I’ve spent several hours to fix one of the error that occurred every-time i called runInstaller script even after setting all required permissions and DISPLAY variables.

It fails to render the installer and creates a log file under /tmp directory with below mentioned contents.

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGSEGV (0xb) at pc=0x0000003e2ce14d70, pid=4000, tid=140717162321680
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_51-b10 mixed mode)
# Problematic frame:
# C [ld-linux-x86-64.so.2+0x14d70]
#

————— T H R E A D —————

Current thread (0x000000004220d3f0): JavaThread “AWT-EventQueue-0” [_thread_in_native, id=4014]

siginfo:si_signo=11, si_errno=0, si_code=128, si_addr=0x0000000000000000

Registers:
RAX=0x0000000000000001, RBX=0x000000004216ae50, RCX=0x000000009eba2203, RDX=0x000000000fabfbff
RSP=0x00007ffb44792278, RBP=0x00007ffb447923c0, RSI=0x0000000000000000, RDI=0x0000000000000058
R8 =0x0000000000000000, R9 =0x0000000000000000, R10=0x00007ffb447921f0, R11=0x000000004216ae50
R12=0x00007ffb447923e8, R13=0x0000000041f85330, R14=0x0000000000000000, R15=0x0000000000000000
RIP=0x0000003e2ce14d70, EFL=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
TRAPNO=0x000000000000000d

Top of Stack: (sp=0x00007ffb44792278)
0x00007ffb44792278: 0000003e2ce0aaea 0000000000000000

Signal Handlers:
SIGSEGV: [libjvm.so+0x67ed60], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGBUS: [libjvm.so+0x67ed60], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGFPE: [libjvm.so+0x582020], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGPIPE: [libjvm.so+0x582020], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGILL: [libjvm.so+0x582020], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGUSR2: [libjvm.so+0x583ed0], sa_mask[0]=0x00000000, sa_flags=0x14000004
SIGHUP: [libjvm.so+0x5839a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGINT: [libjvm.so+0x5839a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGQUIT: [libjvm.so+0x5839a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004
SIGTERM: [libjvm.so+0x5839a0], sa_mask[0]=0x7ffbfeff, sa_flags=0x14000004

————— S Y S T E M —————

OS:Red Hat Enterprise Linux Server release 6.0 (Santiago)

uname:Linux 2.6.32-573.12.1.el6.x86_64 #1 SMP Tue Dec 15 06:42:08 PST 2015 x86_64
libc:glibc 2.12 NPTL 2.12
rlimit: STACK 10240k, CORE 0k, NPROC 16384, NOFILE 65536, AS infinity
load average:0.09 0.06 0.08

CPU:total 1 em64t

Memory: 4k page, physical 2046684k(69828k free), swap 2031612k(2031612k free)

vm_info: Java HotSpot(TM) 64-Bit Server VM (1.5.0_51-b10) for linux-amd64, built on Jun 6 2013 09:59:46 by java_re with gcc 3.2.2 (SuSE Linux)

time: Sat Jan 2 23:09:21 2016
elapsed time: 2 seconds

The workaround to the problem is to set “LD_BIND_NOW” environment variable to a value “1” as shown below and re-launch the installer.

bash-4.1$ export LD_BIND_NOW=1
bash-4.1$ ./runInstaller
Starting Oracle Universal Installer…

Checking Temp space: must be greater than 120 MB. Actual 27339 MB Passed
Checking swap space: must be greater than 150 MB. Actual 4031 MB Passed
Checking monitor: must be configured to display at least 256 colors. Actual 16777216 Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2013-07-01_03-29-40AM. Please wait …
bash-4.1$

This bug seems to have reported on 11.2.0.1 & 11.2.0.3.

Hope That Helps
Prashant Dixit

Posted in Advanced, Basics | Tagged: , | Leave a Comment »

Oracle GI 11.2 Installation on RHEL 7 – Error: ohasd failed to start the Clusterware.

Posted by FatDBA on January 6, 2016

Recently as a part of solution i proposed for a new infrastructure for one of the customer, project team came with an error which they encountered during the GRID Infrastructure – Especially soon after executing the ‘Root.sh’ script. This was actually 11.2.0.4 Grid Infrastructure software installation on Oracle Enterprise 7 which is the latest release from Oracle Corp.

While troubleshooting the problem i experienced much pain getting it to install. The installation process fails when the root.sh script is run.
* Sometimes to configure Grid Infrastructure for a Stand-Alone Server system asks to run the following command as the root user:
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl

The error reported is:

[root@localhost /]# /u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
LOCAL ADD MODE
Creating OCR keys for user ‘oracle’, privgrp ‘oinstall’..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
CRS-4664: Node localhost successfully pinned.
Adding Clusterware entries to inittab
ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2016-01-01 02:14:46.806:
[client(11401)]CRS-2101:The OLR was formatted using version 3.
2016-01-01 02:14:49.572:
[client(11424)]CRS-1001:The OCR was formatted using version 3.

ohasd failed to start at /u01/app/11.2.0/grid/crs/install/roothas.pl line 377, line 4.

I hunted though various blog posts and even Oracle Metalink initially but all of them were was of little to no use.
Finally, I stumbled across an apparently poorly indexed (and titled) support note (1951613.1) that made reference to a RHEL 7 specific patch. The patch number is: 18370031.

So a Patch download and a new installation process later, I was finally able to get the GI installer to properly register the ohasd services. In the end, I was glad it was a patch, that resolved the issue since (in theory) Oracle will support it. I was surprised that the Oracle Support tech was not able to locate the patch 🙂

In applying the patch it is a little different. You have to run the GI installer to the point where it instructs you to run root.sh. Before you run root.sh, you then use OPatch to install the provided patch. Then finally you run root.sh.

Below provided are the steps performed during the fix.
1. First i had to deinstall previous GRID configuration (Where I’ve got that error message after roo.sh execution).
– During the deinstallation process it will ask you to execute few scripts which will ultimately help you to deinstall the entire Oracle Restart stack.
2. Download, Unzip and Apply the patch using OPATCH.
3. Execute the root.sh script once you applied the patch.
4. Check the services status using crs_stat.

 

Step 1:

Deinstall previous GRID configuration
[root@localhost deinstall]# su – oracle
Last login: Fri Jan 1 02:17:02 EST 2016 on pts/1
[oracle@localhost ~]$ cd /u01/app/11.2.0/grid/deinstall
[oracle@localhost deinstall]$ ./deinstall

Checking for required files and bootstrapping …
Please wait …
Location of logs /tmp/deinstall2016-01-01_02-30-16AM/logs/
Click Here to Read More!!

Posted in Advanced | Tagged: | 14 Comments »

Evaluating Storage Performance!

Posted by FatDBA on March 10, 2015

Being a DBA means you need to posses outstanding knowledge of DBA Subjects as obvious, but nowadays knowledge on storage/network/OS is a big surplus. This helps a modern day DBA to troubleshoot/rectify and helps to avoid million of hours wasted to identify DB issues when in fact there isn’t any.

This time i tried to discuss steps which one can take when there are waits likes – log file sync, db file async io submit, log file parallel write, control file parallel write and many of the parallel and sequential write observed in system.

Quick Fast Disk Test Results (DD Command):
———————————————
[oracle@dixitdb111 datafiles]$ time sh -c “dd if=/dev/zero of=dd-test-file bs=20k count=1000000 && sync”
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 26.4959 seconds, 309 MB/s

real 0m44.318s
user 0m0.278s
sys 0m19.410s
You have new mail in /var/spool/mail/oracle

[oracle@dixitdb111 datafiles]$ ls -ltrh
-rw-r–r– 1 oracle oinstall 7.7G Jan 30 11:59 dd-test-file

ORION
————-

ORION (ORacle IO Numbers) imitates the type of I/O performed by Oracle databases, which makes possible for you to measure I/O performance for storage arrangements without actually installing Oracle. This is now included in the “$ORACLE_HOME/bin” directory of Database/Grid installations.

There are many of the options available for the orion to run.
e.g. oltp, olap and many more.
Below pasted is a beautiful explanation by Alex Gorbachev (ACED, IOUG, OAKTABLE Member & Renowned Blogger)
Link: http://www.uyoug.org.uy/eventos2013/OTNLAD2013-Benchmarking-Oracle-IO-Performance-with-ORION-by-Alex-Gorbachev.pdf

[oracle@dixitdb111 bin]$ ./orion -run oltp
ORION: ORacle IO Numbers — Version 11.2.0.3.0
orion_20150203_0800
Calibration will take approximately 2 minutes.
Using a large value for -cache_size may take longer.

This will result in few .csv and text files with IO results with some beautiful charts/graphs and tabular records for the runtime.

Reference:
————–
_hist.csv
Histogram of I/O latencies.
_iops.csv
Performance results of
small I/Os in IOPS.
_lat.csv
Latency of small I/Os in microseconds.
_mbps.csv
Performance results of
large I/Os in MBPS.
_summary.txt
Summary of the input parameters, along with the minimum small I/O latency (in secs),
the maximum MBPS, and the maximum IOPS observed.
_trace.txt
Extended, unprocessed output

I/O calibration is one of those magical option. This feature enables user to assess the performance of the storage subsystem, and determine whether I/O performance problems are caused by the database or the storage subsystem. Unlike other external I/O calibration tools that issue I/Os sequentially, the I/O calibration feature of Oracle Database issues I/Os randomly using Oracle datafiles to access the storage media, producing results that more closely match the actual performance of the database.

SELECT d.name,
i.asynch_io
FROM v$datafile d,
v$iostat_file i
WHERE d.file# = i.file_no
AND i.filetype_name = ‘Data File’;

NAME ASYNCH_IO
—————————————————————– ———
/dbmnt1/dixitdb/datafiles/system01.dbf ASYNC_ON
/dbmnt2/dixitdb/datafiles/undotbs1_01.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/sysaux01.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/users01.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem01.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem02.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem03.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem04.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem05.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem06.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem07.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem08.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem09.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem10.dbf ASYNC_ON
/dbmnt2/dixitdb/datafiles/unicode1tbs01.dbf ASYNC_ON
/dbmnt2/dixitdb/datafiles/unicode2atbs01.dbf ASYNC_ON
/dbmnt2/dixitdb/datafiles/r11testtbs.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem11.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem12.dbf ASYNC_ON
/dbmnt2/dixitdb/datafiles/artest.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem13.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem14.dbf ASYNC_ON
/dbmnt1/dixitdb/datafiles/dbsystem15.dbf ASYNC_ON
/dbmnt3/dixitdb/datafiles/dbsystem16.dbf ASYNC_ON

* below query is resource intensive. Load/CPU spikes are expected during the run.

SET SERVEROUTPUT ON
DECLARE
l_latency PLS_INTEGER;
l_iops PLS_INTEGER;
l_mbps PLS_INTEGER;
BEGIN
DBMS_RESOURCE_MANAGER.calibrate_io (num_physical_disks => 1,
max_latency => 20,
max_iops => l_iops,
max_mbps => l_mbps,
actual_latency => l_latency);

DBMS_OUTPUT.put_line(‘Max IOPS = ‘ || l_iops);
DBMS_OUTPUT.put_line(‘Max MBPS = ‘ || l_mbps);
DBMS_OUTPUT.put_line(‘Latency = ‘ || l_latency);
END;
/

Max IOPS = 610
Max MBPS = 67
Latency = 19

==================
Calibration runs can be monitored using the V$IO_CALIBRATION_STATUS view.

SET LINESIZE 100
COLUMN start_time FORMAT A20
COLUMN end_time FORMAT A20

SELECT TO_CHAR(start_time, ‘DD-MON-YYY HH24:MI:SS’) AS start_time,
TO_CHAR(end_time, ‘DD-MON-YYY HH24:MI:SS’) AS end_time,
max_iops,
max_mbps,
max_pmbps,
latency,
num_physical_disks AS disks
FROM dba_rsrc_io_calibrate;

START_TIME END_TIME MAX_IOPS MAX_MBPS MAX_PMBPS LATENCY DISKS
——————– ——————– ———- ———- ———- ———- ———-
30-JAN-015 09:49:10 30-JAN-015 09:53:14 610 67 27 19 1

Thanks
Prashant Dixit

Posted in Advanced | Tagged: , | Leave a Comment »

Opatch Failed error code 73: OUI-67073: UtilSession failed: Prerequisite check “CheckActiveFilesAndExecutables”

Posted by FatDBA on March 2, 2015

Issue:
Upgrade error from 11.2.0.2 to 11.2.0.4

Error Description:
Oracle SPU / CPU patch deployment using Opatch filed with following error message.
Following executables are active :
/u01/app/oracle/product/11.2.0.2/home/lib/libclntsh.so.11.1
UtilSession failed: Prerequisite check “CheckActiveFilesAndExecutables” failed.
Log file location: /u01/app/oracle/product/11.2.0.2/home/cfgtoollogs/opatch/opatch2014-9-14_12-10-00PM.log

OPatch failed with error code 73

Cause:
There are some files which are locked or some processes still running while applying patch . those should be avoided

Full Error in log:

[Mar 1, 2015 4:19:20 PM] Finish fuser command /sbin/fuser /u01/app/oracle/product/11.2.0.2/home/lib/libclntsh.so.11.1 at Fri Nov 22 14:10:20 CET 2014
[Mar 1, 2015 4:19:20 PM] Following executables are active:
/u01/app/oracle/product/11.2.0.2/home/lib/libclntsh.so.11.1
[Mar 1, 2015 4:19:20 PM] Prerequisite check “CheckActiveFilesAndExecutables” failed.
The details are:
Following executables are active:
/u01/app/oracle/product/11.2.0.2/home/lib/libclntsh.so.11.1
[Mar 1, 2015 4:19:20 PM] OUI-67073:UtilSession failed: Prerequisite check “CheckActiveFilesAndExecutables” failed.
[Mar 1, 2015 4:19:20 PM] Finishing UtilSession at Fri Nov 22 14:10:20 CET 2014

Solution Description
==================================

This error is simple. Firstly make sure DB and listener are down .

Solution 1:
some processes are still in running .to find out them try
ps -ef|grep db_name
then kill each process by using kill -9 1196 command

Solution 2:
check which process is locking this library file using below command and kill it:

$ /sbin/fuser /u01/app/oracle/product/11.2.0.2/home/lib/libclntsh.so.11.1
/u01/app/oracle/product/11.2.0.2/home/lib/libclntsh.so.11.1: 1196m 2215m

$ kill -9 1196

now run opatch apply and it will run without any issues this time.


Hope That Helps
Prashant Dixit

Posted in Uncategorized | Tagged: , , | Leave a Comment »

Using _ALLOW_RESETLOGS_CORRUPTION in case of corruption: How to recover & open the database ?

Posted by FatDBA on March 2, 2015

Recently while opening a TEST RAC database we found the database down and was not available, we tried to start the database but received communication errors and left us clueless. Due to some urgent POC activity scheduled on the database we started investigating the root cause right away for error.
ORA-03113: end-of-file on communication channel

[oracle@testdbdixit ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Thu Feb 26 01:04:23 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.
SQL> startup
ORA-03113: end-of-file on communication channel

We tried to open the database in MOUNT mode and it reached to phase easily without any error.

SQL> STARTUP MOUNT;
ORACLE instance started.

Total System Global Area 1.6034E+10 bytes
Fixed Size 2269072 bytes
Variable Size 2449473648 bytes
Database Buffers 1.3556E+10 bytes
Redo Buffers 26480640 bytes
Database mounted.

But the recovery of any kind got failed because of the database running in No Archivelog Mode & being a test instance there is also not any RMAN backups configured as well … #TotalDisaster 😦 😦

SQL> ALTER DATABASE RECOVER DATABASE UNTIL CANCEL;
ALTER DATABASE RECOVER DATABASE UNTIL CANCEL
*
ERROR at line 1:
ORA-00279: change 7311130 generated at 02/25/2015 22:00:18 needed for thread 2
ORA-00289: suggestion : +FRA
ORA-15173: entry ‘ARCHIVELOG’ does not exist in directory ‘DIXITDB’
ORA-00280: change 7311130 for thread 2 is in sequence #207

SQL> archive log list
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 233
Current log sequence 234

When tried to start the cancel based incomplete recovery, it went successful but received few inconsistency errors for system datafile if we try to open the database in RESETLOGS mode.

SQL> ALTER DATABASE RECOVER CANCEL;
ALTER DATABASE RECOVER CANCEL
*
ERROR at line 1:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘+DATA/DIXITDB/datafile/system.256.871197447’

As expected RESETLOGS mode failed too.

SQL> ALTER DATABASE OPEN RESETLOGS;
ALTER DATABASE OPEN RESETLOGS
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘+DATA/DIXITDB/datafile/system.256.871197447’

Resolution:
====================

*Note: Underscore (hidden/undocumented) parameters should only be used after consent with Oracle Support and should always tried and tested in sandbox environments before applying them in prod environment.

There is a hidden parameter _ALLOW_RESETLOGS_CORRUPTION=TRUE which will allow us to open database even though it’s not properly recovered.
ALTER SYSTEM SET “_allow_resetlogs_corruption”= TRUE SCOPE = SPFILE;
Tip: Also change the undo_management to “Manual”

After the two changes in the spfile you can open the database with:

sqlplus “/ as sysdba”
startup force

Note: Well there is no 100% guarantee that setting _ALLOW_RESETLOGS_CORRUPTION=TRUE will open the database. However, once the database is opened, then we must immediately rebuild the database. Database rebuild means doing the following, namely: (1) perform a full-database export, (2) create a brand new and separate database, and finally (3) import the recent export dump. This option can be tedious and time consuming, but once we successfully open the new database, then we expect minimal or perhaps no data loss at all. Before you try this option, ensure that you have a good and valid backup of the current database.

Previous Settings:
SQL> show parameter undo

NAME TYPE VALUE
———————————— ———– ——————————
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1

SQL> alter system set undo_management=manual scope=spfile;
System altered.

SQL> ALTER SYSTEM SET “_allow_resetlogs_corruption”= TRUE SCOPE = SPFILE;
System altered.

SQL> shut immediate;

And after setting all the requisite parameters to defined values, we finally saw that ‘Database Opened’ message on the SQL prompt … 🙂 🙂

SQL> startup force;
ORACLE instance started.

Total System Global Area 1.6034E+10 bytes
Fixed Size 2269072 bytes
Variable Size 2449473648 bytes
Database Buffers 1.3556E+10 bytes
Redo Buffers 26480640 bytes
Database mounted.
Database opened.

SQL> ALTER DATABASE OPEN RESETLOGS;
Database altered.

SQL> archive log list
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 1
Current log sequence 2

SQL> alter system switch logfile;
System altered.

And a new incarnation of the database happened after opening database in RESETLOGS mode.

SQL> select INCARNATION#,RESETLOGS_TIME,STATUS, RESETLOGS_ID from v$database_incarnation;

INCARNATION# RESETLOGS STATUS RESETLOGS_ID
———— ——— ——- ————
1 24-AUG-13 PARENT 824297850
2 09-FEB-15 PARENT 871197521
3 26-FEB-15 CURRENT 872646322

In Short:
=====================
1) Set _ALLOW_RESETLOGS_CORRUPTION=TRUE in init.ora file.
2) Startup Mount
3) Recover database
4) Alter database open resetlogs.
5) reset undo_management to “manual” in init.ora file.
6) startup database
7) Create new undo tablespace (There is a strict need of changing the UNDO_MANAGEMENT to AUTO in Prod databases)
change undo_management to “AUTO” and undo_tablespace to “NewTablespace”
9) Bounce database.

Hope That Helps!
Prashant Dixit

Posted in Uncategorized | Tagged: , | Leave a Comment »

How to monitor/troubleshoot RAC Stack … Answer is “ORAchk”

Posted by FatDBA on March 2, 2015

Below provided are steps to call and a sample health check report generated for a 2 node RAC system via ORAchk Tool.
link to download: https://support.oracle.com/epmos/faces/SearchDocDisplay?_adf.ctrl-state=kfoe5ynno_4&_afrLoop=178033262862018
Note ID: 1268927.2

=======================================================================

[oracle@dixitdb12v dixit]$ ./orachk
This version of orachk was released on 09-Oct-2014 and its older than 120 days. No new version of orachk is available in RAT_UPGRADE_LOC. It is highly recommended that you download the latest version of orachk from my oracle support to ensure the highest level of accuracy of the data contained within the report.

Do you want to continue running this version? [y/n][y]y

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /opt/app/grid/11.2.0/grid?[y/n][y]y

Checking ssh user equivalency settings on all nodes in cluster

Node dixitdb13v is configured for ssh user equivalency for oracle user

Searching for running databases . . . . .

. .
List of running databases registered in OCR
1. TESTRAC
2. None of above

Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1].1
. .

Checking Status of Oracle Software Stack – Clusterware, ASM, RDBMS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
——————————————————————————————————-
Oracle Stack Status
——————————————————————————————————-
Host Name CRS Installed RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name
——————————————————————————————————-
dixitdb12v Yes Yes Yes Yes Yes TESTRAC1
dixitdb13v Yes Yes Yes Yes Yes TESTRAC2
——————————————————————————————————-

Copying plug-ins

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

18 of the included audit checks require root privileged data collection . If sudo is not configured or the root password is not available, audit checks which require root privileged data collection can be skipped.

1. Enter 1 if you will enter root password for each host when prompted

2. Enter 2 if you have sudo configured for oracle user to execute root_orachk.sh script

3. Enter 3 to skip the root privileged collections

4. Enter 4 to exit and work with the SA to configure sudo or to arrange for root access and run the tool later.

Please indicate your selection from one of the above options for root access[1-4][1]:- 1

*** Checking Best Practice Recommendations (PASS/WARNING/FAIL) ***

Collections and audit checks log file is
/home/oracle/dixit/orachk_dixitdb12v_TESTRAC_022715_032812/log/orachk.log

Running orachk in serial mode because expect(/usr/bin/expect) is not available to supply root passwords on remote nodes

NOTICE: Installing the expect utility (/usr/bin/expect) will allow orachk to gather root passwords at the beginning of the process and execute orachk on all nodes in parallel speeding up the entire process. For more info – http://www.nist.gov/el/msid/expect.cfm. Expect is available for all major platforms. See User Guide for more details.

Checking for prompts in /home/oracle/.bash_profile on dixitdb12v for oracle user…

Checking for prompts in /home/oracle/.bash_profile on dixitdb13v for oracle user…

=============================================================
Node name – dixitdb12v
=============================================================

Collecting – ASM Disk Groups
Collecting – ASM Disk I/O stats
Collecting – ASM Diskgroup Attributes
Collecting – ASM disk partnership imbalance
Collecting – ASM diskgroup attributes
Collecting – ASM diskgroup usable free space
Collecting – ASM initialization parameters
Collecting – Active sessions load balance for TESTRAC database
Collecting – Archived Destination Status for TESTRAC database
Collecting – Cluster Interconnect Config for TESTRAC database
Collecting – Database Archive Destinations for TESTRAC database
Collecting – Database Files for TESTRAC database
Collecting – Database Instance Settings for TESTRAC database
Collecting – Database Parameters for TESTRAC database
Collecting – Database Parameters for TESTRAC database
Collecting – Database Properties for TESTRAC database
Collecting – Database Registry for TESTRAC database
Collecting – Database Sequences for TESTRAC database
Collecting – Database Undocumented Parameters for TESTRAC database
Collecting – Database Undocumented Parameters for TESTRAC database
Collecting – Database Workload Services for TESTRAC database
Collecting – Dataguard Status for TESTRAC database
Collecting – Files not opened by ASM
Collecting – Log Sequence Numbers for TESTRAC database
Collecting – Percentage of asm disk Imbalance
Collecting – Process for shipping Redo to standby for TESTRAC database
Collecting – RDBMS Feature Usage for TESTRAC database
Collecting – Redo Log information for TESTRAC database
Collecting – Standby redo log creation status before switchover for TESTRAC database
Collecting – /proc/cmdline
Collecting – /proc/modules
Collecting – CPU Information
Collecting – CRS active version
Collecting – CRS oifcfg
Collecting – CRS software version
Collecting – CSS Reboot time
Collecting – CSS disktimout
Collecting – Cluster interconnect (clusterware)
Collecting – Clusterware OCR healthcheck
Collecting – Clusterware Resource Status
Collecting – DiskFree Information
Collecting – DiskMount Information
Collecting – Huge pages configuration
Collecting – Interconnect network card speed
Collecting – Kernel parameters
Collecting – Maximum number of semaphore sets on system
Collecting – Maximum number of semaphores on system
Collecting – Maximum number of semaphores per semaphore set
Collecting – Memory Information
Collecting – NUMA Configuration
Collecting – Network Interface Configuration
Collecting – Network Performance
Collecting – Network Service Switch
Collecting – OS Packages
Collecting – OS version
Collecting – Operating system release information and kernel version
Collecting – Oracle Executable Attributes
Collecting – Patches for Grid Infrastructure
Collecting – Patches for RDBMS Home
Collecting – Shared memory segments
Collecting – Table of file system defaults
Collecting – Voting disks (clusterware)
Collecting – number of semaphore operations per semop system call
Preparing to run root privileged commands dixitdb12v. Please enter root password when prompted.
root@dixitdb12v’s password:
Collecting – ACFS Volumes status
Collecting – Broadcast Requirements for Networks
Collecting – CRS user time zone check
Collecting – Custom rc init scripts (rc.local)
Collecting – Disk Information
Collecting – Grid Infastructure user shell limits configuration
Collecting – Interconnect interface config
Collecting – Network interface stats
Collecting – Number of RDBMS LMS running in real time
Collecting – OLR Integrity
Collecting – Root user limits
Collecting – Verify no database server kernel out of memory errors
Collecting – root time zone check
Collecting – slabinfo

Data collections completed. Checking best practices on dixitdb12v.
————————————————————————————–

WARNING => Cluster Health Monitor (CHM) repository does not provide recommended level of retention
INFO => Important Automatic Storage Management (ASM) Notes and Technical White Papers
FAIL => Bash is vulnerable to code injection (CVE-2014-6271)
WARNING => ARCHIVELOG mode is disabled for TESTRAC
INFO => $CRS_HOME/log/hostname/client directory has too many older log files.
INFO => ORA-00600 errors found in alert log for TESTRAC
INFO => At some times checkpoints are not being completed for TESTRAC
INFO => Some data or temp files are not autoextensible for TESTRAC
INFO => oracleasm (asmlib) module is NOT loaded
WARNING => Shell limit soft nproc for DB is NOT configured according to recommendation
WARNING => kernel.shmmax parameter is NOT configured according to recommendation
WARNING => Database Parameter memory_target is not set to the recommended value on TESTRAC1 instance
FAIL => Operating system hugepages count does not satisfy total SGA requirements
WARNING => NIC bonding is not configured for interconnect
WARNING => NIC bonding is NOT configured for public network (VIP)
WARNING => OSWatcher is not running as is recommended.
INFO => Jumbo frames (MTU >= 8192) are not configured for interconnect
WARNING => NTP is not running with correct setting
WARNING => Database parameter DB_BLOCK_CHECKING on PRIMARY is NOT set to the recommended value. for TESTRAC
FAIL => Flashback on PRIMARY is not configured for TESTRAC
INFO => Operational Best Practices
INFO => Database Consolidation Best Practices
INFO => Computer failure prevention best practices
INFO => Data corruption prevention best practices
INFO => Logical corruption prevention best practices
INFO => Database/Cluster/Site failure prevention best practices
INFO => Client failover operational best practices
WARNING => fast_start_mttr_target should be greater than or equal to 300. on TESTRAC1 instance

INFO => Information about hanganalyze and systemstate dump
WARNING => Package unixODBC-2.2.14-11.el6-i686 is recommended but NOT installed
WARNING => Package unixODBC-devel-2.2.14-11.el6-i686 is recommended but NOT installed
FAIL => Table AUD$[FGA_LOG$] should use Automatic Segment Space Management for TESTRAC
INFO => Database failure prevention best practices
WARNING => Database Archivelog Mode should be set to ARCHIVELOG for TESTRAC
FAIL => Primary database is NOT protected with Data Guard (standby database) for real-time data protection and availability for TESTRAC
INFO => Parallel Execution Health-Checks and Diagnostics Reports for TESTRAC
WARNING => Package unixODBC-devel-2.2.14-11.el6-x86_64 is recommended but NOT installed
WARNING => Linux transparent huge pages are enabled
WARNING => vm.min_free_kbytes should be set as recommended.
INFO => Oracle recovery manager(rman) best practices
WARNING => RMAN controlfile autobackup should be set to ON for TESTRAC
INFO => Consider increasing the COREDUMPSIZE size
INFO => Consider investigating changes to the schema objects such as DDLs or new object creation for TESTRAC
INFO => Consider investigating the frequency of SGA resize operations and take corrective action for TESTRAC

Best Practice checking completed.Checking recommended patches on dixitdb12v.
———————————————————————————

Collecting patch inventory on CRS HOME /opt/app/grid/11.2.0/grid
Collecting patch inventory on ORACLE_HOME /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————
1 Recommended CRS patches for 112040 from /opt/app/grid/11.2.0/grid on dixitdb12v
———————————————————————————
Patch# CRS ASM RDBMS RDBMS_HOME Patch-Description
———————————————————————————
19769489 no yes /opt/app/oracle/product/11.2.0/dbhome_1Patch description: “Database Patch Set Update : 11.2.0.4.5 (19769489)”
———————————————————————————

———————————————————————————
1 Recommended RDBMS patches for 112040 from /opt/app/oracle/product/11.2.0/dbhome_1 on dixitdb12v
———————————————————————————
Patch# RDBMS ASM type Patch-Description
———————————————————————————
19769489 yes merge Patch description: “Database Patch Set Update : 11.2.0.4.5 (19769489)”
———————————————————————————
———————————————————————————

———————————————————————————
Clusterware patches summary report
———————————————————————————
Total patches Applied on CRS Applied on RDBMS Applied on ASM
———————————————————————————
1 0 1 0
———————————————————————————

———————————————————————————
RDBMS homes patches summary report
———————————————————————————
Total patches Applied on RDBMS Applied on ASM ORACLE_HOME
———————————————————————————
1 1 0 /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————

=============================================================
Node name – dixitdb13v
=============================================================

Collecting – /proc/cmdline
Collecting – /proc/modules
Collecting – CPU Information
Collecting – CRS active version
Collecting – CRS oifcfg
Collecting – CRS software version
Collecting – Cluster interconnect (clusterware)
Collecting – DiskFree Information
Collecting – DiskMount Information
Collecting – Huge pages configuration
Collecting – Interconnect network card speed
Collecting – Kernel parameters
Collecting – Maximum number of semaphore sets on system
Collecting – Maximum number of semaphores on system
Collecting – Maximum number of semaphores per semaphore set
Collecting – Memory Information
Collecting – NUMA Configuration
Collecting – Network Interface Configuration
Collecting – Network Performance
Collecting – Network Service Switch
Collecting – OS Packages
Collecting – OS version
Collecting – Operating system release information and kernel version
Collecting – Oracle Executable Attributes
Collecting – Patches for Grid Infrastructure
Collecting – Patches for RDBMS Home
Collecting – Shared memory segments
Collecting – Table of file system defaults
Collecting – number of semaphore operations per semop system call
Preparing to run root privileged commands dixitdb13v. Please enter root password when prompted.
root@dixitdb13v’s password:

Data collections completed. Checking best practices on dixitdb13v.
————————————————————————————–

FAIL => Bash is vulnerable to code injection (CVE-2014-6271)
INFO => $CRS_HOME/log/hostname/client directory has too many older log files.
INFO => ORA-00600 errors found in alert log for TESTRAC
INFO => At some times checkpoints are not being completed for TESTRAC
INFO => oracleasm (asmlib) module is NOT loaded
WARNING => Shell limit soft nproc for DB is NOT configured according to recommendation
WARNING => kernel.shmmax parameter is NOT configured according to recommendation
WARNING => Database Parameter memory_target is not set to the recommended value on TESTRAC2 instance
FAIL => Operating system hugepages count does not satisfy total SGA requirements
WARNING => NIC bonding is not configured for interconnect
WARNING => NIC bonding is NOT configured for public network (VIP)
WARNING => OSWatcher is not running as is recommended.
INFO => Jumbo frames (MTU >= 8192) are not configured for interconnect
WARNING => NTP is not running with correct setting
WARNING => Database parameter DB_BLOCK_CHECKING on PRIMARY is NOT set to the recommended value. for TESTRAC
WARNING => fast_start_mttr_target should be greater than or equal to 300. on TESTRAC2 instance

INFO => IMPORTANT: Oracle Database Patch 17478514 PSU is NOT applied to RDBMS Home /opt/app/oracle/product/11.2.0/dbhome_1
WARNING => Package unixODBC-2.2.14-11.el6-i686 is recommended but NOT installed
WARNING => Package unixODBC-devel-2.2.14-11.el6-i686 is recommended but NOT installed
WARNING => Package unixODBC-devel-2.2.14-11.el6-x86_64 is recommended but NOT installed
WARNING => Linux transparent huge pages are enabled
WARNING => vm.min_free_kbytes should be set as recommended.
INFO => Consider increasing the COREDUMPSIZE size

Best Practice checking completed.Checking recommended patches on dixitdb13v.
———————————————————————————

Collecting patch inventory on CRS HOME /opt/app/grid/11.2.0/grid
Collecting patch inventory on ORACLE_HOME /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————
1 Recommended CRS patches for 112040 from /opt/app/grid/11.2.0/grid on dixitdb13v
———————————————————————————
Patch# CRS ASM RDBMS RDBMS_HOME Patch-Description
———————————————————————————
18706472 no no /opt/app/oracle/product/11.2.0/dbhome_1GRID INFRASTRUCTURE SYSTEM PATCH 11.2.0.4.3
———————————————————————————

———————————————————————————
1 Recommended RDBMS patches for 112040 from /opt/app/oracle/product/11.2.0/dbhome_1 on dixitdb13v
———————————————————————————
Patch# RDBMS ASM type Patch-Description
———————————————————————————
18706472 no merge GRID INFRASTRUCTURE SYSTEM PATCH 11.2.0.4.3
———————————————————————————
———————————————————————————

———————————————————————————
Clusterware patches summary report
———————————————————————————
Total patches Applied on CRS Applied on RDBMS Applied on ASM
———————————————————————————
1 0 0 0
———————————————————————————

———————————————————————————
RDBMS homes patches summary report
———————————————————————————
Total patches Applied on RDBMS Applied on ASM ORACLE_HOME
———————————————————————————
1 0 0 /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————

———————————————————————————
CLUSTERWIDE CHECKS
———————————————————————————
———————————————————————————

Detailed report (html) – /home/oracle/dixit/orachk_dixitdb12v_TESTRAC_022715_032812/orachk_dixitdb12v_TESTRAC_022715_032812.html

UPLOAD(if required) – /home/oracle/dixit/orachk_dixitdb12v_TESTRAC_022715_032812.zip

Thanks
Prashant Dixit

Posted in Uncategorized | Tagged: , | Leave a Comment »

Online Patching & Bug Fixing — > : Bug 16342845 : EXCESSIVE CPU IN DBW PROCESSES FOR FAST OBJECT CHECKPOINTSDB

Posted by FatDBA on March 2, 2015

During further investigation we found that all the Database Writers started consuming high resources and waits on ‘DB FILE ASYNC I/O SUBMIT’ event and all rest of the database sessions started waiting on ‘enq: KO – Fast Object Checkpoint’

We have one most critical production system running on oracle 11.2.0.3 on the top of RHEL.
Recently we started experiencing a very bizarre and weird issue where applications timeouts happens intermittently and during that specific period extremely slow IO operations happens at the database level.

While investigation found that during the issue DB started throwing huge and frequent waits on ‘enq: KO – fast object checkpoint’ and ‘db file async I/O submit’ events and all of the queries in the database during the period starts snailing.
At the same time all the 8 available DBWR processes started waiting on “db file async I/O Submit” wait event and seems takes time to write blocks to the disk and this leads to late check-pointing in the system which in turn slow down database.

DB_Waits_Stats-duringSlowness-onetmproduction___mmma

From AWR
—————
foreground wait event
enq: KO – fast object checkpoint — 70% of the DB time.

background wait event
db file async I/O submit — 90% of bg time

Wait Event Histogram
———————-
db file async I/O submit — 100% >1s (% of waits)
enq: CR – block range reuse ckpt — 100% >1s (% of waits)
enq: KO – fast object checkpoint — 100% >1s (% of waits)
enq: RO – fast object reuse — 100% >1s (% of waits)

After weeks of probing and hours of analyzing system state dumps, Hang analyze and writer process traces we reached to a conclusion that we have strike by a Bug 16342845 : EXCESSIVE CPU IN DBW PROCESSES FOR FAST OBJECT CHECKPOINTSDB

Below mentioned are steps performed to apply the recommended patch.
PRIMARY: disable log shipping
STANDBY: stop MRP

1. Disable Log Shipping on Primary Database.
2. Stop MRP process on Physical Standby Database.
3. Apply the patch first in to Physical Standby (online)
4. Monitor the performance on the Phy STDBY db.
5. Will apply the patch on Logical Standby Database.
6. Monitor the logical database performance.
7. Will apply the patch on Primary Database.
8. Monitor the system performance.
9. Start Log shipping on Primary DB end.
10.Start MRP process on Standby database.

Step 1.
Disable log shipping on Primary DB.

SQL> show parameter LOG_ARCHIVE_DEST_

NAME TYPE VALUE
———————————— ——————————– ——————————
log_archive_dest_1 string location=”/vol5/oracle/dixit_db_
arc/dixitdb/arch”, valid_for=(A
LL_LOGFILES,ALL_ROLES)

log_archive_dest_2 string service=”dixitdb_al”, LGWR ASYNC
NOAFFIRM delay=0 optional com
pression=disable max_failure=0
max_connections=1 reopen=300
db_unique_name=”dixitdb_al” net_
timeout=30, valid_for=(all_log
files,primary_role)

log_archive_dest_3 string service=”dixitdb_r”, NOAFFIRM AS
YNC VALID_FOR=(ALL_LOGFILES,PR
IMARY_ROLE) DB_UNIQUE_NAME=one
tm_r

alter system set log_archive_dest_state_2=defer scope=both;

Step 2:
Stop MRP process on Physical Standby Database.

Before
PROCESS STATUS DELAY_MINS
——— ———— ———-
MRP0 WAIT_FOR_LOG 0

SQL> alter database recover managed standby database cancel;

After cancelling recover process the MRP automatically stops.

PROCESS STATUS DELAY_MINS
——— ———— ———-

Step 3:
Opatch online apply – common to all environments.

Recommendations
1. Take binaries backup (ORACLE_HOME).
2. Take Full DB Backup.
3. Check Opatch Prerequisites in order to identify and resolve any patch conflicts.

-bash-3.2$ export PATH=$PATH:$ORACLE_HOME/OPatch

-bash-3.2$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Oracle Interim Patch Installer version 11.2.0.3.0
Copyright (c) 2012, Oracle Corporation. All rights reserved.

PREREQ session

Oracle Home : /opt/app/oracle/product/11gR2
Central Inventory : /opt/app/oracle/oraInventory
from : /opt/app/oracle/product/11gR2/oraInst.loc
OPatch version : 11.2.0.3.0
OUI version : 11.2.0.3.0
Log file location : /opt/app/oracle/product/11gR2/cfgtoollogs/opatch/opatch2015-03-01_05-34-42AM_1.log

Invoking prereq “checkconflictagainstohwithdetail”

Prereq “checkConflictAgainstOHWithDetail” passed.

-bash-3.2$ opatch apply online -connectString dixitdb_al:sys:xxxxxxx
Oracle Interim Patch Installer version 11.2.0.3.0
Copyright (c) 2012, Oracle Corporation. All rights reserved.

Oracle Home : /opt/app/oracle/product/11gR2
Central Inventory : /opt/app/oracle/oraInventory
from : /opt/app/oracle/product/11gR2/oraInst.loc
OPatch version : 11.2.0.3.0
OUI version : 11.2.0.3.0
Log file location : /opt/app/oracle/product/11gR2/cfgtoollogs/opatch/20505138_Mar_01_2015_05_35_29/apply2015-03-01_05-35-29AM_1.log

The patch should be applied/rolled back in ‘-all_nodes’ mode only.
Converting the RAC mode to ‘-all_nodes’ mode.
Applying interim patch ‘20505138’ to OH ‘/opt/app/oracle/product/11gR2’
Verifying environment and performing prerequisite checks…
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Y
Backing up files…

Patching component oracle.rdbms, 11.2.0.3.0…
Installing and enabling the online patch ‘bug20505138.pch’, on database ‘dixitdb_al’.

Verifying the update…
Patch 20505138 successfully applied
Log file location: /opt/app/oracle/product/11gR2/cfgtoollogs/opatch/20505138_Mar_01_2015_05_35_29/apply2015-03-01_05-35-29AM_1.log

OPatch succeeded.


From Alert log during the patch apply:

Sun Mar 01 06:17:13 2015
Patch file bug20505138.pch is out of sync with oracle binary; performing fixup
Patch file bug20505138.pch has been synced with oracle binary
Patch bug20505138.pch Installed – Update #1
Patch bug20505138.pch Enabled – Update #2
Sun Mar 01 06:17:15 2015
Online patch bug20505138.pch has been installed
Online patch bug20505138.pch has been enabled

-bash-3.2$ opatch lsinventory
Oracle Interim Patch Installer version 11.2.0.3.0
Copyright (c) 2012, Oracle Corporation. All rights reserved.

Oracle Home : /opt/app/oracle/product/11gR2
Central Inventory : /opt/app/oracle/oraInventory
from : /opt/app/oracle/product/11gR2/oraInst.loc
OPatch version : 11.2.0.3.0
OUI version : 11.2.0.3.0
Log file location : /opt/app/oracle/product/11gR2/cfgtoollogs/opatch/opatch2015-03-01_05-35-53AM_1.log

Lsinventory Output file location : /opt/app/oracle/product/11gR2/cfgtoollogs/opatch/lsinv/lsinventory2015-03-01_05-35-53AM.txt

——————————————————————————–
Installed Top-level Products (1):

Oracle Database 11g 11.2.0.3.0
There are 1 products installed in this Oracle Home.

Interim patches (5) :

Patch (online) 20505138: applied on Sun Mar 01 05:35:42 CET 2015
Unique Patch ID: 18568953
Created on 9 Feb 2015, 19:34:29 hrs PST8PDT
Bugs fixed:
16367081, 16463153, 16342845


‘Hope That Helps’
Thanks
Prashant Dixit

Posted in Uncategorized | Tagged: , , | Leave a Comment »

 
%d bloggers like this: