Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Posts Tagged ‘RAC’

What is Cluster Health Advisor GUI or CHAG ?

Posted by FatDBA on January 4, 2023

CHA GUI (CHAG) is a graphical user interface for Cluster Health Advisor (CHA) which was earlier internal to Oracle teams but its now available to the customers. It is a standalone, interactive, real-time capable front-end/GUI to the classic CHA utility. Oracle 12.2 is the first version that is supported by CHA GUI (CHAG). You only require RAC license and there is no need for any additional license to use CHAG tool.

CHAG communicates directly with the Grid Infrastructure Management Repository (GIMR) using a JDBC connection.. GIMR is mandatory for CHAG to work as it fetches the data out of the GIMR repository. In case you don’t have the GIMR repo installed, for example on 19c databases as GIMR is optional there, you can use the local mode for CHAG to work, but in absence of GIMR mgmt repo you will not get the historical abilities to go back in time.

Installation is quite simple, you have to download and unzip the software on one of your cluster machines and I recommend not to dump it inside your ORACLE HOME, but in a separate place. CHAG requires X11 or XHost and Java as it uses Java Swing to open the GUI. CHAG can operate in several modes:

  • With a default connection to GIMR Database. This option initiate a live session and provide real-time monitoring.
  • With a user specified location of a GIMR Database (option -P). This option initiate a live session and provide real-time monitoring.
  • Read in a text file with monitoring data (option -f). This initiates a passive session, which allow to analyze textual data extracted from a GIMR, or data collected during a live session. This data is held in a “*.mdb” file.
  • Parse text file with data and generate JSON object with information similar to query “diagnosis” (option -C). This mode of operation ‘chag -C -f ‘ directs CHAG to parse a *.mdb file and to generate a summary of its content in JSON format.

For the offline mode you can get the “mdb” file to analyze using below command. Depending on the time model you will get n number of mdb files for the period.

chactl export repository -format mdb -start <timestamp> -end <timestamp>

About the usage, CHAG is invoked using the ‘chag’ script available in the bin directory of the CHA Home. CHAG is designed primarily for Cluster or Database experts. Usage is quite simple and straight forward, you can move the pointer/slider to choose any particular timeframe to catch problems, their cause and the corrective actions. You can use it both in real time and offline version, its just that for real time you have to be on any of the cluster node, for offline you can generate the MDB file (cha datafile) and can run it anywhere on the client machine with no need of oracle home and only Java will be needed.

Below is the CHAG look and feel, its running on a 2 node RAC cluster where we have two databases installed. Here you see few color codes, where red colors means there were few problems during that interval.

Next is how it explains more about any particular problem caught for the timeslot. Gives you cause and the corrective action, for example in below screenshot it has detected that the ASM IO service time is higher than the expected which points to the underlying IO Subsystem used by ASM disks.

You can use SHIFT keys combinations to get wait event specific details for the selected time period.

You can use few other SHIFT key combinations to present the same data in the form of line graphs.

Few more examples or problems detected by the CHAG. This time it was reporting for redo log writes which are slower and that is something expected as ASM IO is slow too means the entire IO subsystem is impacted.

I highly recommend all readers to go through Doc ID 2340062.1 on metalink for more details on Cluster Health Advisor Graphical User Interface – CHAG.

Hope It Helped!
Prashant Dixit

Advertisement

Posted in Advanced, troubleshooting | Tagged: , , , | Leave a Comment »

root.sh failed on RHEL >7.3 — CLSRSC-400: A system reboot is required to continue installing,

Posted by FatDBA on November 14, 2019

Hi Everyone,

Was little occupied in few of the database migrations happened here at my end, so wasn’t able to post on regular basis. But the good thing is that I have a good list of issues that we faced during the course of this end of end migration and starting from today will try to share all them.

Alright, the one I am going to discuss next is the issue that we encountered while running root.sh script on this ‘Oracle Restart’ setup where the root.sh script failed with below set of errors



Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.2.0.1/grid_home/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/12.2.0.1/crsdata/testserver-monkey/crsconfig/roothas_2019-11-12_10-56-56PM.log
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node testserver-monkey successfully pinned.
2019/11/12 22:57:02 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
 2019/11/12 22:59:06 CLSRSC-400: A system reboot is required to continue installing.
The command '/u01/app/12.2.0.1/grid_home/perl/bin/perl -I/u01/app/12.2.0.1/grid_home/perl/lib -I/u01/app/12.2.0.1/grid_home/crs/install /u01/app/12.2.0.1/grid_home/crs/install/roothas.pl ' execution failed 


To further understand what caused that failure we have checked the log file the above error pointing.
Below are few of the core/main lines when it got failed. So it shows that it failed to load the ADVM/ACFS drivers on the system while running the root.sh script.



>  ACFS-9504: Copying file '/u01/app/12.2.0.1/grid_home/lib/libacfs12.so' to the path '/opt/oracle/extapi/64/acfs/orcl/1/'
>  ACFS-9308: Loading installed ADVM/ACFS drivers.
>  ACFS-9321: Creating udev for ADVM/ACFS.
>  ACFS-9323: Creating module dependencies - this may take some time.
>  ACFS-9176: Entering 'ld usm drvs'
>  ACFS-9154: Loading 'oracleoks.ko' driver.
>  modprobe: FATAL: Module oracleoks not found.
>  ACFS-9109: oracleoks.ko driver failed to load.
>  ACFS-9178: Return code = USM_FAIL
>  ACFS-9177: Return from 'ld usm drvs'
 >  ACFS-9428: Failed to load ADVM/ACFS drivers. A system reboot is recommended.
>  ACFS-9310: ADVM/ACFS installation failed.


The solution to this problem is to apply the one off patch (25078431) to fix this issue with ACFS/ADVM drivers in RHEL > 7.3. Yes, there is a metalink not available for the same file too.
But in our setup even the patch failed to fix the issue as the the .gridSetup -applyoneoffs comes out within 1-2 seconds we ran this command, I mean in short it did nothing and pretends that it applied the patch but the ‘opatch lspatches‘ not showing anything.

Well, we raised this issue with Oracle and they passed it to their development team as there were lot’s of other things running on this DB.
And as you know their DEV team, they don’t have any fixed SLA. Well there is a reason too for them doing like that, as development team does lot’s of testing and regressions hence that is something acceptable.

Well, this problem we anyhow to fix as we had an important test that we planned to perform on this system.
So, comes the time to apply the temporary fix, of course a crude/raw one 🙂

Now, as on this system we don’t need the ACFS, so we can disable the feature right at the code/binary level.
Below are the two main files that when renamed disabled this feature and you are all good to bypass this root.sh check.



acfsdriverstate
acfsroot



You simply have to rename them and re-run the root.sh script it will pass this time and you are done with your GI installation.


Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | 6 Comments »

Convert Standard ASM to Flex ASM.

Posted by FatDBA on March 15, 2018

Hi Mates,

Okay, you have an ASM instance crashed and at the same time the db instance failed on the instance … Expected behavior and many of us have faced this scenario in production RAC setups.
Answer to the question is ‘Flex ASM’ which provides us with something that was previously unattainable: the ability to run multiple, independent in cardinality, ASM instances. You can think of it what SCAN is to Database 11gR2.

Its been a while we have the Flex ASM available for 12c users, now the question – How to convert a Non-Flex ASM setup to Flex enabled ASM.
Below is the method to it, i performed a POC for one of the customer some time back and here are the steps.

Configuration:
RHEL 6, 64 Bit
2 Node 12cR1 RAC setup.
Hostname: rac1, rac2.
DB Instances: dixitdb1, dixitdb2

Let’s first check the network information of the cluster, the network interfaces and their IPv4 addresses, you can collect this info using oifcfg tool.

[oracle@rac1 ~]$ oifcfg getif
eth0  192.168.56.0  global  public
eth1  192.168.10.0  global  cluster_interconnect
[oracle@rac1 ~]$

Next lets check the ASM information and current mode.

[oracle@rac1 ~]$ srvctl status asm
ASM is running on rac1,rac2

[oracle@rac1 ~]$  srvctl config asm
ASM home: 
Password file: +DATA/orapwASM
ASM listener: LISTENER

[oracle@rac1 ~]$ asmcmd showclustermode
ASM cluster : Flex mode disabled
[oracle@rac1 ~]$

Okay, now lets do the conversion, we will be doing the silent conversion. You can use the ASMCA GUIas well to do the same.
Here used 192.168.10.0 as the IP and a free port for ASM LISTENER, we will use 1526 port here for listening all requests.

[oracle@rac1 ~]$ asmca -silent -convertToFlexASM -asmNetworks eth1/192.168.10.0 -asmListenerPort 1526

To complete ASM conversion, run the following script as privileged user in local node.
/u01/app/oracle/cfgtoollogs/asmca/scripts/converttoFlexASM.sh

Okay, so the last step generated an auto script which needs to be executed from root to do the real work. This will bounce all RAC components one by one on each node. By the end of the step we will have a new LISTENER exclusively created for the ASM instance and both of the two instances (ASM1, ASM2) will be registered with it.

[oracle@rac1 ~]$ su - root
Password:
[root@rac1 ~]# /u01/app/oracle/cfgtoollogs/asmca/scripts/converttoFlexASM.sh
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.cvu' on 'rac1'
CRS-2673: Attempting to stop 'ora.mgmtdb' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.dixitdb.db' on 'rac1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.oc4j' on 'rac1'
CRS-2677: Stop of 'ora.cvu' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cvu' on 'rac2'
CRS-2677: Stop of 'ora.DATA.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'rac1' succeeded
CRS-2676: Start of 'ora.cvu' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.scan2.vip' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.scan3.vip' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.scan2.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.scan2.vip' on 'rac2'
CRS-2677: Stop of 'ora.dixitdb.db' on 'rac1' succeeded
CRS-2677: Stop of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac2'
CRS-2677: Stop of 'ora.mgmtdb' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.MGMTLSNR' on 'rac1'
CRS-2677: Stop of 'ora.scan3.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.scan3.vip' on 'rac2'
CRS-2676: Start of 'ora.scan2.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.MGMTLSNR' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.MGMTLSNR' on 'rac2'
CRS-2676: Start of 'ora.rac1.vip' on 'rac2' succeeded
CRS-2676: Start of 'ora.scan3.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN3.lsnr' on 'rac2'
CRS-2676: Start of 'ora.MGMTLSNR' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.mgmtdb' on 'rac2'
CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'rac2' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN3.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.oc4j' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.oc4j' on 'rac2'
CRS-2676: Start of 'ora.oc4j' on 'rac2' succeeded
CRS-2676: Start of 'ora.mgmtdb' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'rac1'
CRS-2677: Stop of 'ora.ons' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac1'
CRS-2677: Stop of 'ora.net1.network' on 'rac1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac1' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.storage' on 'rac1'
CRS-2677: Stop of 'ora.storage' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
Oracle Grid Infrastructure restarted in node rac1
PRCC-1014 : ASMNET1LSNR_ASM was already running
PRCR-1004 : Resource ora.ASMNET1LSNR_ASM.lsnr is already running
PRCR-1079 : Failed to start resource ora.ASMNET1LSNR_ASM.lsnr
CRS-5702: Resource 'ora.ASMNET1LSNR_ASM.lsnr' is already running on 'rac1'
CRS-5702: Resource 'ora.ASMNET1LSNR_ASM.lsnr' is already running on 'rac2'
ASM listener ASMNET1LSNR_ASM running already
CRS-2673: Attempting to stop 'ora.crsd' on 'rac2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.cvu' on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.mgmtdb' on 'rac2'
CRS-2673: Attempting to stop 'ora.dixitdb.db' on 'rac2'
CRS-2673: Attempting to stop 'ora.oc4j' on 'rac2'
CRS-2677: Stop of 'ora.cvu' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.cvu' on 'rac1'
CRS-2676: Start of 'ora.cvu' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.scan3.vip' on 'rac2'
CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.scan2.vip' on 'rac2'
CRS-2677: Stop of 'ora.scan3.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.scan3.vip' on 'rac1'
CRS-2677: Stop of 'ora.scan2.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.scan2.vip' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.dixitdb.db' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.rac2.vip' on 'rac2'
CRS-2677: Stop of 'ora.mgmtdb' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.MGMTLSNR' on 'rac2'
CRS-2676: Start of 'ora.scan3.vip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.rac2.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.rac2.vip' on 'rac1'
CRS-2672: Attempting to start 'ora.LISTENER_SCAN3.lsnr' on 'rac1'
CRS-2677: Stop of 'ora.MGMTLSNR' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.MGMTLSNR' on 'rac1'
CRS-2676: Start of 'ora.scan2.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'rac1'
CRS-2676: Start of 'ora.rac2.vip' on 'rac1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN3.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.MGMTLSNR' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.mgmtdb' on 'rac1'
CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'rac1' succeeded
CRS-2677: Stop of 'ora.oc4j' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.oc4j' on 'rac1'
CRS-2676: Start of 'ora.mgmtdb' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'rac2'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac2'
CRS-2677: Stop of 'ora.FRA.dg' on 'rac2' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'rac2' succeeded
CRS-2676: Start of 'ora.oc4j' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'rac2'
CRS-2677: Stop of 'ora.ons' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac2'
CRS-2677: Stop of 'ora.net1.network' on 'rac2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac2' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac2'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'
CRS-2673: Attempting to stop 'ora.storage' on 'rac2'
CRS-2677: Stop of 'ora.storage' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac2'
CRS-2677: Stop of 'ora.ctssd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'
CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac2'
CRS-2672: Attempting to start 'ora.evmd' on 'rac2'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
CRS-2676: Start of 'ora.diskmon' on 'rac2' succeeded
CRS-2676: Start of 'ora.evmd' on 'rac2' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac2'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac2'
CRS-2676: Start of 'ora.ctssd' on 'rac2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac2'
CRS-2676: Start of 'ora.asm' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac2'
CRS-2676: Start of 'ora.storage' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac2'
CRS-2676: Start of 'ora.crsd' on 'rac2' succeeded
Oracle Grid Infrastructure restarted in node rac2
[root@rac1 ~]#

Okay, so it is done with the reboot of clusterware components and back to the prompt.
Let’s verify if it has been done or not …

[root@rac1 ~]# srvctl status asm
ASM is running on rac1,rac2

[root@rac1 ~]# asmcmd showclustermode
ASM cluster : Flex mode enabled           >>>> Flex Mode is ON now.

[root@rac1 ~]#  srvctl config asm
ASM home: 
Password file: +DATA/orapwASM
ASM listener: LISTENER
ASM instance count: ALL
Cluster ASM listener: ASMNET1LSNR_ASM

And we have a new LISTENER named ‘ASMNET1LSNR_ASM’ created for ASM.

[root@rac2 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE


[root@rac1 ~]# ps -ef|grep tns
root        10     2  0 15:59 ?        00:00:00 [netns]
oracle   22167     1  0 18:46 ?        00:00:00 /u01/app/12.1.0/grid_1/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
oracle   22291     1  0 18:46 ?        00:00:00 /u01/app/12.1.0/grid_1/bin/tnslsnr LISTENER -no_crs_notify -inherit
oracle   22532     1  0 18:46 ?        00:00:00 /u01/app/12.1.0/grid_1/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
oracle   22535     1  0 18:46 ?        00:00:00 /u01/app/12.1.0/grid_1/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
oracle   22544     1  0 18:46 ?        00:00:00 /u01/app/12.1.0/grid_1/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit
root     30044 19089  0 18:52 pts/1    00:00:00 grep tns

[root@rac1 ~]# ps -ef|grep pmon
oracle   21494     1  0 18:46 ?        00:00:00 asm_pmon_+ASM1
oracle   22439     1  0 18:46 ?        00:00:00 ora_pmon_dixitdb1
oracle   22839     1  0 18:46 ?        00:00:00 mdb_pmon_-MGMTDB
root     30089 19089  0 18:52 pts/1    00:00:00 grep pmon
[root@rac1 ~]#


[root@rac1 ~]# lsnrctl status ASMNET1LSNR_ASM

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 13-MAR-2018 18:54:09

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                13-MAR-2018 18:46:25
Uptime                    0 days 0 hr. 7 min. 43 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0/grid_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/rac1/asmnet1lsnr_asm/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.1)(PORT=1526)))
Services Summary...
Service "+ASM" has 2 instance(s).
  Instance "+ASM1", status READY, has 2 handler(s) for this service...
  Instance "+ASM2", status READY, has 1 handler(s) for this service...
The command completed successfully
[root@rac1 ~]#


[root@rac1 ~]# srvctl config listener -l ASMNET1LSNR_ASM
Name: ASMNET1LSNR_ASM
Type: ASM Listener
Owner: oracle
Subnet: 192.168.10.0
Home: 
End points: TCP:1526
Listener is enabled.
Listener is individually enabled on nodes:
Listener is individually disabled on nodes:

[root@rac1 ~]# srvctl status listener -l ASMNET1LSNR_ASM
Listener ASMNET1LSNR_ASM is enabled
Listener ASMNET1LSNR_ASM is running on node(s): rac1,rac2

Let’s do some testing, i will here try to stop one of the ASM instance (+ASM1) on Node1 and will see if the DB Instance still alive and listens to requests.

[root@rac1 ~]# srvctl status database -db dixitdb -f -v
Instance dixitdb1 is running on node rac1. Instance status: Open.
Instance dixitdb2 is running on node rac2. Instance status: Open.

[root@rac1 ~]# srvctl modify asm -count 1
PRCA-1123 : The specified ASM cardinality 1 is less than the minimum cardinality of 2.

Well, this is an expected error because we are running on a 2 Node RAC and Flex ASM (Same as SCAN Listeners) needs at-least 2 Instance up and running which is not possible here in my case. But i will now kill the asm instance manually (Killing the PMON)

[root@rac1 ~]# ps -ef|grep pmon
root      4167  4142  0 19:41 pts/1    00:00:00 grep pmon
oracle   21494     1  0 18:46 ?        00:00:00 asm_pmon_+ASM1
oracle   22439     1  0 18:46 ?        00:00:00 ora_pmon_dixitdb1
oracle   22839     1  0 18:46 ?        00:00:00 mdb_pmon_-MGMTDB

[root@rac1 ~]# kill -9 21494
[root@rac1 ~]#  ps -ef|grep pmon
root      4200  4142  0 19:42 pts/1    00:00:00 grep pmon
oracle   22439     1  0 18:46 ?        00:00:00 ora_pmon_dixitdb1
oracle   22839     1  0 18:46 ?        00:00:00 mdb_pmon_-MGMTDB

Next, let’s see the ASM client connections info on avaialble instance (+ASM2)

SQL> select GROUP_NUMBER, DB_NAME, STATUS, INSTANCE_NAME from  v$asm_client;

GROUP_NUMBER DB_NAME  STATUS       INSTANCE_NAME
------------ -------- ------------ ----------------------------------------------------------------
           1 +ASM     CONNECTED    +ASM2
           2 +ASM     CONNECTED    +ASM2
           1 dixitdb  CONNECTED    dixitdb1
           2 dixitdb  CONNECTED    dixitdb1
           1 dixitdb  CONNECTED    dixitdb2
           2 dixitdb  CONNECTED    dixitdb2
           1 _mgmtdb  CONNECTED    -MGMTDB

7 rows selected.

And we have the Instance 1 (dixitdb1) connected with the +ASM2 instance, as +ASM1 is crashed/dead.
It’s listening all requests via ASM LISTENER, same can be verified or checked in asm listener logs.

13-MAR-2018 19:47:10 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=rac2.localdomain)(USER=oracle))(COMMAND=status)(ARGUMENTS=64)(SERVICE=ASMNET1LSNR_ASM)(VERSION=202375680)) * status * 0
13-MAR-2018 19:47:14 * (CONNECT_DATA=(SERVICE_NAME=+ASM)(INSTANCE_NAME=+ASM2)(CID=(PROGRAM=oracle)(HOST=rac1.localdomain)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.1)(PORT=39062)) * establish * +ASM * 0
 

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | Leave a Comment »

How to monitor/troubleshoot RAC Stack … Answer is “ORAchk”

Posted by FatDBA on March 2, 2015

Below provided are steps to call and a sample health check report generated for a 2 node RAC system via ORAchk Tool.
link to download: https://support.oracle.com/epmos/faces/SearchDocDisplay?_adf.ctrl-state=kfoe5ynno_4&_afrLoop=178033262862018
Note ID: 1268927.2

=======================================================================

[oracle@dixitdb12v dixit]$ ./orachk
This version of orachk was released on 09-Oct-2014 and its older than 120 days. No new version of orachk is available in RAT_UPGRADE_LOC. It is highly recommended that you download the latest version of orachk from my oracle support to ensure the highest level of accuracy of the data contained within the report.

Do you want to continue running this version? [y/n][y]y

CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to /opt/app/grid/11.2.0/grid?[y/n][y]y

Checking ssh user equivalency settings on all nodes in cluster

Node dixitdb13v is configured for ssh user equivalency for oracle user

Searching for running databases . . . . .

. .
List of running databases registered in OCR
1. TESTRAC
2. None of above

Select databases from list for checking best practices. For multiple databases, select 1 for All or comma separated number like 1,2 etc [1-2][1].1
. .

Checking Status of Oracle Software Stack – Clusterware, ASM, RDBMS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
——————————————————————————————————-
Oracle Stack Status
——————————————————————————————————-
Host Name CRS Installed RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name
——————————————————————————————————-
dixitdb12v Yes Yes Yes Yes Yes TESTRAC1
dixitdb13v Yes Yes Yes Yes Yes TESTRAC2
——————————————————————————————————-

Copying plug-ins

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

18 of the included audit checks require root privileged data collection . If sudo is not configured or the root password is not available, audit checks which require root privileged data collection can be skipped.

1. Enter 1 if you will enter root password for each host when prompted

2. Enter 2 if you have sudo configured for oracle user to execute root_orachk.sh script

3. Enter 3 to skip the root privileged collections

4. Enter 4 to exit and work with the SA to configure sudo or to arrange for root access and run the tool later.

Please indicate your selection from one of the above options for root access[1-4][1]:- 1

*** Checking Best Practice Recommendations (PASS/WARNING/FAIL) ***

Collections and audit checks log file is
/home/oracle/dixit/orachk_dixitdb12v_TESTRAC_022715_032812/log/orachk.log

Running orachk in serial mode because expect(/usr/bin/expect) is not available to supply root passwords on remote nodes

NOTICE: Installing the expect utility (/usr/bin/expect) will allow orachk to gather root passwords at the beginning of the process and execute orachk on all nodes in parallel speeding up the entire process. For more info – http://www.nist.gov/el/msid/expect.cfm. Expect is available for all major platforms. See User Guide for more details.

Checking for prompts in /home/oracle/.bash_profile on dixitdb12v for oracle user…

Checking for prompts in /home/oracle/.bash_profile on dixitdb13v for oracle user…

=============================================================
Node name – dixitdb12v
=============================================================

Collecting – ASM Disk Groups
Collecting – ASM Disk I/O stats
Collecting – ASM Diskgroup Attributes
Collecting – ASM disk partnership imbalance
Collecting – ASM diskgroup attributes
Collecting – ASM diskgroup usable free space
Collecting – ASM initialization parameters
Collecting – Active sessions load balance for TESTRAC database
Collecting – Archived Destination Status for TESTRAC database
Collecting – Cluster Interconnect Config for TESTRAC database
Collecting – Database Archive Destinations for TESTRAC database
Collecting – Database Files for TESTRAC database
Collecting – Database Instance Settings for TESTRAC database
Collecting – Database Parameters for TESTRAC database
Collecting – Database Parameters for TESTRAC database
Collecting – Database Properties for TESTRAC database
Collecting – Database Registry for TESTRAC database
Collecting – Database Sequences for TESTRAC database
Collecting – Database Undocumented Parameters for TESTRAC database
Collecting – Database Undocumented Parameters for TESTRAC database
Collecting – Database Workload Services for TESTRAC database
Collecting – Dataguard Status for TESTRAC database
Collecting – Files not opened by ASM
Collecting – Log Sequence Numbers for TESTRAC database
Collecting – Percentage of asm disk Imbalance
Collecting – Process for shipping Redo to standby for TESTRAC database
Collecting – RDBMS Feature Usage for TESTRAC database
Collecting – Redo Log information for TESTRAC database
Collecting – Standby redo log creation status before switchover for TESTRAC database
Collecting – /proc/cmdline
Collecting – /proc/modules
Collecting – CPU Information
Collecting – CRS active version
Collecting – CRS oifcfg
Collecting – CRS software version
Collecting – CSS Reboot time
Collecting – CSS disktimout
Collecting – Cluster interconnect (clusterware)
Collecting – Clusterware OCR healthcheck
Collecting – Clusterware Resource Status
Collecting – DiskFree Information
Collecting – DiskMount Information
Collecting – Huge pages configuration
Collecting – Interconnect network card speed
Collecting – Kernel parameters
Collecting – Maximum number of semaphore sets on system
Collecting – Maximum number of semaphores on system
Collecting – Maximum number of semaphores per semaphore set
Collecting – Memory Information
Collecting – NUMA Configuration
Collecting – Network Interface Configuration
Collecting – Network Performance
Collecting – Network Service Switch
Collecting – OS Packages
Collecting – OS version
Collecting – Operating system release information and kernel version
Collecting – Oracle Executable Attributes
Collecting – Patches for Grid Infrastructure
Collecting – Patches for RDBMS Home
Collecting – Shared memory segments
Collecting – Table of file system defaults
Collecting – Voting disks (clusterware)
Collecting – number of semaphore operations per semop system call
Preparing to run root privileged commands dixitdb12v. Please enter root password when prompted.
root@dixitdb12v’s password:
Collecting – ACFS Volumes status
Collecting – Broadcast Requirements for Networks
Collecting – CRS user time zone check
Collecting – Custom rc init scripts (rc.local)
Collecting – Disk Information
Collecting – Grid Infastructure user shell limits configuration
Collecting – Interconnect interface config
Collecting – Network interface stats
Collecting – Number of RDBMS LMS running in real time
Collecting – OLR Integrity
Collecting – Root user limits
Collecting – Verify no database server kernel out of memory errors
Collecting – root time zone check
Collecting – slabinfo

Data collections completed. Checking best practices on dixitdb12v.
————————————————————————————–

WARNING => Cluster Health Monitor (CHM) repository does not provide recommended level of retention
INFO => Important Automatic Storage Management (ASM) Notes and Technical White Papers
FAIL => Bash is vulnerable to code injection (CVE-2014-6271)
WARNING => ARCHIVELOG mode is disabled for TESTRAC
INFO => $CRS_HOME/log/hostname/client directory has too many older log files.
INFO => ORA-00600 errors found in alert log for TESTRAC
INFO => At some times checkpoints are not being completed for TESTRAC
INFO => Some data or temp files are not autoextensible for TESTRAC
INFO => oracleasm (asmlib) module is NOT loaded
WARNING => Shell limit soft nproc for DB is NOT configured according to recommendation
WARNING => kernel.shmmax parameter is NOT configured according to recommendation
WARNING => Database Parameter memory_target is not set to the recommended value on TESTRAC1 instance
FAIL => Operating system hugepages count does not satisfy total SGA requirements
WARNING => NIC bonding is not configured for interconnect
WARNING => NIC bonding is NOT configured for public network (VIP)
WARNING => OSWatcher is not running as is recommended.
INFO => Jumbo frames (MTU >= 8192) are not configured for interconnect
WARNING => NTP is not running with correct setting
WARNING => Database parameter DB_BLOCK_CHECKING on PRIMARY is NOT set to the recommended value. for TESTRAC
FAIL => Flashback on PRIMARY is not configured for TESTRAC
INFO => Operational Best Practices
INFO => Database Consolidation Best Practices
INFO => Computer failure prevention best practices
INFO => Data corruption prevention best practices
INFO => Logical corruption prevention best practices
INFO => Database/Cluster/Site failure prevention best practices
INFO => Client failover operational best practices
WARNING => fast_start_mttr_target should be greater than or equal to 300. on TESTRAC1 instance

INFO => Information about hanganalyze and systemstate dump
WARNING => Package unixODBC-2.2.14-11.el6-i686 is recommended but NOT installed
WARNING => Package unixODBC-devel-2.2.14-11.el6-i686 is recommended but NOT installed
FAIL => Table AUD$[FGA_LOG$] should use Automatic Segment Space Management for TESTRAC
INFO => Database failure prevention best practices
WARNING => Database Archivelog Mode should be set to ARCHIVELOG for TESTRAC
FAIL => Primary database is NOT protected with Data Guard (standby database) for real-time data protection and availability for TESTRAC
INFO => Parallel Execution Health-Checks and Diagnostics Reports for TESTRAC
WARNING => Package unixODBC-devel-2.2.14-11.el6-x86_64 is recommended but NOT installed
WARNING => Linux transparent huge pages are enabled
WARNING => vm.min_free_kbytes should be set as recommended.
INFO => Oracle recovery manager(rman) best practices
WARNING => RMAN controlfile autobackup should be set to ON for TESTRAC
INFO => Consider increasing the COREDUMPSIZE size
INFO => Consider investigating changes to the schema objects such as DDLs or new object creation for TESTRAC
INFO => Consider investigating the frequency of SGA resize operations and take corrective action for TESTRAC

Best Practice checking completed.Checking recommended patches on dixitdb12v.
———————————————————————————

Collecting patch inventory on CRS HOME /opt/app/grid/11.2.0/grid
Collecting patch inventory on ORACLE_HOME /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————
1 Recommended CRS patches for 112040 from /opt/app/grid/11.2.0/grid on dixitdb12v
———————————————————————————
Patch# CRS ASM RDBMS RDBMS_HOME Patch-Description
———————————————————————————
19769489 no yes /opt/app/oracle/product/11.2.0/dbhome_1Patch description: “Database Patch Set Update : 11.2.0.4.5 (19769489)”
———————————————————————————

———————————————————————————
1 Recommended RDBMS patches for 112040 from /opt/app/oracle/product/11.2.0/dbhome_1 on dixitdb12v
———————————————————————————
Patch# RDBMS ASM type Patch-Description
———————————————————————————
19769489 yes merge Patch description: “Database Patch Set Update : 11.2.0.4.5 (19769489)”
———————————————————————————
———————————————————————————

———————————————————————————
Clusterware patches summary report
———————————————————————————
Total patches Applied on CRS Applied on RDBMS Applied on ASM
———————————————————————————
1 0 1 0
———————————————————————————

———————————————————————————
RDBMS homes patches summary report
———————————————————————————
Total patches Applied on RDBMS Applied on ASM ORACLE_HOME
———————————————————————————
1 1 0 /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————

=============================================================
Node name – dixitdb13v
=============================================================

Collecting – /proc/cmdline
Collecting – /proc/modules
Collecting – CPU Information
Collecting – CRS active version
Collecting – CRS oifcfg
Collecting – CRS software version
Collecting – Cluster interconnect (clusterware)
Collecting – DiskFree Information
Collecting – DiskMount Information
Collecting – Huge pages configuration
Collecting – Interconnect network card speed
Collecting – Kernel parameters
Collecting – Maximum number of semaphore sets on system
Collecting – Maximum number of semaphores on system
Collecting – Maximum number of semaphores per semaphore set
Collecting – Memory Information
Collecting – NUMA Configuration
Collecting – Network Interface Configuration
Collecting – Network Performance
Collecting – Network Service Switch
Collecting – OS Packages
Collecting – OS version
Collecting – Operating system release information and kernel version
Collecting – Oracle Executable Attributes
Collecting – Patches for Grid Infrastructure
Collecting – Patches for RDBMS Home
Collecting – Shared memory segments
Collecting – Table of file system defaults
Collecting – number of semaphore operations per semop system call
Preparing to run root privileged commands dixitdb13v. Please enter root password when prompted.
root@dixitdb13v’s password:

Data collections completed. Checking best practices on dixitdb13v.
————————————————————————————–

FAIL => Bash is vulnerable to code injection (CVE-2014-6271)
INFO => $CRS_HOME/log/hostname/client directory has too many older log files.
INFO => ORA-00600 errors found in alert log for TESTRAC
INFO => At some times checkpoints are not being completed for TESTRAC
INFO => oracleasm (asmlib) module is NOT loaded
WARNING => Shell limit soft nproc for DB is NOT configured according to recommendation
WARNING => kernel.shmmax parameter is NOT configured according to recommendation
WARNING => Database Parameter memory_target is not set to the recommended value on TESTRAC2 instance
FAIL => Operating system hugepages count does not satisfy total SGA requirements
WARNING => NIC bonding is not configured for interconnect
WARNING => NIC bonding is NOT configured for public network (VIP)
WARNING => OSWatcher is not running as is recommended.
INFO => Jumbo frames (MTU >= 8192) are not configured for interconnect
WARNING => NTP is not running with correct setting
WARNING => Database parameter DB_BLOCK_CHECKING on PRIMARY is NOT set to the recommended value. for TESTRAC
WARNING => fast_start_mttr_target should be greater than or equal to 300. on TESTRAC2 instance

INFO => IMPORTANT: Oracle Database Patch 17478514 PSU is NOT applied to RDBMS Home /opt/app/oracle/product/11.2.0/dbhome_1
WARNING => Package unixODBC-2.2.14-11.el6-i686 is recommended but NOT installed
WARNING => Package unixODBC-devel-2.2.14-11.el6-i686 is recommended but NOT installed
WARNING => Package unixODBC-devel-2.2.14-11.el6-x86_64 is recommended but NOT installed
WARNING => Linux transparent huge pages are enabled
WARNING => vm.min_free_kbytes should be set as recommended.
INFO => Consider increasing the COREDUMPSIZE size

Best Practice checking completed.Checking recommended patches on dixitdb13v.
———————————————————————————

Collecting patch inventory on CRS HOME /opt/app/grid/11.2.0/grid
Collecting patch inventory on ORACLE_HOME /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————
1 Recommended CRS patches for 112040 from /opt/app/grid/11.2.0/grid on dixitdb13v
———————————————————————————
Patch# CRS ASM RDBMS RDBMS_HOME Patch-Description
———————————————————————————
18706472 no no /opt/app/oracle/product/11.2.0/dbhome_1GRID INFRASTRUCTURE SYSTEM PATCH 11.2.0.4.3
———————————————————————————

———————————————————————————
1 Recommended RDBMS patches for 112040 from /opt/app/oracle/product/11.2.0/dbhome_1 on dixitdb13v
———————————————————————————
Patch# RDBMS ASM type Patch-Description
———————————————————————————
18706472 no merge GRID INFRASTRUCTURE SYSTEM PATCH 11.2.0.4.3
———————————————————————————
———————————————————————————

———————————————————————————
Clusterware patches summary report
———————————————————————————
Total patches Applied on CRS Applied on RDBMS Applied on ASM
———————————————————————————
1 0 0 0
———————————————————————————

———————————————————————————
RDBMS homes patches summary report
———————————————————————————
Total patches Applied on RDBMS Applied on ASM ORACLE_HOME
———————————————————————————
1 0 0 /opt/app/oracle/product/11.2.0/dbhome_1
———————————————————————————

———————————————————————————
CLUSTERWIDE CHECKS
———————————————————————————
———————————————————————————

Detailed report (html) – /home/oracle/dixit/orachk_dixitdb12v_TESTRAC_022715_032812/orachk_dixitdb12v_TESTRAC_022715_032812.html

UPLOAD(if required) – /home/oracle/dixit/orachk_dixitdb12v_TESTRAC_022715_032812.zip

Thanks
Prashant Dixit

Posted in Uncategorized | Tagged: , | Leave a Comment »

Snapshot ControlFile in RAC Systems and related errors (ORA-00245)

Posted by FatDBA on June 17, 2013

Even after a good Database Upgrade plan one can face issues related to performance, Functionailities, Backups etc. Today I’m going to discuss one of the case that was happened during a production upgrade from 10g r2 -> 11g r2.

We had our database full backup scheduled every night with configuration like – Autobackup ON, Dataa transfer directly to SBT Mediums (Tapes), OPTIMIZATION ON and other basic settings/configurations.
While performing status check next morning we discovered backup got failed with error message:

ORA-00245: control file backup failed; target is likely on a local file system

Error
———
Starting Control File and SPFILE Autobackup at 2013-09-06:21:25:34
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of Control File and SPFILE Autobackup command on ORA_SBT_TAPE_1 channel at 09/06/2013 21:25:36
ORA-00245: control file backup failed; target is likely on a local file system

As we have AUTOBACKUP functionailty ON which takes backup of critical files like controlfile file whenever the database structure metadata in the control file changes and whenever a backup record is added.

Reason:
Fro 11gR2 onwards in RAC database, due to the changes made to the controlfile backup mechanism, any instance in the cluster may write to the snapshot controlfile. Due to this snapshot controlfile need to be visible to all instances.
A snapshot controlfile in 11gr2 must be reachable by all nodes in RAC Environment. If the snapshot controlfile is not available or not shared RMAN will throw such errors during backup operations.
Documentation Link: http://docs.oracle.com/cd/E11882_01/rac.112/e16795/rman.htm#i455026

Solution:
To avoid such situations always keep your snapshot controlfile on a shared location so that it could be accessible by all nodes when needed.

RMAN> configure snapshot controlfile name to ‘/disk02/rmaninfo/snapcontrol/snapcf_cesc1.f’;

using target database control file instead of recovery catalog
new RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/disk02/rmaninfo/snapcontrol/snapcf_cesc1.f’;
new RMAN configuration parameters are successfully stored.

Okay now we have the solution of the problem let’s discuss more about Snapshot Controlfile and about this new change in 11g r2.
– When RMAN performs any operation that requires a consistent view of the control file (such as a backup), it will first create a copy of the control file. This copy is called the snapshot control file. The snapshot control file will be used for the duration of that operation and will be overwritten by any subsequent operation. Even related operations (say, during a backup database plus archive-log operation that does an archive log backup, a database backup, and then another archive log backup) will use newly created snapshot control files, one for each operation.

Why and what’s the need of this new change ?
From 11gR2 onwards, the controlfile backup happens without holding the controlfile enqueue.

Now, what’s this ‘ControlFile Enqueue‘ ?
When we need to back up or resynchronize from the Control file by RMAN, that first creates a snapshot or consistent image of the control file.
If one RMAN job is already backing up the control file while another needs to a new snapshot control file, then may see error:

RMAN-08512: waiting for snapshot controlfile enqueue for 1900 seconds

A job that must wait for the control file enqueue waits for a brief interval and then successfully retrieves the enqueue. RMAN makes up to five attempts to get the enqueue and then fails the job. So, finally with oracle 11g r2 we have the solution of this situation (In RAC Env) by keeping snapshot controlfile on shared location.

Posted in Advanced | Tagged: , | 1 Comment »

 
%d bloggers like this: