Tales From A Lazy Fat DBA

Fan of Oracle DB & Performance, PostgreSQL & Cassandra … \,,/

  • Likes

    • 236,059
  • Archives

  • Categories

  • Subscribe

Archive for the ‘troubleshooting’ Category

oracle.net.ns.NetException: Got minus one from a read call

Posted by FatDBA on September 3, 2020

Hi Guys,

Would like to discuss one problem that I was facing today in on one of the Oracle 12c Release 1 standalone database where application team started explaining the problem that they are getting when doing application restart, specially oracle NET exception of ‘Got minus one from a read call


Caused by: java.lang.RuntimeException: java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/jdbc/ProdMonkeyPD
    Caused by: java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/jdbc/ProdMonkeyPD
    Caused by: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/jdbc/ProdMonkeyPD
    Caused by: javax.resource.ResourceException: IJ031084: Unable to create connection
    Caused by: java.sql.SQLRecoverableException: IO Error: Got minus one from a read call
    Caused by: oracle.net.ns.NetException: Got minus one from a read call"}}
 

They were suspecting the issue with the high number of opened files on OS. The files count was too high when listing using lsof command on this RHEL7 system and the count goes down once the services are stopped. well I was able to explain the difference that exists between RHEL6 & RHEL7 when doing count using lsof. It was in RHEL7/EL7 that shows output including TID as default compared to RHEL6/OL6. Hence the number of open files count increases in RHEL7 as compared to RHEL6. So, it has nothing to do with the error that they have reported.

So, next we have checked database alert log and it was all good, all clean, no errors nothing. I immediately checked the value of “OS_AUTHENT_PREFIX” parameter as it specifies a prefix that Oracle uses to authenticate users attempting to connect to the system. Oracle simply appends this value to the beginning of user’s operating system account name and password and which it later on compares. So, it was set to its default value that is OPS$ and was set for the backward compatibility with previous versions.

So, I have two solutions for the problem

– Set “OS_AUTHENT_PREFIX” to “” (a null string), thereby eliminating the addition of any prefix to operating system account names.
– Set “tcp.validnode_checking = no” in SQLNET.ora file
This is to enable and disable valid node checking for incoming connections. If this parameter is set to yes, then incoming connections are allowed only if they originate from a node that conforms to list specified by TCP.INVITED_NODES or TCP.EXCLUDED_NODES parameters.

So, I tried with the first option and rebooted the database to make changes persistent (this parameter is static) and asked application team to give it a try again, and as expected it worked. The error or the ORACLE NET exception ‘Got minus one from a read call‘ was resolved after applying the first fix itself.

Here the second option is valid too as that also does the same thing, but one fix at a time.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

Cassandra node refused to start – MismatchedInputException: No content to map due to end-of-input

Posted by FatDBA on August 5, 2020

Hi All,

This one will be a quick one, kind of error and solution approach.
This time it’s about Cassandra statup error which I have faced while doing some testing on a POC box and how I resolved it.

In this case I was using Datastax Cassandra 6.8.1 on RHEL6, I tried starting the instance but failed with error which says “MismatchedInputException: No content to map due to end-of-input”.
I’ve attempted to start it few more times but ended up with the same error message. Below is the complete/full error details.


WARN  [main] 2020-08-05 13:43:46,277  DatabaseDescriptor.java:1517 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
INFO  [main] 2020-08-05 13:43:46,281  DseDelegateSnitch.java:39 - Setting my workloads to [Cassandra]
INFO  [main] 2020-08-05 13:43:46,358  YamlConfigurationLoader.java:77 - Configuration location: file:/home/cassandra/dse-6.8.1_node1/resources/cassandra/conf/cassandra.yaml
INFO  [main] 2020-08-05 13:43:46,360  DseDelegateSnitch.java:41 - Initialized DseDelegateSnitch with workloads [Cassandra], delegating to com.datastax.bdp.snitch.DseSimpleSnitch
INFO  [main] 2020-08-05 13:43:46,424  YamlConfigurationLoader.java:77 - Configuration location: file:/home/cassandra/dse-6.8.1_node1/resources/cassandra/conf/cassandra.yaml
ERROR [main] 2020-08-05 13:43:47,904  DseModule.java:126 - Unable to start server. Exiting...
org.apache.cassandra.io.FSReadError: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
 at [Source: (BufferedInputStream); line: -1, column: 0]
        at com.datastax.bdp.db.nodes.Nodes.transactionalRead(Nodes.java:195)
        at com.datastax.bdp.db.nodes.Nodes.access$100(Nodes.java:76)
        at com.datastax.bdp.db.nodes.Nodes$Local.(Nodes.java:477)
        at com.datastax.bdp.db.nodes.Nodes.(Nodes.java:97)
        at com.datastax.bdp.db.nodes.Nodes$Instance.setup(Nodes.java:911)
        at org.apache.cassandra.config.DatabaseDescriptor.applyMetadata(DatabaseDescriptor.java:587)
        at org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:234)
        at org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:249)
        at com.datastax.bdp.DseCoreModule.(DseCoreModule.java:84)
        at com.datastax.bdp.DseModule.getRequiredModules(DseModule.java:163)
        at com.datastax.bdp.server.AbstractDseModule.configure(AbstractDseModule.java:27)
        at com.datastax.bdp.DseModule.configure(DseModule.java:89)
        at com.google.inject.AbstractModule.configure(AbstractModule.java:62)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
        at com.google.inject.spi.Elements.getElements(Elements.java:110)
        at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
        at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
        at com.google.inject.Guice.createInjector(Guice.java:96)
        at com.google.inject.Guice.createInjector(Guice.java:73)
        at com.google.inject.Guice.createInjector(Guice.java:62)
        at com.datastax.bdp.ioc.DseInjector.get(DseInjector.java:36)
        at com.datastax.bdp.DseModule.main(DseModule.java:103)
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
 at [Source: (BufferedInputStream); line: -1, column: 0]
 


Now, let’s talk about the fix that I have applied to get rid of the problem, but before that I recalled that the database was abruptly shutdown last time due to server/hsot crashed. This file contains local startup information like data center, release info, scheam version, native transport address, port info (JMX, storage etc.), broadcast address etc.


[cassandra@fatdba-doccass bin]$ cd /var/lib/cassandra/metadata/nodes/
[cassandra@fatdba-doccass nodes]$ ls
local  peers  snapshots
[cassandra@fatdba-doccass nodes]$ rm -rf local
[cassandra@fatdba-doccass nodes]$
[cassandra@fatdba-doccass nodes]$
[cassandra@fatdba-doccass nodes]$ ls -ltrh
total 4.0K
drwxrwxr-x. 2 cassandra cassandra 6 Jul 27 16:07 snapshots
-rw-rw-r--. 1 cassandra cassandra 1 Aug  4 18:29 peers
 

Now let’s try to start the node once again.


[cassandra@fatdba-doccass bin]$
[cassandra@fatdba-doccass bin]$ ./dse cassandra
 


And it got started after clearning the local details/information which was there when node was crashed.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

Golden Gate 19.1.0 – VIEW REPORT & VIEW PARAMS command failed

Posted by FatDBA on March 24, 2020

Hi Folks,

Would like to share one of the latest problem that I’ve faced with Golden Gate release 19.1.0 where two of the most frequently used commands, VIEW REPORT and VIEW PARAMS command simply fails in the administration client console. This is a Windows server environment where I was doing some testing with this new release.
I mean it produces no additional logs or information while doing that, it simply tries to wake up and crashes in next few seconds.

So, with no idea about the problem, plus this being a new release of Oracle GG (19.1) there wasn’t any metalink note or public documentation available, so reached Oracle support. They gave a very simple but logical solution to handle this issue. They asked me to once again set the default text viewer program for viewing parameter and report files by passing following command and it worked


SET PAGER notepad 


They later on declared this issue as a BUG with code 30427030, but bug details are yet to be published on Metalink website.


Stay inside, learn something new during this self isolation & stop the spread of Covid-19.
Stay Healthy and Safe Everyone!

Prashant Dixit
 

Posted in troubleshooting | Tagged: , | Leave a Comment »

Oracle Real Application Testing (RAT) – Part 1: What it is ?

Posted by FatDBA on January 31, 2020

Hi Guys,

As committed I am back with the first edition or the post on Oracle RAT (Real Application Testing) and there be couple more follow up chapters on the same in next few days or weeks.

Alright, recently during one of our mission-critical production database migration we reached a point where we had to perform the Load Test before pushing the real-time workload on to this new system. I was asked to prepare the strategy and to pick the best possible tool to access the performance of the performance of this new system and how it will respond to the current traffic.

Received lot’s of suggestions from rest of the team, i.e. Swingbench, Loadrunner, Orion etc. but most of them are with a predefined set of Supplied Benchmarks though few are customized but are more related to the server performance and bechmarking but not at the Database or SQL level. And considering the notorious behavior of many of the custom code and legacy application modules I was more leaned towards picking a tool which covers both Database and SQL, and we finally agreed on Oracle RAT.

Oracle Real Application Testing, an option that comes with Oracle Enterprise Edition. Oracle Real Application Testing helps you to test the real-life workload after changes on the database such as database upgrades, OS upgrades, parameter changes, hardware replacement, etc. So, in short the Oracle RAT will be system stress test tool to simulate production load. Introduced in Oracle 11g Release 1. But yes, it’s not free and comes with additional cost and licenses.

There are two features “Database Replay” and “SQL Performance Analyzer” will help fine-tuning on the database before passing production.
I will cover more about the ‘Database Replay’ feature here and might cover the ‘SQL Performance Analyzer’ feature later.

When can you use RAT – “Database Replay” feature?
System Changes
– Hardware replacement such as CPU, RAM, etc.
– Database and OS upgrades
– Storage changes (OCFS2 – ASM)
– OS changes (Windows – Linux)
Configuration Changes
– Single Instance – RAC– Patch installation– Database parameter change

Which database versions are supported?
The workload capture process is supported on the Oracle Database 10g R2 (10.2.0.4) and above versions. The worload replay process is supported on the Oracle Database 11g R1 and above versions.

How to do it, where to start and all ?
Well there are two different ways you can perform the RAT (DB Replay) testing
– Using Oracle Enterprise Manager (OEM) : This option is entirely GUI based where you select your source and target systems and by doing all those clicks performs this stress/load testing on the system.
– Using command line way (My preferred way of doing this, yes I am ‘old school’) using DBMS_WORKLOAD_CAPTURE & DBMS_WORKLOAD_REPLAY procedures.

Some High Level Steps:
– Capture workload into capture files (In the form of .rec files, are flat files)
– Copy files to test system and preprocess them (to make them machine understandable)
– Replay files on test system (play the recorded files)
– Perform detailed analysis of workload capture and replay using reports generated by Database Replay. (Reporting for bench markings)

ON SOURCE System:
dbms_workload_capture.start_capture 
dbms_workload_capture.finish_capture; 

Copy the workload files to the client system. For example: – /home/oracle/rat/test1

On TARGET System:
1. dbms_workload_replay.process_capture 
2. dbms_workload_replay.initialize_replay 
3. dbms_workload_replay.prepare_replay 
4. Run the workload client to calibrate the replay. The calibration process (mode=CALIBRATE) recommends the number of client processes required to perform the replay
5. Replay the workload using below command.
dbms_workload_replay.start_replay; 

Components: The ARCHITECTURE (Simplified)

DB REPLAY “The Big Picture”

What is a Workload Client ?
The REPLAY uses wrc clients – Which are multi-threaded JAVA clients and can be started on the same machine/host or on separate hosts.
Will cover about them more in depth in future posts.


[oracle@PDIXIT:RAT]$ wrc system/XXXX mode=calibrate replaydir=/DBCapture/RAT/RAT_13DEC15_19_17
 Workload Replay Client: Release 11.2.0.4.0 - Production on Sat Dec 16 05:50:39 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
 
 
Report for Workload in: /DBCapture/RAT/RAT_13DEC16_19_17
-----------------------
Recommendation:
Consider using at least 13 clients divided among 4 CPU(s)
You will need at least 168 MB of memory per client process.
If your machine(s) cannot match that number, consider using more clients.
 
Workload Characteristics:
- max concurrency: 575 sessions
- total number of sessions: 1729
 
Assumptions:
- 1 client process per 50 concurrent sessions
- 4 client process per CPU
- 256 KB of memory cache per concurrent session
- think time scale = 100
- connect time scale = 100
- synchronization = TRUE
 

Now how to compare/benchmark ?
At the end of both CAPTURE & REPLAY methods you need to generate few process specific report.
Few of the important files that help in benchmarking are:
AWR Reports: Generate the AWR reports for the same time interval when we have any of the two process were in progress. The BEGIN AND END Snaps can be collected from DBA_WORKLOAD_CAPTURES & DBA_WORKLOAD_REPLAYS
CAPTURE/REPLAY Reports: These reports are specific to workload capture and playing on target.
Capture Vs Replay reports.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

Disk Goes Offline after rebalance! – Is this due to a BUG ?

Posted by FatDBA on January 30, 2020

HI Everyone,

Today during one of the activity where we migrated the ASM Storage for one of our 2 Node RAC cluster (running on 11gR2), where we had to perform the disk rebalancing to copy/mirror the contents from older/existing storage to the new storage before we go and drop the older storage partitions, we faced some some weirdness. The disks goes offline in this multi-node ASM and we left stranded with initially no idea behind this behavior, but finally we were able to locate a metalink page for the same issue.

Yes, this was due to a known Bug with number 13476583
Oracle Server (Rdbms) Version
This problem is introduced in the
11.2.0.2.3 Patch Set Update
11.2.0.2.5 Patch Set Update
11.2.0.2.4 Patch Set Update
11.2.0.2.3 Patch Set Update
and in 11.2.0.3, by the fix for bug 10040921.

Problem:
When disks are dropped, a forcible diskgroup dismount is performed on other ASM instance/s.

Workaround or Fix:
1. The problem does not cause diskgroup corruption. So mostly diskgroup can be mounted again.
2. Apply fix
Interim patches here: Patch:13476583
11.2.0.2.6 Patch Set Update
11.2.0.2 Patch 17 on Windows Platforms

Oracle Notes: 245840.1

Hope That Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

RAT Reporting Error: ORA-06502: numeric or value error: character string buffer too small

Posted by FatDBA on December 16, 2019

Hi All,

Today’s topic of discussion is to handle/fix one of the issues that I’d faced while generating RAT (real application testing) reports on 10gR2 database. I know many of us are not yet aware about the tool, it’s purpose and functionality. Very soon I will be writing about this great product from Oracle for database load testing using real/genuine workload and is quite helpful to forecast your DB performance before you migrate.

Alright, coming back to the point – I was trying to generate the RAT Capture report (on target of course) to see what all was there in the capture, its observations, highlights and rest and that’s when we’ve encountered an error (pasted below)



DECLARE
l_report CLOB;
BEGIN
l_report := DBMS_WORKLOAD_CAPTURE.report(capture_id => 81,
format => DBMS_WORKLOAD_CAPTURE.TYPE_HTML);
END;
/ 2 3 4 5 6 7
DECLARE
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at "SYS.DBMS_SWRF_REPORT_INTERNAL", line 7446
ORA-06512: at "SYS.DBMS_SWRF_REPORT_INTERNAL", line 8591
ORA-06512: at "SYS.DBMS_SWRF_REPORT_INTERNAL", line 8521
ORA-06512: at "SYS.DBMS_WORKLOAD_CAPTURE", line 486
ORA-06512: at "SYS.DBMS_WORKLOAD_CAPTURE", line 1214
ORA-06512: at line 4


There are two solutions to this problem:

1. First to drop the common (shared by capture and replay) schemas and their infrastructure tables using below two scripts.
That firstscript below drop schema tables shared by capture and replay and second drops the Capture infrastructure tables.
catwrr.sql – Catalog script for Workload Capture and Replay — this script then rebuilds all the capture and replay related tables.


@@?/rdbms/admin/catnowrr.sql
@@?/rdbms/admin/catwrr.sql
exec prvt_report_registry.register_clients(TRUE); --- This one registers clients 

Note: In that case you might loss all of your previous capture ID details from the system as it simply washes or wipes everything there related with RAT tables. Hence this is kind of a crude and a raw method to fix this issue. And I recommend to always connect with Oracle Support before going to run these scripts on your database!

2. I tried of another approach to avoid this error and generate the RAT capture report from the target instead of Source where we were getting the error.
Is that possible ?? — Yes, you can. After further analysis I found the issue is with the 10gR2 capture reporting code which sometimes throws this error.

So, the second way turned ut to be a better approach here as we have all of our previous stats and data untouched and nothing has been wiped out in this case, as we simply ran the reporting procedure from the target (12c R2 in our case) and that’s how avoided the issue.


Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: | Leave a Comment »

root.sh failing while installing 12cR2 on RHEL7 “Failed to create keys in the OLR” – Did your hostname starts with a number ?

Posted by FatDBA on July 29, 2019

Hi Guys,

I know its been too long since i last posted and it all happened due to some site authentication issues and some personal priorities. Here I am back with new issues, all related with performance, administration, troubleshooting, optimization and other subjects.

This time would like to share one of the issue that i have faced while installing Oracle 12c Release 2 (Yes, I still do installations, sometimes 🙂 ) on a brand new RHEL7 box where everything was good till I ran root.sh which got failed due to a weird error which initially got no hint behind the problem.
Initially i though if this qualifies to be a post and deserves a place here but actually I have spend few days identifying the cause and hours that I have spend with support, so just want to save all that time for you all who might facing the same issue and looking something on Google 🙂

So lets get started!
This is what exactly I got when ran the root.sh script



[root@8811913-monkey-db1:/u011/app1/12.2.0.1/grid]# ./root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u011/app1/12.2.0.1/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u011/app1/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u011/app1/12.2.0.1/crsdata/8811913-monkey-db1/crsconfig/roothas_2019-02-18_00-59-22AM.log
Site name (8811913-monkey-db1) is invalid.clscfg -localadd -z  [-avlookup]
                 -p property1:value1,property2:value2...

  -avlookup       - Specify if the operation is during clusterware upgrade
  -z   - Specify the site GUID for this node
  -p propertylist - list of cluster properties and its value pairs

 Adds keys in OLR for the HASD.
WARNING: Using this tool may corrupt your cluster configuration. Do not
         use unless you positively know what you are doing.

 Failed to create keys in the OLR, rc = 100, Message:


2019/02/18 00:59:28 CLSRSC-188: Failed to create keys in Oracle Local Registry
Died at /u011/app1/12.2.0.1/grid/crs/install/oraolr.pm line 552.
The command '/u011/app1/12.2.0.1/grid/perl/bin/perl -I/u011/app1/12.2.0.1/grid/perl/lib -I/u011/app1/12.2.0.1/grid/crs/install /u011/app1/12.2.0.1/grid/crs/install/roothas.pl ' execution failed


The error simply said that the script failed to ‘create the keys in OLR’. These keys were for HASD that it was attempting to add. I verified all run time logs that got created the time but they too gave no idea about this problem. That is when I had to engage the Oracle customer support and came to know that this all happened due to a new BUG (BUG 26581118 – ALLOW HOSTNAME WITH NUMERIC VALUE) that comes in to picture when you have the hostname starts with a numeral or number and is an RHEL7 and is specific to Oracle 12c Release 2.

Oracle suggested a bug fix (Patch Number: 26751067) for this issue. This is a MERGE patch and fixes both Bug 25499276 & 26581118. One more thing, you have to apply this patch before the root.sh script.
So let me quickly show how to do that (removing all redundant and other sections).



[oracle@8811913-monkey-db1:/u011/app1/12.2.0.1/grid/OPatch]$ ./opatch napply -oh /u011/app1/12.2.0.1/grid -local 26751067/26751067/
Oracle Interim Patch Installer version 12.2.0.1.6
Copyright (c) 2019, Oracle Corporation.  All rights reserved.

...
......

Patch 26751067 successfully applied.
Log file location: /u011/app1/12.2.0.1/grid/cfgtoollogs/opatch/opatch2019-02-18_01-05-41AM_1.log

OPatch succeeded.
[oracle@8811913-monkey-db1:/u011/app1/12.2.0.1/grid/OPatch]$
[oracle@8811913-monkey-db1:/u011/app1/12.2.0.1/grid/OPatch]$


Ran the root.sh after patching and it went smooth.
BTW, in case you don’t want to do all this, simply change the hostname and put any alphabet in front of your hostname i.e. 8811913 –> A8811913 — That’s It!

Hope It Helps!

Thanks
Prashant Dixit

Posted in troubleshooting, Uncategorized | Tagged: | 1 Comment »

OPatch – Error occurred during initialization of VM, Could not reserve enough space for XXXXXXKB object heap

Posted by FatDBA on February 19, 2019

Hi Guys,

Disucssing a random issue what i’ve encountered few hours back, is a problem related with the new version of the OPatch which when unzipped generating a weird error and is discussed below.



[oracle@gunna:~/app/oracle/product/12.2.0/dbhome_1/OPatch/28822515]$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./

Error occurred during initialization of VM
Could not reserve enough space for 39957221KB object heap


On OCS download page for OPatch, the auto version is set to 32-bit (Linux X86).
Check if the name of the downloaded file is something similar ‘p6880880_122010_LINUX.zip’. If yes, then you have downloaded the 32 bit version. Choose ‘Linux x86-64’ as the right vrsion and try again

Let’s try again.



[oracle@gunna:~/app/oracle/product/12.2.0/dbhome_1/OPatch/28822515]$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Oracle Interim Patch Installer version 12.2.0.1.16
Copyright (c) 2018, Oracle Corporation.  All rights reserved.

PREREQ session

Oracle Home       : /home/oracle/app/oracle/product/12.2.0/dbhome_1
Central Inventory : /home/oracle/app/oraInventory
   from           : /home/oracle/app/oracle/product/12.2.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.16
OUI version       : 12.2.0.1.4
Log file location : /home/oracle/app/oracle/product/12.2.0/dbhome_1/cfgtoollogs/opatch/opatch2018-12-24_00-46-02AM_1.log

Invoking prereq "checkconflictagainstohwithdetail"

Prereq "checkConflictAgainstOHWithDetail" passed.

OPatch succeeded.


All good now!

Hope It Helps
Prashant ‘Fatdba’ Dixit

Posted in troubleshooting | Tagged: | Leave a Comment »

Postgres CREATE TABLESPACE failed with Permission denied

Posted by FatDBA on September 2, 2018

Hi Everyone,

This week i was busy preparing one Postgres database for migration purposes and have faced few basic problems while doing some of the rudimentary or elementary operations. Yesterday itself i was trying to create one Tablespace and was getting ‘Permission Denied’ for the defined path/location.
I tried to create this tablespace on different locations but failed every-time with same error related with permissions and interestingly permissions are okay (Directory owned by POSTGRES with RWX permissions). This left me confused about where exactly is the problem.

This is what i was trying and getting the error.

elephant=# CREATE TABLESPACE aleph location '/var/lib/pgsql/tbs';
ERROR:  could not set permissions on directory "/var/lib/pgsql/tbs": Permission denied
elephant=# CREATE TABLESPACE aleph location '/var/lib/pgsql';
ERROR:  could not set permissions on directory "/var/lib/pgsql": Permission denied
elephant=# CREATE TABLESPACE aleph location '/var';
ERROR:  could not set permissions on directory "/var": Permission denied

Then all of the sudden i remember something similar i have encountered in the past while working on Oracle Databases where SELinux Policy Prevents SQLPlus From Connecting to Oracle Database. So, i decided to turn off the SELinux status, and to effectively do it i ran setenforce 0 (Or you can use setenforce Permissive )
* The above commands will switch off SELinux enforcement temporarily until the machine is rebooted. If you would like to make it permanently, edit /etc/sysconfig/selinux, enter:
# vi /etc/sysconfig/selinux

And set / update it as follows:
SELINUX=disabled

[root@fatdba ~]# setenforce 0
[root@fatdba ~]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   permissive
Mode from config file:          enforcing
Policy version:                 26
Policy from config file:        targeted
[root@fatdba ~]#
[root@fatdba ~]#

Now after changing the setting of SELinux i once again tried the same step and BOOM, it worked!

elephant=# CREATE TABLESPACE aleph location '/var/lib/pgsql/tbs';
CREATE TABLESPACE
elephant=#
elephant=#

Hope It Helps
Prashant Dixit

Posted in Basics, troubleshooting | Tagged: | Leave a Comment »

12c OEM Error: LongOpManager$ZombieDetection:1017

Posted by FatDBA on August 31, 2018

Hey Pals,

I am sure, few of the us who have the 12c EM configured on their systems must have received one of the annoying alert/incident where the EM Agent on the server has reported
something about the ‘Zombie’ processes repeatedly.

I recently encountered one such issue with this Production system where the 12c agent frequently sending zombie related alerts, like one below. So, this post of all about handling such issues
and how to fix them or to avoid them.

Host=dixitlab.asi.dixson.corp 
Target type=Agent 
Target name=dixitlab.asi.dixson.corp:3873 
Message=Problem: java.lang.Throwable:oracle.sysman.gcagent.tmmain.execution.LongOpManager$ZombieDetection:1017 
Severity=Critical 
Problem creation time=May 5, 2018 10:22:48 AM ADST 
Last updated time=Aug 29, 2018 1:00:47 AM ADST 
Problem ID=113 
Problem key=java.lang.Throwable:oracle.sysman.gcagent.tmmain.execution.LongOpManager$ZombieDetection:1017 
Incident count=5 
Problem Status=New 
Priority=None 
Escalated=No 
Problem acknowledged By Owner=No 
Rule Name=OEM12c-Problems,Enterprise Problems 
Rule Owner=SYSMAN 

Next quick thing in such cases is always checking the agent (gcagent) logs to understand the exact error or issues with EM or Agent.
And as expected the logs has something interesting in them, i saw a DEBUG message captured where Zombie Detection was initiated by the agent after a delay of lot of attempts.
This happens when an EM Agent task such as collecting metrics is running more than the expected time, the process is marked as a zombie and is one of the leading causes of agent crash or halt.

X-AGENT_PERSISTENCE_WAIT_TIME: 60
X-AGENT_PERSISTENCE_ID: https://dixitlab1.asi.dixson.corp:1830/emd/main/
2018-08-29 17:10:26,050 [31:858161EB] DEBUG - Submitting task ZombieDetector for execution
2018-08-29 17:10:26,050 [216:1AE716D8] DEBUG - Begin task ZombieDetector on Thread: GC.SysExecutor.8
2018-08-29 17:10:26,050 [216:69BEAC9D:GC.SysExecutor.8 (ZombieDetector)] DEBUG - Scheduling next ZombieDetector.Task after delay 60000 including periodShift of 0 milliseconds

There are few of the ways to avoid such issues.
1. Set _zombieSuspensions=TRUE
2. Set _canceledThreadWait=900
3. set _zombieThreadPercentThreshold=0

Other way, that is to avoid the alerts/incidents you can set one of the ‘Hidden’ parameter “_zombieCreateIncident=” to FALSE and set it in the agent configuration file followed by agent restart.
This

[oracle@dixitlab config]$
[oracle@dixitlab config]$ pwd
/u01/app/oracle/new_agent12c/agent_inst/sysman/config


[oracle@dixitlab config]$ more emd.properties|grep _zombieCreateIncident
_zombieCreateIncident=false
[oracle@dixitlab config]$




[oracle@dixitlab bin]$
[oracle@dixitlab bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version          : 12.1.0.5.0
OMS Version            : 12.1.0.5.0
Protocol Version       : 12.1.0.1.0
Agent Home             : /u01/app/oracle/new_agent12c/agent_inst
Agent Log Directory    : /u01/app/oracle/new_agent12c/agent_inst/sysman/log
Agent Binaries         : /u01/app/oracle/new_agent12c/core/12.1.0.5.0
Agent Process ID       : 2978
Parent Process ID      : 12860
Agent URL              : https://dixitlab.asi.dixson.corp:3873/emd/main/
Local Agent URL in NAT : https://dixitlab.asi.dixson.corp:3873/emd/main/
Repository URL         : https://dixitlab1.asi.dixson.corp:4900/empbs/upload
Started at             : 2018-08-29 01:01:08
Started by user        : oracle
Operating System       : Linux version 2.6.32-696.20.1.el6.x86_64 (amd64)
Last Reload            : (none)
Last successful upload                       : 2018-08-29 17:28:20
Last attempted upload                        : 2018-08-29 17:28:20
Total Megabytes of XML files uploaded so far : 1.03
Number of XML files pending upload           : 0
Size of XML files pending upload(MB)         : 0
Available disk space on upload filesystem    : 52.74%
Collection Status                            : Collections enabled
Heartbeat Status                             : Ok
Last attempted heartbeat to OMS              : 2018-08-29 17:29:16
Last successful heartbeat to OMS             : 2018-08-29 17:29:16
Next scheduled heartbeat to OMS              : 2018-08-29 17:30:16

---------------------------------------------------------------



[oracle@dixitlab bin]$ ./emctl stop agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation.  All rights reserved.
Stopping agent ...
 stopped.
[oracle@dixitlab bin]$
[oracle@dixitlab bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent is Not Running
[oracle@dixitlab bin]$
[oracle@dixitlab bin]$
[oracle@dixitlab bin]$ ./emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation.  All rights reserved.
Starting agent .............................. started.
[oracle@dixitlab bin]$
[oracle@dixitlab bin]$
[oracle@dixitlab bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version          : 12.1.0.5.0
OMS Version            : 12.1.0.5.0
Protocol Version       : 12.1.0.1.0
Agent Home             : /u01/app/oracle/new_agent12c/agent_inst
Agent Log Directory    : /u01/app/oracle/new_agent12c/agent_inst/sysman/log
Agent Binaries         : /u01/app/oracle/new_agent12c/core/12.1.0.5.0
Agent Process ID       : 2664
Parent Process ID      : 2455
Agent URL              : https://dixitlab.asi.dixson.corp:3873/emd/main/
Local Agent URL in NAT : https://dixitlab.asi.dixson.corp:3873/emd/main/
Repository URL         : https://dixitlab1.asi.dixson.corp:4900/empbs/upload
Started at             : 2018-08-29 17:33:05
Started by user        : oracle
Operating System       : Linux version 2.6.32-696.20.1.el6.x86_64 (amd64)
Last Reload            : (none)
Last successful upload                       : 2018-08-29 17:34:34
Last attempted upload                        : 2018-08-29 17:34:34
Total Megabytes of XML files uploaded so far : 0.03
Number of XML files pending upload           : 0
Size of XML files pending upload(MB)         : 0
Available disk space on upload filesystem    : 52.71%
Collection Status                            : Collections enabled
Heartbeat Status                             : Ok
Last attempted heartbeat to OMS              : 2018-08-29 17:34:24
Last successful heartbeat to OMS             : 2018-08-29 17:34:24
Next scheduled heartbeat to OMS              : 2018-08-29 17:35:24

---------------------------------------------------------------
Agent is Running and Ready
[oracle@dixitlab bin]$


Hope It Helps
Prashant Dixit

Posted in troubleshooting | Tagged: | Leave a Comment »

 
%d bloggers like this: