Tales From A Lazy Fat DBA

Den of an Oracle DB Performance & Troubleshooting freak … \,,/

  • Likes

    • 203,900
  • Archives

  • Categories

  • Subscribe

  • Prashant Dixit is the FatDBA

  • Follow Tales From A Lazy Fat DBA on WordPress.com
  • My Twitter Feeds

  • Disclaimer!

    FatDBA or Oracle ‘Ant’ is an independent web-blog/site.The experiences, Test cases, views, and opinions expressed in this website are my own and does not reflect the views or opinions of my employer.

    This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content.
    Product and company names mentioned in this website may be the trademarks of their respective owners.

Archive for the ‘Advanced’ Category

Advance/Troubleshooting/Error-Bug Fixing

How to enable RAT feature and some Troubleshooting

Posted by FatDBA on February 28, 2020

Hi Everyone,

This post of all about preparing your database before you start using RAT on it, as it’s little tricky. You have to enable the feature at the binary level before you start using the RAT replay plus you have to patch your database with few of the RAT related mandatory patches.

This post is all about enabling the feature at the binary level and one of the issue that I have faced immediately after that.

My database which is 11.2.0.4.0 (2 Node RAC) has the RAT feature set to FALSE, which means that the option is not yet enabled.


SQL> select * from v$option where parameter like 'Real Application Testing';

PARAMETER                                                        VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
Real Application Testing                                         FALSE
 

Alright, so next step is to enable the feature at the binary level, and to do that you have to run ‘make’ command and bind rat_on to your Oracle Home. Here I have a 2 Node RAC and I am starting with Node 1 and once done will repeat steps on Node 2.


[oracle@Testdb lib]$ make -f ins_rdbms.mk rat_on ioracle
/usr/bin/ar d /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/libknlopt.a kecnr.o
/usr/bin/ar cr /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/libknlopt.a /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/kecwr.o
chmod 755 /u01/app/oracle/product/11.2.0/dbhome_1/bin

 - Linking Oracle
rm -f /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/oracle
gcc  -o /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/oracle -m64 -z noexecstack -L/u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/ -L/u01/app/oracle/product/11.2.0/dbhome_1/lib/ 
-L/u01/app/oracle/product/11.2.0/dbhome_1/lib/stubs/   -Wl,-E /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/opimai.o /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/ssoraed.o /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/ttcsoi.o  -Wl,--whole-archive -lperfsrv11 -Wl,--no-whole-archive /u01/app/oracle/product/11.2.0/dbhome_1/lib/nautab.o /u01/app/oracle/product/11.2.0/dbhome_1/lib/naeet.o /u01/app/oracle/product/11.2.0/dbhome_1/lib/naect.o /u01/app/oracle/product/11.2.0/dbhome_1/lib/naedhs.o /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/config.o  -lserver11 -lodm11 -lcell11 -lnnet11 -lskgxp11 -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11  -lvsn11 -lcommon11 -lgeneric11 -lknlopt `if /usr/bin/ar tv /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap11" ; fi` -lslax11 -lpls11  -lrt -lplp11 -lserver11 -lclient11  -lvsn11 -lcommon11 -lgeneric11 `if [ -f /u01/app/oracle/product/11.2.0/dbhome_1/lib/libavserver11.a ] ; then echo "-lavserver11" ; else echo "-lavstub11"; fi` `if [ -f /u01/app/oracle/product/11.2.0/dbhome_1/lib/libavclient11.a ] ; then echo "-lavclient11" ; fi` -lknlopt -lslax11 -lpls11  -lrt -lplp11 -ljavavm11 -lserver11  -lwwg  `cat /u01/app/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /u01/app/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnnz11 -lzt11 -lmm -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lztkg11 `cat /u01/app/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /u01/app/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnnz11 -lzt11   -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `if /usr/bin/ar tv /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo11"; fi` -L/u01/app/oracle/product/11.2.0/dbhome_1/ctx/lib/ -lctxc11 -lctx11 -lzx11 -lgx11 -lctx11 -lzx11 -lgx11 -lordimt11 -lclsra11 -ldbcfg11 -lhasgen11 -lskgxn2 -lnnz11 -lzt11 -lxml11 -locr11 -locrb11 -locrutl11 -lhasgen11 -lskgxn2 -lnnz11 -lzt11 -lxml11  -loraz -llzopro -lorabz2 -lipp_z -lipp_bz2 -lippdcemerged -lippsemerged -lippdcmerged  -lippsmerged -lippcore  -lippcpemerged -lippcpmerged  -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lsnls11 -lunls11  -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lasmclnt11 -lcommon11 -lcore11 -laio    `cat /u01/app/oracle/product/11.2.0/dbhome_1/lib/sysliblist` -Wl,-rpath,/u01/app/oracle/product/11.2.0/dbhome_1/lib -lm    `cat /u01/app/oracle/product/11.2.0/dbhome_1/lib/sysliblist` -ldl -lm   -L/u01/app/oracle/product/11.2.0/dbhome_1/lib

test ! -f /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle ||\
           mv -f /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracleO
mv /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/oracle /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
[oracle@Testdb lib]$
 

Alright, so it’s done. No errors during the enablement process. Let’s connect with the database and see if the status of the feature.


[oracle@Testdb ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Mon Sep 16 22:14:59 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected.
SQL> alter system switch logfile;

ERROR at line 1:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
 

Oops, why is that. I have set the environmental variables and everything is in place then why it’s not allowing me to connect with the database and say not logged on. This is strange!

Let’s check if there is anything captured withing alert logs and other traces.


Mon Jan 16 22:19:48 2019
WARNING: Oracle executable binary mismatch detected.
 Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/Testdb/Testdb1/trace/Testdb1_cjq0_176885.trc:
Mon Jan 16 22:19:49 2019
WARNING: Oracle executable binary mismatch detected.
 Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
Mon Jan 16 22:19:49 2019
Errors in file /u01/app/oracle/diag/rdbms/Testdb/Testdb1/trace/Testdb1_j000_1389.trc:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54322 (dba), current egid = 54321 (oinstall)
Mon Jan 16 22:19:49 2019
WARNING: Oracle executable binary mismatch detected.
 Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
 

But it’s all good, no error except a general warning of “_disable_image_check” which happens as we have made a live change and it’s reporting that a mismatch is there at the binary level which were used to start the database instance. This can be easily suppressed by setting the parameter “_disable_image_check” to true.

Alright, so no idea or any hint from alerts and other logs.
Next I have checked the permissions of executable ‘oracle’ under ORACLE_HOME/bin folder.


[oracle@Testdb bin]$ ls -ll oracle
-rwsr-s--x 1 oracle oinstall 228395182 Sep 16 23:02 oracle
 


Why the group has changed from ‘dba’ to ‘oinstall’. This is strange!
Let me revert back the group name and re-assign it back to the older group.


[oracle@Testdb bin]$ chown oracle:dba oracle
 


Second I have noted the change in file permissions. It’s missing the ‘setuid’ access right flag from it, which is present in Node 2 where we didn’t made any change. I have revert back the permissions as well!


[oracle@Testdb bin]$ ls -ll oracle
-rwxr-x--x 1 oracle dba 228395182 Sep 16 23:02 oracle

[oracle@Testdb bin]$ chmod 6751 oracle

[oracle@Testdb bin]$ ls -ll oracle
-rwsr-s--x 1 oracle dba 228395182 Sep 16 23:02 oracle
 

And it worked, no errors this time and I was able to connect with the database using SQL Plus.


SQL> select * from v$option where parameter like 'Real Application Testing';

PARAMETER                                                        VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
Real Application Testing                                         TRUE

SQL> select name, open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
Testdb  READ WRITE

SQL> select name, open_mode from gv$database;

NAME      OPEN_MODE
--------- --------------------
Testdb  READ WRITE
Testdb  READ WRITE
 

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | Leave a Comment »

RAT (Real Application Testing) Replay using Filters

Posted by FatDBA on February 18, 2020

Hi Everyone,

My fourth post in a row about one of Oracle’s most underutilized tool named Real Application Testing (RAT) which is no doubt an extremely cost-effective and easy-to-use proactive performance management solution that enables businesses to fully assess the outcome of a system changes in test or production.

Alright, today’s post is all about RAT Reply in circumstances when you want to filter something from your captured workload. I mean for example you have captured 1 hour of production workload which you want to play on your target database but not everything, I mean want to exclude few of the schemas.

Well, this can be easily done during the load capture time as well, but in case if you missed that then this can also be done during replay time. So, today’s post is all about that. Once again I would like to explain the method using step-wise approach.

Step 1:
Suppose you want to exclude user with name DIXIT
Create exclusion filter for username DIXIT.


BEGIN
DBMS_WORKLOAD_REPLAY.ADD_FILTER (fname => 'ignore_dixit30', fattribute => 'USER', fvalue => 'DIXIT');
END;
/
 

Step 2:
Next you need to create the filter set which will be used by the replay during execution. This is where you have to pass the replay directory details, Name of the filter set you want and the action to include or bind it with the replay.


exec DBMS_WORKLOAD_REPLAY.CREATE_FILTER_SET (replay_dir => 'RATOCT10', filter_set => 'MyReplayFilterdixit30', default_action => 'INCLUDE');
 

Step 3:
Next you need to initialize the replay.


execute DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY('REPLAY_30OCT19', 'RATOCT10');
 

Step 4:
The next step is to use the filter which we have created at step 2 above.
Note: The replay MUST be initialized first before calling USER_FILTER_SET


exec DBMS_WORKLOAD_REPLAY.USE_FILTER_SET (filter_set => 'MyReplayFilterdixit30');
 

Step 5:
Next you can verify if the filter set and filters are created or not. You can do that using below provided dynamic views relevant to REPLAY or RAT.


select CAPTURE_ID, SET_NAME, FILTER_NAME, VALUE from DBA_WORKLOAD_REPLAY_FILTER_SET;
select * from DBA_WORKLOAD_FILTERS where value='DIXIT';
 

Step 6:
Next proceed with the prepare and start the Replay

execute DBMS_WORKLOAD_REPLAY.PREPARE_REPLAY(synchronization => 'SCN')
 

Step 7:
Start recommended number of workload clients on host/hosts.
Note: It’s not necessary to use the DEBUG & workdir parameters as we only used them to generate extra diagnostic logs or traces that we will use in case if replay fails. If not needed only mode, replaydir parameters are required.


RAC Node 1:
nohup wrc system/xxxxxx mode=replay replaydir=/exp_pd/ratreplaydiroct/ratoctober10th_main workdir=/exp/ratreplaydiroct/traces_forreplay_date10012019 debug=ON > wrc301.log 2>&1&

nohup wrc system/xxxxxx mode=replay replaydir=/exp_pd/ratreplaydiroct/ratoctober10th_main workdir=/exp/ratreplaydiroct/traces_forreplay_date10012019 debug=ON > wrc302.log 2>&1&


RAC Node 2:
nohup wrc system/xxxxxx mode=replay replaydir=/exp_pd/ratreplaydiroct/ratoctober10th_main workdir=/exp/ratreplaydiroct/traces_forreplay_date10012019 debug=ON > wrc303.log 2>&1&

nohup wrc system/xxxxxx mode=replay replaydir=/exp_pd/ratreplaydiroct/ratoctober10th_main workdir=/exp/ratreplaydiroct/traces_forreplay_date10012019 debug=ON > wrc304.log 2>&1&
 

Step 8:
With all above steps running fine and no errors reported, we can start the replay next which will use the created exclusion filters and play rest of the load on the target database.


execute DBMS_WORKLOAD_REPLAY.START_REPLAY();
 

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: | Leave a Comment »

Oracle Real Application Testing (RAT) – Part 3 : What is Replay and how to do it ?

Posted by FatDBA on February 14, 2020

Hi Guys,

In the same continuation from where I left last time …
This is the Part 3 of the final part of this three post series on Oracle Real Application Testing or RAT. This part mainly deals with the replay side or the target end where you finally replays of triggers the captured workload.

Alright, so let’s get started!
I am starting immediately after step 6 (which we covered in Part 2).

Step 7:
Next move all files from capture directory to target system (12c POC VM box in this case). I have created the directory for REPLAY purposes there at the target to receive all these files and to play them later on.

Next transfer the files from source.

Match the count on target directory, should be same.

RAT Replay (General) Steps:

Step 1:
The first step in preparation is to process the CAPTURE in the replay directory.
Note: This will be a one-time activity and only do if you are doing it for the first time. If the load is already processed and you have replayed it earlier then there is not any use to follow this step.


BEGIN
DBMS_WORKLOAD_REPLAY.PROCESS_CAPTURE(capture_dir=>'RAT_TRIALRUN');
END;
/
 

Step 2:
RESTORE DB: At This point we need to restore the database prior to the point when the Capture was started Note: No need to perform this step if this was already performed using Import or GRPs.

Step 3:
Next is to put the DB in readiness state. We have to initialize the replay.

execute DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY('REPLAY_TRIALRUN1', 'RAT_TRIALRUN'); 

Step 4:
Next we will PREPARE the replay. There are many arguments that you can pass with the ‘synchronization’ parameter & many other parameters to tailor made your replay. We have finalized and opted PREPARE_REPLAY step i.e. synchronization specially.


execute DBMS_WORKLOAD_REPLAY.PREPARE_REPLAY(synchronization => 'SCN'); 

Step 5:
Next to get the required numbers of workload clients to run and replay that load on target database.
For that we have to run wrc utility in calibrate mode.


wrc system/xxxx mode=calibrate replaydir=/nfs_exp/prodrattest/ratdir_02092019
 

Step 6:
Next is to fire the recommended numbers of wrc (workload) clients on the target database to replay the captured workload.
For example, if 8 is the recommended number of replay clients. Then we can
Ex:

nohup wrc system/oracle@ mode=replay replaydir=/home/replay/rat & 

On RAC Node 1:
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019

On RAC Node 2:
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019
wrc system/XXXX mode=replay replaydir=/nfs_exp/prodrattest/ratdir_02092019
 

Step 7:
Next when all above steps are completed, next we will be replaying the workload.

execute DBMS_WORKLOAD_REPLAY.START_REPLAY(); 

Step 8:
Monitoring the Replay
There are many ways to perform the monitoring of the replay process, I will be sharing few scripts and discuss few of the methods that can be used to check/verify the replay progress.

Way 1:
Monitor the progress of replay using one of the dynamic view named DBA_WORKLOAD_REPLAYS


alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
SELECT ID, NAME,DBNAME,
PARALLEL,STATUS, to_char(START_TIME,'dd/mm/yyyy hh24:mi'),
to_char(END_TIME,'dd/mm/yyyy hh24:mi'),DURATION_SECS/60, NUM_CLIENTS, USER_CALLS, DBTIME, RAC_MODE
AWR_BEGIN_SNAP,AWR_END_SNAP,AWR_EXPORTED
FROM DBA_WORKLOAD_REPLAYS;
where id=2;
 


Way 2:
To check what’s all is happening in the database including the load pushed by WRC clients.
This is a join of gv$sqlarea, gv$session to get session and SQL details.


select x.inst_id,x.sid ,x.serial#,x.username,sqlarea.parsing_schema_name,x.sql_id
 ,sqlarea.plan_hash_value
 ,sqlarea.LAST_ACTIVE_TIME
 ,sqlarea.DISK_READS
 ,sqlarea.BUFFER_GETS
 ,sqlarea.ROWS_PROCESSED
 ,sqlarea.CPU_TIME
 ,sqlarea.LAST_LOAD_TIME
 ,x.event,x.osuser,x.status
 ,x.process,x.machine,x.program,x.module,x.action
 ,TO_CHAR(x.LOGON_TIME, 'MM-DD-YYYY HH24:MI:SS') logontime
 ,x.LAST_CALL_ET
 ,x.SECONDS_IN_WAIT,x.state,sql_text,
 ltrim(to_char(floor(x.LAST_CALL_ET/3600), '09')) || ':'
  || ltrim(to_char(floor(mod(x.LAST_CALL_ET, 3600)/60), '09')) || ':'
  || ltrim(to_char(mod(x.LAST_CALL_ET, 60), '09'))    RUNNING_SINCE
 from   gv$sqlarea sqlarea,gv$session x
 where  x.sql_hash_value = sqlarea.hash_value
 and    x.sql_address    = sqlarea.address
 and    sql_text not like '%select x.inst_id,x.sid ,x.serial# ,x.username ,x.sql_id ,x.event%'
 and    sql_text not like '%select x.inst_id,x.sid ,x.serial#,x.username,sqlarea.parsing_schema_name,x.sql_id ,sqlarea.plan_hash_value%'
 and    x.status='ACTIVE'
 and x.USERNAME is not null
 and x.SQL_ADDRESS    = sqlarea.ADDRESS
 and x.SQL_HASH_VALUE = sqlarea.HASH_VALUE
 order by RUNNING_SINCE desc;
 

Way 3:
Below is the method or a way to identify the replay progress in terms of percentage completed.
Code and text is pasted below.

replay_progress_percentage

Step 9:
Reporting
Next step is to generate some RAT specific reporting which will be helpful to replay performance and other statistics.


DECLARE
cap_id         NUMBER;
rep_id         NUMBER;
rep_rpt        CLOB;
rpt_len                 NUMBER;
BEGIN
rep_rpt :=DBMS_WORKLOAD_REPLAY.REPORT(replay_id => 1,format => DBMS_WORKLOAD_REPLAY.TYPE_HTML);
dbms_output.put_line(DBMS_LOB.SUBSTR(rep_rpt,32767,1));
END;
/
 

Below is the code to generate Capture Vs Replay reports.


var report_bind clob;
begin
DBMS_WORKLOAD_REPLAY.COMPARE_PERIOD_REPORT (replay_id1 => 1, replay_id2=> null, format => 'HTML', result => :report_bind);
end;
/
 

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: | Leave a Comment »

Oracle Real Application Testing (RAT) – Part 2: What is Capture & how to do it ?

Posted by FatDBA on February 10, 2020

Hi Folks,

Continuing the same subject/topic what I have started in my last post – Real Application Testing (RAT). This post is all about the ‘Capture‘ part what happens on the source database where we captures the workload which will later on replayed on the target database.

I have break it in to few easy steps to understand.

Step 1:
First we need to verify if the RAT option is installed and working fine. This you only need to verify upto 10Gr2 as all later versions comes with all features enabled by himself during the installation process, until you didn’t de-selected any specific feature during customized installation.
In case of 10g you need to check using v$option dynamic view for RAT parameter and there is a need to enable the parameter ‘pre_11g_enable_capture’.

Please see below the steps of verification!

Step 2:
Creating exclusion FILTERS (If Required) for capture.
This is the step where we create filters to exclude few of the system usernames and few of other schemas like SYSMAN, SYS and all of such schemas where you don’t want to capture load.
Please see the screenshot use to do the same.

Step 3:
Creation of RAT specific OS based directory.
This is the place where all CAPTURE files will be saved and should be created like below.

Step 4:
Next, we can now start the CAPTURE process, this should be done using the main RAT specific procedure DBMS_WORKLOAD_CAPTURE and its function START_CAPTURE. Here the main parameter to pass is name (Name of the capture you want to name), dir (directory which will hold all workload files, the same what I have created above).
There are few parameters which I have intentionally not used i.e. DURATION as there is a BUG in 10Gr2 database which causes the capture not to stop even after specific time and had to manually stop the process. So in below example I will be capturing XXXXX minutes of load from this database and will stop it explicitly.

This being a staging setup I am taking around 30 minutes of workload but in real time this could be anything between 10 or 15 minutes of peak hours.

You can monitor the progress using DBA_WORKLOAD_CAPTURES view. See below.

Next, you can get more details about this ongoing capture activity. See below.

Step 5:
Next when we are done with the capturing of load for the specified time, we can go and stop it now. In my case on RS staging I left that running and capturing workload for ~ 32 Minutes.

Next, you can verify the contents by going to the RAT directory. There you will see .rec (recording) files, .wmd and a special report (TEXT/HTML format) will generate and is specific to CAPTURE process only.

Now we have BEGIN SNAP Id and END SNAP (time duration for the capture runtime) we can generate the AWR report as well.
The same report can be fetched via text method as well.

Step 6:
Next we will export the AWR data. This will be later on used to generate the comparison report from REPLAY side. This will create two more files under the capture directory wcr_ca.log and wcr_ca.dmp

Next steps starts at the target end or the host and deserves a separate post.
I will soon be writing about Replay process in my next post. Till that time keep learning!

Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | Leave a Comment »

Oracle Real Application Testing (RAT) – Part 1: What it is ?

Posted by FatDBA on January 31, 2020

Hi Guys,

As committed I am back with the first edition or the post on Oracle RAT (Real Application Testing) and there be couple more follow up chapters on the same in next few days or weeks.

Alright, recently during one of our mission-critical production database migration we reached a point where we had to perform the Load Test before pushing the real-time workload on to this new system. I was asked to prepare the strategy and to pick the best possible tool to access the performance of the performance of this new system and how it will respond to the current traffic.

Received lot’s of suggestions from rest of the team, i.e. Swingbench, Loadrunner, Orion etc. but most of them are with a predefined set of Supplied Benchmarks though few are customized but are more related to the server performance and bechmarking but not at the Database or SQL level. And considering the notorious behavior of many of the custom code and legacy application modules I was more leaned towards picking a tool which covers both Database and SQL, and we finally agreed on Oracle RAT.

Oracle Real Application Testing, an option that comes with Oracle Enterprise Edition. Oracle Real Application Testing helps you to test the real-life workload after changes on the database such as database upgrades, OS upgrades, parameter changes, hardware replacement, etc. So, in short the Oracle RAT will be system stress test tool to simulate production load. Introduced in Oracle 11g Release 1. But yes, it’s not free and comes with additional cost and licenses.

There are two features “Database Replay” and “SQL Performance Analyzer” will help fine-tuning on the database before passing production.
I will cover more about the ‘Database Replay’ feature here and might cover the ‘SQL Performance Analyzer’ feature later.

When can you use RAT – “Database Replay” feature?
System Changes
– Hardware replacement such as CPU, RAM, etc.
– Database and OS upgrades
– Storage changes (OCFS2 – ASM)
– OS changes (Windows – Linux)
Configuration Changes
– Single Instance – RAC– Patch installation– Database parameter change

Which database versions are supported?
The workload capture process is supported on the Oracle Database 10g R2 (10.2.0.4) and above versions. The worload replay process is supported on the Oracle Database 11g R1 and above versions.

How to do it, where to start and all ?
Well there are two different ways you can perform the RAT (DB Replay) testing
– Using Oracle Enterprise Manager (OEM) : This option is entirely GUI based where you select your source and target systems and by doing all those clicks performs this stress/load testing on the system.
– Using command line way (My preferred way of doing this, yes I am ‘old school’) using DBMS_WORKLOAD_CAPTURE & DBMS_WORKLOAD_REPLAY procedures.

Some High Level Steps:
– Capture workload into capture files (In the form of .rec files, are flat files)
– Copy files to test system and preprocess them (to make them machine understandable)
– Replay files on test system (play the recorded files)
– Perform detailed analysis of workload capture and replay using reports generated by Database Replay. (Reporting for bench markings)

ON SOURCE System:
dbms_workload_capture.start_capture 
dbms_workload_capture.finish_capture; 

Copy the workload files to the client system. For example: – /home/oracle/rat/test1

On TARGET System:
1. dbms_workload_replay.process_capture 
2. dbms_workload_replay.initialize_replay 
3. dbms_workload_replay.prepare_replay 
4. Run the workload client to calibrate the replay. The calibration process (mode=CALIBRATE) recommends the number of client processes required to perform the replay
5. Replay the workload using below command.
dbms_workload_replay.start_replay; 

Components: The ARCHITECTURE (Simplified)

DB REPLAY “The Big Picture”

What is a Workload Client ?
The REPLAY uses wrc clients – Which are multi-threaded JAVA clients and can be started on the same machine/host or on separate hosts.
Will cover about them more in depth in future posts.


[oracle@PDIXIT:RAT]$ wrc system/XXXX mode=calibrate replaydir=/DBCapture/RAT/RAT_13DEC15_19_17
 Workload Replay Client: Release 11.2.0.4.0 - Production on Sat Dec 16 05:50:39 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
 
 
Report for Workload in: /DBCapture/RAT/RAT_13DEC16_19_17
-----------------------
Recommendation:
Consider using at least 13 clients divided among 4 CPU(s)
You will need at least 168 MB of memory per client process.
If your machine(s) cannot match that number, consider using more clients.
 
Workload Characteristics:
- max concurrency: 575 sessions
- total number of sessions: 1729
 
Assumptions:
- 1 client process per 50 concurrent sessions
- 4 client process per CPU
- 256 KB of memory cache per concurrent session
- think time scale = 100
- connect time scale = 100
- synchronization = TRUE
 

Now how to compare/benchmark ?
At the end of both CAPTURE & REPLAY methods you need to generate few process specific report.
Few of the important files that help in benchmarking are:
AWR Reports: Generate the AWR reports for the same time interval when we have any of the two process were in progress. The BEGIN AND END Snaps can be collected from DBA_WORKLOAD_CAPTURES & DBA_WORKLOAD_REPLAYS
CAPTURE/REPLAY Reports: These reports are specific to workload capture and playing on target.
Capture Vs Replay reports.

Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

Disk Goes Offline after rebalance! – Is this due to a BUG ?

Posted by FatDBA on January 30, 2020

HI Everyone,

Today during one of the activity where we migrated the ASM Storage for one of our 2 Node RAC cluster (running on 11gR2), where we had to perform the disk rebalancing to copy/mirror the contents from older/existing storage to the new storage before we go and drop the older storage partitions, we faced some some weirdness. The disks goes offline in this multi-node ASM and we left stranded with initially no idea behind this behavior, but finally we were able to locate a metalink page for the same issue.

Yes, this was due to a known Bug with number 13476583
Oracle Server (Rdbms) Version
This problem is introduced in the
11.2.0.2.3 Patch Set Update
11.2.0.2.5 Patch Set Update
11.2.0.2.4 Patch Set Update
11.2.0.2.3 Patch Set Update
and in 11.2.0.3, by the fix for bug 10040921.

Problem:
When disks are dropped, a forcible diskgroup dismount is performed on other ASM instance/s.

Workaround or Fix:
1. The problem does not cause diskgroup corruption. So mostly diskgroup can be mounted again.
2. Apply fix
Interim patches here: Patch:13476583
11.2.0.2.6 Patch Set Update
11.2.0.2 Patch 17 on Windows Platforms

Oracle Notes: 245840.1

Hope That Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

CLSRSC-188: Failed to create keys in Oracle Local Registry

Posted by FatDBA on January 3, 2020

Hi Everyone,

Happy New Year!

So here goes my first post for Year 2020. This time I will be discussing an error that we encountered some time back while executing the important ‘root.sh’ script for a new 12cR2 Oracle Restart setup on RHEL7. The script was going smooth till the point where it tries to add keys in OLR for HASD and died with error “CLSRSC-188: Failed to create keys in Oracle Local Registry”.

Below is the exact error what we get during the root.sh run.
Here you will that it was throwing an error which says “Site name (1819181-monkeydb) is invalid.clscfg”.


[root@1819181-monkeydb gridhome]# ./root.sh

Performing root user operation.
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.2.0.1/gridhome/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/12.2.0.1/crsdata/1819181-monkeydb/crsconfig/roothas_2019-12-31_11-03-49AM.log
Site name (1819181-monkeydb) is invalid.clscfg -localadd -z  [-avlookup]
                 -p property1:value1,property2:value2...

  -avlookup       - Specify if the operation is during clusterware upgrade
  -z   - Specify the site GUID for this node
  -p propertylist - list of cluster properties and its value pairs

 Adds keys in OLR for the HASD.
WARNING: Using this tool may corrupt your cluster configuration. Do not
         use unless you positively know what you are doing.

Failed to create keys in the OLR, rc = 100, Message:

2019/12/31 11:03:56 CLSRSC-188: Failed to create keys in Oracle Local Registry
Died at /u01/app/12.2.0.1/gridhome/crs/install/oraolr.pm line 552.
The command '/u01/app/12.2.0.1/gridhome/perl/bin/perl -I/u01/app/12.2.0.1/gridhome/perl/lib -I/u01/app/12.2.0.1/gridhome/crs/install /u01/app/12.2.0.1/gridhome/crs/install/roothas.pl ' execution failed
 

It all happened because our hostname started with a number (1819181-monkeydb) and it’s a known bug that makes the hostname as invalid for root.sh and therefore the above error comes up.
There is a another condition as well, suppose your hostname starts with a alphabet (AHOST-TEXTIBOX-09) but as there is a limit of 15 characters which oracle considers for the hostname, and here in our example the 15th character is a hyphen (-).
So, even in such a case the root.sh will fail even when the hostname starts with a non-numeric character but it’s 15th character is a special character.

Now let’s discuss the solutions.
First, you can apply a merge patch 26751067 (which is merge of Bugs: Bug 25499276 Bug 26581118) and re-run the root.sh script.
Second, change the hostname right after the failure and re-run the script, this time it will go through with no error. Below is an example.

Let’s first change the hostname quickly before we and re-run root.sh


[root@1819181-monkeydb gridhome]# cat /etc/hostname
1819181-monkeydb
[root@1819181-monkeydb gridhome]# echo A1819181-monkeydb > /etc/hostname
[root@1819181-monkeydb gridhome]# cat /etc/hostname
A1819181-monkeydb
 

To update your command prompt simply re-login and to apply this change system wide execute below.


[root@1819181-monkeydb gridhome]# systemctl restart systemd-hostnamed
[root@A1819181-monkeydb gridhome]# 

[root@1819181-monkeydb gridhome]# ./root.sh
Performing root user operation.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.2.0.1/gridhome/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/12.2.0.1/crsdata/a1819181-monkeydb/crsconfig/roothas_2019-12-31_11-17-33AM.log
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
PROT-53: The file name [/u01/app/12.2.0.1/gridhome/cdata/localhost/local.ocr] specified for the 'ocrconfig -repair', 'ocrconfig -add' or 'ocrconfig -replace' command designates an invalid storage type for the Oracle Cluster Registry.
2019/12/31 11:17:43 CLSRSC-155: Replace of older local-only OCR failed
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node a1819181-monkeydb successfully pinned.
2019/12/31 11:17:47 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'a1819181-monkeydb'
CRS-2673: Attempting to stop 'ora.evmd' on 'a1819181-monkeydb'
CRS-2677: Stop of 'ora.evmd' on 'a1819181-monkeydb' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'a1819181-monkeydb' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.

a1819181-monkeydb     2019/12/31 11:18:41     /u01/app/12.2.0.1/gridhome/cdata/a1819181-monkeydb/backup_20191231_111841.olr     0
2019/12/31 11:18:42 CLSRSC-327: Successfully configured Oracle Restart for a standalone server
 


Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: | Leave a Comment »

FGA Error ORA-28138: Error in Policy Predicate

Posted by FatDBA on December 26, 2019

Hi Folks,

Today’s I am going to discuss one of the eerie issue that we faced recently while doing a Database Switch-over activity (From 10gR2 to 12cR2) where application team changed their application string or connection ways and started pointing to this new 12c database.
Before I proceed, let me give you a quick background about this activity, this was a test (Staging) database which was migrated on a new infrastructure and with version 12c, we’ve used data pump to move data from source to this new target and everything went well during all those steps.

Everything was successfully moved till the time the first test customer login to the application and reported that he failed to connect using his credentials. One error message that was captured in application server logs (this was a three tiered platform) which reads

"java.sql.SQLException: ORA-28138: Error in Policy Predicate". 

This error prevented all of the users to connect with the application after this switch-over. Well, apart from regular login procedures, rest all of was working fine.
The error immediately gave us an idea that the error was pointing to the FGA that we have tested on few of the tables some time back, including one of the base table which is used to insert login details before it authenticates access. So, we verified the FGA settings that migrated to this new database and found they are configured with some strange and complex AUDIT conditions
using a custom function where someone tried to define a subquery in the audit_condition, and didn’t tested the result.


i.e. sys.check_audited_user > 0  & sys.check_audited_user = 'XYS'. 

This being an invalid policy preicate and ultimately all operations got failed on said table which in turn stopping users to login.
So, this all happened due to complex precidates used in audit policies, this should be avoided. I mean it will allow you to create the policy but will fail with such errors related with FGA predicates. You cannot define a subquery in the audit_condition; it must be a simple predicate

So, now we have two solutions to avoid this situation.
One, you can simply go and drop the policy created on the said object to resume operations.
Else you can write a function that will evaluate the complex criteria and return a value that can be used in a simple predicate.


Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: | Leave a Comment »

RAT Reporting Error: ORA-06502: numeric or value error: character string buffer too small

Posted by FatDBA on December 16, 2019

Hi All,

Today’s topic of discussion is to handle/fix one of the issues that I’d faced while generating RAT (real application testing) reports on 10gR2 database. I know many of us are not yet aware about the tool, it’s purpose and functionality. Very soon I will be writing about this great product from Oracle for database load testing using real/genuine workload and is quite helpful to forecast your DB performance before you migrate.

Alright, coming back to the point – I was trying to generate the RAT Capture report (on target of course) to see what all was there in the capture, its observations, highlights and rest and that’s when we’ve encountered an error (pasted below)



DECLARE
l_report CLOB;
BEGIN
l_report := DBMS_WORKLOAD_CAPTURE.report(capture_id => 81,
format => DBMS_WORKLOAD_CAPTURE.TYPE_HTML);
END;
/ 2 3 4 5 6 7
DECLARE
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at "SYS.DBMS_SWRF_REPORT_INTERNAL", line 7446
ORA-06512: at "SYS.DBMS_SWRF_REPORT_INTERNAL", line 8591
ORA-06512: at "SYS.DBMS_SWRF_REPORT_INTERNAL", line 8521
ORA-06512: at "SYS.DBMS_WORKLOAD_CAPTURE", line 486
ORA-06512: at "SYS.DBMS_WORKLOAD_CAPTURE", line 1214
ORA-06512: at line 4


There are two solutions to this problem:

1. First to drop the common (shared by capture and replay) schemas and their infrastructure tables using below two scripts.
That firstscript below drop schema tables shared by capture and replay and second drops the Capture infrastructure tables.
catwrr.sql – Catalog script for Workload Capture and Replay — this script then rebuilds all the capture and replay related tables.


@@?/rdbms/admin/catnowrr.sql
@@?/rdbms/admin/catwrr.sql
exec prvt_report_registry.register_clients(TRUE); --- This one registers clients 

Note: In that case you might loss all of your previous capture ID details from the system as it simply washes or wipes everything there related with RAT tables. Hence this is kind of a crude and a raw method to fix this issue. And I recommend to always connect with Oracle Support before going to run these scripts on your database!

2. I tried of another approach to avoid this error and generate the RAT capture report from the target instead of Source where we were getting the error.
Is that possible ?? — Yes, you can. After further analysis I found the issue is with the 10gR2 capture reporting code which sometimes throws this error.

So, the second way turned ut to be a better approach here as we have all of our previous stats and data untouched and nothing has been wiped out in this case, as we simply ran the reporting procedure from the target (12c R2 in our case) and that’s how avoided the issue.


Hope It Helps
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: | Leave a Comment »

root.sh failed on RHEL >7.3 — CLSRSC-400: A system reboot is required to continue installing,

Posted by FatDBA on November 14, 2019

Hi Everyone,

Was little occupied in few of the database migrations happened here at my end, so wasn’t able to post on regular basis. But the good thing is that I have a good list of issues that we faced during the course of this end of end migration and starting from today will try to share all them.

Alright, the one I am going to discuss next is the issue that we encountered while running root.sh script on this ‘Oracle Restart’ setup where the root.sh script failed with below set of errors



Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.2.0.1/grid_home/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/12.2.0.1/crsdata/testserver-monkey/crsconfig/roothas_2019-11-12_10-56-56PM.log
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node testserver-monkey successfully pinned.
2019/11/12 22:57:02 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
 2019/11/12 22:59:06 CLSRSC-400: A system reboot is required to continue installing.
The command '/u01/app/12.2.0.1/grid_home/perl/bin/perl -I/u01/app/12.2.0.1/grid_home/perl/lib -I/u01/app/12.2.0.1/grid_home/crs/install /u01/app/12.2.0.1/grid_home/crs/install/roothas.pl ' execution failed 


To further understand what caused that failure we have checked the log file the above error pointing.
Below are few of the core/main lines when it got failed. So it shows that it failed to load the ADVM/ACFS drivers on the system while running the root.sh script.



>  ACFS-9504: Copying file '/u01/app/12.2.0.1/grid_home/lib/libacfs12.so' to the path '/opt/oracle/extapi/64/acfs/orcl/1/'
>  ACFS-9308: Loading installed ADVM/ACFS drivers.
>  ACFS-9321: Creating udev for ADVM/ACFS.
>  ACFS-9323: Creating module dependencies - this may take some time.
>  ACFS-9176: Entering 'ld usm drvs'
>  ACFS-9154: Loading 'oracleoks.ko' driver.
>  modprobe: FATAL: Module oracleoks not found.
>  ACFS-9109: oracleoks.ko driver failed to load.
>  ACFS-9178: Return code = USM_FAIL
>  ACFS-9177: Return from 'ld usm drvs'
 >  ACFS-9428: Failed to load ADVM/ACFS drivers. A system reboot is recommended.
>  ACFS-9310: ADVM/ACFS installation failed.


The solution to this problem is to apply the one off patch (25078431) to fix this issue with ACFS/ADVM drivers in RHEL > 7.3. Yes, there is a metalink not available for the same file too.
But in our setup even the patch failed to fix the issue as the the .gridSetup -applyoneoffs comes out within 1-2 seconds we ran this command, I mean in short it did nothing and pretends that it applied the patch but the ‘opatch lspatches‘ not showing anything.

Well, we raised this issue with Oracle and they passed it to their development team as there were lot’s of other things running on this DB.
And as you know their DEV team, they don’t have any fixed SLA. Well there is a reason too for them doing like that, as development team does lot’s of testing and regressions hence that is something acceptable.

Well, this problem we anyhow to fix as we had an important test that we planned to perform on this system.
So, comes the time to apply the temporary fix, of course a crude/raw one 🙂

Now, as on this system we don’t need the ACFS, so we can disable the feature right at the code/binary level.
Below are the two main files that when renamed disabled this feature and you are all good to bypass this root.sh check.



acfsdriverstate
acfsroot



You simply have to rename them and re-run the root.sh script it will pass this time and you are done with your GI installation.


Hope It Helps
Prashant Dixit

Posted in Advanced | Tagged: , | Leave a Comment »

 
%d bloggers like this: