Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Writing custom messages to Alert Log and remembering DBMS_SYSTEM and that sweet little ksdwrt

Posted by FatDBA on August 1, 2022

Hi Guys,

Today’s one is a quick one and is about an old package (I guess its there since 8i days), but lesser known and underutilized feature at the same time.
Recently I was working on an OEM task where I have to edit the metric collection regular expressions to make it as per requirements. The next question was asked – ‘Lets wait till the time that specific incident happens in the database to see if it triggers the alert or not ….‘ Really ? Do we have to wait that long ?

No, in order to test that change, you don’t even have to adopt any crude method of editing the alert log file manually and write the error message over there or use UTL_FILE. There is an inbuilt package called dbms_system that you can use to handle such cases and write your custom messages to the alert log file. There are few other options/routines available that you can do with the package but this one is about a special subprogram called ‘ksdwrt‘.

dbms_system.ksdwrt(dest IN BINARY_INTEGER,tst IN VARCHAR2);

Here is the syntax:
execute sys.dbms_system.ksdwrt(,to_char(sysdate)|| ‘ — ‘);
where the argument values can be
1 to write to trace file
2 to write to alert log file
3 to write both trace and alert logfile.

Example:
exec dbms_system.ksdwrt(3,'ORA-04031: This is a test error message, please ignore');

Hope It Helped!
Prashant Dixit

Posted in Basics | Tagged: , | Leave a Comment »

What is a _FIX_CONTROL & DBMS_OPTIM_BUNDLE in Oracle ?

Posted by FatDBA on July 17, 2022

Lately I was in discussion with one of my friend who was facing an issue with Oracle 19c database where the vendor asked him to apply a patch to fix the problem, but he did not want to apply that single patch because their Oracle homes were shared and he didn’t want to increase the complexity of their patching cycles. Then later on Oracle suggested them to try a workaround which requires a setting using fix controls.

So many times Oracle recommends to set a fix control in case of a bug fix, but what exactly are they ? Their purpose ? & tools and methods to control these bug fixes ? This post is all about explaining all of them in detail.

So, What are they ? – Fix controls are bug fix control parameters introduced in 10.2 and they are typically used to enable/disable certain bug fixes in Oracle database. You cannot pull-back any patch, the patch you trying must have the option to use _FIX_CONTROL, and must be visible under V$SYSTEM_FIX_CONTROL views.

Let’s understand this using one of the case where mview push predicate was not happening due to wrong cardinality estimate in one of the production system running on 12.1.0.2. It was rejecting join predicate pushdown (JPPD) transformations and this was avoiding view to be joined with index-based nested-loop join method and causing issues. This was happening all due to bug 21802552. Let’s check if the bug number is present in fix control views and what’s its status.

SQL> select bugno, value, description from v$system_fix_control where bugno=21802552;

     BUGNO      VALUE DESCRIPTION
---------- ---------- ----------------------------------------------------------------------
  21802552          1 correct cardinality adjusted by DS

-- You can get similar information using DBMS_SQLDIAG.GET_FIX_CONTROL(BUG NUMBER) proc as well.

So, its there in the view’s output and its enabled (value 1), and we can turn it off, lets do it. A proper syntax of using them is given below.

-- To enable:
"_fix_control"='Bugno:ON'    (OR)   "_fix_control"="Bugno:1"

-- To disable:
"_fix_control"='Bugno:OFF'  (OR)   "_fix_control"="Bugno:0"


SQL> ALTER SYSTEM SET "_fix_control" = '21802552:OFF';

System altered.

SQL>

SQL> select bugno, value, description from v$system_fix_control where bugno=21802552;

     BUGNO      VALUE DESCRIPTION
---------- ---------- ----------------------------------------------------------------------
  21802552          0 correct cardinality adjusted by DS



-- same was recorded in alert log file as well

2022-07-16T09:04:02.371313-04:00
ALTER SYSTEM SET _fix_control='21802552:OFF' SCOPE=BOTH;

You can do the same using the new dbms_optim_bundle.set_fix_controls package, it was introduced in 12.1.0.2 to implement Oracle’s approach of ‘Automatic Fix Control Persistence’ framework. Let’s try to the same using said package.

-- This will set given _fix_controls in scope=BOTH on all instances
-- Lets enable it again before we disable it back again
SQL> exec dbms_optim_bundle.set_fix_controls('21802552:1','*', 'BOTH', 'NO');

PL/SQL procedure successfully completed.

SQL> select bugno, value, description from v$system_fix_control where bugno=21802552;

     BUGNO      VALUE DESCRIPTION
---------- ---------- ----------------------------------------------------------------------
  21802552          1 correct cardinality adjusted by DS


-- Lets roll it back
SQL> exec dbms_optim_bundle.set_fix_controls('21802552:0','*', 'BOTH', 'NO');

PL/SQL procedure successfully completed.

SQL> select bugno, value, description from v$system_fix_control where bugno=21802552;

     BUGNO      VALUE DESCRIPTION
---------- ---------- ----------------------------------------------------------------------
  21802552          0 correct cardinality adjusted by DS

--
-- Entry in parameter file made by the dbms_optim_bundle package for fix control
*._fix_control='21802552:0'#added through dbms_optim_bundle package


Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , | Leave a Comment »

How to trace DBMS_STATS and see what is under the hood for a slow sluggish stats collection ..

Posted by FatDBA on July 10, 2022

Recently while working on a slow stats gathering case, someone asked if there is a way to know what’s happening beneath the surface ? what all flags and calculations its doing internally while on the front end the DBMS_STATS still running ? Yes, there is a way! As with most of the Oracle utilities, DBMS_STATS too comes with its own tracing facility which you can call using dbms_stats.set_global_prefs. Today’s post is all about enabling tracing on DBMS_STATS package.

Few of the commonly used DBMS_STATS flags/options are given below, you can always do a combination if wanted to club …

4 = This is to trace table stats
8 = This is to trace index stats
16 = This is to trace columnar stats
512 = auto stats job
1024 = This is to trace parallel executions
4096 = This is to trace partition prunes
16384 = This one traces extended column stats
32768 = To trace approximate NDV (number distinct values) ….

Let’s quickly collect traces for both Indexes and Tables followed by the stas collection for a table and its dependent Indices.

SQL>
SQL> exec dbms_stats.set_global_prefs('TRACE',4+8);

PL/SQL procedure successfully completed.

SQL>

SQL> set time on
13:28:55 SQL>
13:28:56 SQL>
13:28:56 SQL>
13:28:56 SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYS' , tabname => 'BIGTAB',cascade => true, estimate_percent => 10,method_opt=>'for all indexed columns size 1', granularity => 'ALL', degree => 4);

PL/SQL procedure successfully completed.

13:28:59 SQL>
13:29:00 SQL>


SQL> select name, value from v$diag_info where name='Diag Trace';

NAME                                     VALUE
---------------------------------------- ------------------------------------------------------------------------------------
Diag Trace                               /u01/app/oracle/diag/rdbms/localdb/localdb/trace

Alright, we have our trace ready, let’s dig in deep what’s all inside the trace that we have collected for both the Table and it’s Index.
It starts with a standard header in the trace files, followed by all by default explicitly mentioned flags/options with DBMS_STATS in XML format.

-- Header
DBMS_STATS: Record gather table stats operation on table : BIGTAB
DBMS_STATS:   job name: 
DBMS_STATS:    |--> Operation id: 1341
DBMS_STATS: gather_table_stats: <params><param name="block_sample" val="FALSE"/><param name="cascade" val="TRUE"/>
<param name="concurrent" val="FALSE"/><param name="degree" val="4"/><param name="estimate_percent" val="10"/>
<param name="force" val="FALSE"/><param name="granularity" val="ALL"/><param name="method_opt" val="for all indexed columns size 1"/><param name="no_invalidate" val="NULL"/>
<param name="ownname" val="SYS"/><param name="partname" val=""/><param name="reporting_mode" val="FALSE"/><param name="statid" val=""/><param name="statown" val=""/>
<param name="stattab" val=""/><param name="stattype" val="DATA"/><param name="tabname" val="BIGTAB"/></params>
DBMS_STATS: Start gather table stats -- tabname: BIGTAB

Next comes the preferences set for the called Table, ‘BIGTAB’ in our case. This will give you more idea about what all options were set overall and if there is any scope to tune and tweak anyone of them.

DBMS_STATS: Preferences for table SYS.BIGTAB
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME                                         - 
DBMS_STATS: STATS_RETENTION                                   - 
DBMS_STATS: MON_MODS_ALL_UPD_TIME                             - 
DBMS_STATS: SNAPSHOT_UPD_TIME                                 - 
DBMS_STATS: TRACE                                             - 12
DBMS_STATS: DEBUG                                             - 0
DBMS_STATS: SYS_FLAGS                                         - 1
DBMS_STATS: SPD_RETENTION_WEEKS                               - 53
DBMS_STATS: CASCADE                                           - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT                                  - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE                                            - NULL
DBMS_STATS: METHOD_OPT                                        - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE                                     - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY                                       - AUTO
DBMS_STATS: PUBLISH                                           - TRUE
DBMS_STATS: STALE_PERCENT                                     - 10
DBMS_STATS: APPROXIMATE_NDV                                   - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM                         - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE                        - FALSE
DBMS_STATS: INCREMENTAL                                       - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL                      - TRUE
DBMS_STATS: AUTOSTATS_TARGET                                  - AUTO
DBMS_STATS: CONCURRENT                                        - OFF
DBMS_STATS: JOB_OVERHEAD_PERC                                 - 1
DBMS_STATS: JOB_OVERHEAD                                      - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS                           - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS                        - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS                          - 3
DBMS_STATS: TABLE_CACHED_BLOCKS                               - 1
DBMS_STATS: INCREMENTAL_LEVEL                                 - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS                             - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS                                           - GATHER
DBMS_STATS: GATHER_AUTO                                       - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY                                     - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE                                         - 0
DBMS_STATS: GATHER_SCAN_RATE                                  - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER                    - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS                              - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS                         - 15
DBMS_STATS: ROOT_TRIGGER_PDB                                  - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD                         - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS                        - FALSE
DBMS_STATS: AUTO_TASK_STATUS                                  - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME                            - 3600
DBMS_STATS: AUTO_TASK_INTERVAL                                - 900
DBMS_STATS: STATS_MODEL_INTERNAL_MINRSQ                       - 0.9
DBMS_STATS: STATS_MODEL_INTERNAL_CONTROL                      - 0
DBMS_STATS: STATS_MODEL                                       - ON
DBMS_STATS: AUTO_STATS_ADVISOR_TASK                           - TRUE

Next sections is where it monitors the stats gathering for Table and its Index, their start and end times were captured, for example : “APPROX_NDV_ALGORITHM => Non-Incremental” was chosen. Table level stats were used i.e. row count, blocks, average row length, sample size, number of indexes etc.

Next it jumps to the Index where it tries to calculate the global statistics for the partitioned Index by aggregating the partition-level statistics but as its a non-partitioned Index, it simples moves to the next stage. Next it triggers the Indexing options (just enables them at the start) i.e. nlb (number of leaf blocks), ndk (number of distinct keys), clf (clustering factor), nblks (number of blocks), sample percentage, degree (parallelism), nrw (key counts) etc by allocating them value 1 which means to consider them.
Finally it starts to analyze the Index structure (In this case it’s not using approximate_ndv algorithm) and calculates the final values of nrw, nlb, ndk, clf etc.

Few Internal options were also used i.e. ctx.conc_ctx, ctx.batching_coeff etc.

-- Table Stats
DBMS_STATS: Started table SYS.BIGTAB. at 28-JUN-22 01.28.58.499744000 PM -04:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting
DBMS_STATS: reporting_man_log_task: target: "SYS"."BIGTAB" objn: 108254 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: ALL global_requested: NULL pfix: 
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: HLL in incremental (no stats exist)
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: ALL execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix:  ownname: SYS tabname: BIGTAB pname:  spname:  execution phase: 1
DBMS_STATS: Number of rows in the table = 492190, blocks = , average row length = 19, chain count = , scan rate = 0, sample size = 49219, cstats.count = 3, cind = 2

-- Index part 
DBMS_STATS: Started index SYS.IDX_DIXIT at 28-JUN-22 01.28.58.816819000 PM -04:00 granularity: ALL gIdxGran: 
DBMS_STATS: Specified granularity = , New granularity = ALL, Fixed granularity = ALL
DBMS_STATS: granularity ALL
DBMS_STATS: reporting_man_log_task: target: "SYS"."IDX_DIXIT" objn: 108255 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS:  Gather index subpartition stats...
DBMS_STATS:  Gather index partition stats...
DBMS_STATS:  Gather global index stats...
DBMS_STATS: Start analyze_index_using_sql
DBMS_STATS: tab_stats_stale: BIGTAB not analyzed using full compute
DBMS_STATS:   Number of blocks in the index: 1030
DBMS_STATS: Start execute_analyze_index(owner= SYS, indname= IDX_DIXIT, fobjn= , sample_pct= 10, seed= 0, degree= 4, collect_nrw= 1, collect_nlb= 1, collect_ndk= 1, collect_clf= 1, bmi= 0, iot_ov= 0, iot_sec= 0, ppredtxt= , nblks= 1030)
DBMS_STATS: execute_analyze_index(): Not using approximate_ndv, pct=111.529126213592233009708737864077669903,sample_pct=10
DBMS_STATS: End execute_analyze_index(ssize= 494100, nrw= 494100, nlb= 1027, ndk= 1000, clf= 494100)
DBMS_STATS: target_size: 1027
DBMS_STATS: Finished index SYS.IDX_DIXIT at 28-JUN-22 01.28.59.464468000 PM -04:00

DBMS_STATS: Finished table SYS.BIGTAB. at 28-JUN-22 01.28.59.519495000 PM -04:00

Once you are done with the stats tracing, close it immediately!

SQL> exec dbms_stats.set_global_prefs('TRACE',0);

PL/SQL procedure successfully completed.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

High stats collection time for partitioned tables after upgrade to 19c

Posted by FatDBA on July 2, 2022

Recently, while working on a database upgrade from 12c to 19c (19.15) one of my friend encountered a strange issue on the newly upgraded 19c database where the stats gathering on the full database started taking huge time. It used to take ~ 3 hours to complete the full database statistics, but the same stats collection job after the upgrade to 19c started taking close to 10 hours. The stats collection script they were using was quite simple and with minimal parameters used.

EXEC DBMS_STATS.GATHER_DATABASE_STATS(ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE,degree => 8);

During the analysis he observed that the top 3-4 in-flight transactions during stats collection were related to the Index Statistics and were found doing ‘Index Fast Full Scan’, and all of them are on few of the large partitioned tables in the database. He discussed the case with me and together tried few thing i.e. recollected dictionary and fixed object statistics, did some comparative study of parameters between 12c and 19c but none of them worked. At last we tried to set debugging levels on DBMS_STATS to see what’s happening under the hood, and that gave us some hint when set it with level/flag 8 (trace index stats) and with level 32768 to trace approximate NDV (number distinct values) gatherings. Traces gave us some idea that its surely with the index stats and NDV or number of distinct keys and is taking time.

But even after that we both were totally clueless as these Tables and its dependent objects are there in the system for a very long time. So. the big question was – What’s new in 19c that has slowed down stats collection ?

Finally we decided to contact OCS! And they quickly responded to the problem as its a known problem with the 19c. As per them, there was an enhancement in 19c that is related to Index stats gathering, and that had lead to the longer stats times. It was all due to an unpublished Bug 33427856 which is an enhancement to improve the calculation of index NDK (Number of Distinct Keys). This new feature with the approx_count_distinct function and fully scans indexes to calculate NDK. This has a significant benefit because NDK is now accurate. It also means that gathering statistics can take longer (for example, updating global index statistics if incremental stats is used). So, In general, this is expected behavior, since DBMS_STATS is doing more work in 19c than it did in previously unenhanced versions.

And the solution to this new 19c index-stats feature (a problem) off by setting fix control to disable ‘Enhance Index NDK Statistics’ – 27268249

alter system set "_fix_control"='27268249:0';

And as soon as we deleted existing statistics and regather them, the time dropped drastically and got completed under 3 hours.

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , | Leave a Comment »

Exceptionally high stats collection time on FIXED OBJECTS during an upgrade …

Posted by FatDBA on June 26, 2022

Someone recently asked about a situation where they were trying to upgrade their database to 19c and as a part of their upgrade plan, they were trying to run fixed object statistics but it was going on forever, and they were totally clueless why and where its taking time. This being a mandatory step, they tried several times, but same result.

About fixed object stats, It is recommended that you re-gather them if you do a major database or application upgrade, implement a new module, or make changes to the database configuration. For example if you increase the SGA size then all of the X$ tables that contain information about the buffer cache and shared pool may change significantly, such as X$ tables used in v$buffer_pool or v$shared_pool_advice.

About fixed objects stats collection idle time, I mean anything between 1-10 minutes is I will say normal and average, but anything that goes beyond 20 minutes or even more or even in hours is abnormally high and points to a situation.

So, I was asked to take a look on ad-hoc basis and during the analysis I found a SQL trying to do a count all on unified_audit_trail, and was running from the same time since they called the DBMS_STATS for FIXED OBJECTS on the database. When asked, they told me that they’d enabled auditing on the database some 6 months back and haven’t purged anything since then, the audit trail had grown behemoth and has ~ 880 Million records. I immediately offered them two approaches to handle the situation – Either lock your unified table statistics (using dbms_stats.lock_table_stats) or else take backup of the table and purge audit records before calling the stats gathering job again. They agreed with the second approach, they took backup of audit table and purged audit trail. As soon as they purged audit table, the stats collection on fixed objects got finished in ~ 3 minutes.

This was the situation and what we did …

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL RAC_INSTANCE LAST_ARCHIVE_TS
-------------------- ------------ ------------------------------
STANDARD AUDIT TRAIL 0 22-MAY-22 06.00.00.000000 AM +00:00


SQL> select count(*) from aud$;

COUNT(*)
----------
885632817

BEGIN
DBMS_AUDIT_MGMT.clean_audit_trail(
audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
use_last_arch_timestamp => FALSE);
END;
/

SQL> select count(*) from aud$;

COUNT(*)
----------
0


SQL> SET TIMING ON
SQL> BEGIN
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
/

Elapsed: 00:03:10.81

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , , | 2 Comments »

What is that strange looking wait event ‘TCP Socket (KGAS)’ in AWR report ?

Posted by FatDBA on June 13, 2022

Hi Guys,

Recently someone shared me an AWR report from a production 19c system, and he was very tensed about one of the strange looking wait event called ‘TCP Socket (KGAS)’. He was strained because the event was coming with a very high average wait time of 7863.68ms (7.86 seconds), and was consuming around 98.0% of the total DB Time.

Luckily I’d encountered something similar in the past for one of the customer, where the application team was unable to send the mail as DBMS scheduler, and it was stuck for a long time with wait event “TCP Socket(KGAS)” where problem was not with the scheduler, but was an underlying network or third-party application problem.

So, today’s post is all about the wait event, what it is, how to resolve it etc.

KGAS is a element in the server which handles TCP/IP sockets which is typically used in dedicated connections i.e. by some PLSQL built in packages such as UTL_HTTP and UTL_TCP.
A session is waiting for an external host to provide requested data over a network socket. The time that this wait event tracks does not indicate a problem, and even a long wait time is not a reason to contact Oracle Support. It naturally takes time for data to flow between hosts over a network, and for the remote aspect of an application to process any request made to it. An application that communicates with a remote host must wait until the data it will read has arrived.

From an application/network point of view, delays in establishing a network connection may produce unwanted delays for users. We should make sure that the application makes network calls efficiently and that the network is working well such that these delays are minimized.

From the database point of view, these waits can safely be ignored; the wait event does not represent a database issue. It merely reports the total elapsed time for a network connection to be established or for data to arrive from over the network. The database waits for the connection to be established and reports the time taken. Its always good to check with the network or the third-party application vendors to investigate the underlying socket.

But in case of systemwide waits – If the TIME spent waiting for this event is significant then it is best to determine which sessions are showing the wait and drill into what those sessions are doing as the wait is usually related to whatever application code is executing eg: What part of the application may be using UTL_HTTP or similar and is experiencing waits. This statement can be used to see which sessions may be worth tracing

SELECT sid, total_waits, time_waited
FROM v$session_event WHERE event='TCP Socket (KGAS)' and total_waits>0 ORDER BY 3,2;

In order to reduce these waits or to help find the origin of the socket operations try:

  • Check the current SQL/module/action of V$SESSION for sessions that are waiting on the event at the time that they are waiting to try and identify any common area of application code waiting on the event.
  • Get an ERRORSTACK level 3 dump of some sessions waiting on the event. This should help show the exact PLSQL and C call stacks invoking the socket operation if the dump is taken when the session is waiting. Customers may need assistance from Oracle Support in order to get and interpret such a dump but it can help pinpoint the relevant application code.
  • Trace sessions incurring the waits including wait tracing to try and place the waits in the context of the code executing around the waits. eg: Use event 10046 level 8 or DBMS_MONITOR.SESSION_TRACE_ENABLE.
  • Use DBA_DEPENDENCIES to find any application packages which may ultimately be using UTL_HTTP or UTL_TCP underneath for some operation.

Example:
Execute the following SQL from a session on a dedicated connection and then check the resulting trace file to see “TCP Socket (KGAS)” waits:

alter session set events '10046 trace name context forever, level 8';

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | Leave a Comment »

FatDBA ranked 27th among top 100 Oracle database blogs on the planet … :)

Posted by FatDBA on June 4, 2022

Hello All, my blog FatDBA have been awarded with the ‘Top 100 Oracle Blogs on the planet‘ (Rank 27) by feedspot. The best Oracle blog list curated from thousands of blogs on the web and ranked by traffic, social media followers, domain authority & freshness.

Thank you so much guys for your support \,,/

https://blog.feedspot.com/oracle_blogs/

Posted in Uncategorized | Tagged: | Leave a Comment »

Oracle 21c welcomes the ‘Attention Log’ …

Posted by FatDBA on June 1, 2022

Hi Guys,

Alert log is always been a very important logfile which contain important information about error messages and exceptions that occur during database operations. Its very crucial for any analysis or for troubleshooting any critical event that has happened. Specially over the period of last few years, with all those new database releases, its slowly becoming very messy, loud and has got whole lot of new content added to it that it has to record for all those regular and critical database events, and finally with the inception of Oracle 21c we have the ‘Attention Log‘ that helps to segregate all those critical and vital events which otherwise gets mixed up with other regular incidents of alertlog file.

Each of the database has its own Attention log and is a regular JSON format file which is very easy to translate. Few of the important dimensions or its properties are
URGENCY: Class with possible values INFO, IMMEDIATE etc.
CAUSE : A quick detail about the possible cause or reason.
NOTIFICATION : A regular message in case of any event i.e. “PMON (ospid: 1901): terminating the instance due to ORA error 12752” etc.
ACTION : What possibly you can do
TIME : A timestamp of the event

Let’s see how it looks like!

[oracle@witnessalberta ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 21.0.0.0.0 - Production on Fri Apr 8 22:38:25 2022
Version 21.3.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.


Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0

SQL> set linesize 400 pagesize 400


SQL> col NAME for a30
SQL> col value for a70
SQL>
SQL> select name, value from v$diag_info where value like '%attention%';

NAME                           VALUE
------------------------------ ----------------------------------------------------------------------
Attention Log                  /opt/oracle/diag/rdbms/orclcdb/ORCLCDB/trace/attention_ORCLCDB.log




[oracle@witnessalberta trace]$ pwd
/opt/oracle/diag/rdbms/orclcdb/ORCLCDB/trace
[oracle@witnessalberta trace]$

[oracle@witnessalberta trace]$
[oracle@witnessalberta trace]$ ls -ltrh *.log*
-rw-r-----. 1 oracle oinstall 6.0K Apr  8 22:32 attention_ORCLCDB.log
-rw-r-----. 1 oracle oinstall 244K Apr  8 22:34 alert_ORCLCDB.log
[oracle@witnessalberta trace]$




{
  "NOTIFICATION" : "Starting ORACLE instance (normal) (OS id: 3309)",
  "URGENCY"      : "INFO",
  "INFO"         : "Additional Information Not Available",
  "CAUSE"        : "A command to startup the instance was executed",
  "ACTION"       : "Check alert log for progress and completion of command",
  "CLASS"        : "CDB Instance / CDB ADMINISTRATOR / AL-1000",
  "TIME"         : "2022-04-08T22:32:47.914-04:00"
}

....
.....
.........
{
  "NOTIFICATION" : "Shutting down ORACLE instance (immediate) (OS id: 9724)",
  "URGENCY"      : "INFO",
  "INFO"         : "Shutdown is initiated by sqlplus@localhost.ontadomain (TNS V1-V3). ",
  "CAUSE"        : "A command to shutdown the instance was executed",
  "ACTION"       : "Check alert log for progress and completion of command",
  "CLASS"        : "CDB Instance / CDB ADMINISTRATOR / AL-1001",
  "TIME"         : "2021-09-16T23:11:56.812-04:00"
}

.....
......
........

{
  "ERROR"        : "PMON (ospid: 1901): terminating the instance due to ORA error 12752",
  "URGENCY"      : "IMMEDIATE",
  "INFO"         : "Additional Information Not Available",
  "CAUSE"        : "The instance termination routine was called",
  "ACTION"       : "Check alert log for more information relating to instance termination rectify the error and restart the instance",
  "CLASS"        : "CDB Instance / CDB ADMINISTRATOR / AL-1003",
  "TIME"         : "2021-09-16T23:34:26.117-02:00"
}
...
.....
......
{
  "ERROR"        : "PMON (ospid: 3408): terminating the instance due to ORA error 474",
  "URGENCY"      : "IMMEDIATE",
  "INFO"         : "Additional Information Not Available",
  "CAUSE"        : "The instance termination routine was called",
  "ACTION"       : "Check alert log for more information relating to instance termination rectify the error and restart the instance",
  "CLASS"        : "CDB Instance / CDB ADMINISTRATOR / AL-1003",
  "TIME"         : "2022-04-08T23:38:11.258-04:00"
}


Hope It Helped!
Prashant Dixit

Posted in Basics, troubleshooting | Tagged: , | Leave a Comment »

What is a short stack dump ?

Posted by FatDBA on May 23, 2022

Hi Everyone,

Short stacks are a rich source of information which can help you to do analyzing and deconstructing a process which is part of a problem in Oracle databases, and the good part is you don’t have to rely on any OS utilities like gdb, pstack etc. to extract stacks or traces. Its pretty simple and quick to generate the short stacks, you have to pass on the OSPID of the process, and issue SHORT_STACK option for ORADEBUG, and will generate a sequence of all functions involved and their current position in a calls stack. You can also use errorstack for the same, but short_stacks are fast and point-to-point!

Now one can ask – What is a ‘call stack’ ?
A call stack is a list of names of methods called at run time from the beginning of a program until the execution of the current statement. A call stack is mainly intended to keep track of the point to which each active subroutine should return control when it finishes executing. Call stack acts as a tool to debug an application when the method to be traced can be called in more than one context. This forms a better alternative than adding tracing code to all methods that call the given method.

Let me run it on a real time system where we are facing issues where due to adaptive switching between Log write methods LGWR was causing excessive ‘log file sync‘ Waits

[oracle@oracleontario ~]$ ps -ef|grep lg
oracle     4402      1  0 00:25 ?        00:00:00 ora_lgwr_dixitdb
oracle     4719   3201  0 00:25 pts/1    00:00:00 grep --color=auto lg

-- Here in the example I am generating short stacks for LGWR process
SQL>
SQL> oradebug setospid 4402
Oracle pid: 19, Unix process pid: 4402, image: oracle@oracleontario.ontadomain (LGWR)
SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-semtimedop()+10<-skgpwwait()+187
<-ksliwat()+2233<-kslwaitctx()+200<-ksarcv()+320<-ksbabs()+670<-ksbrdp()+1167<-opirip()+541<-opidrv()+581<-sou2o()+165
<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
SQL>
SQL>



-- After a gap of ~ 10-15 Mins
-- Let me generate the short stack once again just to see if there is any difference.
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+992 <-sigacthandler()+104
<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212 <-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680
<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380


So, as you’ve seen in above code, the clear change is there in the LGWR short stacks, we did few log switches, modified a big table in the database followed by multiple commits. If you carefully look at the stack, it was function ksedsts() where it was at the time we generated the stack, and function ksdxfstk () called it, followed by ksdxcb(), sspuser() and immediate change after functions called by LGWR internal call stacks. They matched one of the known BUG which gave us a hint that we resolved after modified a undocumented parameter.

So, it can help us locating the buggy functions called by any user session or background processes and is very useful in case when you are battling with a strange looking problem in your Oracle database.

Few of the functions that I am able to recall are given below.
semtimedop() is one of the function for semaphore operations also known as semop and is used to perform atomically an array of semaphore operations on the set of semaphores associated with the semaphore identifier specified by semid.
‘ksdxcb’ is a function that’s called usually when the command oradebug is being used.
sighandler is the programming signal handler and is used to locate an exception. When a signal is sent to a process, various register and stack operations occur that make it look as though the leaf PC at the time of the signal is the return address for a call to a system function, sigacthandler(). sigacthandler() calls the user-specified signal handler just as any function would call another.
The sigaltstack() function allows a thread to define and examine the state of an alternate stack area on which signals are processed.

opidrv() is ORACLE Program Interface DRiVer (IGNORE)
opiodr is ORACLE code request driver – route the current request
main() is the standard executable entry point
ksedst() is the KSE dump the call stack
skgmstack() is the call specified function with extra STACK space
rpidrv() or the RPI is theRecursive Program Interface DRiVer
pfrrun() or PSDEVN is the PL/SQL Interpreter Main Instruction Loop
kkxexe() or KKX is to execute plsql

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Bad clustering factor on >12.1 ? Have you tried Attribute Clustering ?

Posted by FatDBA on May 16, 2022

Recently we were facing one performance issue with one of the SQL which references a highly volatile table in the entire database, and one of the largest too. The tables goes through massive DMLs each minute and that has lead to huge randomness in the data distribution, and expectedly few of its critical indexes has a very high (bad) clustering factor (CF).

A quick explanation on the clustering factor, it is a measure of the ordered-ness of an index in comparison to the table that it is based on. That would be a simple explanation on Oracle’s clustering factor (CF), but if you want to read more about it I would recommend to read blog posts by Richard Foote, Jonathan Lewis and few other great articles by Randolf Geist.

Okay, coming back to the case – We tried few methods (fixed BLevel, Leaf_Blocks, degree, stats recollection etc.), but none of them helped much to stabilize performance of the query, and finally we tried something that was introduced in Oracle 12.1 the ‘Attribute Clustering‘ that helped us to resolve the problem. The attribute clustering improves physical IOs for tables and its partitions. An attribute-clustered table stores data in close proximity on disk in an ordered way based on the values of a certain set of columns in the table or a set of columns in the other tables.

Attribute clustering is a user-defined table directive that provides data clustering on one or more columns in a table. The directives can be specified when the table is created or modified. There are two types of attribute clustering:

  • With Linear Ordering : Linear ordering stores the data according to the order of specified columns. This is the default type of clustering.
  • With Interleaved Ordering : It accurately determines exactly where data is located on the disk. This enabled I/O Pruning. This uses a special multidimensional clustering technique based on Z-order curve fitting.

Note: Zone mapping is a separately licensed feature.

Let’s do some test to understand the feature better. Here in the demo I would be using linear order clustering without any zonemaps. You can try try pairing the LO clustering with zone maps too. All tests I am doing it on a 19.15.0.0.0 sandbox environment.

For testing purpose, I have created a new tables with some 2000000 random values/rows inserted into it, this is to mimic a bad clustering factor (CF) scenario. I have also created an Index on the top of a column (date) and have collected the statistics.

[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Apr 27 00:13:08 2022
Version 19.15.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0

SQL>
SQL> create table new_test (id number, DOB date, text varchar2(40));

Table created.

SQL> insert into new_test select rownum, sysdate-trunc(dbms_random.value(0, 20000)), 'PRASHANT DIXIT'
from dual connect by level <= 2000000;  2

2000000 rows created.

SQL>
SQL> commit;
Commit complete.

SQL> create index idx_newtest on new_test(dob);

Index created.

SQL> EXEC DBMS_STATS.gather_table_stats('DIXDROID', 'NEW_TEST', estimate_percent => null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.



-- Here you can see the extremely bad CLUSTERING FACTOR of the Index. 
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST';  2

TABLE_NAME INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST   IDX_NEWTEST               10097    2000000           1989246

SQL>

With such a bad clustering factor of the Index, let me try to run a query on the same column and pass a date range and see the execution plan.

SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';

Explained.

SQL> 


SQL> SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 1847055510

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |  3100 | 86800 |  2750   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| NEW_TEST |  3100 | 86800 |  2750   (1)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

With a bad clustering factor, the query is going for a full table scan on NEW_TEST table, and the overall cost of the access path is very high too. Let’s alter the table using Linear attribute clustering – I won’t be using materialized zonemaps here

SQL> alter table NEW_TEST add clustering by linear order(DOB) without materialized zonemap;

Table altered.

SQL> alter table NEW_TEST move online;

Table altered.

-- Now if you check you will the improved CF of the Index.
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST';  

TABLE_NAME INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST   IDX_NEWTEST               10097    2000000              9277

The clustering factor of the Index was greatly improved after altering the table column (DOB) with linear order clustering enabled. Let’s try to run the same SQL and see if there is any improvements or not.

SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';

Explained.


SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------
Plan hash value: 1446839462

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |  3100 | 86800 |    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NEW_TEST    |  3100 | 86800 |    26   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_NEWTEST |  3100 |       |    11   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

15 rows selected.

SQL>
SQL>

And we can see the FTS was replaced with a quick INDEX RANGE SCAN on IDX_NEWTEST index (created to cover DOB column). Look at the great reduction in the cost per step and final cost of the access path and plan tree.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Leave a Comment »