Posts Tagged ‘performance’

Cleaning Up MySQL Replication Checks With a Bit of Bash

Posted by FatDBA on January 29, 2026

Checking MySQL replication is one of those things DBAs do on autopilot. Log in, open the MySQL client, run SHOW SLAVE STATUS\G or SHOW REPLICA STATUS\G, scroll, scan for Yes, check lag, repeat. It works, but it is noisy, raw, and easy to misread when you are tired or troubleshooting multiple servers.

In some free time, I played around with shell scripting to clean this up. The goal was not to replace MySQL, but to wrap the same information into something that answers the real question faster: is replication healthy or not, and how bad is it if it is not.

The script runs with strict shell settings so failures are never hidden. It connects using a MySQL login path first and falls back to a secure defaults file, which keeps credentials out of the script. If MySQL cannot be reached or authentication fails, the script stops immediately and shows the real error instead of pretending replication is broken.

Because environments are rarely consistent, the script automatically detects whether the server supports SHOW REPLICA STATUS or still uses SHOW SLAVE STATUS. It figures this out by checking for known fields and then sticks to the correct command, which makes it usable across old and new MySQL versions without edits.

Once the raw replication output is captured, the script parses individual fields directly from the \G output using awk. It handles both old and new field names, so Source_Host and Master_Host are treated the same. The same approach is used for thread states, binlog positions, relay logs, delay, and error fields. If replication is not configured and MySQL returns an empty result, the script fails clearly instead of silently succeeding.

From there, it starts behaving more like a DBA than a SQL dump. IO and SQL threads are evaluated and clearly marked as OK, PROBLEM, or UNKNOWN. Replication lag is converted from seconds into minutes and evaluated against simple thresholds so you immediately know whether it is harmless or serious. If any SQL or IO errors exist, replication is considered critical even if threads appear to be running.

One thing I was very deliberate about is that the script always prints the full report, even when replication is stopped or broken. When things go wrong, you still see topology details, binlog and relay positions, thread states, and error messages in one place. Nothing is hidden when you need it the most.

The output is structured like a quick operational report, with hostname, timestamp, replication mode, overall health, lag, and error presence shown at the top. Color highlighting is used only in interactive sessions, so the script remains safe for logging and automation.

Finally, the script exits with meaningful return codes. A clean replication state exits successfully, warning conditions return a different code, and critical failures return a hard error. This makes it easy to plug into cron jobs or monitoring without parsing text output.

This started as a small free-time experiment, but it turned into something I actually use. Shell scripting may not be glamorous, but for DBAs it is one of the fastest ways to remove friction from daily work. If you find yourself running the same replication commands again and again, that is usually a sign that a small script can make life easier.

#!/usr/bin/env bash
set -euo pipefail

# -------------------------------------------------------------------
# Author : Prashant Dixit
# Version: 1.5 (always print full report + cleaner header formatting)
# Notes  : Shows full sections even when replication is stopped / broken.
# -------------------------------------------------------------------

MYSQL_BIN="${MYSQL_BIN:-/usr/bin/mysql}"

MYSQL_LOGIN_PATH="${MYSQL_LOGIN_PATH:-testadmin_local}"

MYSQL_CNF="${MYSQL_CNF:-/root/.my-shutdown.cnf}"

if [[ -t 1 ]]; then
  RED=$'\033[0;31m'
  GREEN=$'\033[0;32m'
  YELLOW=$'\033[0;33m'
  BLUE=$'\033[0;34m'
  CYAN=$'\033[0;36m'
  BOLD=$'\033[1m'
  DIM=$'\033[2m'
  RESET=$'\033[0m'
else
  RED=""; GREEN=""; YELLOW=""; BLUE=""; CYAN=""; BOLD=""; DIM=""; RESET=""
fi

hr()  { printf "%s\n" "${DIM}----------------------------------------------------------------------${RESET}"; }
hdr() { printf "%s\n" "${BOLD}${CYAN}$*${RESET}"; }
die() { echo "${RED}ERROR:${RESET} $*"; exit 2; }

mysql_run() {
  # Try login-path first, fallback to defaults-file
  local q="$1" out rc
  out="$("$MYSQL_BIN" --login-path="$MYSQL_LOGIN_PATH" -e "$q" 2>&1)" && { echo "$out"; return 0; }
  rc=$?

  if [[ -r "$MYSQL_CNF" ]]; then
    out="$("$MYSQL_BIN" --defaults-file="$MYSQL_CNF" -e "$q" 2>&1)" && { echo "$out"; return 0; }
    rc=$?
    echo "$out"
    return $rc
  fi

  echo "$out"
  return $rc
}

detect_status_cmd() {
  # If SHOW REPLICA STATUS works, use it; else fallback to SLAVE
  if mysql_run "SHOW REPLICA STATUS\\G" | grep -qE '^[[:space:]]*(Replica_IO_State|Source_Host):'; then
    echo "SHOW REPLICA STATUS\\G"
  else
    echo "SHOW SLAVE STATUS\\G"
  fi
}

STATUS_CMD="$(detect_status_cmd)"


STATUS_RAW="$(mysql_run "$STATUS_CMD" 2>&1 || true)"

if echo "$STATUS_RAW" | grep -qiE "^(ERROR|mysql:)|Access denied|unknown option|Can't connect|Can't connect to local MySQL server|ERROR [0-9]+"; then
  echo "$STATUS_RAW"
  exit 2
fi


if [[ -z "${STATUS_RAW//[[:space:]]/}" ]]; then
  die "No replica/slave status returned. Is this server configured as a replica?"
fi

get_field() {
  local key="$1"
  echo "$STATUS_RAW" |
    awk -F': ' -v k="$key" '
      {
        gsub(/^[ \t]+|[ \t]+$/, "", $1)
        if ($1 == k) {print $2; found=1; exit}
      }
      END {if (!found) print ""}'
}

Slave_IO_State="$(get_field "Replica_IO_State")"; [[ -n "$Slave_IO_State" ]] || Slave_IO_State="$(get_field "Slave_IO_State")"

Master_Host="$(get_field "Source_Host")"; [[ -n "$Master_Host" ]] || Master_Host="$(get_field "Master_Host")"
Master_User="$(get_field "Source_User")"; [[ -n "$Master_User" ]] || Master_User="$(get_field "Master_User")"
Master_Port="$(get_field "Source_Port")"; [[ -n "$Master_Port" ]] || Master_Port="$(get_field "Master_Port")"

Master_Log_File="$(get_field "Source_Log_File")"; [[ -n "$Master_Log_File" ]] || Master_Log_File="$(get_field "Master_Log_File")"
Read_Master_Log_Pos="$(get_field "Read_Source_Log_Pos")"; [[ -n "$Read_Master_Log_Pos" ]] || Read_Master_Log_Pos="$(get_field "Read_Master_Log_Pos")"

Relay_Log_File="$(get_field "Relay_Log_File")"
Relay_Log_Pos="$(get_field "Relay_Log_Pos")"

Relay_Master_Log_File="$(get_field "Relay_Source_Log_File")"
[[ -n "$Relay_Master_Log_File" ]] || Relay_Master_Log_File="$(get_field "Relay_Master_Log_File")"

SQL_Delay="$(get_field "SQL_Delay")"

Slave_IO_Running="$(get_field "Replica_IO_Running")"; [[ -n "$Slave_IO_Running" ]] || Slave_IO_Running="$(get_field "Slave_IO_Running")"
Slave_SQL_Running="$(get_field "Replica_SQL_Running")"; [[ -n "$Slave_SQL_Running" ]] || Slave_SQL_Running="$(get_field "Slave_SQL_Running")"

Slave_SQL_Running_State="$(get_field "Replica_SQL_Running_State")"
[[ -n "$Slave_SQL_Running_State" ]] || Slave_SQL_Running_State="$(get_field "Slave_SQL_Running_State")"

Seconds_Behind_Master="$(get_field "Seconds_Behind_Source")"
[[ -n "$Seconds_Behind_Master" ]] || Seconds_Behind_Master="$(get_field "Seconds_Behind_Master")"

Last_Errno="$(get_field "Last_Errno")"
Last_Error="$(get_field "Last_Error")"

Last_IO_Error="$(get_field "Last_IO_Error")"
Last_IO_Error_Timestamp="$(get_field "Last_IO_Error_Timestamp")"

Last_SQL_Error="$(get_field "Last_SQL_Error")"

fmt_kv() {
  local k="$1" v="${2:-}"
  [[ -n "${v// /}" ]] || v="<blank>"
  printf "%-24s %s\n" "$k" "$v"
}

emph_status() {
  local label="$1"
  local value="${2:-}"
  local norm="${value,,}"

  [[ -n "${value// /}" ]] || value="<blank>"

  if [[ "$norm" == "yes" ]]; then
    printf "%-24s %s\n" "$label" "${GREEN}******* ${value} ******* ---->>> OK${RESET}"
    return 0
  elif [[ "$norm" == "no" ]]; then
    printf "%-24s %s\n" "$label" "${RED}******* ${value} ******* ---->>> PROBLEM${RESET}"
    return 2
  else
    printf "%-24s %s\n" "$label" "${YELLOW}******* ${value} ******* ---->>> UNKNOWN${RESET}"
    return 1
  fi
}

sec_to_min() {
  local s="${1:-}"
  [[ "$s" =~ ^[0-9]+$ ]] || { echo "<blank>"; return; }

  if (( s < 600 )); then
    awk -v sec="$s" 'BEGIN { printf "%.1fm", sec/60 }'
  else
    awk -v sec="$s" 'BEGIN { printf "%dm", int(sec/60) }'
  fi
}


overall="OK"
overall_color="$GREEN"

io_rc=0; sql_rc=0
emph_status "Slave_IO_Running" "${Slave_IO_Running:-}"; io_rc=$?
emph_status "Slave_SQL_Running" "${Slave_SQL_Running:-}"; sql_rc=$?

if [[ $io_rc -eq 2 || $sql_rc -eq 2 ]]; then
  overall="CRITICAL"; overall_color="$RED"
elif [[ $io_rc -eq 1 || $sql_rc -eq 1 ]]; then
  overall="WARNING"; overall_color="$YELLOW"
fi

lag_hint="<blank>"
if [[ -n "${Seconds_Behind_Master// /}" ]] && [[ "${Seconds_Behind_Master}" =~ ^[0-9]+$ ]]; then
  lag_min="$(sec_to_min "$Seconds_Behind_Master")"
  if (( Seconds_Behind_Master == 0 )); then
    lag_hint="${GREEN}${lag_min}${RESET}"
  elif (( Seconds_Behind_Master <= 30 )); then
    lag_hint="${YELLOW}${lag_min}${RESET}"
    [[ "$overall" == "OK" ]] && overall="WARNING" && overall_color="$YELLOW"
  else
    lag_hint="${RED}${lag_min}${RESET}"
    overall="CRITICAL"; overall_color="$RED"
  fi
fi

err_hint="<blank>"
if [[ -n "${Last_SQL_Error// /}${Last_IO_Error// /}${Last_Error// /}" ]]; then
  err_hint="${RED}Errors present${RESET}"
  overall="CRITICAL"; overall_color="$RED"
fi

clear 2>/dev/null || true

hdr "MySQL Replication Status Check Utility  ${DIM}(v1.5)${RESET}"
echo "${DIM}Author: Prashant Dixit${RESET}"
hr
printf "%s %s\n" "${BOLD}Host:${RESET}" "$(hostname -f)"
printf "%s %s\n" "${BOLD}Time:${RESET}" "$(date)"
printf "%s %s\n" "${BOLD}Mode:${RESET}" "${STATUS_CMD%%\\G}"
printf "%s %s  %s  %s\n" "${BOLD}Overall:${RESET}" "${overall_color}${BOLD}${overall}${RESET}" "${BOLD}Lag:${RESET} ${lag_hint}" "${err_hint}"
hr
echo

hdr "Replication Topology"
hr
fmt_kv "Slave_IO_State" "${Slave_IO_State:-}"
fmt_kv "Master_Host" "${Master_Host:-}"
fmt_kv "Master_User" "${Master_User:-}"
fmt_kv "Master_Port" "${Master_Port:-}"
echo

hdr "Positions / Relay"
hr
fmt_kv "Master_Log_File" "${Master_Log_File:-}"
fmt_kv "Read_Master_Log_Pos" "${Read_Master_Log_Pos:-}"
fmt_kv "Relay_Log_File" "${Relay_Log_File:-}"
fmt_kv "Relay_Log_Pos" "${Relay_Log_Pos:-}"
fmt_kv "Relay_Master_Log_File" "${Relay_Master_Log_File:-}"
fmt_kv "SQL_Delay" "${SQL_Delay:-}"
echo

hdr "Thread Status"
hr
emph_status "Slave_IO_Running" "${Slave_IO_Running:-}"
emph_status "Slave_SQL_Running" "${Slave_SQL_Running:-}"
fmt_kv "Slave_SQL_Running_State" "${Slave_SQL_Running_State:-}"
fmt_kv "Seconds_Behind_Master" "$(sec_to_min "${Seconds_Behind_Master:-}")"
echo

hdr "Errors"
hr
fmt_kv "Last_Errno" "${Last_Errno:-}"
fmt_kv "Last_Error" "${Last_Error:-}"
fmt_kv "Last_IO_Error" "${Last_IO_Error:-}"
fmt_kv "Last_IO_Error_Timestamp" "${Last_IO_Error_Timestamp:-}"
fmt_kv "Last_SQL_Error" "${Last_SQL_Error:-}"
hr
echo

if [[ "$overall" == "OK" ]]; then
  exit 0
elif [[ "$overall" == "WARNING" ]]; then
  exit 1
else
  exit 2
fi

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: bash, HA, highavailability, mysql, oracle, performance, replication, scripting | Leave a Comment »

When Linux Swaps Away My Sleep – MySQL, RHEL8, and the Curious Case of High Swap Usage

Posted by FatDBA on December 12, 2025

I remember an old instance where I’d got an alert that one of production MySQL servers had suddenly gone sluggish after moved to RHEL 8 from RHEL7. On checking, I found something odd … the system was consuming swap heavily, even though there was plenty of physical memory free.

Someone who did the first time deployment years before, left THP as enabled and with default swapiness … but this setting that had worked perfectly for years on RHEL 7, but now, after the upgrade to RHEL 8.10, the behavior was completely different.

This post is about how that small OS level change turned into a real performance headache, and what we found after some deep digging.

The server in question was a MySQL 8.0.43 instance running on a VMware VM with 16 CPUs and 64 GB RAM. When the issue began, users complained that the database was freezing randomly, and monitoring tools were throwing high load average and slow query alerts.

Let’s take a quick look at the environment … It was a pretty decent VM, nothing under sized.

$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.10 (Ootpa)

$ uname -r
4.18.0-553.82.1.el8_10.x86_64

$ uptime
11:20:24 up 3 days, 10:57,  2 users,  load average: 4.34, 3.15, 3.63

$ grep ^CPU\(s\) sos_commands/processor/lscpu
CPU(s): 16

When I pulled the SAR data for that morning, the pattern was clear ..There were long stretches on CPU where %iowait spiked above 20-25%, and load averages crossed 400+ during peak time! The 09:50 slot looked particularly suspicious .. load average jumped to 464 and remained high for several minutes.

09:00:01 %usr=26.08  %iowait=22.78  %idle=46.67
09:40:01 %usr=29.04  %iowait=24.43  %idle=40.11
09:50:01 %usr=7.55   %iowait=10.07  %idle=80.26
10:00:01 %usr=38.53  %iowait=19.54  %idle=35.32

Here’s what the memory and swap stats looked like:

# Memory Utilization
%memused ≈ 99.3%
Free memory ≈ 400 MB (on a 64 GB box)
Swap usage ≈ 85% average, hit 100% at 09:50 AM

That was confusing.. MySQL was not leaking memory, and there was still >10 GB available for cache and buffers. The system was clearly pushing pages to swap even though it didn’t need to. That was the turning point in the investigation.

At the same time, the reporting agent started reporting MySQL timeouts:

 09:44:09 [mysql] read tcp xxx.xx.xx.xx:xxx->xxx.xxx.xx.xx:xxxx: i/o timeout
 09:44:14 [mysql] read tcp xx.xx.xx.xxxx:xxx->xx.xx.xx.xx.xx:xxx: i/o timeout

And the system kernel logs showed the familiar horror lines for every DBA .. MySQL threads were being stalled by the OS. This aligned perfectly with the time when swap usage peaked.

 09:45:34 kernel: INFO: task mysqld:5352 blocked for more than 120 seconds.
 09:45:34 kernel: INFO: task ib_pg_flush_co:9435 blocked for more than 120 seconds.
 09:45:34 kernel: INFO: task connection:10137 blocked for more than 120 seconds.

I double-checked the swappiness configuration:

$ cat /proc/sys/vm/swappiness
1

So theoretically, swap usage should have been minimal. But the system was still paging aggressively. Then I checked the cgroup configuration (a trick I learned from a Red Hat note) .. And there it was more than 115 cgroups still using the default value of 60! … In RHEL 8, memory management moved more toward cgroup v2, which isolates memory parameters by control group.

So even if /proc/sys/vm/swappiness is set to 1, processes inside those cgroups can still follow their own default value (60) and this explained why the system was behaving like swappiness=60 even though the global value was 1.

$ find /sys/fs/cgroup/memory/ -name *swappiness -exec cat {} \; | uniq -c
      1 1
    115 60

In RHEL 8, memory management moved more toward cgroup v2, which isolates memory parameters by control group. So even if /proc/sys/vm/swappiness is set to 1, processes inside those cgroups can still follow their own default value (60). This explained why the system was behaving like swappiness=60 even though the global value was 1.

Once the root cause was identified, the fix was straightforward — Enforced global swapiness across CGroups

Add this to /etc/sysctl.conf:

vm.force_cgroup_v2_swappiness = 1

Then reload:
sysctl -p

This forces the kernel to apply the global swappiness value to all cgroups, ensuring consistent behavior. Next, we handled THP that is always expected to cause intermittent fragmentation and stalls in memory intensive workloads like MySQL, Oracle, PostgreSQL and even in non RDBMSs like Cassandra etc., we disabled the transparent huge pages and rebooted the host.

In short what happened and was the root cause.

RHEL8 introduced a change in how swappiness interacts with cgroups.
The old /proc/sys/vm/swappiness setting no longer applies universally.
Unless explicitly forced, MySQL’s cgroup keeps the default swappiness (60).
Combined with THP and background I/O, this created severe page cache churn.

So the OS upgrade, not MySQL, was the real root cause.

Note: https://access.redhat.com/solutions/6785021

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: Tuning, performance, troubleshooting, mysql, optimization, OS, rhel, opertingsystem | Leave a Comment »

From Painful Manual LOB Shrink to Automatic SecureFiles Shrink

Posted by FatDBA on November 15, 2025

I’ve been working with LOBs for years now, and trust me, shrinking them has always been a headache. Anyone who has ever tried ALTER TABLE … SHRINK SPACE on a big SecureFiles LOB knows the pain … blocking sessions, unexpected waits, and sometimes that lovely ORA-1555 popping up at the worst time. Every DBA eventually gets into that situation where a LOB segment is 200 GB on disk but only 10 GB of real data remains. You delete rows… but the space never comes back unless you manually shrink it, which itself can cause more issues.

With the introduction of Automatic SecureFiles Shrink, Oracle really made a DBA’s life easier. This feature, which first came out in 23ai, quietly frees up unused LOB space in the background without disrupting your workload. I wanted to see how it behaves in a real scenario, so I set up a small lab and tested it out. Here’s the whole experiment, raw and simple.

Lets do a demo and understand how this new feature works … I spun up a fresh PDB and made a small tablespace just for this test. Nothing fancy.

CREATE TABLESPACE lobts
  DATAFILE '/u02/oradata/LOBTS01.dbf'
  SIZE 1G AUTOEXTEND ON NEXT 256M;

Tablespace created.



-- Table with a securefile LOB based column.
CREATE TABLE tst_securefile_lob
(
    id NUMBER,
    lob_data CLOB
)
LOB (lob_data) STORE AS SECUREFILE (
    TABLESPACE lobts
    CACHE
);

Table created.




SELECT table_name, column_name, securefile
FROM   user_lobs
WHERE  table_name='TST_SECUREFILE_LOB';

TABLE_NAME           COLUMN_NAME   SECUREFILE
-------------------  ------------  ----------
TST_SECUREFILE_LOB   LOB_DATA      YES

Next, I inserted a good amount of junk data around 10,000 rows of random CLOB strings. I wanted the LOB segment to be big enough to see clear differences after shrink.

BEGIN
  FOR r IN 1 .. 10 LOOP
    INSERT INTO tst_securefile_lob (id, lob_data)
    SELECT r*100000 + level,
           TO_CLOB(DBMS_RANDOM.STRING('x', 32767))
    FROM dual
    CONNECT BY level <= 1000;
    COMMIT;
  END LOOP;
END;
/

PL/SQL procedure successfully completed.



SELECT COUNT(*) FROM tst_securefile_lob;

  COUNT(*)
----------
     10000




SELECT ul.segment_name,
       us.blocks,
       ROUND(us.bytes/1024/1024,2) AS mb
FROM   user_lobs ul
JOIN   user_segments us
  ON us.segment_name = ul.segment_name
WHERE  ul.table_name = 'TST_SECUREFILE_LOB';

SEGMENT_NAME               BLOCKS     MB
------------------------   --------   --------
SYS_LOB0001234567C00002$     131072     1024.00




-- After stats and a quick check in USER_SEGMENTS, the LOB segment was showing a nice chunky size. 
-- Then I deleted almost everything
-- Now the table will have very few rows left, but the LOB segment was still the same size. As usual.
DELETE FROM tst_securefile_lob
WHERE id < 900000;
COMMIT;

9990 rows deleted. 
Commit complete.


-- Checking LOB Internal Usage (Before Auto Shrink)
EXEC show_securefile_space(USER, 'SYS_LOB0001234567C00002$');
Segment blocks      = 131072  bytes=1073741824
Used blocks         =  10240  bytes=83886080
Expired blocks      = 110592  bytes=905969664
Unexpired blocks    =  10240  bytes=83886080
-- This clearly shows almost the entire segment is expired/free but not reclaimed.


-- Checking Auto Shrink Statistics (Before Enabling)
SELECT name, value
FROM   v$sysstat
WHERE  name LIKE '%Auto SF SHK%'
ORDER  BY name;

NAME                                     VALUE
--------------------------------------   ------
Auto SF SHK failures                         0
Auto SF SHK segments processed               0
Auto SF SHK successful                       0
Auto SF SHK total number of tasks            0

Turning on Automatic Shrink — By default this feature is OFF, so I enabled it:

EXEC DBMS_SPACE.SECUREFILE_SHRINK_ENABLE;

PL/SQL procedure successfully completed.

That’s literally it. No parameters, no tuning, nothing else. Just enable.

Automatic SecureFiles Shrink does not run immediately after you delete data. Oracle requires that a SecureFiles LOB segment be idle for a specific amount of time before it becomes eligible for shrinking, and the default idle-time limit is 1,440 minutes (24 hours). “Idle” means that no DML or preallocation activity has occurred on that LOB during that period. Once the segment meets this condition, Oracle considers it during its automatic background shrink task, which is part of AutoTask and runs every 30 minutes with a defined processing window.

When the task executes, it attempts to shrink eligible segments, but it does so gradually and in small increments using a trickle-based approach, rather than reclaiming all possible space in a single operation. This incremental behavior is deliberate: it reduces impact on running workloads and avoids heavy reorganization all at once. Only segments that meet all selection criteria .. such as having sufficient free space above the defined thresholds and not using RETENTION MAX … are processed. Because of this incremental design and the eligibility rules, the full space reclamation process can span multiple background cycles.

Some of the internal hidden/underscore params that can be used to control these limits (not unless support asked you to do or you are dealing a lab system)

Parameter                                    Default_Value    Session_Value    Instance_Value   IS_SESSION_MODIFIABLE   IS_SYSTEM_MODIFIABLE
------------------------------------------- ---------------  ---------------- ---------------- ----------------------- ---------------------
_ktsls_autoshrink_seg_idle_seconds            86400            86400            86400            FALSE                   IMMEDIATE
_ktsls_autoshrink_seg_pen_seconds             86400            86400            86400            FALSE                   IMMEDIATE
_ktsls_autoshrink_trickle_mb                  5                5                5                FALSE                   IMMEDIATE

Okay, lets check out post chnaghe outputs, what automatic LOB shrink does to our test object.

SELECT ul.segment_name,
       us.blocks,
       ROUND(us.bytes/1024/1024,2) AS mb
FROM   user_lobs ul
JOIN   user_segments us
  ON us.segment_name = ul.segment_name
WHERE ul.table_name='TST_SECUREFILE_LOB';

SEGMENT_NAME               BLOCKS     MB
------------------------   --------   --------
SYS_LOB0001234567C00002$      40960      320.00

Note: From ~1024 MB down to ~320 MB. Auto shrink worked     🙂



-- DBMS_SPACE Usage (After Auto Shrink)
EXEC show_securefile_space(USER, 'SYS_LOB0001234567C00002$');
Segment blocks      = 40960  bytes=335544320
Used blocks         =  9216  bytes=75497472
Expired blocks      =  6144  bytes=50331648
Unexpired blocks    =  9216  bytes=75497472

Notee:  Expired blocks dropped from 110k -->  6k. This confirms auto shrink freed most of the fragmented space.






-- After the task is run.
SELECT name, value
FROM v$sysstat
WHERE name LIKE '%Auto SF SHK%'
ORDER BY name;

NAME                                     VALUE
--------------------------------------   ------
Auto SF SHK failures                         0
Auto SF SHK segments processed               1
Auto SF SHK successful                       1
Auto SF SHK total number of tasks            1




SELECT owner,
       segment_name,
       shrunk_bytes,
       attempts,
       last_shrink_time
FROM v$securefile_shrink;

OWNER      SEGMENT_NAME               SHRUNK_BYTES    ATTEMPTS   LAST_SHRINK_TIME
---------  ------------------------   -------------   ---------  ---------------------------
PRASHANT   SYS_LOB0001234567C00002$      744947712          1     14-NOV-2025 06:32:15

Note:  Oracle automatically reclaimed ~710 MB of wasted LOB space.

This feature, It’s simple, it’s safe, and it saves DBAs from doing manual shrink maintenance again and again. It’s not a fast feature it’s slow and polite on purpose but it works exactly as expected.

If your system has LOBs (EBS attachments, documents, JSON, logs, media files, etc.), you should absolutely enable this. Let Oracle handle the boring part.

Hope It Helped!
Prashant Dixit

Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: 23ai, ai, databases, ML, oracle, performance, relational, renaps | Leave a Comment »

MySQL OPTIMIZE TABLE – Disk Space Reclaim Defragmentation and Common Myths

Posted by FatDBA on August 8, 2025

When working with MySQL databases, one common task is reclaiming disk space and defragmenting tables. The typical solution that most of us have turned to is OPTIMIZE TABLE. While this sounds like a simple, quick fix, there are a few myths and things we often overlook that can lead to confusion. Let’s break it down.

The Basics: OPTIMIZE TABLE and ALTER TABLE

To reclaim space or defragment a table in MySQL, the go-to commands are usually:

OPTIMIZE TABLE <table_name>; or OPTIMIZE TABLE [table_name_1], [table_name_2] or via sudo mysqlcheck -o [schema] [table] -u [username] -p [password]

But before we dive into the myths, let’s clarify what happens when you run these commands.

OPTIMIZE TABLE Overview

OPTIMIZE TABLE is essentially a shorthand for ALTER TABLE <table_name> ENGINE=InnoDB for InnoDB tables. It works by rebuilding the table, compacting data, and reclaiming unused space.
In MySQL 5.6.17 and later, the command works online, meaning it allows concurrent reads and writes during the rebuild, with some exceptions (brief locking during initial and final stages). Prior to 5.6.17, the table was locked for the entire duration of the operation, causing application downtime.

Myth #1: OPTIMIZE TABLE Is Always Quick

No: OPTIMIZE TABLE can indeed take a long time for large tables, especially if there are a lot of inserts, deletes, or updates. This is true when rebuilding the table. For larger datasets, the I/O load can be significant.

mysql> OPTIMIZE TABLE my_large_table;
+----------------------------+--------+----------+----------+----------+
| Table                      | Op     | Msg_type | Msg_text |
+----------------------------+--------+----------+----------+----------+
| mydb.my_large_table         | optimize | ok       | Table optimized |
+----------------------------+--------+----------+----------+----------+

In the output, the values under each column heading would show:

Table: The table that was optimized (e.g., yourdb.customers).
Op: The operation performed (optimize).
Msg_type: Type of message, usually status.
Msg_text: The result of the operation, such as OK or a specific message (e.g., “Table is already up to date”).

If the table is already optimized or doesn’t require optimization, the output might look like this:

+------------------+----------+----------+-----------------------------+
| Table            | Op       | Msg_type | Msg_text                    |
+------------------+----------+----------+-----------------------------+
| yourdb.customers | optimize | note     | Table is already up to date |
+------------------+----------+----------+-----------------------------+

Below screenshot explains possible values of msg_text etc.

Real-Time Example Validation:

MySQL logs can show something like this: [Note] InnoDB: Starting online optimize table my_large_table [Note] InnoDB: Table optimized successfully

However, for larger tables, it is critical to consider the additional I/O load during the rebuild. For example:
bash [Note] InnoDB: Rebuilding index my_large_table_idx [Note] InnoDB: Table rebuild completed in 300 seconds

Note: In order to get more detailed information its good to verify PROCESSLIST or SLOW QUERY LOG (if enabled).

Myth #2: OPTIMIZE TABLE Doesn’t Block Other Operations

Yes/No: This myth is partly true and partly false depending on the MySQL version.
For MySQL 5.5 and earlier: The table is locked for writes, but concurrent reads are allowed.
For MySQL 5.6.16 and earlier: Same as above .. concurrent reads are allowed, but writes are blocked.
For MySQL 5.6.17 and later: Concurrent reads and writes are allowed during the rebuild process, but the table still needs to be briefly locked during the initial and final phases. There is a brief lock required to start the process, which is often overlooked.

Real-Time Example for MySQL 5.6.17+:

[Note] InnoDB: Starting online optimize table my_large_table
[Note] InnoDB: Table optimized successfully

Although reads and writes are allowed during this process, you might still experience short bursts of lock at the start and end of the operation.

Myth #3: You Don’t Need to Worry About Disk Space

No: You need sufficient disk space before running OPTIMIZE TABLE. If you’re running low on space, you could encounter errors or performance issues during the rebuild process.
There are few bugs as well which might could occur if disk space is insufficient. Additionally, there’s also temporary disk space required during the rebuild process. Running OPTIMIZE TABLE with insufficient space could fail silently, leading to issues down the line.

Best Practice:
Ensure that your disk has at least as much free space as the table you’re optimizing, as a copy of the table is created temporarily during the rebuild.

Myth #4: ALTER TABLE with Row Format Is Always the Solution

No: ALTER TABLE ... ROW_FORMAT=COMPRESSED or other formats can help optimize space, but it may not always result in savings, especially for certain data types (like BLOBs or large text fields). It can also introduce overhead on the CPU if you’re using compression.

In some cases, switching to a compressed format can actually increase the size of the table, depending on the type of data stored.

Real-Time Example:

For a table like customer_data: ALTER TABLE customer_data ROW_FORMAT=COMPRESSED; Depending on the types of columns and data (e.g., BLOBs or TEXT), compression might not always yield the expected results.

Myth #5: You Only Need to Optimize Tables When They Get Slow

No: This is another common misconception. Regular optimization is crucial to ensure long-term performance, especially for heavily modified tables. Tables that undergo a lot of updates or deletions can become fragmented over time, even without obvious performance degradation.

Optimizing periodically can help prevent gradual performance loss.

Real-Time Example:

If you have an orders table: mysql> OPTIMIZE TABLE orders; Over time, especially with frequent UPDATE or DELETE operations, fragmentation can slow down access, even if it’s not immediately noticeable.

Main Pointerss ..

OPTIMIZE TABLE is a helpful tool but not a one-size-fits-all solution.
It requires sufficient disk space and careful consideration of your MySQL version and storage engine (InnoDB vs. MyISAM).
In MySQL 5.6.17 and later, online optimizations are possible, but brief locking still occurs during the process.
For MyISAM tables, there’s no escaping the full lock during optimization.
Always assess the potential overhead (I/O and CPU usage) before running the operation, especially on larger datasets.

By breaking these myths, you can make better decisions when using OPTIMIZE TABLE to keep your database healthy without causing unnecessary downtime or performance hits.

Hope It Helped!
Prashant Dixit
Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: ai, Database, mysql, oracle, performance, technology | Leave a Comment »

Oracle AWR Scripts Decoded .. No More Guessing!

Posted by FatDBA on July 29, 2025

Recently, someone asked me why there are so many AWR-like files in the directory and whether they are as useful as the well-known awrrpt.sql. I took the opportunity to explain what I knew about them and their purpose. Since I thought it could be helpful, I decided to share this insight with my readers as well.

If you’re into performance tuning in Oracle, very likey you’ve already used AWR reports. But then you open this directory: $ORACLE_HOME/rdbms/admin …. …and boom – you’re hit with a list of cryptic scripts: awrrpt.sql, awrgdrpi.sql, awrsqrpt.sql, awrextr.sql…

What do they all do?
When should you use which one?
Why are they named like 90s DOS files 🙂 ?

Let’s keep it short and sharp. Here’s your point-to-point breakdown of the most important AWR scripts.

Before you go running any of these scripts – make sure you have the Oracle Diagnostic Pack license.
AWR stuff is not free.

I grouped them into logical chunks – reports, comparisons, SQLs, data movement, etc.

Performance Reports

These are the most common AWR reports you run to analyze performance between 2 snapshots.

Script	What it does
`awrrpt.sql`	Generates AWR report for the current instance (for single instance DBs)
`awrrpti.sql`	Same as above, but lets you select another DBID or instance (useful for RAC)
`awrgrpt.sql`	AWR Global Report – gives a full RAC-wide view
`awrgrpti.sql`	Same as above, but lets you pick another DBID/instance

Example:
You’re troubleshooting high CPU on node 2 of your RAC? Use awrrpti.sql.

Comparison Reports

These help you compare two different time ranges – maybe before and after a patch, or different load periods.

Script	What it does
`awrddrpt.sql`	Compares two AWR snapshots (date diff) – for a single instance
`awrddrpi.sql`	Same as above, for another dbid/instance
`awrgdrpt.sql`	Global RAC diff report (current RAC)
`awrgdrpi.sql`	Global RAC diff report (another dbid/instance)

Use these when you wanna say, “Hey, this new code made the DB slower… prove it!”

Want to see what a particular SQL is doing? These are your tools.

Script	What it does
`awrsqrpt.sql`	SQL report for a specific SQL_ID in the current instance
`awrsqrpi.sql`	Same thing but lets you pick another dbid/instance

You’ll be surprised how useful this is when hunting bad queries.

Sometimes, you need to take AWR data from one system and analyze it somewhere else (like test or dev).

Script	What it does
`awrextr.sql`	Export AWR data using datapump
`awrload.sql`	Import AWR data using datapump

This is actually gold when working on performance issues across environments.

Helper / Utility Scripts

These are mostly helper scripts to define input or make reports more automated or interactive.

Script	What it does
`perfhubrpt.sql`	Generates a fancy interactive Performance Hub report
`awrinpnm.sql`	Input name helper for AWR
`awrinput.sql`	Get inputs before running AWR reports
`awrddinp.sql`	Input helper for diff reports
`awrgdinp.sql`	Input helper for RAC diff reports

What’s with these weird script names?

Yeah, all these awrsqrpi.sql, awrgdrpt.sql, etc. – they look like random garbage at first.
But there’s actually some logic.

Here’s how to decode them:

Abbreviation	Means
`awr`	Automatic Workload Repository
`rpt` or `rp`	Report
`i`	Lets you select specific instance or DBID
`g`	Global report for RAC
`d` or `dd`	Diff reports (comparing two snapshots)
`sq`	SQL
`inp`	Input helper

So awrsqrpi.sql = AWR SQL Report for a different instance
And awrgdrpi.sql = AWR Global Diff Report for another DBID/instance

So Which Script Should I Use?

Here’s a quick cheat sheet:

Task	Script
Normal AWR report (single instance)	`awrrpt.sql`
AWR report for RAC (global view)	`awrgrpt.sql`
SQL performance report	`awrsqrpt.sql`
Compare two AWR reports	`awrddrpt.sql`
Export/import AWR data	`awrextr.sql` and `awrload.sql`

If you’re doing anything with RAC – prefer the ones with g in them.
If you’re automating – use the *inp*.sql files.

Final Thoughts

Yes, the names are ugly.
Yes, the syntax is old-school.
But honestly? These AWR scripts are still some of the best tools you have for DB performance analysis.

Just remember:

Don’t use them without a valid license
Learn the naming pattern once – and it gets way easier
Practice running different ones on test systems

And next time someone complains, “The database is slow” … you know exactly which script to run.

Hope It Helped!
Prashant Dixit
Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: awr, databases, fatdba, optimization, oracle, performance, renaps, Tuning | Leave a Comment »

fatdba explores Vector Search in Oracle 23ai

Posted by FatDBA on July 23, 2025

So Oracle rolled out 23ai a while back and like every major release, it came packed with some really cool and interesting features. One that definitely caught my eye was Vector Search. I couldn’t resist diving in… and recently I explored it in depth and would like to share a though on this subject.

You see, we’ve been doing LIKE '%tax policy%' since forever. But now, Oracle’s SQL has become more powerful. Not only does it match words … it matches meaning.

So here’s me trying to explain what vector search is, how Oracle does it, why you’d care, and some examples that’ll hopefully make it click.

What’s Vector Search, Anyway?

Alright, imagine this:

You have a table of products. You search for “lightweight laptop for travel”.
Some entries say “ultrabook”, others say “portable notebook”, and none mention “lightweight” or “travel”. Old-school SQL would’ve said: “No Matches Found”

But with vector search, it gets it. Oracle turns all that text into math .. basically, a long list of numbers called a vector … and compares meanings instead of words.

So What’s a Vector?

When we say “vector” in vector search, we’re not talking about geometry class. In the world of AI and databases, a vector is just a long list of numbers … each number representing some aspect or feature of the original input (like a sentence, product description, image, etc.).

Here’s a basic example:
[0.12, -0.45, 0.88, …, 0.03]
This is a vector … maybe a 512 or 1536-dimension one .. depending on the embedding model used (like OpenAI, Oracle’s built-in model, Cohere, etc.).

Each number in this list is abstract, but together they represent the essence or meaning of your data.

Let’s say you have these two phrases:
“Apple is a tech company”
“iPhone maker based in California”

Now, even though they don’t share many words, they mean nearly the same thing. When passed through an embedding model, both phrases are converted into vectors:

Vector A: [0.21, -0.32, 0.76, …, 0.02]
Vector B: [0.22, -0.30, 0.74, …, 0.01]
They look very close … and that’s exactly the point.

What Oracle 23ai Gives You

A new VECTOR datatype (yeah!)
AI_VECTOR() function to convert text into vectors
VECTOR_INDEX to make search blazing fast
VECTOR_DISTANCE() to measure similarity
It’s all native in SQL ..no need for another vector DB bolted on

Let’s Build Something Step-by-Step

We’ll build a simple product table and do a vector search on it.

Step 1: Create the table

CREATE TABLE products (
  product_id     NUMBER PRIMARY KEY,
  product_name   VARCHAR2(100),
  description    VARCHAR2(1000),
  embedding      VECTOR(1536)
);

1536? Yeah, that’s the number of dimensions from Oracle’s built-in embedding model. Depends on which one you use.

Step 2: Generate vector embeddings

UPDATE products
SET embedding = ai_vector('text_embedding', description);

This’ll take the description, pass it through Oracle’s AI model, and give you a vector. Magic.

Step 3: Create the vector index

CREATE VECTOR INDEX product_vec_idx
ON products (embedding)
WITH (DISTANCE METRIC COSINE);

This speeds up the similarity comparisons … much like an index does for normal WHERE clauses.

Step 4: Semantic Search in SQL

SELECT product_id, product_name, 
       VECTOR_DISTANCE(embedding, ai_vector('text_embedding', 'light laptop for designers')) AS score
FROM products
ORDER BY score
FETCH FIRST 5 ROWS ONLY;

Now we’re searching for meaning, not words.

VECTOR_DISTANCE Breakdown

You can use different math behind the scenes:

VECTOR_DISTANCE(v1, v2 USING COSINE)
VECTOR_DISTANCE(v1, v2 USING EUCLIDEAN)
VECTOR_DISTANCE(v1, v2 USING DOT_PRODUCT)

Cosine is the usual go-to for text. Oracle handles the rest for you.

Use Cases You’ll Actually Care About

1. Semantic Product Search — “Fast shoes for runners” => shows “Nike Vaporfly”, even if it doesn’t say “fast”.

2. Similar Document Retrieval — Find all NDAs that look like this one (even with totally different words).

3. Customer Ticket Suggestion — Auto-suggest resolutions from past tickets. Saves your support team hours.

4. Content Recommendation — “People who read this also read…” kind of stuff. Easy to build now.

5. Risk or Fraud Pattern Matching — Find transactions that feel like fraud ..even if the details don’t match 1:1.

I know it might sound little confusing .. lets do a Onwe more example : Legal Document Matching

CREATE TABLE legal_docs (
  doc_id       NUMBER PRIMARY KEY,
  title        VARCHAR2(255),
  content      CLOB,
  content_vec  VECTOR(1536)
);

Update vectors:

UPDATE legal_docs
SET content_vec = ai_vector('text_embedding', content);

Now find similar docs:

SELECT doc_id, title
FROM legal_docs
ORDER BY VECTOR_DISTANCE(content_vec, ai_vector('text_embedding', 'confidentiality in government contracts'))
FETCH FIRST 10 ROWS ONLY;

That’s it. You’re officially building an AI-powered legal search engine.

Things to Know

Creating vectors can be heavy .. batch it.
Indexing speeds up similarity search a lot.
Combine with normal filters for best results:

SELECT * FROM products
WHERE category = 'laptop'
ORDER BY VECTOR_DISTANCE(embedding, ai_vector('gaming laptop under 1kg'))
FETCH FIRST 5 ROWS ONLY;

Final Thoughts from fatdba

I’m honestly impressed. Oracle took something that felt like ML black magic and put it right in SQL. No external service. No complicated setups. Just regular SQL, but smater.

Hope It Helped!
Prashant Dixit
Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: 23ai, ai, artificial-intelligence, databases, DBA, llm, ML, oracle, performance, rag, renaps, SQL, technology, Tuning, vector | Leave a Comment »

Diagnosing a MySQL database performance Issue Using MySQLTuner.

Posted by FatDBA on July 20, 2025

A few weeks ago, we ran into a pretty nasty performance issue on one of our MySQL production-like grade databases. It started with slow application response times and ended with my phone blowing up with alerts. Something was clearly wrong, and while I suspected some bad queries or config mismatches, I needed a fast way to get visibility into what was really happening under the hood.

This is where MySQLTuner came to the rescue, again 🙂 I’ve used this tool in the past, and honestly, it’s one of those underrated gems for DBAs and sysadmins. It’s a Perl script that inspects your MySQL configuration and runtime status and then gives you a human-readable report with recommendations.

Let me walk you through how I used it to identify and fix the problem ..step by step .. including actual command output, what I changed, and the final outcome.

Step 1: Getting MySQLTuner

First things first, if you don’t already have MySQLTuner installed, just download it:

bashCopyEditwget https://raw.githubusercontent.com/major/MySQLTuner-perl/master/mysqltuner.pl
chmod +x mysqltuner.pl

You don’t need to install anything. Just run it like this:

bashCopyEdit./mysqltuner.pl --user=root --pass='YourStrongPassword'

(Note: Avoid running this in peak traffic hours on prod unless you’re sure about your load and risk.)

Step 2: Sample Output Snapshot

Here’s a portion of what I got when I ran it:

 >>  MySQLTuner 2.6.20 
 >>  Run with '--help' for additional options and output filtering

[OK] Currently running supported MySQL version 5.7.43
[!!] Switch to 64-bit OS - MySQL cannot use more than 2GB of RAM on 32-bit systems
[OK] Operating on 64-bit Linux

-------- Performance Metrics -------------------------------------------------
[--] Up for: 3d 22h 41m  (12M q [35.641 qps], 123K conn, TX: 92G, RX: 8G)
[--] Reads / Writes: 80% / 20%
[--] Binary logging is enabled (GTID MODE: ON)
[--] Total buffers: 3.2G global + 2.8M per thread (200 max threads)
[OK] Maximum reached memory usage: 4.2G (27.12% of installed RAM)
[!!] Slow queries: 15% (1M/12M)
[!!] Highest connection usage: 98% (197/200)
[!!] Aborted connections: 2.8K
[!!] Temporary tables created on disk: 37% (1M on disk / 2.7M total)

-------- MyISAM Metrics ------------------------------------------------------
[!!] Key buffer used: 17.2% (89M used / 512M cache)
[!!] Key buffer size / total MyISAM indexes: 512.0M/800.0M

-------- InnoDB Metrics ------------------------------------------------------
[OK] InnoDB buffer pool / data size: 2.0G/1.5G
[OK] InnoDB buffer pool instances: 1
[--] InnoDB Read buffer efficiency: 99.92% (925M hits / 926M total)
[!!] InnoDB Write log efficiency: 85.10% (232417 hits / 273000 total)
[!!] InnoDB log waits: 28

-------- Recommendations -----------------------------------------------------
General recommendations:
    Control warning line(s) size by reducing joins or increasing packet size
    Increase max_connections slowly if needed
    Reduce or eliminate persistent connections
    Enable the slow query log to troubleshoot bad queries
    Consider increasing the InnoDB log file size
    Query cache is deprecated and should be disabled

Variables to adjust:
    max_connections (> 200)
    key_buffer_size (> 512M)
    innodb_log_file_size (>= 512M)
    tmp_table_size (> 64M)
    max_heap_table_size (> 64M)

Step 3: What I Observed

Here’s what stood out for me:

1. Too many slow queries — 15% of all queries were slow. That’s a huge red flag. This wasn’t being logged properly either — the slow query log was off.

2. Disk-based temporary tables — 37% of temporary tables were being written to disk. This kills performance during joins and sorts.

3. Connections hitting limit — 197 out of 200 max connections used at peak. Close to saturation ..possibly causing application timeouts.

4. MyISAM key buffer inefficient — Key buffer was too small for the amount of MyISAM index data (yes, we still have a couple legacy MyISAM tables..

5. InnoDB log file too small — Frequent log flushing and waits were indicated, meaning `innodb_log_file_size` wasn’t enough for our write load.

Step 4: Actions I Took

Here’s what I changed based on the output and a quick double-check of our workload patterns:

– Enabled Slow Query Log

sqlCopyEditSET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

And updated /etc/my.cnf:

iniCopyEditslow_query_log = 1
slow_query_log_file = /var/log/mysql-slow.log
long_query_time = 1

– Increased `tmp_table_size` and `max_heap_table_size`:

iniCopyEdittmp_table_size = 128M
max_heap_table_size = 128M

(This reduced the % of temp tables going to disk.)

– Raised `innodb_log_file_size`:

iniCopyEditinnodb_log_file_size = 512M
innodb_log_files_in_group = 2

Caution: You need to shut down MySQL cleanly and delete old redo logs before applying this change.

– Raised `key_buffer_size`:

iniCopyEditkey_buffer_size = 1G

We still had some legacy MyISAM usage and this definitely helped reduce read latency.

– Upped the `max_connections` a bit (but also discussed with devs about app-level connection pooling):

iniCopyEditmax_connections = 300

Step 5: Post-Change Observations

After making these changes and restarting MySQL (for some of the changes to take effect), here’s what I observed:

CPU dropped by ~15% at peak hours.
Threads_running dropped significantly, meaning less contention.
Temp table usage on disk dropped to 12%.
Slow query log started capturing some really bad queries, which were fixed in the app code within a few days.
No more aborted connections or connection errors from the app layer.

Final Thoughts

MySQLTuner is not a magic bullet, but it’s one of those tools that gives you quick, actionable insights without the need to install big observability stacks or pay for enterprise APM tools. I’d strongly suggest any MySQL admin or engineer dealing with production performance issues keep this tool handy.

It’s also good for periodic health checks, even if you’re not in a crisis. Run it once a month or so, and you’ll catch slow config drifts or usage pattern changes.

Resources

If you’ve had a similar experience or used MySQLTuner in your infra, would love to hear what kind of findings you had. Drop them in the comments or message me directly .. Want to know more 🙂 Happy tuning!

Hope It Helped!
Prashant Dixit
Database Architect @RENAPS
Reach us at : https://renaps.com/

Posted in Uncategorized | Tagged: ai, cloud, Database, mysql, mysqld, performance, renaps, SQL, technology, Tuning | Leave a Comment »

Leveraging SQLT to transfer execution plans between SQL IDs using coe_load_sql_profile.sql

Posted by FatDBA on March 11, 2025

Hi All,

Have you used coe_load_sql_profile.sql before? I mean a lot of people uses coe_xfr_sql_profile.sql from SQLT and these two scripts deals with SQL profiles in Oracle, but their purposes and use cases differ. coe_xfr_sql_profile.sql is used to export and migrate an existing SQL Profile from one system to another, ensuring performance stability across environments. coe_load_sql_profile.sql is used to create a new SQL Profile by capturing the execution plan from a modified SQL query and applying it to the original query, forcing it to use the optimized plan.

Let me first explain a little bit more of the toolkit – Oracle SQLT (SQLTXPLAIN) which is a powerful tool designed to help DBAs analyze and troubleshoot SQL performance issues and all above mentioned scripts are part of the kit provided by Oracle and written by none other than Carlos Sierra.

A common question DBAs encounter is: Can we plug the execution plan of one SQL ID into another SQL ID? …. The answer is YES! This can be accomplished using the SQLT script coe_load_sql_profile.sql. In this blog, we will explore how to use this script to achieve plan stability by enforcing a preferred execution plan across different SQL IDs. It examines the memory and AWR both to look text of the SQL IDs you passed and then it queries GV$SQL_PLAN and DBA_HIST_SQL_PLAN to extract the execution plan hash value from the modified SQL. Once it’s done collecting that information, it performs a loop to extract optimizer hints of the modified SQL’s execution plan. Finally it creates a SQL Profile using DBMS_SQLTUNE.IMPORT_SQL_PROFILE.

Let’s give a quick demo … assume we have two SQL statements:

SQL ID 1: 78a1nbdabcba (Original SQL) …. SQL ID 2: 9na182nn2bnn (Modified SQL)
Both queries are logically similar but produce different execution plans.
Our goal is to take the execution plan from SQL ID 1 and apply it to SQL ID 2.

connect system/monkey123
SQL> @coe_load_sql_profile.sql 
or 
SQL> START coe_load_sql_profile.sql <ORIGINAL_SQL_ID> <MODIFIED_SQL_ID>


Parameter 1:
ORIGINAL_SQL_ID (required)

Enter value for 1: 78a1nbdabcba

Parameter 2:
MODIFIED_SQL_ID (required)

Enter value for 2: 9na182nn2bnn


     PLAN_HASH_VALUE          AVG_ET_SECS
-------------------- --------------------
          1181381381                 .003

Parameter 3:
PLAN_HASH_VALUE (required)

Enter value for 3: 1181381381

Values passed to coe_load_sql_profile:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ORIGINAL_SQL_ID: "78a1nbdabcba"
MODIFIED_SQL_ID: "9na182nn2bnn"
PLAN_HASH_VALUE: "1181381381"

.
.
.

ORIGINAL:78a1nbdabcba MODIFIED:9na182nn2bnn PHV:1181381381 SIGNATURE:16731003137917309319 CREATED BY COE_LOAD_SQL_PROFILE.SQL
SQL>SET ECHO OFF;

****************************************************************************
* Enter password to export staging table STGTAB_SQLPROF_78a1nbdabcba
****************************************************************************

Export: Release 19.0.0- Production on Sun Mar 08 14:45:47 2012

Copyright (c) 1982, 2024, Oracle and/or its affiliates.  All rights reserved.

Password:
.
.
.

coe_load_sql_profile completed.



Run original query
SQL> select ename from DIXIT where ename='Name';

Plan hash value: 1181381381

---------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| 
---------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |     6 |     3   (0)|
|*  1 |  TABLE ACCESS FULL| DIXIT  |     1 |     6 |     3   (0)|
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ENAME"='Name')

Note
-----
   - SQL profile "78a1nbdabcba_1181381381" used for this statement

What are your experiences with enforcing execution plans in Oracle?
Let me know in the comments!

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: Database, databases, oracle, performance, SQL, technology | 4 Comments »

Resolving an interesting timezone mismatch in database sessions after migration

Posted by FatDBA on March 8, 2025

This happened a while ago, but it was an interesting scenario during a database migration. We encountered a unique issue where the system timezone varied depending on how the database connection was established. Specifically, when connecting to the database using a service name, the session timezone was set to EST, whereas a direct connection (without using a service name) correctly showed the timezone as UTC. This discrepancy had the potential to cause severe inconsistencies in time-sensitive transactions and logging mechanisms, prompting an urgent need for resolution.

[oracle@xxxxxx ~]$ sqlplus monkey/xxxxxxx@MONKEYD
SQL> SELECT TO_CHAR(SYSDATE, 'MM-DD-YYYY HH24:MI:SS') AS "NOW" FROM DUAL;

NOW
-------------------
02-08-2025 13:20:07  -- Displaying EST timezone

SQL> !date
Sat Feb 8 18:20:10 UTC 2025  -- Server's system timezone is UTC



When connecting without specifying a service name:
[oracle@xxxxx 11.2.0.4]$ sqlplus monkey/xxxxxxx
SQL> SELECT TO_CHAR(SYSDATE, 'MM-DD-YYYY HH24:MI:SS') AS "NOW" FROM DUAL;

NOW
-------------------
02-08-2025 18:15:40  -- Correctly showing UTC timezone

SQL> !date
Sat Feb 8 18:15:42 UTC 2025  -- Server's system timezone remains UTC

Upon detailed investigation, we determined that the issue stemmed from the Oracle Clusterware (HAS) environment settings. Specifically:

Listener Start Method: When the listener was started via srvctl, the database sessions picked up the incorrect timezone (EST). When the listener was started manually, the correct timezone (UTC) was applied.

We reviewed the file /app/oracle/testdb/mygrid/19.0/crs/install/s_crsconfig__env.txt, which is responsible for defining environmental variables for Oracle Clusterware. The file contained the incorrect timezone setting: TZ=America/New_York # Incorrect setting enforcing EST … This setting was overriding the expected system timezone when the listener was started via srvctl.

The previous production environment running Oracle 11g with Oracle Restart had the setting —> TZ=GMT-00:00 # Which correctly maintained UTC behavior
The new setup on Oracle 19c had TZ=America/New_York, leading to the observed inconsistency.

To resolve the issue, we first took backup of existing file before updting the TZ parameter.

Before making any changes, we took a backup of the affected configuration file:
cp /app/oracle/testdb/mygrid/19.0/crs/install/s_crsconfig_env.txt \ /app/oracle/testdb/mygrid/19.0/crs/install/s_crsconfig_env.txt.

Modified TZ=America/New_York in environment file /app/oracle/testdb/mygrid/19.0/crs/install/s_crsconfig__env.txt and updated it to TZ=UTC

Since Oracle Restart enforces the timezone setting at the environment level, a restart of services was required. This step was planned within a scheduled outage window to avoid disruption.

srvctl stop listener
srvctl stop database -d MONKEYDB
crsctl stop has
crsctl start has
srvctl start database -d MONKEYDB
srvctl start listener

After applying the changes and restarting the necessary services, we verified that the database sessions were now consistently reporting the correct timezone (UTC):

sqlplus monkey/xxxxxxx@MONKEYD SQL> SELECT TO_CHAR(SYSDATE, 'MM-DD-YYYY HH24:MI:SS') AS "NOW" FROM DUAL;

The output now correctly showed UTC time, confirming that the issue was successfully resolved.

This issue highlighted a subtle but critical misconfiguration that affected database session timezones in a clustered environment. The root cause was traced back to the Oracle Restart configuration file, which was enforcing the wrong timezone (America/New_York). By updating this setting to UTC and restarting the HAS services, we successfully restored consistency across all database sessions.

Moving forward, these insights will help ensure that similar issues are proactively identified and mitigated during future database migrations. Proper validation of environment settings, particularly in Oracle Clusterware and HA setups, is crucial to maintaining operational integrity and preventing timezone-related anomalies in mission-critical applications.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: Database, oracle, performance, science, social-media, technology | Leave a Comment »

Resolving Performance Issues in Oracle EBS 12.1 Journal Imports Due to GL Code Combinations Exclusive Table Lock

Posted by FatDBA on February 18, 2025

A long-standing challenge in Oracle EBS 12.1 involves journal imports taking an excessive amount of time due to exclusive locking on the GL.GL_CODE_COMBINATIONS table. This issue significantly slows down the entire journal import process by causing serialization, ultimately impacting system performance.

The problem arises during the journal import process when records are inserted into the GL_BALANCES table. Because of this, only one import process can proceed at a time, forcing subsequent imports to wait due to contention on the GL_CODE_COMBINATIONS table. The contention manifests as waits on the event ‘enq: TM – contention’, resulting from an exclusive table-level lock.

Understanding Profile Options: `Flexfields:Validate on Server` and `Flexfields:Shared Table Lock`

Flexfields: Validate on Server
- This profile decides if flexfield validations take place on the server or client.
- Turning this profile on (Yes) or off (No) only changes where validations run, not their validity.
- If set to Yes, validations occur on the server, decreasing client-side workload.
Flexfields:Shared Table Lock
- This setting determines the locking behavior during client-side validation.
- When set to Yes, the flexfield table (GL_CODE_COMBINATIONS) is locked in shared mode.
- When set to No, the table is locked in exclusive mode, leading to serialization and blocking other processes.
- Note: If Flexfields:Validate on Server is set to No, the Flexfields:Shared Table Lock profile setting takes effect.

How These Profiles Impact Journal Import Performance : The Journal Import program is indirectly affected by these profile settings. The issue arises when the import process attempts to create new code combinations using the flexfield validation routine.

If Flexfields:Validate on Server is set to No, the system relies on client-side validation, which then enforces the Flexfields:Shared Table Lock setting.
If Flexfields:Shared Table Lock is set to No, the system applies exclusive locks on GL_CODE_COMBINATIONS, leading to serialization of imports.

To improve journal import performance and minimize locking issues, consider the following:

Set Flexfields:Validate on Server to Yes
- This shifts validation to the server, reducing contention on GL_CODE_COMBINATIONS.
Ensure Flexfields:Shared Table Lock is set to Yes (if server-side validation is disabled)
- This prevents exclusive table-level locks and allows multiple imports to run concurrently.
Optimize Code Combination Creation
- Regularly clean up and archive unused or obsolete code combinations to reduce contention.
- Implement proper indexing on frequently accessed flexfield tables to enhance performance.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: ebs, erp, oracle, performance | Leave a Comment »

« Previous Entries

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes

Posts Tagged ‘performance’

Share this:

Share this:

Turning on Automatic Shrink — By default this feature is OFF, so I enabled it:

Share this:

The Basics: OPTIMIZE TABLE and ALTER TABLE

OPTIMIZE TABLE Overview

Myth #1: OPTIMIZE TABLE Is Always Quick

Myth #2: OPTIMIZE TABLE Doesn’t Block Other Operations

Myth #3: You Don’t Need to Worry About Disk Space

Myth #4: ALTER TABLE with Row Format Is Always the Solution

Myth #5: You Only Need to Optimize Tables When They Get Slow

Main Pointerss ..

Share this:

Performance Reports

Comparison Reports

Helper / Utility Scripts

What’s with these weird script names?

So Which Script Should I Use?

Final Thoughts

Share this:

What’s Vector Search, Anyway?

So What’s a Vector?

What Oracle 23ai Gives You

Let’s Build Something Step-by-Step

Step 1: Create the table

Step 2: Generate vector embeddings

Step 3: Create the vector index

Step 4: Semantic Search in SQL

VECTOR_DISTANCE Breakdown

Use Cases You’ll Actually Care About

1. Semantic Product Search — “Fast shoes for runners” => shows “Nike Vaporfly”, even if it doesn’t say “fast”.

2. Similar Document Retrieval — Find all NDAs that look like this one (even with totally different words).

3. Customer Ticket Suggestion — Auto-suggest resolutions from past tickets. Saves your support team hours.

4. Content Recommendation — “People who read this also read…” kind of stuff. Easy to build now.

5. Risk or Fraud Pattern Matching — Find transactions that feel like fraud ..even if the details don’t match 1:1.

I know it might sound little confusing .. lets do a Onwe more example : Legal Document Matching

Things to Know

Final Thoughts from fatdba

Share this:

Step 1: Getting MySQLTuner

Step 2: Sample Output Snapshot

Step 3: What I Observed

1. Too many slow queries — 15% of all queries were slow. That’s a huge red flag. This wasn’t being logged properly either — the slow query log was off.

2. Disk-based temporary tables — 37% of temporary tables were being written to disk. This kills performance during joins and sorts.

3. Connections hitting limit — 197 out of 200 max connections used at peak. Close to saturation ..possibly causing application timeouts.

4. MyISAM key buffer inefficient — Key buffer was too small for the amount of MyISAM index data (yes, we still have a couple legacy MyISAM tables..

5. InnoDB log file too small — Frequent log flushing and waits were indicated, meaning innodb_log_file_size wasn’t enough for our write load.

Step 4: Actions I Took

– Enabled Slow Query Log

– Increased tmp_table_size and max_heap_table_size:

– Raised innodb_log_file_size:

– Raised key_buffer_size:

– Upped the max_connections a bit (but also discussed with devs about app-level connection pooling):

Step 5: Post-Change Observations

Final Thoughts

Resources

Share this:

Share this:

Share this:

Understanding Profile Options: Flexfields:Validate on Server and Flexfields:Shared Table Lock

How These Profiles Impact Journal Import Performance : The Journal Import program is indirectly affected by these profile settings. The issue arises when the import process attempts to create new code combinations using the flexfield validation routine.

To improve journal import performance and minimize locking issues, consider the following:

Share this:

5. InnoDB log file too small — Frequent log flushing and waits were indicated, meaning `innodb_log_file_size` wasn’t enough for our write load.

– Increased `tmp_table_size` and `max_heap_table_size`:

– Raised `innodb_log_file_size`:

– Raised `key_buffer_size`:

– Upped the `max_connections` a bit (but also discussed with devs about app-level connection pooling):

Understanding Profile Options: `Flexfields:Validate on Server` and `Flexfields:Shared Table Lock`