Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

When and why the optimizer switched to RBO from CBO mode … Why I cannot see that in 10053 trace ??

Posted by FatDBA on August 28, 2021

Hi Guys,

Recently I was working on one performance issue where one critical SQL statement started consuming more time. After a quick check I saw a switch was happened to RBO mode from CBO, but wasn’t sure on when and why the optimizer mode was switched. And the expected answer to my quest is to generate the debug 10053 trace file to get some insight about the cost-based optimizer internal calculations and to check the cardinality, selectivity and draw a good parallel with the way cost of table, index or sort etc. may be calculated.

Usually the best way to work out what’s going on in this situation is to look at the optimizer debug trace event, 10053 trace file. I always prefer to generate optimizer traces in such situations when the mighty optimizer messed up things. Being a performance consultant, it had saved me so many times in the past, always a best bet for me.

But this time it was looking little different, I couldn’t see details about why optimizer switched the mode in the ‘Query‘ section of the trace. I was totally perplexed, I mean this was not the first time I was looking for that information in the trace file. Why it’s not there, what happened .. 😦

This was Oracle 19.3.0.0.0 database running on RHEL, I tried metalink and found one document specific to this issue and luckily this was happening all due to a known bug 31130156. The problem was later on solved after we applied the bug-fix patch and interpreted the reason of the mode switch (I will write another post about the core problem) …

Note: It can be very difficult to interpret the 10053 optimizer trace if you don’t have any prior experience on it. I recommend readers to check one great document written by Wolfgang Breitling with title ‘A Look Under The Hood Of CBO’.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: | Leave a Comment »

OGG-01201 Error reported by MGR Access denied

Posted by FatDBA on August 22, 2021

Hi Guys,

Last week encountered a problem with one old GG setup running on v12.2 where the extract was failing with errors OGG-01201/OGG-01668 when doing Initial load.

ERROR   OGG-01201  Oracle GoldenGate Capture for Oracle, exld1.prm:  Error reported by MGR : Access denied
ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, exld1.prm:  PROCESS ABENDING

This ‘access denied’ error was there even when the login information was correct for both source and target systems. I was overly confused and wasn’t sure what was causing the issue!

What I come to know after reading a particular piece of documentation, in version 12.2 of GG, the default behavior is the MANAGER and related EXTRACT/REPLICAT cannot be started or stopped remotely as by default there is only deny rule. And while I was trying to do the initial load on the source server and attempts to starts the replicat on target server, I hit the error. This is a security feature and is to prevent unauthorized access to Oracle GoldenGate manager processes and the processes under its control.

Solution to the problem is add “ACCESSRULE, PROG *, IPADDR *, ALLOW” to your manager parameter file on the target system, something like below. The ACCESSRULE parameter restricts the remote system access.

-- GoldenGate Manager Parameter File (mgr.prm) on Target system
--
userid xxxxxxx, password xxxxxxx
PORT 7810
ACCESSRULE, PROG REPLICAT, IPADDR 10.11.01.15, ALLOW
PURGEOLDEXTRACTS ./dirdat/rp*, USECHECKPOINTS, MINKEEPHOURS 4

Here you can also set priority using PRI (0-99) which specifies the priority. The PROG parameter could be anything like GGSCI, GUI, MGR/MANAGER, REPLICAT, COLLECTOR|SERVER and * for all options (default). IPADDR specifies from which IP can access the specified program. Login_ID specifies with RMTHOST configuration and ALLOW | DENY specifies allow or deny the access.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: , , | Leave a Comment »

New JSON features – The old lovely database is new again with 21c

Posted by FatDBA on August 21, 2021

Oracle Database supports relational, graph, spatial, text, OLAP, XML, and JSON data – yes, all at once in one database.
The Oracle Database 21c provides a native JSON data type in binary format. This data type can be used in tables, uses less space and is faster. It’s uniquely designed Oracle Binary JSON format (OSON) is able to speed up both OLAP and OLTP workloads over JSON documents.

I’d recently did some tests and found JSON datatype is now fully integrated into all components of the 21c database and have few new things added to improve its performance. This post is all about JSON datatype in the Oracle 21c Database eco-system, new features, improvements etc.

So, before I move ahead, would like to first build the foundation for the readers, lets create a table with with JSON Data and do some examples.

-- Create a table with JSON datatype.
CREATE TABLE testOrder
 (did NUMBER PRIMARY KEY, jdoc JSON)


-- Let's insert some data to the table
INSERT INTO testOrder
VALUES (1, ' {"testOrder": {
"podate": "2015-06-03",
"shippingAddress": {"street": "3467 35th Ave",
 "city" : "Clara", “state”: “CA”, "zip":
94612},
"comments" : "Discounted sales Foundation Day",
"sparse_id" :"PFHA35",
"items": [
 {"name" : "TV", "price": 341.55, "quantity": 2,
 "parts": [
 {"partName": "remoteCon", "partQuantity": 1},
 {"partName": "antenna”, "partQuantity": 2}]},
 {"name": “PC”, “price”: 441.78, "quantity": 10,
 "parts": [
 {"partName": "mousepad", "partQuantity": 2},
 {"partName": "keyboard", "partQuantity": 1}]}
]}}');



-- Do some SELECT ops
SELECT did,
 po.jdoc.testOrder.podate.date(),
 po.jdoc.testOrder.shippingAddress,
 po.jdoc.testOrder.items[*].count(),
 po.jdoc.testOrder.item[1]
FROM testOrder po
WHERE po.jdoc.testOrder.podate.date() =
TO_DATE(‘2015-06-03’,'YYYY-MM-DD') AND
po.jdoc.testOrder.shippingAddress.zip.number()
BETWEEN 84610 AND 84620;


SELECT JSON {
 ‘name’ : li.itemName,
 ‘sales’ : li.price * li.quantity
}
FROM lineItems_rel li 




-- That's how to UPDATE 
UPDATE testOrder po
SET jdoc = JSON_TRANSFORM(jdoc,
 REPLACE
‘$.testOrder.shippingAddress.city’
 = ‘Oakland’,
 REPLACE ‘$.testOrder.shippingAddress.zip’
 = 94607,
 SET '$.testOrder.contactPhone' =
 JSON('["(415)-867-8560","(500)312-8198"]'),
 REMOVE ‘$.testOrder.sparse_id’,
 APPEND ‘$.testOrder.items’ =
 JSON(‘{“items” :[{“name”:”iphone”,
 “price” : 635.54, “quantity” :2}]}’))
WHERE po.jdoc,testOrder.podate.date() =
 TO_DATE(‘2019-07-01’); 

So, that’s how you can create, query, update your JSON data in any table, pretty cool right 🙂

Okay, coming back to the purpose of the post – What is new in Oracle 21c in terms of JSON support ?

  • Though JSON data type was added in Oracle 20c to provide native support, but is generally available in version 21c.
  • Earlier to 21c, users can only use a single-value functional index to accelerate JSON_VALUE() predicate evaluation. Antecedently, a functional index was bounded to index at most one value per row for JSON that meant a field value having at most one occurrence. In 21c, a user can create a multi-value functional index on a JSON datatype column to index elements within a JSON array. This speeds up the rating of JSON_EXISTS() – an operator allowing the use array of equivalence predicates of the SQL/JSON path language.
  • Oracle 21c includes several other enhancements to the JSON functionality in the database JSON_SCALAR function, that creates an instance of a JSON type from a SQL scalar value.
  • JSON_TRANSFORM function was introduced in Oracle Database 21c to make JSON data alterations simpler or easier in complexity.

Hope It Helped!
Prashant Dixit

Posted in Uncategorized | Tagged: | Leave a Comment »

What’s new in Golden Gate version 21c ?

Posted by FatDBA on August 20, 2021

Hi Guys,

Oracle has recently released Golden Gate version 21.1, this happened immediately after they released database version 21c (21.3) for on-prem. Today’s post is all about new features and changes happened with this new GG version.

  • Oracle GoldenGate is available with Microservices Architecture : This release of Oracle GoldenGate is available with Microservices Architecture only.
  • This release of Oracle GoldenGate is available with Microservices Architecture only.
  • Automatic Extract of tables with supplemental logging is supported : Oracle GoldenGate provides a new auto_capture mode to capture changes for all the tables that are enabled for logical replication. You can list the tables enabled for auto-capture using the LIST TABLES AUTO_CAPTURE command option. Use the TRANLOGOPTIONS INTEGRATEDPARAMS auto_capture option to set up automatic capture.
  • Oracle native JSON datatype is supported : Oracle GoldenGate capture and apply processes now support the new native JSON datatype, which is supported by Oracle Database 21c and higher.
  • Enhanced Automatic Conflict Detection and Resolution for Oracle Database 21c
  • Autonomous Database Extract is supported : Oracle GoldenGate can now capture from the Autonomous Databases in OCI.
  • Large DDL (greater than 4 MB) replication is supported : DDLs that are greater than 4 MB in size will be provided replication support.
  • DB_UNIQUE_NAME with heartbeat table : DB_UNIQUE_NAME is available with the heartbeat table to allow users to uniquely identify the source of the heartbeat.
  • Oracle GoldenGate binaries are no longer installed on a shared drive : Oracle always recommended installing the Oracle GoldenGate binaries (OGG_HOME) on a local file system as a best practice. From this release onward, it is a requirement. The binaries must be installed on local drives.
  • Partition Filtering
  • A new Extract needs to be created when the DB timezone is changed : You need to create new Extract if DB timezone is changed, especially in case of Oracle Cloud deployment.
  • DB_UNIQUE_NAME with trail file header : DB_UNIQUE_NAME is added in the trail file header along with DB_NAME, which helps in troubleshooting replication in active-active environments, where mostly all replicas have the same DB_NAME but identify each replica site uniquely using the DB_UNIQUE_NAME.
  • Per PDB Capture
  • Parallel Replicat Core Infrastructure Support for Heterogeneous Databases : Parallel Replicat is supported with SQL Server, DB2 z/OS, and MySQL.

Release announcement link : https://blogs.oracle.com/dataintegration/oracle-goldengate-21c-release-announcement

Hope It Helped
Prashant Dixit

Posted in Uncategorized | Tagged: , , | Leave a Comment »

Kafka Producer: Error while fetching metadata with correlation id INVALID_REPLICATION_FACTOR

Posted by FatDBA on August 14, 2021

Hi Guys,

Recently I was working on a replication project where we used Kafka to move data from source to target. I tried to create a test topic using Kafka producer console and immediately kicked out with error which says “INVALID_REPLICATION_FACTOR”. This we were doing on a test VM with single CPU and with limited system resources.

[root@cantowintert bin]#
[root@cantowintert bin]# kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic first_topic
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
>hello prashant
[2021-07-26 08:18:30,051] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 40 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,154] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 41 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,260] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 42 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,367] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 43 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,471] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 44 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,576] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 45 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,681] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 46 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,788] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 47 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:30,896] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 48 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:31,000] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 49 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:31,103] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 50 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:18:31,221] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 51 : {first_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
^Corg.apache.kafka.common.KafkaException: Producer closed while send in progress
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:909)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:885)
        at kafka.tools.ConsoleProducer$.send(ConsoleProducer.scala:71)
        at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:53)
        at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
Caused by: org.apache.kafka.common.KafkaException: Requested metadata update after close
        at org.apache.kafka.clients.producer.internals.ProducerMetadata.awaitUpdate(ProducerMetadata.java:126)
        at org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata(KafkaProducer.java:1047)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:906)
        ... 4 more
[root@cantowintert bin]#

Lets check what is captured in Kafka server startup logs, and we found the hint that the RF is > than the available brokers.

org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 1 larger than available brokers: 0.
[2021-07-26 08:24:45,723] WARN [Controller id=0, targetBrokerId=0] Connection to node 0 (cantowintert.bcdomain/192.168.20.129:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:25:06,830] WARN [Controller id=0, targetBrokerId=0] Connection to node 0 (cantowintert.bcdomain/192.168.20.129:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2021-07-26 08:25:27,950] WARN [Controller id=0, targetBrokerId=0] Connection to node 0 (cantowintert.bcdomain/192.168.20.129:9092) could not be established. Broker may not be available. 

Solution to the problem is to uncomment this line and restart and try to edit the topic all over again.

listeners=PLAINTEXT://:9092

changed this to

listeners=PLAINTEXT://127.0.0.1:9092

Hope It helped
Prashant Dixit

Posted in Uncategorized | Tagged: , | 1 Comment »

What is that strange looking TRANLOGOPTIONS EXCLUDETAG in parameter file ?

Posted by FatDBA on August 11, 2021

Hi Guys,

Recently someone asked why we have this strange looking entry in Golden Gate extract parameter file which reads ‘TRANLOGOPTIONS EXCLUDETAG 00‘. Why is that, what are those numbers ? I was able to explain him the purpose, would like to write a short post about it.

OGG v12.1.2 has a new EXTRACT parameter TRANLOGOPTIONS EXCLUDETAG. This is typically used to exclude the REPLICAT user in bi-directional configurations. When Extract is in classic or integrated capture mode, use the TRANLOGOPTIONS parameter with the EXCLUDETAG tag option. This parameter directs the Extract process to ignore transactions that are tagged with the specified redo tag. For example:

extract exttestpd
useridalias clouduser
EXTTRAIL ./dirdat/rp, format release 12.1
ddl include all
ddloptions addtrandata, report
tranlogoption excludetag 00 
TABLE dixituser.*;

Changes made by Integrated REP are tagged by default in redo as 00. So adding the EXTRACT parameter TRANLOGOPTIONS EXCLUDETAG 00 Would exclude those operations. The tag can also be explicitly set in REPLICAT using:

DBOPTIONS SETTAG 0885

Then in EXTRACT param:

TRANLOGOPTIONS EXCLUDETAG 0885

The TRANLOGOPTION EXCLUDETAG 00 prevents Golden Gate extract from capturing transactions from the replication which are by default tagged with “00”. The excludetag will ensure that the we don’t run into problems with ping-pong updates.

Some other possible examples of using this parameter are …

TRANLOGOPTIONS EXCLUDETAG 00
TRANLOGOPTIONS EXCLUDETAG +
TRANLOGOPTIONS EXCLUDETAG 0991
TRANLOGOPTIONS EXCLUDETAG 2222 4444

Hope It Helped
Prashant Dixit

Posted in Uncategorized | Tagged: , , | Leave a Comment »

pg_cron : Probably the best way to schedule jobs within PostgreSQL database.

Posted by FatDBA on July 30, 2021

Hi Guys,

Many of databases like Oracle (DBMS_SCHEDULER), MySQL (event_scheduler), Micrsoft SQL Server (SQL Server Agent) have their own build-in job/command scheduling mechanisms, PostgreSQL still does’t have any native or in-build scheduling methods like those databases. There are few popular extensions like pg_cron, pgAgent etc. which can be used to get the scheduling functionality in your PostgreSQL database.

This post will be all about installing, configuring and doing some tests using pg_cron extension and see what all it has to offer. pg_cron which is a simple cron-based job scheduler for PostgreSQL ( >= 9.5) that runs inside the database as an extension. It uses the same grammatical arrangement as regular cron, but it allows you to schedule PostgreSQL commands directly. You can schedule your SQL Statements, stored procedures and tasks like VACUUM, VACUUM ANALYZE

Though there are few limitations with the approach

  • You can’t use pg_cron jobs to schedule a job in seconds interval. For example, you can’t schedule a cron job to run every 10 seconds. For such scenarios, it’s better to write a shell script that uses the sleep command. Still a good option to try!
  • pg_cron can run multiple jobs in parallel, but it runs at most one instance of a job at a time. If a second run is supposed to start before the first one finishes, then the second run is queued and started as soon as the first run completes.

but still a best bet …

Okay, let’s start with the installation. I am using YUM to install the package. I am doing this test on Open Source PostgreSQL 12.6.7

[root@canttowin ~]# yum install pg_cron_12.x86_64
Loaded plugins: langpacks, ulninfo

Resolving Dependencies
--> Running transaction check
---> Package pg_cron_12.x86_64 0:1.3.1-1.rhel7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================================================
 Package                               Arch                              Version                                     Repository                         Size
=============================================================================================================================================================
Installing:
 pg_cron_12                            x86_64                            1.3.1-1.rhel7                               pgdg12                             90 k

Transaction Summary
=============================================================================================================================================================
Install  1 Package

Total download size: 90 k
Installed size: 178 k
Is this ok [y/d/N]: y
Downloading packages:
pg_cron_12-1.3.1-1.rhel7.x86_64.rpm                                                                                                   |  90 kB  00:00:02
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : pg_cron_12-1.3.1-1.rhel7.x86_64                                                                                                           1/1
  Verifying  : pg_cron_12-1.3.1-1.rhel7.x86_64                                                                                                           1/1

Installed:
  pg_cron_12.x86_64 0:1.3.1-1.rhel7

Complete!
You have new mail in /var/spool/mail/root
[root@canttowin ~]#

Next, I will be modifying the parameter ‘shared_preload_libraries‘ within postgresql.conf parameter file which is to specifies one or more shared libraries to be preloaded at server start. Will do a database reboot after that to make changes persistent.

[postgres@canttowin data]$
[postgres@canttowin data]$ more postgresql.conf |grep shared
#shared_preload_libraries = ''  # (change requires restart)


[postgres@canttowin data]$
[postgres@canttowin data]$ more postgresql.conf |grep shared
shared_preload_libraries = 'pg_cron'    # (change requires restart)


[postgres@canttowin bin]$
[postgres@canttowin bin]$ ./pg_ctl -D /var/lib/pgsql/12/data/ stop
waiting for server to shut down.... done
server stopped
[postgres@canttowin bin]$ ./pg_ctl -D /var/lib/pgsql/12/data/ start
waiting for server to start....2021-07-24 11:14:00.948 EDT [14074] LOG:  starting PostgreSQL 12.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2021-07-24 11:14:00.948 EDT [14074] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-07-24 11:14:00.948 EDT [14074] LOG:  listening on IPv6 address "::", port 5432
2021-07-24 11:14:00.966 EDT [14074] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-07-24 11:14:00.982 EDT [14074] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-07-24 11:14:01.020 EDT [14074] LOG:  redirecting log output to logging collector process
2021-07-24 11:14:01.020 EDT [14074] HINT:  Future log output will appear in directory "log".
 done
server started
[postgres@canttowin bin]$

Next, if you want to use any specific database to create the extension, you have to an additional entry to configuration file with cron.database_name= ‘name of the database’

Now, you will see one more background process ‘pg_cron launcher‘ comes in to existence and can be seen running on the system.

Alright, all set. Now will create the extension on the database and will do some tests.

[postgres@canttowin bin]$
[postgres@canttowin bin]$ psql -p 5432
psql (12.6.7)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" via socket in "/tmp" at port "5432".
postgres=#
postgres=#
postgres=#
postgres=# CREATE EXTENSION pg_cron;
CREATE EXTENSION
postgres=#
postgres=#
postgres=# GRANT USAGE ON SCHEMA cron TO postgres;
GRANT
postgres=#



-- How to check job lists.
postgres=# SELECT * FROM cron.job;
 jobid | schedule | command | nodename | nodeport | database | username | active | jobname
-------+----------+---------+----------+----------+----------+----------+--------+---------
(0 rows)

-- To schedule a VACUUM ANALYZE on each day at 11:00 AM
postgres=# SELECT cron.schedule('0 11 * * *', 'VACUUM ANALYZE');
 schedule
----------
        1
(1 row)

postgres=# SELECT * FROM cron.job;
 jobid |  schedule  |    command     | nodename  | nodeport | database | username | active | jobname
-------+------------+----------------+-----------+----------+----------+----------+--------+---------
     1 | 0 11 * * * | VACUUM ANALYZE | localhost |     5432 | postgres | postgres | t      |
(1 row)

postgres=#

postgres=#
postgres=#
postgres=# table cron.job;

 jobid |  schedule  |    command     | nodename  | nodeport | database | username | active | jobname
-------+------------+----------------+-----------+----------+----------+----------+--------+---------
     1 | 0 11 * * * | VACUUM ANALYZE | localhost |     5432 | postgres | postgres | t      |
(1 row)


Lets do some more tests, will now create a table with some test records and will try to delete them using pg_cron.

postgres=#
postgres=# CREATE TABLE article (
postgres(#     article_id bigserial primary key,
postgres(#     article_name varchar(20) NOT NULL,
postgres(#     article_desc text NOT NULL,
postgres(#     date_added timestamp default NULL
postgres(# );
CREATE TABLE
postgres=#


postgres=# insert into article (article_id, article_name, article_desc, date_added) values (1010,'vinyl','Beatles 1980 Vinyl',current_timestamp);
postgres=# insert into article (article_id, article_name, article_desc, date_added) values (1011,'Tape','Deftones 2015 Vinyl',current_timestamp);
postgres=# insert into article (article_id, article_name, article_desc, date_added) values (1019,'Tape','Deftones 2015 Vinyl',current_timestamp);



postgres=#
postgres=# select * from article;
 article_id | article_name |    article_desc     |         date_added
------------+--------------+---------------------+----------------------------
       1010 | vinyl        | Beatles 1980 Vinyl  | 2021-07-24 12:03:30.514886
       1019 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:03:40.870081
       1011 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:03:52.046054
(3 rows)

postgres=#


postgres=# \! date
Sat Jul 24 12:20:40 EDT 2021
postgres=#



-- To delete entries those are older than 1 minute
postgres=# SELECT cron.schedule('23 12 * * *', $$DELETE FROM article WHERE date_added<now()- interval '1 Mins'$$);
 schedule
----------
       2
(1 row)

postgres=#
postgres=# SELECT * FROM cron.job;
 jobid |  schedule   |                            command                            | nodename  | nodeport | database | username | active | jobname
-------+-------------+---------------------------------------------------------------+-----------+----------+----------+----------+--------+---------
     1 | */5 * * * * | VACUUM                                                        | localhost |     5432 | postgres | postgres | t      |
     2 | 23 12 * * * | DELETE FROM article WHERE date_added<now()- interval '1 Mins' | localhost |     5432 | postgres | postgres | t      |
(2 rows)


-- This is what it is before pg_cron executing the schedule.
postgres=#
postgres=#
postgres=# select * from article;
 article_id | article_name |    article_desc     |         date_added
------------+--------------+---------------------+----------------------------
       1010 | vinyl        | Beatles 1980 Vinyl  | 2021-07-24 12:16:00.507391
       1011 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:05.762869
       1019 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:09.960464
(3 rows)


-- Will do a watch on table records to see if pg_cron comes and delete records on specified time.
postgres=#
postgres=# \watch 5
                  Sat 24 Jul 2021 12:22:43 PM EDT (every 5s)

 article_id | article_name |    article_desc     |         date_added
------------+--------------+---------------------+----------------------------
       1010 | vinyl        | Beatles 1980 Vinyl  | 2021-07-24 12:16:00.507391
       1011 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:05.762869
       1019 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:09.960464
(3 rows)

                  Sat 24 Jul 2021 12:22:48 PM EDT (every 5s)

 article_id | article_name |    article_desc     |         date_added
------------+--------------+---------------------+----------------------------
       1010 | vinyl        | Beatles 1980 Vinyl  | 2021-07-24 12:16:00.507391
       1011 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:05.762869
       1019 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:09.960464
(3 rows)

                  Sat 24 Jul 2021 12:22:53 PM EDT (every 5s)

 article_id | article_name |    article_desc     |         date_added
------------+--------------+---------------------+----------------------------
       1010 | vinyl        | Beatles 1980 Vinyl  | 2021-07-24 12:16:00.507391
       1011 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:05.762869
       1019 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:09.960464
(3 rows)

                  Sat 24 Jul 2021 12:22:58 PM EDT (every 5s)

 article_id | article_name |    article_desc     |         date_added
------------+--------------+---------------------+----------------------------
       1010 | vinyl        | Beatles 1980 Vinyl  | 2021-07-24 12:16:00.507391
       1011 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:05.762869
       1019 | Tape         | Deftones 2015 Vinyl | 2021-07-24 12:16:09.960464
(3 rows)

                  Sat 24 Jul 2021 12:23:03 PM EDT (every 5s)

 article_id | article_name |    article_desc     |         date_added
------------+--------------+---------------------+----------------------------

Yes, it did pretty smoothly on specified time! Let’s do some more fun.

postgres=#
postgres=# CREATE OR REPLACE PROCEDURE display_message (INOUT msg TEXT)
postgres-# AS $$ BEGIN
postgres$# RAISE NOTICE 'Procedure Parameter: %', msg ;
postgres$# END ;
postgres$# $$
postgres-# LANGUAGE plpgsql ;
CREATE PROCEDURE
postgres=#
postgres=# call display_message('This is my test case');
NOTICE:  Procedure Parameter: This is my test case
         msg
----------------------
 This is my test case
(1 row)



postgres=# SELECT cron.schedule('display_message', '* * * * *', 'CALL display_message()');
 schedule
----------
       14
(1 row)

postgres=# SELECT * FROM cron.job;
 jobid |  schedule   |                            command                            | nodename  | nodeport | database | username | active |     jobname
-------+-------------+---------------------------------------------------------------+-----------+----------+----------+----------+--------+-----------------
     9 | */5 * * * * | VACUUM                                                        | localhost |     5432 | postgres | postgres | t      |
    12 | 20 12 * * * | DELETE FROM article WHERE date_added<now()- interval '1 Mins' | localhost |     5432 | postgres | postgres | t      |
    13 | 0 3 * * *   | VACUUM article                                                | localhost |     5432 | postgres | postgres | t      |
    14 | * * * * *   | CALL display_message()                                        | localhost |     5432 | postgres | postgres | t      | display_message
(4 rows)

Now, how to delete a single schedule or all/multiple.

postgres=#  SELECT cron.unschedule(2);
 unschedule
------------
 t
(1 row)


postgres=#
postgres=# SELECT cron.unschedule(jobid) FROM cron.job;
 unschedule
------------
 t
 t

(2 rows)

postgres=#
postgres=# SELECT * FROM cron.job;
 jobid | schedule | command | nodename | nodeport | database | username | active | jobname
-------+----------+---------+----------+----------+----------+----------+--------+---------
(0 rows)

postgres=#

You can check database log files (postgresql-<day of the week>.log) to see if all okay with the jobs. If any of the scheduled job failed to execute, it will log more details to the log file, example, see below screenshot where one of the job with ID 14 trying to start but failed due to ‘connection refused‘ error. So, here we have got a hint to check pg_hba file and accordingly take the action to mitigate the problem.

After we applied the change …

Hope It Helped!
Prashant Dixit

Posted in Advanced | Tagged: , | 2 Comments »

My favorite 5 linux TOP features/flags ….

Posted by FatDBA on July 23, 2021

Hi Guys,

Recently I started the ‘Top 5’ series where I share my top 5 features in any particular tool or product. Last time I did for SQL Developer command line (SQLcl), this time it will be about one of Linux’s favorite tool ‘TOP‘.

So, without any particular order, below are my top 5 picks of options/flags available with the tool.

c : To get absolute path of the commands currently executing. This will be quite helpful if you want to know from where the command/executable is running. I have used this option lot of the times in multiple performance battles that I have won 🙂 Look at the screen below and in to column COMMAND.

z: will display running process in RED color which may help you to identified running process easily.

1: To get usage details per CPU core.

SHIFT+i : To disable the IRIX (default mode is ON) mode, help you to understand – Why Process CPU % Usage larger than Total CPU Time ?

E: To get memory details in KB, MB & GB, press ‘E’ each time. So, no need to separately call ‘free’ command.

I know this is all about top 5 picks, but here goes my 6th top (kinda reserve option) 🙂

V: Tree Mode, to display the processes in a parent child hierarchy.

Hope It Helped
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , | Leave a Comment »

How to fix/handle Wraparound problem in PostgreSQL ?

Posted by FatDBA on July 21, 2021

Hi Guys,

In continuation from where I left in my last post on simulating wraparounds in PostgreSQL, this post is all about managing the mess that we spilled 🙂 Though this was a synthetic/artificial test and as soon you close or exit SESSION 1 where you’ve opened a BEGIN block/transaction and ran select txid_current() to block autovacuum, it (autovacuum) will automatically kick-in and perform a quick full vacuum on all of the databases in the cluster. You have to remember that even if you have autovacuum turned off, it will still start in emergency mode.

So, this is what we got earlier at the time we hit by the wraparound problem when we tried to execute a DML and it says something about stopping postmaster and vacuum databases in single-user mode; What is that ? – We will see that next ….

enterprisedb=#
enterprisedb=# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
 ORDER BY 2 DESC;
 
   datname    |    age     | current_setting
--------------+------------+-----------------
 postgres     | 2146483647 | 200000000
 edb          | 2146483647 | 200000000
 template1    | 2146483647 | 200000000
 template0    | 2146483647 | 200000000
 immortal     | 2146483647 | 200000000
 darkthrone   | 2146483647 | 200000000
 immortal1    | 2146483647 | 200000000
 mayhem       | 2146483647 | 200000000
 murduk       | 2146483647 | 200000000
 burzum       | 2146483647 | 200000000
 gorgoroth    | 2146483647 | 200000000
 repmgr       | 2146483647 | 200000000
 enterprisedb | 2146483647 | 200000000
(13 rows)



enterprisedb=# update pgbench_history set aid=65555 where aid=64444;
ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
enterprisedb=#
enterprisedb=#

Okay, coming back to the point, how to handle the situation now when you’ve already hit the wraparound. So, first thing you have to manually shut down the database cluster, if it’s still up.

-bash-4.2$ pwd
/usr/edb/as11/bin

-bash-4.2$ /usr/edb/as11/bin/pg_ctl stop -D /var/lib/edb/as11/data
waiting for server to shut down.... done
server stopped
-bash-4.2$

Next, you need to run VACUUM FULL for each of the database (one-by-one using single mode) through stand-alone backend mode. For that you have to call postgres (if you’re on open source) or edb-postgres utility. I am on EDB PostgreSQL 11 so will be calling edb-postgres and execute VACUUM FULL.

-bash-4.2$ pwd
/usr/edb/as11/bin

-bash-4.2$
-bash-4.2$ ls *postgre*
edb-postgres
-bash-4.2$

-bash-4.2$ ./edb-postgres --single -D /var/lib/edb/as11/data enterprisedb

PostgreSQL stand-alone backend 11.11.20
backend> vacuum full;
2021-07-20 11:07:49 IST LOG:  statement: vacuum full;

backend>

It’s done for ‘enterprisedb’ database. Next will do it for ‘postgresql’ database, but this time I will use some stand-alone backend related command to assure we are connected with the right database and to enable runtime stats or verbose.

-bash-4.2$ ./edb-postgres --single -D /var/lib/edb/as11/data postgres

PostgreSQL stand-alone backend 11.11.20
backend>
backend> select current_database();
2021-07-20 13:45:25 IST LOG:  statement: select current_database();

         1: current_database    (typeid = 19, len = 64, typmod = -1, byval = f)
        ----
         1: current_database = "postgres"       (typeid = 19, len = 64, typmod = -1, byval = f)
        ----
backend> 


backend> SELECT set_config('log_statement_stats', 'on', false);
2021-07-20 13:48:55 IST LOG:  statement: SELECT set_config('log_statement_stats', 'on', false);

         1: set_config  (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: set_config = "on"   (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
backend>

-- Now when we have enabled logging of stats, let's execute the VACUUM to see runtime stats
backend> vacuum full;
2021-07-20 13:49:29 IST LOG:  statement: vacuum full;

2021-07-20 13:49:30 IST LOG:  QUERY STATISTICS
2021-07-20 13:49:30 IST DETAIL:  ! system usage stats:
        !       0.557302 s user, 0.426659 s system, 120.357278 s elapsed
        !       [0.583998 s user, 0.455579 s system total]
        !       21756 kB max resident size
        !       224/402352 [240/402824] filesystem blocks in/out
        !       0/2558 [0/6201151] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       982/2 [1005/3] voluntary/involuntary context switches
2021-07-20 13:58:39 IST STATEMENT:  vacuum full;

backend>

Okay, same way we have to run VACUUM FULL on all of the databases and restart PostgreSQL cluster.
Once all of the databases are done and after restart, connect with the database and you will see the TXID or datfrozenid has been reset and you are all good.

enterprisedb=# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
 ORDER BY 2 DESC;

    datname    | age | current_setting
--------------+-----+-----------------
postgres     | 1369 | 200000000
edb          | 1272 | 200000000
template1    | 1170 | 200000000
template0    | 1071 | 200000000
immortal     |  974 | 200000000
darkthrone   |  877 | 200000000
immortal1    |  780 | 200000000
mayhem       |  585 | 200000000
murduk       |  487 | 200000000
burzum       |  388 | 200000000
gorgoroth    |  291 | 200000000
repmgr       |  194 | 200000000
enterprisedb |   97 | 200000000

You’re all set!

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , , | Leave a Comment »

How to simulate the deadly, infamous, misunderstood & complex ‘Transaction Wraparound Problem’ in PostgreSQL …

Posted by FatDBA on July 20, 2021

Hi Guys,

Last week one of my good friend asked me an interesting question – ‘How to replicate the transaction wraparound problem’ in PostgreSQL ?

He has a demo/POC that he needs to do for one of his customer who is interested to see how it looks like in reality and how ‘deadly’ is the problem and how to fix this. He scanned almost everything on the Google, but unfortunately found nothing available or pre-cooked …

So, this might be the only post available on the internet explaining how to exactly simulate the transaction wraparound in the PostgreSQL Database. This one was really interesting and I have never tried this before as I thought I will never see transaction ID wraparound error 🙂 . So, before I jump in to the demo, we need to take care of few things. Use a system with good resources, I have performed this demo on an EDB PostgreSQL 11.11.20 database with 8 CPUs, 31 GB RAM, and it took me around 9 hours to exhaust all transactionIDs and reach the limit of 2,000,000,000.

Would like to thank Nikolay Samokhvalov for his help! 🙂

Before I start, let’s do this one more time – What is a transaction ID and why it wrapsaround ?
The PostgreSQL transaction control mechanism assigns a transaction ID to every row that is modified in the database; these IDs control the visibility of that row to other concurrent transactions. Txid wraparound is a problem due to MVCC. MVCC relies on being able to take the txids of two transactions and determine which of the transactions came first.
A PostgreSQL database can have two billion ‘in-flight’ unvacuumed transactions before PostgreSQL takes dramatic action to avoid data loss. If the number of unvacuumed transactions reaches (2^31 – 10,000,000), the log starts warning that vacuuming is needed. If the number of unvacuumed transactions reaches 2,000,000,000, PostgreSQL sets the database to read-only mode and requires an offline, single-user, standalone vacuum.

Note: You have to carefully monitor your database pg_log, archiving locations as this activity will generate huge WALs. At the same time, keep an eye on the size of your log directory, specially if you are capturing all statements.

We will open multiple parallel sessions to simulate the workload and monitor the progress.

SESSION 1:

In the first session, we will first connect with the database and set idle_in_transaction_session_timeout to value 0 and start a BEGIN block and execute select txid_current(); this is to block autovacuum, as simply turning it off wouldn’t help because emegency mode kick in even with `autovacuum = off.

The idle_in_transaction_session_timeout parameter terminate any session with an open transaction that has been idle for longer than the specified duration in milliseconds. This allows any locks held by that session to be released and the connection slot to be reused; it also allows tuples visible only to this transaction to be vacuumed. So, we will have to set it to value 0 that will disable this feature.

-bash-4.2$ psql -p 5444
psql.bin (11.11.20)
Type "help" for help.

enterprisedb=# SET idle_in_transaction_session_timeout = 0;
SET
enterprisedb=#  BEGIN;
BEGIN
enterprisedb=# select txid_current();
 txid_current
--------------
         1266

enterprisedb=#
enterprisedb=#

SESSION 2:

Now, I will create a file with current transaction ID written in to it and will call the pgbench. I am doing something like below to call pgbench, below are the explanations on flags used and why I used them ..

pgbench -U enterprisedb enterprisedb -f wl.sql -c8 -j8 -T100000 -rn -P60

-f: to read from a file
-c8: 8 clients
-j8: 8 threads/jobs (adjust it as per your CPU count)
-T100000: seconds to run the test for this many seconds, rather than a fixed number of transactions per client
-rn: where r is to report the average per-statement latency (execution time from the perspective of the client) of each command after the benchmark finishes. Whereas n is to perform no vacuuming before running the test. This option is necessary if you are running a custom test scenario that does not include the standard tables pgbench_accounts, pgbench_branches, pgbench_history, and pgbench_tellers.
-P60: is to show progress report every sec seconds

-bash-4.2$ 
-bash-4.2$ echo 'select txid_current();' > wl.sql
-bash-4.2$ 
-bash-4.2$
-bash-4.2$ /usr/edb/as11/bin/pgbench -U enterprisedb enterprisedb -f wl.sql -c8 -j8 -T100000 -rn -P60
progress: 60.0 s, 51097.6 tps, lat 0.157 ms stddev 0.262
progress: 120.0 s, 51171.1 tps, lat 0.156 ms stddev 0.252
progress: 180.0 s, 51622.8 tps, lat 0.155 ms stddev 0.250
progress: 240.0 s, 51328.9 tps, lat 0.156 ms stddev 0.260
progress: 300.0 s, 50976.6 tps, lat 0.157 ms stddev 0.256
progress: 360.0 s, 51130.3 tps, lat 0.156 ms stddev 0.265
progress: 420.0 s, 50792.6 tps, lat 0.158 ms stddev 0.254
progress: 480.0 s, 51207.2 tps, lat 0.156 ms stddev 0.250
progress: 540.0 s, 51162.6 tps, lat 0.156 ms stddev 0.251
progress: 600.0 s, 51246.0 tps, lat 0.156 ms stddev 0.256
progress: 660.0 s, 51047.3 tps, lat 0.157 ms stddev 0.253
progress: 720.0 s, 51197.3 tps, lat 0.156 ms stddev 0.252
progress: 780.0 s, 51119.2 tps, lat 0.156 ms stddev 0.281
progress: 840.0 s, 50938.8 tps, lat 0.157 ms stddev 0.252
progress: 900.0 s, 51252.8 tps, lat 0.156 ms stddev 0.251
progress: 960.0 s, 51054.0 tps, lat 0.157 ms stddev 0.253
progress: 1020.0 s, 51159.5 tps, lat 0.156 ms stddev 0.254
progress: 1080.0 s, 51116.0 tps, lat 0.157 ms stddev 0.253
progress: 1140.0 s, 50947.6 tps, lat 0.157 ms stddev 0.251
progress: 1200.0 s, 51289.7 tps, lat 0.156 ms stddev 0.254
progress: 1260.0 s, 50338.7 tps, lat 0.159 ms stddev 0.254
progress: 1320.0 s, 50121.1 tps, lat 0.160 ms stddev 0.260
progress: 1380.0 s, 49936.6 tps, lat 0.160 ms stddev 0.291
progress: 1440.0 s, 50451.1 tps, lat 0.159 ms stddev 0.251
progress: 1500.0 s, 51107.4 tps, lat 0.157 ms stddev 0.248
progress: 1560.0 s, 50907.8 tps, lat 0.157 ms stddev 0.321
progress: 1620.0 s, 51033.7 tps, lat 0.157 ms stddev 0.251
progress: 1680.0 s, 51013.6 tps, lat 0.157 ms stddev 0.254
progress: 1740.0 s, 51119.4 tps, lat 0.156 ms stddev 0.252
progress: 1800.0 s, 51411.9 tps, lat 0.156 ms stddev 0.249
progress: 1860.0 s, 51099.1 tps, lat 0.157 ms stddev 0.249
progress: 1920.0 s, 51527.5 tps, lat 0.155 ms stddev 0.256
progress: 1980.0 s, 51204.3 tps, lat 0.156 ms stddev 0.267
progress: 2040.0 s, 51363.8 tps, lat 0.156 ms stddev 0.249
progress: 2100.0 s, 51563.1 tps, lat 0.155 ms stddev 0.250
progress: 2160.0 s, 51329.3 tps, lat 0.156 ms stddev 0.246
progress: 2220.0 s, 51073.4 tps, lat 0.157 ms stddev 0.276
progress: 2280.0 s, 51402.0 tps, lat 0.156 ms stddev 0.248
progress: 2340.0 s, 51364.4 tps, lat 0.156 ms stddev 0.253
progress: 2400.0 s, 50890.0 tps, lat 0.157 ms stddev 0.251
progress: 2460.0 s, 50239.6 tps, lat 0.159 ms stddev 0.296
progress: 2520.0 s, 50684.8 tps, lat 0.158 ms stddev 0.283
progress: 2580.0 s, 49826.2 tps, lat 0.161 ms stddev 0.271
progress: 2640.0 s, 51076.7 tps, lat 0.157 ms stddev 0.250
progress: 2700.0 s, 51349.6 tps, lat 0.156 ms stddev 0.252
progress: 2760.0 s, 51076.6 tps, lat 0.157 ms stddev 0.253
progress: 2820.0 s, 50854.5 tps, lat 0.157 ms stddev 0.256
progress: 2880.0 s, 49864.8 tps, lat 0.160 ms stddev 0.260
progress: 2940.0 s, 50450.1 tps, lat 0.159 ms stddev 0.264
progress: 3000.0 s, 50334.7 tps, lat 0.159 ms stddev 0.264
progress: 3060.0 s, 50514.3 tps, lat 0.158 ms stddev 0.299
progress: 3120.0 s, 50508.7 tps, lat 0.158 ms stddev 0.284
progress: 3180.0 s, 50691.3 tps, lat 0.158 ms stddev 0.288
progress: 3240.0 s, 50601.6 tps, lat 0.158 ms stddev 0.256
progress: 3300.0 s, 50280.2 tps, lat 0.159 ms stddev 0.264
progress: 3360.0 s, 50457.6 tps, lat 0.159 ms stddev 0.265
progress: 3420.0 s, 50178.3 tps, lat 0.159 ms stddev 0.390
progress: 3480.0 s, 50353.3 tps, lat 0.159 ms stddev 0.264
progress: 3540.0 s, 50133.0 tps, lat 0.160 ms stddev 0.259
progress: 3600.0 s, 50217.6 tps, lat 0.159 ms stddev 0.282
progress: 3660.0 s, 50147.8 tps, lat 0.160 ms stddev 0.279
progress: 3720.0 s, 50578.0 tps, lat 0.158 ms stddev 0.261
progress: 3780.0 s, 50523.9 tps, lat 0.158 ms stddev 0.261
progress: 3840.0 s, 50349.7 tps, lat 0.159 ms stddev 0.269
progress: 3900.0 s, 50437.5 tps, lat 0.159 ms stddev 0.287
progress: 3960.0 s, 49569.1 tps, lat 0.161 ms stddev 0.311
progress: 4020.0 s, 50337.3 tps, lat 0.159 ms stddev 0.286
progress: 4080.0 s, 50065.2 tps, lat 0.160 ms stddev 0.260
progress: 4140.0 s, 49791.4 tps, lat 0.161 ms stddev 0.264
progress: 4200.0 s, 49681.3 tps, lat 0.161 ms stddev 0.326
progress: 4260.0 s, 48034.2 tps, lat 0.167 ms stddev 0.340
progress: 4320.0 s, 50460.9 tps, lat 0.159 ms stddev 0.290
progress: 4380.0 s, 49731.0 tps, lat 0.161 ms stddev 0.300
progress: 4440.0 s, 50441.1 tps, lat 0.159 ms stddev 0.266
progress: 4500.0 s, 50259.1 tps, lat 0.159 ms stddev 0.267
progress: 4560.0 s, 50197.0 tps, lat 0.159 ms stddev 0.259
progress: 4620.0 s, 49832.1 tps, lat 0.161 ms stddev 0.271
progress: 4680.0 s, 50160.3 tps, lat 0.159 ms stddev 0.268
progress: 4740.0 s, 50578.5 tps, lat 0.158 ms stddev 0.262
progress: 4800.0 s, 50260.5 tps, lat 0.159 ms stddev 0.270
progress: 4860.0 s, 50545.1 tps, lat 0.158 ms stddev 0.256
progress: 4920.0 s, 49961.8 tps, lat 0.160 ms stddev 0.285
progress: 4980.0 s, 50450.7 tps, lat 0.159 ms stddev 0.267
progress: 5040.0 s, 50008.7 tps, lat 0.160 ms stddev 0.257
progress: 5100.0 s, 49723.2 tps, lat 0.161 ms stddev 0.428
progress: 5160.0 s, 50211.1 tps, lat 0.159 ms stddev 0.378
progress: 5220.0 s, 49791.0 tps, lat 0.161 ms stddev 0.262
progress: 5280.0 s, 50091.5 tps, lat 0.160 ms stddev 0.265
progress: 5340.0 s, 50280.3 tps, lat 0.159 ms stddev 0.266
progress: 5400.0 s, 49957.1 tps, lat 0.160 ms stddev 0.402
progress: 5460.0 s, 50101.7 tps, lat 0.160 ms stddev 0.293
progress: 5520.0 s, 50387.3 tps, lat 0.159 ms stddev 0.261
progress: 5580.0 s, 50173.2 tps, lat 0.159 ms stddev 0.278
progress: 5640.0 s, 50225.1 tps, lat 0.159 ms stddev 0.266
progress: 5700.0 s, 50186.7 tps, lat 0.159 ms stddev 0.264
progress: 5760.0 s, 49120.1 tps, lat 0.163 ms stddev 0.264
progress: 5820.0 s, 49445.3 tps, lat 0.162 ms stddev 0.262
progress: 5880.0 s, 49547.2 tps, lat 0.161 ms stddev 0.260
progress: 5940.0 s, 49730.2 tps, lat 0.161 ms stddev 0.267
progress: 6000.0 s, 50543.3 tps, lat 0.158 ms stddev 0.258
progress: 6060.0 s, 50569.1 tps, lat 0.158 ms stddev 0.258
progress: 6120.0 s, 50560.4 tps, lat 0.158 ms stddev 0.250
progress: 6180.0 s, 50572.1 tps, lat 0.158 ms stddev 0.255
progress: 6240.0 s, 50551.6 tps, lat 0.158 ms stddev 0.253
progress: 6300.0 s, 50993.8 tps, lat 0.157 ms stddev 0.254
progress: 6360.0 s, 51114.6 tps, lat 0.157 ms stddev 0.257
progress: 6420.0 s, 50762.6 tps, lat 0.158 ms stddev 0.261
progress: 6480.0 s, 50814.2 tps, lat 0.157 ms stddev 0.250
progress: 6540.0 s, 50735.7 tps, lat 0.158 ms stddev 0.257
progress: 6600.0 s, 50521.9 tps, lat 0.158 ms stddev 0.263
progress: 6660.0 s, 50829.8 tps, lat 0.157 ms stddev 0.252
progress: 6720.0 s, 50933.7 tps, lat 0.157 ms stddev 0.255
progress: 6780.0 s, 51152.9 tps, lat 0.156 ms stddev 0.252
progress: 6840.0 s, 50457.1 tps, lat 0.159 ms stddev 0.255
progress: 6900.0 s, 50542.5 tps, lat 0.158 ms stddev 0.251
progress: 6960.0 s, 50802.5 tps, lat 0.157 ms stddev 0.282
progress: 7020.0 s, 51066.1 tps, lat 0.157 ms stddev 0.270
progress: 7080.0 s, 50729.0 tps, lat 0.158 ms stddev 0.258
progress: 7140.0 s, 51242.7 tps, lat 0.156 ms stddev 0.255
progress: 7200.0 s, 50714.2 tps, lat 0.158 ms stddev 0.253
progress: 7260.0 s, 50704.7 tps, lat 0.158 ms stddev 0.323
progress: 7320.0 s, 50951.4 tps, lat 0.157 ms stddev 0.250
progress: 7380.0 s, 50604.8 tps, lat 0.158 ms stddev 0.340
progress: 7440.0 s, 51217.0 tps, lat 0.156 ms stddev 0.251
progress: 7500.0 s, 50832.4 tps, lat 0.157 ms stddev 0.251
progress: 7560.0 s, 50815.7 tps, lat 0.157 ms stddev 0.256
progress: 7620.0 s, 49936.3 tps, lat 0.160 ms stddev 0.463
progress: 7680.0 s, 50556.4 tps, lat 0.158 ms stddev 0.251
progress: 7740.0 s, 50699.8 tps, lat 0.158 ms stddev 0.255
progress: 7800.0 s, 50474.0 tps, lat 0.158 ms stddev 0.256
progress: 7860.0 s, 50401.9 tps, lat 0.159 ms stddev 0.321
progress: 7920.0 s, 51385.5 tps, lat 0.156 ms stddev 0.251
progress: 7980.0 s, 50671.9 tps, lat 0.158 ms stddev 0.248
progress: 8040.0 s, 50631.3 tps, lat 0.158 ms stddev 0.252
progress: 8100.0 s, 50741.3 tps, lat 0.158 ms stddev 0.255
progress: 8160.0 s, 50465.5 tps, lat 0.159 ms stddev 0.257
progress: 8220.0 s, 51384.9 tps, lat 0.156 ms stddev 0.212
progress: 8280.0 s, 51676.8 tps, lat 0.155 ms stddev 0.233
progress: 8340.0 s, 52049.3 tps, lat 0.154 ms stddev 0.179
progress: 8400.0 s, 52394.6 tps, lat 0.153 ms stddev 0.178
progress: 8460.0 s, 51872.3 tps, lat 0.154 ms stddev 0.192
progress: 8520.0 s, 51896.9 tps, lat 0.154 ms stddev 0.178
progress: 8580.0 s, 52187.5 tps, lat 0.153 ms stddev 0.178
progress: 8640.0 s, 51773.8 tps, lat 0.155 ms stddev 0.185
progress: 8700.0 s, 51862.7 tps, lat 0.154 ms stddev 0.184
progress: 8760.0 s, 51906.7 tps, lat 0.154 ms stddev 0.189
progress: 8820.0 s, 52038.6 tps, lat 0.154 ms stddev 0.189
progress: 8880.0 s, 52370.9 tps, lat 0.153 ms stddev 0.180
progress: 8940.0 s, 52155.2 tps, lat 0.153 ms stddev 0.185
progress: 9000.0 s, 52162.2 tps, lat 0.153 ms stddev 0.219
progress: 9060.0 s, 51900.7 tps, lat 0.154 ms stddev 0.189
progress: 9120.0 s, 51794.5 tps, lat 0.154 ms stddev 0.178
progress: 9180.0 s, 51885.1 tps, lat 0.154 ms stddev 0.180
progress: 9240.0 s, 52007.4 tps, lat 0.154 ms stddev 0.177
progress: 9300.0 s, 52168.3 tps, lat 0.153 ms stddev 0.189
progress: 9360.0 s, 51801.3 tps, lat 0.154 ms stddev 0.182
progress: 9420.0 s, 51899.8 tps, lat 0.154 ms stddev 0.173
progress: 9480.0 s, 52215.3 tps, lat 0.153 ms stddev 0.177
progress: 9540.0 s, 52129.9 tps, lat 0.153 ms stddev 0.227
progress: 9600.0 s, 51929.9 tps, lat 0.154 ms stddev 0.176
progress: 9660.0 s, 51776.6 tps, lat 0.155 ms stddev 0.181
progress: 9720.0 s, 51542.8 tps, lat 0.155 ms stddev 0.268
progress: 9780.0 s, 51352.5 tps, lat 0.156 ms stddev 0.310
progress: 9840.0 s, 51661.0 tps, lat 0.155 ms stddev 0.177
progress: 9900.0 s, 51219.8 tps, lat 0.156 ms stddev 0.260
progress: 9960.0 s, 52235.8 tps, lat 0.153 ms stddev 0.184
progress: 10020.0 s, 51904.5 tps, lat 0.154 ms stddev 0.186
progress: 10080.0 s, 52066.8 tps, lat 0.154 ms stddev 0.183
progress: 10140.0 s, 52212.5 tps, lat 0.153 ms stddev 0.209
progress: 10200.0 s, 52402.8 tps, lat 0.153 ms stddev 0.186
progress: 10260.0 s, 52194.8 tps, lat 0.153 ms stddev 0.178
progress: 10320.0 s, 52117.3 tps, lat 0.153 ms stddev 0.179
progress: 10380.0 s, 52060.5 tps, lat 0.154 ms stddev 0.181
progress: 10440.0 s, 52043.1 tps, lat 0.154 ms stddev 0.183
progress: 10500.0 s, 52358.5 tps, lat 0.153 ms stddev 0.178
progress: 10560.0 s, 51361.2 tps, lat 0.156 ms stddev 0.179
progress: 10620.0 s, 51655.7 tps, lat 0.155 ms stddev 0.184
progress: 10680.0 s, 51501.1 tps, lat 0.155 ms stddev 0.176
progress: 10740.0 s, 51315.4 tps, lat 0.156 ms stddev 0.182
progress: 10800.0 s, 52344.5 tps, lat 0.153 ms stddev 0.181
progress: 10860.0 s, 51687.0 tps, lat 0.155 ms stddev 0.182
progress: 10920.0 s, 51150.2 tps, lat 0.156 ms stddev 0.179
progress: 10980.0 s, 51341.9 tps, lat 0.156 ms stddev 0.181
progress: 11040.0 s, 51074.0 tps, lat 0.157 ms stddev 0.184
progress: 11100.0 s, 51645.4 tps, lat 0.155 ms stddev 0.277
progress: 11160.0 s, 51984.6 tps, lat 0.154 ms stddev 0.182
progress: 11220.0 s, 51993.4 tps, lat 0.154 ms stddev 0.181
progress: 11280.0 s, 51894.9 tps, lat 0.154 ms stddev 0.183
progress: 11340.0 s, 52007.2 tps, lat 0.154 ms stddev 0.186
progress: 11400.0 s, 52255.9 tps, lat 0.153 ms stddev 0.188
progress: 11460.0 s, 52221.2 tps, lat 0.153 ms stddev 0.183
progress: 11520.0 s, 51990.5 tps, lat 0.154 ms stddev 0.181
progress: 11580.0 s, 52138.3 tps, lat 0.153 ms stddev 0.178
progress: 11640.0 s, 51258.2 tps, lat 0.156 ms stddev 0.178
progress: 11700.0 s, 52268.0 tps, lat 0.153 ms stddev 0.179
progress: 11760.0 s, 51838.4 tps, lat 0.154 ms stddev 0.180
progress: 11820.0 s, 52079.2 tps, lat 0.154 ms stddev 0.185
progress: 11880.0 s, 52531.4 tps, lat 0.152 ms stddev 0.175
progress: 11940.0 s, 51711.7 tps, lat 0.155 ms stddev 0.260
progress: 12000.0 s, 52020.7 tps, lat 0.154 ms stddev 0.180
progress: 12060.0 s, 51990.1 tps, lat 0.154 ms stddev 0.241
progress: 12120.0 s, 51367.6 tps, lat 0.156 ms stddev 0.193
progress: 12180.0 s, 51316.0 tps, lat 0.156 ms stddev 0.187
progress: 12240.0 s, 50819.0 tps, lat 0.157 ms stddev 0.190
progress: 12300.0 s, 51614.6 tps, lat 0.155 ms stddev 0.182
progress: 12360.0 s, 51500.6 tps, lat 0.155 ms stddev 0.187
progress: 12420.0 s, 51715.8 tps, lat 0.155 ms stddev 0.259
progress: 12480.0 s, 51805.0 tps, lat 0.154 ms stddev 0.183
progress: 12540.0 s, 51846.2 tps, lat 0.154 ms stddev 0.183
progress: 12600.0 s, 51829.6 tps, lat 0.154 ms stddev 0.179
progress: 12660.0 s, 51411.1 tps, lat 0.156 ms stddev 0.221
progress: 12720.0 s, 51728.5 tps, lat 0.155 ms stddev 0.187
progress: 12780.0 s, 52170.5 tps, lat 0.153 ms stddev 0.181
progress: 12840.0 s, 51985.8 tps, lat 0.154 ms stddev 0.182
progress: 12900.0 s, 51723.3 tps, lat 0.155 ms stddev 0.181
progress: 12960.0 s, 52324.2 tps, lat 0.153 ms stddev 0.184
progress: 13020.0 s, 51795.5 tps, lat 0.154 ms stddev 0.189
progress: 13080.0 s, 51547.3 tps, lat 0.155 ms stddev 0.373
progress: 13140.0 s, 51592.4 tps, lat 0.155 ms stddev 0.231
progress: 13200.0 s, 52004.8 tps, lat 0.154 ms stddev 0.223
progress: 13260.0 s, 52118.3 tps, lat 0.153 ms stddev 0.184
progress: 13320.0 s, 51781.1 tps, lat 0.154 ms stddev 0.183
progress: 13380.0 s, 50621.8 tps, lat 0.158 ms stddev 0.196
progress: 13440.0 s, 50671.8 tps, lat 0.158 ms stddev 0.321
progress: 13500.0 s, 51389.4 tps, lat 0.156 ms stddev 0.181
progress: 13560.0 s, 51653.2 tps, lat 0.155 ms stddev 0.186
progress: 13620.0 s, 51491.7 tps, lat 0.155 ms stddev 0.181
progress: 13680.0 s, 51639.4 tps, lat 0.155 ms stddev 0.183
progress: 13740.0 s, 51385.3 tps, lat 0.156 ms stddev 0.188
progress: 13800.0 s, 51640.2 tps, lat 0.155 ms stddev 0.177
progress: 13860.0 s, 51820.7 tps, lat 0.154 ms stddev 0.176
progress: 13920.0 s, 52041.9 tps, lat 0.154 ms stddev 0.179
progress: 13980.0 s, 52001.0 tps, lat 0.154 ms stddev 0.178
progress: 14040.0 s, 52189.3 tps, lat 0.153 ms stddev 0.176
progress: 14100.0 s, 51897.9 tps, lat 0.154 ms stddev 0.179
progress: 14160.0 s, 52066.4 tps, lat 0.154 ms stddev 0.180
progress: 14220.0 s, 52257.9 tps, lat 0.153 ms stddev 0.178
progress: 14280.0 s, 51850.1 tps, lat 0.154 ms stddev 0.210
......
.........
...........

SESSION 3:

Now in this third session we will monitor the database to see what’s going on. You can do something like below to check where we are going.

Below SQL will show you the ‘datfrozenxid‘ which is a value for all transaction IDs before this one have been replaced with a permanent (“frozen”) transaction ID in this database. This is used to track whether the database needs to be vacuumed in order to prevent transaction ID wraparound. And to see ‘autovacuum_freeze_max_age‘ that is age at which to autovacuum a table to prevent transaction ID wraparound.

Do a watch (\watch 10) on the command to see the progress and the play live 🙂

SELECT datname
    , age(datfrozenxid)
    , current_setting('autovacuum_freeze_max_age') 
FROM pg_database 
ORDER BY 2 DESC;


   datname    |   age   | current_setting
--------------+---------+-----------------
 postgres     | 3058531 | 200000000
 edb          | 3058531 | 200000000
 template1    | 3058531 | 200000000
 template0    | 3058531 | 200000000
 immortal     | 3058531 | 200000000
 darkthrone   | 3058531 | 200000000
 immortal1    | 3058531 | 200000000
 mayhem       | 3058531 | 200000000
 marduk       | 3058531 | 200000000
 burzum       | 3058531 | 200000000
 gorgoroth    | 3058531 | 200000000
 repmgr       | 3058531 | 200000000
 enterprisedb | 3058531 | 200000000
(13 rows)

I have used below query to check where we are going, current XIDs, how close we are to our target of wraparound ..

enterprisedb=# WITH max_age AS ( 
    SELECT 2000000000 as max_old_xid
        , setting AS autovacuum_freeze_max_age 
        FROM pg_catalog.pg_settings 
        WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS ( 
    SELECT datname
        , m.max_old_xid::int
        , m.autovacuum_freeze_max_age::int
        , age(d.datfrozenxid) AS oldest_current_xid 
    FROM pg_catalog.pg_database d 
    JOIN max_age m ON (true) 
    WHERE d.datallowconn ) 
SELECT max(oldest_current_xid) AS oldest_current_xid
    , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound
    , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac 
FROM per_database_stats


\watch 10

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          289007513 |                         14 |                               145
(1 row)



 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          290077770 |                         15 |                               145
(1 row)


 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          292467983 |                         15 |                               146
(1 row)


                     Mon 19 Jul 2021 08:38:01 PM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          342714029 |                         17 |                               171
(1 row)

                     Mon 19 Jul 2021 08:38:11 PM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          343223530 |                         17 |                               172
(1 row)


                     Mon 19 Jul 2021 09:17:45 PM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          406185619 |                         20 |                               203
(1 row)


 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          421779698 |                         21 |                               211
(1 row)

                     Mon 19 Jul 2021 09:28:10 PM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          438446236 |                         22 |                               219
(1 row)



                     Mon 19 Jul 2021 11:03:03 PM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          733479765 |                         37 |                               367
(1 row)



 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          763852636 |                         38 |                               382
(1 row)


                     Mon 19 Jul 2021 11:33:03 PM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          826485794 |                         41 |                               413
(1 row)


                     Tue 20 Jul 2021 12:18:07 AM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          966692697 |                         48 |                               483
(1 row)


 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
         1107823196 |                         55 |                               554
 

                     Tue 20 Jul 2021 01:09:18 AM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
         1125393734 |                         56 |                               563
(1 row)


                     Tue 20 Jul 2021 01:43:11 AM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
         1230984982 |                         62 |                               615
(1 row)





-- LETS SEE WHAT'S HAPPENING ON THE DATABASE 
enterprisedb=# \x
Expanded display is on.
enterprisedb=#
enterprisedb=# select * from pg_stat_activity;


-[ RECORD 4 ]----+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 24056
usesysid         | 10
usename          | enterprisedb
application_name | psql.bin
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:45:44.833016 +05:30
xact_start       | 20-JUL-21 09:45:57.749892 +05:30
query_start      | 20-JUL-21 09:46:01.042977 +05:30
state_change     | 20-JUL-21 09:46:01.047161 +05:30
wait_event_type  | Client
wait_event       | ClientRead
state            | idle in transaction
backend_xid      | 1195
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 5 ]----+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 25884
usesysid         | 10
usename          | enterprisedb
application_name | psql.bin
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:47:36.766729 +05:30
xact_start       | 20-JUL-21 09:52:18.941776 +05:30
query_start      | 20-JUL-21 09:52:18.941776 +05:30
state_change     | 20-JUL-21 09:52:18.941778 +05:30
wait_event_type  |
wait_event       |
state            | active
backend_xid      |
backend_xmin     | 1195
query            | select * from pg_stat_activity;
backend_type     | client backend
-[ RECORD 6 ]----+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27150
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.70284 +05:30
xact_start       | 20-JUL-21 09:52:18.941429 +05:30
query_start      | 20-JUL-21 09:52:18.941429 +05:30
state_change     | 20-JUL-21 09:52:18.94143 +05:30
wait_event_type  | IPC
wait_event       | ClogGroupUpdate
state            | active
backend_xid      | 4526420
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 7 ]----+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27151
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.704062 +05:30
xact_start       |
query_start      | 20-JUL-21 09:52:18.943293 +05:30
state_change     | 20-JUL-21 09:52:18.943411 +05:30
wait_event_type  | Client
wait_event       | ClientRead
state            | idle
backend_xid      |
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 8 ]----+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27153
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.7067 +05:30
xact_start       |
query_start      | 20-JUL-21 09:52:18.943218 +05:30
state_change     | 20-JUL-21 09:52:18.943334 +05:30
wait_event_type  |
wait_event       |
state            | idle
backend_xid      |
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 9 ]----+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27154
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.707948 +05:30
xact_start       | 20-JUL-21 09:52:18.941116 +05:30
query_start      | 20-JUL-21 09:52:18.941116 +05:30
state_change     | 20-JUL-21 09:52:18.941117 +05:30
wait_event_type  | IPC
wait_event       | ClogGroupUpdate
state            | active
backend_xid      | 4526414
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 10 ]---+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27152
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.710155 +05:30
xact_start       | 20-JUL-21 09:52:18.943403 +05:30
query_start      | 20-JUL-21 09:52:18.943403 +05:30
state_change     | 20-JUL-21 09:52:18.943405 +05:30
wait_event_type  |
wait_event       |
state            | active
backend_xid      |
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 11 ]---+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27155
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.710785 +05:30
xact_start       | 20-JUL-21 09:52:18.941493 +05:30
query_start      | 20-JUL-21 09:52:18.941493 +05:30
state_change     | 20-JUL-21 09:52:18.941494 +05:30
wait_event_type  | IPC
wait_event       | ClogGroupUpdate
state            | active
backend_xid      | 4526421
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 12 ]---+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27156
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.711762 +05:30
xact_start       |
query_start      | 20-JUL-21 09:52:18.942582 +05:30
state_change     | 20-JUL-21 09:52:18.942704 +05:30
wait_event_type  | Client
wait_event       | ClientRead
state            | idle
backend_xid      |
backend_xmin     |
query            | select txid_current();
backend_type     | client backend
-[ RECORD 13 ]---+---------------------------------
datid            | 16384
datname          | enterprisedb
pid              | 27157
usesysid         | 10
usename          | enterprisedb
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 20-JUL-21 09:49:56.711944 +05:30
xact_start       | 20-JUL-21 09:52:18.943345 +05:30
query_start      | 20-JUL-21 09:52:18.943345 +05:30
state_change     | 20-JUL-21 09:52:18.943346 +05:30
wait_event_type  | Client
wait_event       | ClientRead
state            | active
backend_xid      | 4526447
backend_xmin     | 1195
query            | select txid_current();
backend_type     | client backend
-[ RECORD 14 ]---+---------------------------------



SESSION 4:

Open another session to check log files, if they are reporting any hints, warnings or errors
After ~1 hour of test run, I have started seeing below warnings in the log files.

2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST WARNING:  oldest xmin is far in the past
2021-07-19 20:15:26 IST HINT:  Close open transactions soon to avoid wraparound problems.
        You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST WARNING:  oldest xmin is far in the past
2021-07-19 20:15:26 IST HINT:  Close open transactions soon to avoid wraparound problems.
        You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();
2021-07-19 20:15:26 IST LOG:  statement: select txid_current();

After ~ 3 hours of test run, the warnings becoming more prominent.

-bash-4.2$ grep -Rinw /var/lib/edb/as11/data/log/ -e 'wraparound'
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:17711:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:20705:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21160:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21650:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22179:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22694:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23229:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23776:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24316:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24840:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25391:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25927:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:27533:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:28135:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:32986:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:33581:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:17711:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:20705:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21160:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21650:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22179:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22694:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23229:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23776:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24316:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24840:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25391:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25927:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:27533:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:28135:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:32986:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:33581:2021-07-19 19:53:01 IST HINT:  Close open transactions soon to avoid wraparound problems.

Now, let’s monitor system stats too to see overall load, processes their commands etc.

top - 00:19:39 up 159 days,  7:11,  6 users,  load average: 10.31, 9.97, 9.95
Tasks: 399 total,   9 running, 390 sleeping,   0 stopped,   0 zombie
%Cpu(s): 60.0 us, 38.4 sy,  0.0 ni,  1.6 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32780168 total, 14568928 free,  1259976 used, 16951264 buff/cache
KiB Swap: 16777212 total, 16685044 free,    92168 used. 23083820 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
21831 enterpr+  20   0  591764   5228   2152 S 16.8  0.0 417:57.53 /usr/edb/as11/bin/pgbench -U enterprisedb enterprisedb -f wl.sql -c8 -j8 -T100000 -rn -P+
21845 enterpr+  20   0 8667644  24604  22256 R  9.7  0.1 209:53.07 postgres: enterprisedb enterprisedb [local] SELECT
21840 enterpr+  20   0 8667644  24584  22244 R  9.4  0.1 210:36.23 postgres: enterprisedb enterprisedb [local] idle
21844 enterpr+  20   0 8667644  24876  22528 R  9.4  0.1 209:38.54 postgres: enterprisedb enterprisedb [local] idle
21841 enterpr+  20   0 8667644  24836  22496 R  8.8  0.1 209:15.88 postgres: enterprisedb enterprisedb [local] SELECT
21847 enterpr+  20   0 8667644  24600  22252 S  8.8  0.1 206:16.87 postgres: enterprisedb enterprisedb [local] idle
21846 enterpr+  20   0 8667644  24604  22256 S  8.5  0.1 210:41.59 postgres: enterprisedb enterprisedb [local] idle
20421 enterpr+  20   0  259688   2124    632 R  7.1  0.0 146:27.66 postgres: logger
21843 enterpr+  20   0 8667644  24588  22248 R  6.6  0.1 210:05.38 postgres: enterprisedb enterprisedb [local] idle
21842 enterpr+  20   0 8667644  24600  22260 R  6.6  0.1 209:48.66 postgres: enterprisedb enterprisedb [local] idle
20426 enterpr+  20   0 8666684  18736  17208 R  4.0  0.1 102:43.53 postgres: walwriter
 6858 polkitd   20   0  620236  18044   4428 S  0.9  0.1  15940:23 /usr/lib/polkit-1/polkitd --no-debug

SESSION 2:

To my surprise, after ~8 hours of test run, the session screen where I have pgbench running, started throwing some WARNINGs to perform vacuum within specified number of transactions to avoid database shutdown. It kept on throwing those direct warnings for almost an hour before it stopped processing pgbench requests and started aborting workers with a message ‘ERROR: database is not accepting commands to avoid wraparound data loss in database “postgres”.

In the end it will present you the overall pgbench stats where it fired 2146483646 statements in total.

WARNING:  database "postgres" must be vacuumed within 5092920 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 5092920 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 5092917 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
....
......
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 3178805 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 3178808 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 3178804 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 3178803 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 3178801 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 3178799 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
......
........
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 1000002 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING:  database "postgres" must be vacuumed within 1000002 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 0 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

client 1 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

client 7 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

client 6 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

client 2 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

client 4 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

WARNING:  database "postgres" must be vacuumed within 1000005 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 3 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

client 5 aborted in command 0 (SQL) of script 0; ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

transaction type: wl.sql
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 8
duration: 100000 s
number of transactions actually processed: 2146483646
latency average = 0.159 ms
latency stddev = 1.102 ms
tps = 50437.366776 (including connections establishing)
tps = 50437.386645 (excluding connections establishing)
statement latencies in milliseconds:
         0.171  select txid_current();
You have new mail in /var/spool/mail/enterprisedb
-bash-4.2$

SESSION 3:

Lets check our screen 3 to see where we are in terms on oldest current XID and percent towards wraparound, and I see XID numbers are not increasing and we have reached 107% towards wraparound (strange, why it goes beyond 100%).

enterprisedb=# WITH max_age AS ( 
    SELECT 2000000000 as max_old_xid
        , setting AS autovacuum_freeze_max_age 
        FROM pg_catalog.pg_settings 
        WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS ( 
    SELECT datname
        , m.max_old_xid::int
        , m.autovacuum_freeze_max_age::int
        , age(d.datfrozenxid) AS oldest_current_xid 
    FROM pg_catalog.pg_database d 
    JOIN max_age m ON (true) 
    WHERE d.datallowconn ) 
SELECT max(oldest_current_xid) AS oldest_current_xid
    , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound
    , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac 
FROM per_database_stats;

(1 row)
                     Tue 20 Jul 2021 06:57:51 AM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
         2146483647 |                        107 |                              1073
(1 row)


                     Tue 20 Jul 2021 06:58:01 AM IST (every 10s)

 oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
         2146483647 |                        107 |                              1073
(1 row)




enterprisedb=# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
 ORDER BY 2 DESC;

   datname    |    age     | current_setting
--------------+------------+-----------------
 postgres     | 2146483647 | 200000000
 edb          | 2146483647 | 200000000
 template1    | 2146483647 | 200000000
 template0    | 2146483647 | 200000000
 immortal     | 2146483647 | 200000000
 darkthrone   | 2146483647 | 200000000
 immortal1    | 2146483647 | 200000000
 mayhem       | 2146483647 | 200000000
 murduk       | 2146483647 | 200000000
 burzum       | 2146483647 | 200000000
 gorgoroth    | 2146483647 | 200000000
 repmgr       | 2146483647 | 200000000
 enterprisedb | 2146483647 | 200000000
(13 rows)

Let’s connect with the database and see what happens when we try to do both SELECT & UPDATE statements …

enterprisedb=# \dt+
                            List of relations
 Schema |       Name       | Type  |    Owner     |  Size   | Description
--------+------------------+-------+--------------+---------+-------------
 public | pgbench_accounts | table | enterprisedb | 13 MB   |
 public | pgbench_branches | table | enterprisedb | 40 kB   |
 public | pgbench_history  | table | enterprisedb | 40 kB   |
 public | pgbench_tellers  | table | enterprisedb | 40 kB   |
 public | scale_data       | table | enterprisedb | 6720 MB |
(5 rows)

enterprisedb=# select * from pgbench_history;
 tid | bid |  aid  | delta |           mtime           | filler
-----+-----+-------+-------+---------------------------+--------
   2 |   1 | 64444 | -1096 | 19-JUL-21 18:41:15.655738 |
   3 |   1 | 18460 | -4688 | 19-JUL-21 18:41:15.668699 |
   2 |   1 | 92762 | -4669 | 19-JUL-21 18:41:15.670741 |
   2 |   1 | 55081 |  1817 | 19-JUL-21 18:41:15.672719 |
   5 |   1 | 47743 |   803 | 19-JUL-21 18:41:15.674666 |
   5 |   1 | 23899 | -3910 | 19-JUL-21 18:41:15.67664  |
  10 |   1 | 33384 |  4005 | 19-JUL-21 18:41:15.678631 |
   6 |   1 | 88383 | -4257 | 19-JUL-21 18:41:15.680629 |
   8 |   1 | 87562 |   200 | 19-JUL-21 18:41:15.682612 |
   5 |   1 |  3957 | -2407 | 19-JUL-21 18:41:15.684595 |
(10 rows)

enterprisedb=#
enterprisedb=# update pgbench_history set aid=65555 where aid=64444;
ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
enterprisedb=#
enterprisedb=#
enterprisedb=#

Alright, so with transaction wraparound is already there, we only be able to run SELECT queries, but DMLS will fail with a message printed to stop the database and vacuum it in single-user mode.

So, what we have learned

  • Database will keep on running and accepting READ ONLY (SELECT) requests, error will be thrown in case if DMLs tried.
  • 2,000,000,000 seems a huge number, but in case of busy transactional database, it’s just a matter of few hours.
  • AUTOVACUUMING is good thing, at least for this scenario, let’s not discuss about its bugs and other problems.
  • Add VACUUMING a routine activity else you will soon find yourself in a big mess.

The easiest way to get the highest TXID age value back down is to force a vacuum on the entire database cluster. And the best way to do this cluster-wide vacuum is the vacuumdb binary utility that comes with PostgreSQL. I will cover it in my next post as this post is all about simulating wraparounds.

Hope It Helped!
Prashant Dixit

Posted in Advanced, troubleshooting | Tagged: , , | 7 Comments »