How to fix/handle Wraparound problem in PostgreSQL ?
Posted by FatDBA on July 21, 2021
Hi Guys,
In continuation from where I left in my last post on simulating wraparounds in PostgreSQL, this post is all about managing the mess that we spilled 🙂 Though this was a synthetic/artificial test and as soon you close or exit SESSION 1 where you’ve opened a BEGIN block/transaction and ran select txid_current() to block autovacuum, it (autovacuum) will automatically kick-in and perform a quick full vacuum on all of the databases in the cluster. You have to remember that even if you have autovacuum turned off, it will still start in emergency mode.
So, this is what we got earlier at the time we hit by the wraparound problem when we tried to execute a DML and it says something about stopping postmaster and vacuum databases in single-user mode; What is that ? – We will see that next ….
enterprisedb=#
enterprisedb=# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
datname | age | current_setting
--------------+------------+-----------------
postgres | 2146483647 | 200000000
edb | 2146483647 | 200000000
template1 | 2146483647 | 200000000
template0 | 2146483647 | 200000000
immortal | 2146483647 | 200000000
darkthrone | 2146483647 | 200000000
immortal1 | 2146483647 | 200000000
mayhem | 2146483647 | 200000000
murduk | 2146483647 | 200000000
burzum | 2146483647 | 200000000
gorgoroth | 2146483647 | 200000000
repmgr | 2146483647 | 200000000
enterprisedb | 2146483647 | 200000000
(13 rows)
enterprisedb=# update pgbench_history set aid=65555 where aid=64444;
ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
enterprisedb=#
enterprisedb=#
Okay, coming back to the point, how to handle the situation now when you’ve already hit the wraparound. So, first thing you have to manually shut down the database cluster, if it’s still up.
-bash-4.2$ pwd
/usr/edb/as11/bin
-bash-4.2$ /usr/edb/as11/bin/pg_ctl stop -D /var/lib/edb/as11/data
waiting for server to shut down.... done
server stopped
-bash-4.2$
Next, you need to run VACUUM FULL for each of the database (one-by-one using single mode) through stand-alone backend mode. For that you have to call postgres (if you’re on open source) or edb-postgres utility. I am on EDB PostgreSQL 11 so will be calling edb-postgres and execute VACUUM FULL.
-bash-4.2$ pwd
/usr/edb/as11/bin
-bash-4.2$
-bash-4.2$ ls *postgre*
edb-postgres
-bash-4.2$
-bash-4.2$ ./edb-postgres --single -D /var/lib/edb/as11/data enterprisedb
PostgreSQL stand-alone backend 11.11.20
backend> vacuum full;
2021-07-20 11:07:49 IST LOG: statement: vacuum full;
backend>
It’s done for ‘enterprisedb’ database. Next will do it for ‘postgresql’ database, but this time I will use some stand-alone backend related command to assure we are connected with the right database and to enable runtime stats or verbose.
-bash-4.2$ ./edb-postgres --single -D /var/lib/edb/as11/data postgres
PostgreSQL stand-alone backend 11.11.20
backend>
backend> select current_database();
2021-07-20 13:45:25 IST LOG: statement: select current_database();
1: current_database (typeid = 19, len = 64, typmod = -1, byval = f)
----
1: current_database = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f)
----
backend>
backend> SELECT set_config('log_statement_stats', 'on', false);
2021-07-20 13:48:55 IST LOG: statement: SELECT set_config('log_statement_stats', 'on', false);
1: set_config (typeid = 25, len = -1, typmod = -1, byval = f)
----
1: set_config = "on" (typeid = 25, len = -1, typmod = -1, byval = f)
----
backend>
-- Now when we have enabled logging of stats, let's execute the VACUUM to see runtime stats
backend> vacuum full;
2021-07-20 13:49:29 IST LOG: statement: vacuum full;
2021-07-20 13:49:30 IST LOG: QUERY STATISTICS
2021-07-20 13:49:30 IST DETAIL: ! system usage stats:
! 0.557302 s user, 0.426659 s system, 120.357278 s elapsed
! [0.583998 s user, 0.455579 s system total]
! 21756 kB max resident size
! 224/402352 [240/402824] filesystem blocks in/out
! 0/2558 [0/6201151] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 982/2 [1005/3] voluntary/involuntary context switches
2021-07-20 13:58:39 IST STATEMENT: vacuum full;
backend>
Okay, same way we have to run VACUUM FULL on all of the databases and restart PostgreSQL cluster.
Once all of the databases are done and after restart, connect with the database and you will see the TXID or datfrozenid has been reset and you are all good.
enterprisedb=# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
datname | age | current_setting
--------------+-----+-----------------
postgres | 1369 | 200000000
edb | 1272 | 200000000
template1 | 1170 | 200000000
template0 | 1071 | 200000000
immortal | 974 | 200000000
darkthrone | 877 | 200000000
immortal1 | 780 | 200000000
mayhem | 585 | 200000000
murduk | 487 | 200000000
burzum | 388 | 200000000
gorgoroth | 291 | 200000000
repmgr | 194 | 200000000
enterprisedb | 97 | 200000000
You’re all set!
Hope It Helped!
Prashant Dixit
Leave a Reply