How to simulate the deadly, infamous, misunderstood & complex ‘Transaction Wraparound Problem’ in PostgreSQL …
Posted by FatDBA on July 20, 2021
Hi Guys,
Last week one of my good friend asked me an interesting question – ‘How to replicate the transaction wraparound problem’ in PostgreSQL ?
He has a demo/POC that he needs to do for one of his customer who is interested to see how it looks like in reality and how ‘deadly’ is the problem and how to fix this. He scanned almost everything on the Google, but unfortunately found nothing available or pre-cooked …
So, this might be the only post available on the internet explaining how to exactly simulate the transaction wraparound in the PostgreSQL Database. This one was really interesting and I have never tried this before as I thought I will never see transaction ID wraparound error 🙂 . So, before I jump in to the demo, we need to take care of few things. Use a system with good resources, I have performed this demo on an EDB PostgreSQL 11.11.20 database with 8 CPUs, 31 GB RAM, and it took me around 9 hours to exhaust all transactionIDs and reach the limit of 2,000,000,000.
Would like to thank Nikolay Samokhvalov for his help! 🙂
Before I start, let’s do this one more time – What is a transaction ID and why it wrapsaround ?
The PostgreSQL transaction control mechanism assigns a transaction ID to every row that is modified in the database; these IDs control the visibility of that row to other concurrent transactions. Txid wraparound is a problem due to MVCC. MVCC relies on being able to take the txids of two transactions and determine which of the transactions came first.
A PostgreSQL database can have two billion ‘in-flight’ unvacuumed transactions before PostgreSQL takes dramatic action to avoid data loss. If the number of unvacuumed transactions reaches (2^31 – 10,000,000), the log starts warning that vacuuming is needed. If the number of unvacuumed transactions reaches 2,000,000,000, PostgreSQL sets the database to read-only mode and requires an offline, single-user, standalone vacuum.
Note: You have to carefully monitor your database pg_log, archiving locations as this activity will generate huge WALs. At the same time, keep an eye on the size of your log directory, specially if you are capturing all statements.
We will open multiple parallel sessions to simulate the workload and monitor the progress.
SESSION 1:
In the first session, we will first connect with the database and set idle_in_transaction_session_timeout to value 0 and start a BEGIN block and execute select txid_current(); this is to block autovacuum, as simply turning it off wouldn’t help because emegency mode kick in even with `autovacuum = off.
The idle_in_transaction_session_timeout parameter terminate any session with an open transaction that has been idle for longer than the specified duration in milliseconds. This allows any locks held by that session to be released and the connection slot to be reused; it also allows tuples visible only to this transaction to be vacuumed. So, we will have to set it to value 0 that will disable this feature.
-bash-4.2$ psql -p 5444
psql.bin (11.11.20)
Type "help" for help.
enterprisedb=# SET idle_in_transaction_session_timeout = 0;
SET
enterprisedb=# BEGIN;
BEGIN
enterprisedb=# select txid_current();
txid_current
--------------
1266
enterprisedb=#
enterprisedb=#
SESSION 2:
Now, I will create a file with current transaction ID written in to it and will call the pgbench. I am doing something like below to call pgbench, below are the explanations on flags used and why I used them ..
pgbench -U enterprisedb enterprisedb -f wl.sql -c8 -j8 -T100000 -rn -P60
-f: to read from a file
-c8: 8 clients
-j8: 8 threads/jobs (adjust it as per your CPU count)
-T100000: seconds to run the test for this many seconds, rather than a fixed number of transactions per client
-rn: where r is to report the average per-statement latency (execution time from the perspective of the client) of each command after the benchmark finishes. Whereas n is to perform no vacuuming before running the test. This option is necessary if you are running a custom test scenario that does not include the standard tables pgbench_accounts, pgbench_branches, pgbench_history, and pgbench_tellers.
-P60: is to show progress report every sec seconds
-bash-4.2$
-bash-4.2$ echo 'select txid_current();' > wl.sql
-bash-4.2$
-bash-4.2$
-bash-4.2$ /usr/edb/as11/bin/pgbench -U enterprisedb enterprisedb -f wl.sql -c8 -j8 -T100000 -rn -P60
progress: 60.0 s, 51097.6 tps, lat 0.157 ms stddev 0.262
progress: 120.0 s, 51171.1 tps, lat 0.156 ms stddev 0.252
progress: 180.0 s, 51622.8 tps, lat 0.155 ms stddev 0.250
progress: 240.0 s, 51328.9 tps, lat 0.156 ms stddev 0.260
progress: 300.0 s, 50976.6 tps, lat 0.157 ms stddev 0.256
progress: 360.0 s, 51130.3 tps, lat 0.156 ms stddev 0.265
progress: 420.0 s, 50792.6 tps, lat 0.158 ms stddev 0.254
progress: 480.0 s, 51207.2 tps, lat 0.156 ms stddev 0.250
progress: 540.0 s, 51162.6 tps, lat 0.156 ms stddev 0.251
progress: 600.0 s, 51246.0 tps, lat 0.156 ms stddev 0.256
progress: 660.0 s, 51047.3 tps, lat 0.157 ms stddev 0.253
progress: 720.0 s, 51197.3 tps, lat 0.156 ms stddev 0.252
progress: 780.0 s, 51119.2 tps, lat 0.156 ms stddev 0.281
progress: 840.0 s, 50938.8 tps, lat 0.157 ms stddev 0.252
progress: 900.0 s, 51252.8 tps, lat 0.156 ms stddev 0.251
progress: 960.0 s, 51054.0 tps, lat 0.157 ms stddev 0.253
progress: 1020.0 s, 51159.5 tps, lat 0.156 ms stddev 0.254
progress: 1080.0 s, 51116.0 tps, lat 0.157 ms stddev 0.253
progress: 1140.0 s, 50947.6 tps, lat 0.157 ms stddev 0.251
progress: 1200.0 s, 51289.7 tps, lat 0.156 ms stddev 0.254
progress: 1260.0 s, 50338.7 tps, lat 0.159 ms stddev 0.254
progress: 1320.0 s, 50121.1 tps, lat 0.160 ms stddev 0.260
progress: 1380.0 s, 49936.6 tps, lat 0.160 ms stddev 0.291
progress: 1440.0 s, 50451.1 tps, lat 0.159 ms stddev 0.251
progress: 1500.0 s, 51107.4 tps, lat 0.157 ms stddev 0.248
progress: 1560.0 s, 50907.8 tps, lat 0.157 ms stddev 0.321
progress: 1620.0 s, 51033.7 tps, lat 0.157 ms stddev 0.251
progress: 1680.0 s, 51013.6 tps, lat 0.157 ms stddev 0.254
progress: 1740.0 s, 51119.4 tps, lat 0.156 ms stddev 0.252
progress: 1800.0 s, 51411.9 tps, lat 0.156 ms stddev 0.249
progress: 1860.0 s, 51099.1 tps, lat 0.157 ms stddev 0.249
progress: 1920.0 s, 51527.5 tps, lat 0.155 ms stddev 0.256
progress: 1980.0 s, 51204.3 tps, lat 0.156 ms stddev 0.267
progress: 2040.0 s, 51363.8 tps, lat 0.156 ms stddev 0.249
progress: 2100.0 s, 51563.1 tps, lat 0.155 ms stddev 0.250
progress: 2160.0 s, 51329.3 tps, lat 0.156 ms stddev 0.246
progress: 2220.0 s, 51073.4 tps, lat 0.157 ms stddev 0.276
progress: 2280.0 s, 51402.0 tps, lat 0.156 ms stddev 0.248
progress: 2340.0 s, 51364.4 tps, lat 0.156 ms stddev 0.253
progress: 2400.0 s, 50890.0 tps, lat 0.157 ms stddev 0.251
progress: 2460.0 s, 50239.6 tps, lat 0.159 ms stddev 0.296
progress: 2520.0 s, 50684.8 tps, lat 0.158 ms stddev 0.283
progress: 2580.0 s, 49826.2 tps, lat 0.161 ms stddev 0.271
progress: 2640.0 s, 51076.7 tps, lat 0.157 ms stddev 0.250
progress: 2700.0 s, 51349.6 tps, lat 0.156 ms stddev 0.252
progress: 2760.0 s, 51076.6 tps, lat 0.157 ms stddev 0.253
progress: 2820.0 s, 50854.5 tps, lat 0.157 ms stddev 0.256
progress: 2880.0 s, 49864.8 tps, lat 0.160 ms stddev 0.260
progress: 2940.0 s, 50450.1 tps, lat 0.159 ms stddev 0.264
progress: 3000.0 s, 50334.7 tps, lat 0.159 ms stddev 0.264
progress: 3060.0 s, 50514.3 tps, lat 0.158 ms stddev 0.299
progress: 3120.0 s, 50508.7 tps, lat 0.158 ms stddev 0.284
progress: 3180.0 s, 50691.3 tps, lat 0.158 ms stddev 0.288
progress: 3240.0 s, 50601.6 tps, lat 0.158 ms stddev 0.256
progress: 3300.0 s, 50280.2 tps, lat 0.159 ms stddev 0.264
progress: 3360.0 s, 50457.6 tps, lat 0.159 ms stddev 0.265
progress: 3420.0 s, 50178.3 tps, lat 0.159 ms stddev 0.390
progress: 3480.0 s, 50353.3 tps, lat 0.159 ms stddev 0.264
progress: 3540.0 s, 50133.0 tps, lat 0.160 ms stddev 0.259
progress: 3600.0 s, 50217.6 tps, lat 0.159 ms stddev 0.282
progress: 3660.0 s, 50147.8 tps, lat 0.160 ms stddev 0.279
progress: 3720.0 s, 50578.0 tps, lat 0.158 ms stddev 0.261
progress: 3780.0 s, 50523.9 tps, lat 0.158 ms stddev 0.261
progress: 3840.0 s, 50349.7 tps, lat 0.159 ms stddev 0.269
progress: 3900.0 s, 50437.5 tps, lat 0.159 ms stddev 0.287
progress: 3960.0 s, 49569.1 tps, lat 0.161 ms stddev 0.311
progress: 4020.0 s, 50337.3 tps, lat 0.159 ms stddev 0.286
progress: 4080.0 s, 50065.2 tps, lat 0.160 ms stddev 0.260
progress: 4140.0 s, 49791.4 tps, lat 0.161 ms stddev 0.264
progress: 4200.0 s, 49681.3 tps, lat 0.161 ms stddev 0.326
progress: 4260.0 s, 48034.2 tps, lat 0.167 ms stddev 0.340
progress: 4320.0 s, 50460.9 tps, lat 0.159 ms stddev 0.290
progress: 4380.0 s, 49731.0 tps, lat 0.161 ms stddev 0.300
progress: 4440.0 s, 50441.1 tps, lat 0.159 ms stddev 0.266
progress: 4500.0 s, 50259.1 tps, lat 0.159 ms stddev 0.267
progress: 4560.0 s, 50197.0 tps, lat 0.159 ms stddev 0.259
progress: 4620.0 s, 49832.1 tps, lat 0.161 ms stddev 0.271
progress: 4680.0 s, 50160.3 tps, lat 0.159 ms stddev 0.268
progress: 4740.0 s, 50578.5 tps, lat 0.158 ms stddev 0.262
progress: 4800.0 s, 50260.5 tps, lat 0.159 ms stddev 0.270
progress: 4860.0 s, 50545.1 tps, lat 0.158 ms stddev 0.256
progress: 4920.0 s, 49961.8 tps, lat 0.160 ms stddev 0.285
progress: 4980.0 s, 50450.7 tps, lat 0.159 ms stddev 0.267
progress: 5040.0 s, 50008.7 tps, lat 0.160 ms stddev 0.257
progress: 5100.0 s, 49723.2 tps, lat 0.161 ms stddev 0.428
progress: 5160.0 s, 50211.1 tps, lat 0.159 ms stddev 0.378
progress: 5220.0 s, 49791.0 tps, lat 0.161 ms stddev 0.262
progress: 5280.0 s, 50091.5 tps, lat 0.160 ms stddev 0.265
progress: 5340.0 s, 50280.3 tps, lat 0.159 ms stddev 0.266
progress: 5400.0 s, 49957.1 tps, lat 0.160 ms stddev 0.402
progress: 5460.0 s, 50101.7 tps, lat 0.160 ms stddev 0.293
progress: 5520.0 s, 50387.3 tps, lat 0.159 ms stddev 0.261
progress: 5580.0 s, 50173.2 tps, lat 0.159 ms stddev 0.278
progress: 5640.0 s, 50225.1 tps, lat 0.159 ms stddev 0.266
progress: 5700.0 s, 50186.7 tps, lat 0.159 ms stddev 0.264
progress: 5760.0 s, 49120.1 tps, lat 0.163 ms stddev 0.264
progress: 5820.0 s, 49445.3 tps, lat 0.162 ms stddev 0.262
progress: 5880.0 s, 49547.2 tps, lat 0.161 ms stddev 0.260
progress: 5940.0 s, 49730.2 tps, lat 0.161 ms stddev 0.267
progress: 6000.0 s, 50543.3 tps, lat 0.158 ms stddev 0.258
progress: 6060.0 s, 50569.1 tps, lat 0.158 ms stddev 0.258
progress: 6120.0 s, 50560.4 tps, lat 0.158 ms stddev 0.250
progress: 6180.0 s, 50572.1 tps, lat 0.158 ms stddev 0.255
progress: 6240.0 s, 50551.6 tps, lat 0.158 ms stddev 0.253
progress: 6300.0 s, 50993.8 tps, lat 0.157 ms stddev 0.254
progress: 6360.0 s, 51114.6 tps, lat 0.157 ms stddev 0.257
progress: 6420.0 s, 50762.6 tps, lat 0.158 ms stddev 0.261
progress: 6480.0 s, 50814.2 tps, lat 0.157 ms stddev 0.250
progress: 6540.0 s, 50735.7 tps, lat 0.158 ms stddev 0.257
progress: 6600.0 s, 50521.9 tps, lat 0.158 ms stddev 0.263
progress: 6660.0 s, 50829.8 tps, lat 0.157 ms stddev 0.252
progress: 6720.0 s, 50933.7 tps, lat 0.157 ms stddev 0.255
progress: 6780.0 s, 51152.9 tps, lat 0.156 ms stddev 0.252
progress: 6840.0 s, 50457.1 tps, lat 0.159 ms stddev 0.255
progress: 6900.0 s, 50542.5 tps, lat 0.158 ms stddev 0.251
progress: 6960.0 s, 50802.5 tps, lat 0.157 ms stddev 0.282
progress: 7020.0 s, 51066.1 tps, lat 0.157 ms stddev 0.270
progress: 7080.0 s, 50729.0 tps, lat 0.158 ms stddev 0.258
progress: 7140.0 s, 51242.7 tps, lat 0.156 ms stddev 0.255
progress: 7200.0 s, 50714.2 tps, lat 0.158 ms stddev 0.253
progress: 7260.0 s, 50704.7 tps, lat 0.158 ms stddev 0.323
progress: 7320.0 s, 50951.4 tps, lat 0.157 ms stddev 0.250
progress: 7380.0 s, 50604.8 tps, lat 0.158 ms stddev 0.340
progress: 7440.0 s, 51217.0 tps, lat 0.156 ms stddev 0.251
progress: 7500.0 s, 50832.4 tps, lat 0.157 ms stddev 0.251
progress: 7560.0 s, 50815.7 tps, lat 0.157 ms stddev 0.256
progress: 7620.0 s, 49936.3 tps, lat 0.160 ms stddev 0.463
progress: 7680.0 s, 50556.4 tps, lat 0.158 ms stddev 0.251
progress: 7740.0 s, 50699.8 tps, lat 0.158 ms stddev 0.255
progress: 7800.0 s, 50474.0 tps, lat 0.158 ms stddev 0.256
progress: 7860.0 s, 50401.9 tps, lat 0.159 ms stddev 0.321
progress: 7920.0 s, 51385.5 tps, lat 0.156 ms stddev 0.251
progress: 7980.0 s, 50671.9 tps, lat 0.158 ms stddev 0.248
progress: 8040.0 s, 50631.3 tps, lat 0.158 ms stddev 0.252
progress: 8100.0 s, 50741.3 tps, lat 0.158 ms stddev 0.255
progress: 8160.0 s, 50465.5 tps, lat 0.159 ms stddev 0.257
progress: 8220.0 s, 51384.9 tps, lat 0.156 ms stddev 0.212
progress: 8280.0 s, 51676.8 tps, lat 0.155 ms stddev 0.233
progress: 8340.0 s, 52049.3 tps, lat 0.154 ms stddev 0.179
progress: 8400.0 s, 52394.6 tps, lat 0.153 ms stddev 0.178
progress: 8460.0 s, 51872.3 tps, lat 0.154 ms stddev 0.192
progress: 8520.0 s, 51896.9 tps, lat 0.154 ms stddev 0.178
progress: 8580.0 s, 52187.5 tps, lat 0.153 ms stddev 0.178
progress: 8640.0 s, 51773.8 tps, lat 0.155 ms stddev 0.185
progress: 8700.0 s, 51862.7 tps, lat 0.154 ms stddev 0.184
progress: 8760.0 s, 51906.7 tps, lat 0.154 ms stddev 0.189
progress: 8820.0 s, 52038.6 tps, lat 0.154 ms stddev 0.189
progress: 8880.0 s, 52370.9 tps, lat 0.153 ms stddev 0.180
progress: 8940.0 s, 52155.2 tps, lat 0.153 ms stddev 0.185
progress: 9000.0 s, 52162.2 tps, lat 0.153 ms stddev 0.219
progress: 9060.0 s, 51900.7 tps, lat 0.154 ms stddev 0.189
progress: 9120.0 s, 51794.5 tps, lat 0.154 ms stddev 0.178
progress: 9180.0 s, 51885.1 tps, lat 0.154 ms stddev 0.180
progress: 9240.0 s, 52007.4 tps, lat 0.154 ms stddev 0.177
progress: 9300.0 s, 52168.3 tps, lat 0.153 ms stddev 0.189
progress: 9360.0 s, 51801.3 tps, lat 0.154 ms stddev 0.182
progress: 9420.0 s, 51899.8 tps, lat 0.154 ms stddev 0.173
progress: 9480.0 s, 52215.3 tps, lat 0.153 ms stddev 0.177
progress: 9540.0 s, 52129.9 tps, lat 0.153 ms stddev 0.227
progress: 9600.0 s, 51929.9 tps, lat 0.154 ms stddev 0.176
progress: 9660.0 s, 51776.6 tps, lat 0.155 ms stddev 0.181
progress: 9720.0 s, 51542.8 tps, lat 0.155 ms stddev 0.268
progress: 9780.0 s, 51352.5 tps, lat 0.156 ms stddev 0.310
progress: 9840.0 s, 51661.0 tps, lat 0.155 ms stddev 0.177
progress: 9900.0 s, 51219.8 tps, lat 0.156 ms stddev 0.260
progress: 9960.0 s, 52235.8 tps, lat 0.153 ms stddev 0.184
progress: 10020.0 s, 51904.5 tps, lat 0.154 ms stddev 0.186
progress: 10080.0 s, 52066.8 tps, lat 0.154 ms stddev 0.183
progress: 10140.0 s, 52212.5 tps, lat 0.153 ms stddev 0.209
progress: 10200.0 s, 52402.8 tps, lat 0.153 ms stddev 0.186
progress: 10260.0 s, 52194.8 tps, lat 0.153 ms stddev 0.178
progress: 10320.0 s, 52117.3 tps, lat 0.153 ms stddev 0.179
progress: 10380.0 s, 52060.5 tps, lat 0.154 ms stddev 0.181
progress: 10440.0 s, 52043.1 tps, lat 0.154 ms stddev 0.183
progress: 10500.0 s, 52358.5 tps, lat 0.153 ms stddev 0.178
progress: 10560.0 s, 51361.2 tps, lat 0.156 ms stddev 0.179
progress: 10620.0 s, 51655.7 tps, lat 0.155 ms stddev 0.184
progress: 10680.0 s, 51501.1 tps, lat 0.155 ms stddev 0.176
progress: 10740.0 s, 51315.4 tps, lat 0.156 ms stddev 0.182
progress: 10800.0 s, 52344.5 tps, lat 0.153 ms stddev 0.181
progress: 10860.0 s, 51687.0 tps, lat 0.155 ms stddev 0.182
progress: 10920.0 s, 51150.2 tps, lat 0.156 ms stddev 0.179
progress: 10980.0 s, 51341.9 tps, lat 0.156 ms stddev 0.181
progress: 11040.0 s, 51074.0 tps, lat 0.157 ms stddev 0.184
progress: 11100.0 s, 51645.4 tps, lat 0.155 ms stddev 0.277
progress: 11160.0 s, 51984.6 tps, lat 0.154 ms stddev 0.182
progress: 11220.0 s, 51993.4 tps, lat 0.154 ms stddev 0.181
progress: 11280.0 s, 51894.9 tps, lat 0.154 ms stddev 0.183
progress: 11340.0 s, 52007.2 tps, lat 0.154 ms stddev 0.186
progress: 11400.0 s, 52255.9 tps, lat 0.153 ms stddev 0.188
progress: 11460.0 s, 52221.2 tps, lat 0.153 ms stddev 0.183
progress: 11520.0 s, 51990.5 tps, lat 0.154 ms stddev 0.181
progress: 11580.0 s, 52138.3 tps, lat 0.153 ms stddev 0.178
progress: 11640.0 s, 51258.2 tps, lat 0.156 ms stddev 0.178
progress: 11700.0 s, 52268.0 tps, lat 0.153 ms stddev 0.179
progress: 11760.0 s, 51838.4 tps, lat 0.154 ms stddev 0.180
progress: 11820.0 s, 52079.2 tps, lat 0.154 ms stddev 0.185
progress: 11880.0 s, 52531.4 tps, lat 0.152 ms stddev 0.175
progress: 11940.0 s, 51711.7 tps, lat 0.155 ms stddev 0.260
progress: 12000.0 s, 52020.7 tps, lat 0.154 ms stddev 0.180
progress: 12060.0 s, 51990.1 tps, lat 0.154 ms stddev 0.241
progress: 12120.0 s, 51367.6 tps, lat 0.156 ms stddev 0.193
progress: 12180.0 s, 51316.0 tps, lat 0.156 ms stddev 0.187
progress: 12240.0 s, 50819.0 tps, lat 0.157 ms stddev 0.190
progress: 12300.0 s, 51614.6 tps, lat 0.155 ms stddev 0.182
progress: 12360.0 s, 51500.6 tps, lat 0.155 ms stddev 0.187
progress: 12420.0 s, 51715.8 tps, lat 0.155 ms stddev 0.259
progress: 12480.0 s, 51805.0 tps, lat 0.154 ms stddev 0.183
progress: 12540.0 s, 51846.2 tps, lat 0.154 ms stddev 0.183
progress: 12600.0 s, 51829.6 tps, lat 0.154 ms stddev 0.179
progress: 12660.0 s, 51411.1 tps, lat 0.156 ms stddev 0.221
progress: 12720.0 s, 51728.5 tps, lat 0.155 ms stddev 0.187
progress: 12780.0 s, 52170.5 tps, lat 0.153 ms stddev 0.181
progress: 12840.0 s, 51985.8 tps, lat 0.154 ms stddev 0.182
progress: 12900.0 s, 51723.3 tps, lat 0.155 ms stddev 0.181
progress: 12960.0 s, 52324.2 tps, lat 0.153 ms stddev 0.184
progress: 13020.0 s, 51795.5 tps, lat 0.154 ms stddev 0.189
progress: 13080.0 s, 51547.3 tps, lat 0.155 ms stddev 0.373
progress: 13140.0 s, 51592.4 tps, lat 0.155 ms stddev 0.231
progress: 13200.0 s, 52004.8 tps, lat 0.154 ms stddev 0.223
progress: 13260.0 s, 52118.3 tps, lat 0.153 ms stddev 0.184
progress: 13320.0 s, 51781.1 tps, lat 0.154 ms stddev 0.183
progress: 13380.0 s, 50621.8 tps, lat 0.158 ms stddev 0.196
progress: 13440.0 s, 50671.8 tps, lat 0.158 ms stddev 0.321
progress: 13500.0 s, 51389.4 tps, lat 0.156 ms stddev 0.181
progress: 13560.0 s, 51653.2 tps, lat 0.155 ms stddev 0.186
progress: 13620.0 s, 51491.7 tps, lat 0.155 ms stddev 0.181
progress: 13680.0 s, 51639.4 tps, lat 0.155 ms stddev 0.183
progress: 13740.0 s, 51385.3 tps, lat 0.156 ms stddev 0.188
progress: 13800.0 s, 51640.2 tps, lat 0.155 ms stddev 0.177
progress: 13860.0 s, 51820.7 tps, lat 0.154 ms stddev 0.176
progress: 13920.0 s, 52041.9 tps, lat 0.154 ms stddev 0.179
progress: 13980.0 s, 52001.0 tps, lat 0.154 ms stddev 0.178
progress: 14040.0 s, 52189.3 tps, lat 0.153 ms stddev 0.176
progress: 14100.0 s, 51897.9 tps, lat 0.154 ms stddev 0.179
progress: 14160.0 s, 52066.4 tps, lat 0.154 ms stddev 0.180
progress: 14220.0 s, 52257.9 tps, lat 0.153 ms stddev 0.178
progress: 14280.0 s, 51850.1 tps, lat 0.154 ms stddev 0.210
......
.........
...........
SESSION 3:
Now in this third session we will monitor the database to see what’s going on. You can do something like below to check where we are going.
Below SQL will show you the ‘datfrozenxid‘ which is a value for all transaction IDs before this one have been replaced with a permanent (“frozen”) transaction ID in this database. This is used to track whether the database needs to be vacuumed in order to prevent transaction ID wraparound. And to see ‘autovacuum_freeze_max_age‘ that is age at which to autovacuum a table to prevent transaction ID wraparound.
Do a watch (\watch 10) on the command to see the progress and the play live 🙂
SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
datname | age | current_setting
--------------+---------+-----------------
postgres | 3058531 | 200000000
edb | 3058531 | 200000000
template1 | 3058531 | 200000000
template0 | 3058531 | 200000000
immortal | 3058531 | 200000000
darkthrone | 3058531 | 200000000
immortal1 | 3058531 | 200000000
mayhem | 3058531 | 200000000
marduk | 3058531 | 200000000
burzum | 3058531 | 200000000
gorgoroth | 3058531 | 200000000
repmgr | 3058531 | 200000000
enterprisedb | 3058531 | 200000000
(13 rows)
I have used below query to check where we are going, current XIDs, how close we are to our target of wraparound ..
enterprisedb=# WITH max_age AS (
SELECT 2000000000 as max_old_xid
, setting AS autovacuum_freeze_max_age
FROM pg_catalog.pg_settings
WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS (
SELECT datname
, m.max_old_xid::int
, m.autovacuum_freeze_max_age::int
, age(d.datfrozenxid) AS oldest_current_xid
FROM pg_catalog.pg_database d
JOIN max_age m ON (true)
WHERE d.datallowconn )
SELECT max(oldest_current_xid) AS oldest_current_xid
, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound
, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac
FROM per_database_stats
\watch 10
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
289007513 | 14 | 145
(1 row)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
290077770 | 15 | 145
(1 row)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
292467983 | 15 | 146
(1 row)
Mon 19 Jul 2021 08:38:01 PM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
342714029 | 17 | 171
(1 row)
Mon 19 Jul 2021 08:38:11 PM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
343223530 | 17 | 172
(1 row)
Mon 19 Jul 2021 09:17:45 PM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
406185619 | 20 | 203
(1 row)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
421779698 | 21 | 211
(1 row)
Mon 19 Jul 2021 09:28:10 PM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
438446236 | 22 | 219
(1 row)
Mon 19 Jul 2021 11:03:03 PM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
733479765 | 37 | 367
(1 row)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
763852636 | 38 | 382
(1 row)
Mon 19 Jul 2021 11:33:03 PM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
826485794 | 41 | 413
(1 row)
Tue 20 Jul 2021 12:18:07 AM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
966692697 | 48 | 483
(1 row)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
1107823196 | 55 | 554
Tue 20 Jul 2021 01:09:18 AM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
1125393734 | 56 | 563
(1 row)
Tue 20 Jul 2021 01:43:11 AM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
1230984982 | 62 | 615
(1 row)
-- LETS SEE WHAT'S HAPPENING ON THE DATABASE
enterprisedb=# \x
Expanded display is on.
enterprisedb=#
enterprisedb=# select * from pg_stat_activity;
-[ RECORD 4 ]----+---------------------------------
datid | 16384
datname | enterprisedb
pid | 24056
usesysid | 10
usename | enterprisedb
application_name | psql.bin
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:45:44.833016 +05:30
xact_start | 20-JUL-21 09:45:57.749892 +05:30
query_start | 20-JUL-21 09:46:01.042977 +05:30
state_change | 20-JUL-21 09:46:01.047161 +05:30
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid | 1195
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 5 ]----+---------------------------------
datid | 16384
datname | enterprisedb
pid | 25884
usesysid | 10
usename | enterprisedb
application_name | psql.bin
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:47:36.766729 +05:30
xact_start | 20-JUL-21 09:52:18.941776 +05:30
query_start | 20-JUL-21 09:52:18.941776 +05:30
state_change | 20-JUL-21 09:52:18.941778 +05:30
wait_event_type |
wait_event |
state | active
backend_xid |
backend_xmin | 1195
query | select * from pg_stat_activity;
backend_type | client backend
-[ RECORD 6 ]----+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27150
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.70284 +05:30
xact_start | 20-JUL-21 09:52:18.941429 +05:30
query_start | 20-JUL-21 09:52:18.941429 +05:30
state_change | 20-JUL-21 09:52:18.94143 +05:30
wait_event_type | IPC
wait_event | ClogGroupUpdate
state | active
backend_xid | 4526420
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 7 ]----+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27151
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.704062 +05:30
xact_start |
query_start | 20-JUL-21 09:52:18.943293 +05:30
state_change | 20-JUL-21 09:52:18.943411 +05:30
wait_event_type | Client
wait_event | ClientRead
state | idle
backend_xid |
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 8 ]----+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27153
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.7067 +05:30
xact_start |
query_start | 20-JUL-21 09:52:18.943218 +05:30
state_change | 20-JUL-21 09:52:18.943334 +05:30
wait_event_type |
wait_event |
state | idle
backend_xid |
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 9 ]----+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27154
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.707948 +05:30
xact_start | 20-JUL-21 09:52:18.941116 +05:30
query_start | 20-JUL-21 09:52:18.941116 +05:30
state_change | 20-JUL-21 09:52:18.941117 +05:30
wait_event_type | IPC
wait_event | ClogGroupUpdate
state | active
backend_xid | 4526414
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 10 ]---+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27152
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.710155 +05:30
xact_start | 20-JUL-21 09:52:18.943403 +05:30
query_start | 20-JUL-21 09:52:18.943403 +05:30
state_change | 20-JUL-21 09:52:18.943405 +05:30
wait_event_type |
wait_event |
state | active
backend_xid |
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 11 ]---+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27155
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.710785 +05:30
xact_start | 20-JUL-21 09:52:18.941493 +05:30
query_start | 20-JUL-21 09:52:18.941493 +05:30
state_change | 20-JUL-21 09:52:18.941494 +05:30
wait_event_type | IPC
wait_event | ClogGroupUpdate
state | active
backend_xid | 4526421
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 12 ]---+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27156
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.711762 +05:30
xact_start |
query_start | 20-JUL-21 09:52:18.942582 +05:30
state_change | 20-JUL-21 09:52:18.942704 +05:30
wait_event_type | Client
wait_event | ClientRead
state | idle
backend_xid |
backend_xmin |
query | select txid_current();
backend_type | client backend
-[ RECORD 13 ]---+---------------------------------
datid | 16384
datname | enterprisedb
pid | 27157
usesysid | 10
usename | enterprisedb
application_name | pgbench
client_addr |
client_hostname |
client_port | -1
backend_start | 20-JUL-21 09:49:56.711944 +05:30
xact_start | 20-JUL-21 09:52:18.943345 +05:30
query_start | 20-JUL-21 09:52:18.943345 +05:30
state_change | 20-JUL-21 09:52:18.943346 +05:30
wait_event_type | Client
wait_event | ClientRead
state | active
backend_xid | 4526447
backend_xmin | 1195
query | select txid_current();
backend_type | client backend
-[ RECORD 14 ]---+---------------------------------
SESSION 4:
Open another session to check log files, if they are reporting any hints, warnings or errors
After ~1 hour of test run, I have started seeing below warnings in the log files.
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST WARNING: oldest xmin is far in the past
2021-07-19 20:15:26 IST HINT: Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST WARNING: oldest xmin is far in the past
2021-07-19 20:15:26 IST HINT: Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
2021-07-19 20:15:26 IST LOG: statement: select txid_current();
After ~ 3 hours of test run, the warnings becoming more prominent.
-bash-4.2$ grep -Rinw /var/lib/edb/as11/data/log/ -e 'wraparound'
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:17711:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:20705:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21160:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21650:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22179:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22694:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23229:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23776:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24316:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24840:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25391:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25927:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:27533:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:28135:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:32986:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:33581:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:17711:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:20705:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21160:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:21650:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22179:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:22694:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23229:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:23776:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24316:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:24840:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25391:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:25927:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:27533:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:28135:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:32986:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
/var/lib/edb/as11/data/log/edb-2021-07-19_195300.log:33581:2021-07-19 19:53:01 IST HINT: Close open transactions soon to avoid wraparound problems.
Now, let’s monitor system stats too to see overall load, processes their commands etc.
top - 00:19:39 up 159 days, 7:11, 6 users, load average: 10.31, 9.97, 9.95
Tasks: 399 total, 9 running, 390 sleeping, 0 stopped, 0 zombie
%Cpu(s): 60.0 us, 38.4 sy, 0.0 ni, 1.6 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32780168 total, 14568928 free, 1259976 used, 16951264 buff/cache
KiB Swap: 16777212 total, 16685044 free, 92168 used. 23083820 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21831 enterpr+ 20 0 591764 5228 2152 S 16.8 0.0 417:57.53 /usr/edb/as11/bin/pgbench -U enterprisedb enterprisedb -f wl.sql -c8 -j8 -T100000 -rn -P+
21845 enterpr+ 20 0 8667644 24604 22256 R 9.7 0.1 209:53.07 postgres: enterprisedb enterprisedb [local] SELECT
21840 enterpr+ 20 0 8667644 24584 22244 R 9.4 0.1 210:36.23 postgres: enterprisedb enterprisedb [local] idle
21844 enterpr+ 20 0 8667644 24876 22528 R 9.4 0.1 209:38.54 postgres: enterprisedb enterprisedb [local] idle
21841 enterpr+ 20 0 8667644 24836 22496 R 8.8 0.1 209:15.88 postgres: enterprisedb enterprisedb [local] SELECT
21847 enterpr+ 20 0 8667644 24600 22252 S 8.8 0.1 206:16.87 postgres: enterprisedb enterprisedb [local] idle
21846 enterpr+ 20 0 8667644 24604 22256 S 8.5 0.1 210:41.59 postgres: enterprisedb enterprisedb [local] idle
20421 enterpr+ 20 0 259688 2124 632 R 7.1 0.0 146:27.66 postgres: logger
21843 enterpr+ 20 0 8667644 24588 22248 R 6.6 0.1 210:05.38 postgres: enterprisedb enterprisedb [local] idle
21842 enterpr+ 20 0 8667644 24600 22260 R 6.6 0.1 209:48.66 postgres: enterprisedb enterprisedb [local] idle
20426 enterpr+ 20 0 8666684 18736 17208 R 4.0 0.1 102:43.53 postgres: walwriter
6858 polkitd 20 0 620236 18044 4428 S 0.9 0.1 15940:23 /usr/lib/polkit-1/polkitd --no-debug
SESSION 2:
To my surprise, after ~8 hours of test run, the session screen where I have pgbench running, started throwing some WARNINGs to perform vacuum within specified number of transactions to avoid database shutdown. It kept on throwing those direct warnings for almost an hour before it stopped processing pgbench requests and started aborting workers with a message ‘ERROR: database is not accepting commands to avoid wraparound data loss in database “postgres”.
In the end it will present you the overall pgbench stats where it fired 2146483646 statements in total.
WARNING: database "postgres" must be vacuumed within 5092920 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 5092920 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 5092917 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
....
......
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 3178805 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 3178808 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 3178804 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 3178803 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 3178801 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 3178799 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
......
........
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 1000002 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 1000002 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 0 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 1 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 7 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 6 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 2 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 4 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
WARNING: database "postgres" must be vacuumed within 1000005 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 3 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
client 5 aborted in command 0 (SQL) of script 0; ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
transaction type: wl.sql
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 8
duration: 100000 s
number of transactions actually processed: 2146483646
latency average = 0.159 ms
latency stddev = 1.102 ms
tps = 50437.366776 (including connections establishing)
tps = 50437.386645 (excluding connections establishing)
statement latencies in milliseconds:
0.171 select txid_current();
You have new mail in /var/spool/mail/enterprisedb
-bash-4.2$
SESSION 3:
Lets check our screen 3 to see where we are in terms on oldest current XID and percent towards wraparound, and I see XID numbers are not increasing and we have reached 107% towards wraparound (strange, why it goes beyond 100%).
enterprisedb=# WITH max_age AS (
SELECT 2000000000 as max_old_xid
, setting AS autovacuum_freeze_max_age
FROM pg_catalog.pg_settings
WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS (
SELECT datname
, m.max_old_xid::int
, m.autovacuum_freeze_max_age::int
, age(d.datfrozenxid) AS oldest_current_xid
FROM pg_catalog.pg_database d
JOIN max_age m ON (true)
WHERE d.datallowconn )
SELECT max(oldest_current_xid) AS oldest_current_xid
, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound
, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac
FROM per_database_stats;
(1 row)
Tue 20 Jul 2021 06:57:51 AM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
2146483647 | 107 | 1073
(1 row)
Tue 20 Jul 2021 06:58:01 AM IST (every 10s)
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
2146483647 | 107 | 1073
(1 row)
enterprisedb=# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
datname | age | current_setting
--------------+------------+-----------------
postgres | 2146483647 | 200000000
edb | 2146483647 | 200000000
template1 | 2146483647 | 200000000
template0 | 2146483647 | 200000000
immortal | 2146483647 | 200000000
darkthrone | 2146483647 | 200000000
immortal1 | 2146483647 | 200000000
mayhem | 2146483647 | 200000000
murduk | 2146483647 | 200000000
burzum | 2146483647 | 200000000
gorgoroth | 2146483647 | 200000000
repmgr | 2146483647 | 200000000
enterprisedb | 2146483647 | 200000000
(13 rows)
Let’s connect with the database and see what happens when we try to do both SELECT & UPDATE statements …
enterprisedb=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+--------------+---------+-------------
public | pgbench_accounts | table | enterprisedb | 13 MB |
public | pgbench_branches | table | enterprisedb | 40 kB |
public | pgbench_history | table | enterprisedb | 40 kB |
public | pgbench_tellers | table | enterprisedb | 40 kB |
public | scale_data | table | enterprisedb | 6720 MB |
(5 rows)
enterprisedb=# select * from pgbench_history;
tid | bid | aid | delta | mtime | filler
-----+-----+-------+-------+---------------------------+--------
2 | 1 | 64444 | -1096 | 19-JUL-21 18:41:15.655738 |
3 | 1 | 18460 | -4688 | 19-JUL-21 18:41:15.668699 |
2 | 1 | 92762 | -4669 | 19-JUL-21 18:41:15.670741 |
2 | 1 | 55081 | 1817 | 19-JUL-21 18:41:15.672719 |
5 | 1 | 47743 | 803 | 19-JUL-21 18:41:15.674666 |
5 | 1 | 23899 | -3910 | 19-JUL-21 18:41:15.67664 |
10 | 1 | 33384 | 4005 | 19-JUL-21 18:41:15.678631 |
6 | 1 | 88383 | -4257 | 19-JUL-21 18:41:15.680629 |
8 | 1 | 87562 | 200 | 19-JUL-21 18:41:15.682612 |
5 | 1 | 3957 | -2407 | 19-JUL-21 18:41:15.684595 |
(10 rows)
enterprisedb=#
enterprisedb=# update pgbench_history set aid=65555 where aid=64444;
ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
enterprisedb=#
enterprisedb=#
enterprisedb=#
Alright, so with transaction wraparound is already there, we only be able to run SELECT queries, but DMLS will fail with a message printed to stop the database and vacuum it in single-user mode.
So, what we have learned
- Database will keep on running and accepting READ ONLY (SELECT) requests, error will be thrown in case if DMLs tried.
- 2,000,000,000 seems a huge number, but in case of busy transactional database, it’s just a matter of few hours.
- AUTOVACUUMING is good thing, at least for this scenario, let’s not discuss about its bugs and other problems.
- Add VACUUMING a routine activity else you will soon find yourself in a big mess.
The easiest way to get the highest TXID age value back down is to force a vacuum on the entire database cluster. And the best way to do this cluster-wide vacuum is the vacuumdb binary utility that comes with PostgreSQL. I will cover it in my next post as this post is all about simulating wraparounds.
Hope It Helped!
Prashant Dixit
Oraclouddba-Nataraj said
Excellent post.
How to fix/handle Wraparound problem in PostgreSQL ? « Tales From A Lazy Fat DBA said
[…] « How to simulate the deadly, infamous, misunderstood & complex ‘Transaction Wraparound Prob… […]
Nik said
Good post. Worth mentioning where the idea came from – it was discussed in this Telegram group: https://t.me/postgreschat/10114
Nik said
+ Twitter discussion here (alternatives mentioned): https://twitter.com/samokhvalov/status/1415575072081809409
Hanu said
Excellent post. Can you plz do post on minor upgrade using tar ball installation and edb installation with RPM’s without local repos and how to solve dependencies using RPM installation
Amitabh said
Very Nice, yes, its the only post for simulating TX id wraparound.