Fragmentation Or Bloating in PostgreSQL – How to identify and fix it using DB Vacuuming

Posted by FatDBA on September 4, 2018

Hey Folks,

Back with another post on PostgreSQL. This time related with table fragmentation (Bloating in PG) on how to identify it and fix it using Vacuuming.

Okay, so we have this table of size 995 MBs with close to 20000000 rows and the DB (postgres default db) size is of 2855 MBs.

Now lets do some DMLs to create the scenario.

postgres=# delete from large_test where num3 >=90; DELETE 7324451 postgres=# postgres=# INSERT INTO large_test (num1, num2, num3) postgres-# SELECT round(random()*10), random(), random()*142 postgres-# FROM generate_series(1, 2000) s(i); INSERT 0 2000 postgres=# delete from large_test where num3 >=90; DELETE 729 postgres=# INSERT INTO large_test (num1, num2, num3) postgres-# SELECT round(random()*10), random(), random()*142 postgres-# FROM generate_series(1, 9000) s(i); INSERT 0 9000 postgres=# postgres=# delete from large_test where num1 < 8; DELETE 9514961 postgres=# delete from large_test where num1 < 10; DELETE 2536447

Okay now with all those DMLs i am sure we will have enough of fragmentation in the database, lets check.



postgres=#
postgres=# \dt+ large_test
                       List of relations
 Schema |    Name    | Type  |  Owner   |  Size  | Description
--------+------------+-------+----------+--------+-------------
 public | large_test | table | postgres | 996 MB  |
(1 row)

Okay, the size of the table is almost the same what it was before all the DMLs. Lets see if there is any fragmentation in the database, for this i have used below custimized statement, you can also use pg_class table to get basic details on fragmentation.



SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
  ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
  CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
  CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
  SELECT
    schemaname, tablename, cc.reltuples, cc.relpages, bs,
    CEIL((cc.reltuples*((datahdr+ma-
      (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
    COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
    COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS iotta -- very rough approximation, assumes all cols
  FROM (
    SELECT
      ma,bs,schemaname,tablename,
      (datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
      (maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
    FROM (
      SELECT
        schemaname, tablename, hdr, ma, bs,
        SUM((1-null_frac)*avg_width) AS datawidth,
        MAX(null_frac) AS maxfracsum,
        hdr+(
          SELECT 1+COUNT(*)/8
          FROM pg_stats s2
          WHERE null_frac0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
        ) AS nullhdr
      FROM pg_stats s, (
        SELECT
          (SELECT current_setting('block_size')::NUMERIC) AS bs,
          CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
          CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
        FROM (SELECT version() AS v) AS foo
      ) AS constants
      GROUP BY 1,2,3,4,5
    ) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname  'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC;

Okay so below results which we have captured clearly shows that there is a fragmentation (Look at wastedbytes column)

Now when the fragmentation is clear, lets do the shrink or space reclaim using VACUUM. I will first try with ‘BASIC VACUUM’ and ANALYZE the table at the same time to make optimizer statistics up-to-date.

To remove dead tuples, vacuum processing provides two modes, i.e. Concurrent VACUUM and Full VACUUM. Concurrent VACUUM, often simply called VACUUM, removes dead tuples for each page of the table file, and other transactions can read the table while this process is running. In contrast, Full VACUUM removes dead tuples and defragments live tuples the whole file, and other transactions cannot access tables while Full VACUUM is running.



postgres=#
postgres=# VACUUM (VERBOSE, ANALYZE) large_test; 
INFO:  vacuuming "public.large_test"
INFO:  index "idxlargetest" now contains 634412 row versions in 54840 pages
DETAIL:  0 index row versions were removed.
19811 index pages have been deleted, 13985 are currently reusable.
CPU 0.49s/0.03u sec elapsed 2.93 sec.
INFO:  index "idxlargetest1" now contains 634412 row versions in 54883 pages
DETAIL:  0 index row versions were removed.
52137 index pages have been deleted, 0 are currently reusable.
CPU 0.23s/0.03u sec elapsed 0.26 sec.
INFO:  "large_test": found 0 removable, 7 nonremovable row versions in 1 out of 127459 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 87 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.72s/0.07u sec elapsed 3.21 sec.
INFO:  analyzing "public.large_test"
INFO:  "large_test": scanned 30000 of 127459 pages, containing 149103 live rows and 0 dead rows; 30000 rows in sample, 633484 estimated total rows
VACUUM
postgres=#

As discussed and expected we see no change in wasted space and fragmentation still exists, see below result which matches the same what we have collected before the BASIC VACUUM.



 current_database | schemaname |    tablename     | tbloat | wastedbytes |               iname               | ibloat | wastedibytes
------------------+------------+------------------+--------+-------------+-----------------------------------+--------+--------------
 postgres         | public     | large_test       |   31.6 |  1011122176 | idxlargetest                      |   19.6 |    426352640
 postgres         | public     | large_test       |   31.6 |  1011122176 | idxlargetest1                     |   19.6 |    426704896

Okay, so time to test the FULL VACUUM which reclaims more space and does the real action of freeing up the space than plain or basic VACUUM but the only issue with it it locks the database table.



postgres=# VACUUM (FULL, VERBOSE, ANALYZE) large_test;
INFO:  vacuuming "public.large_test"
INFO:  "large_test": found 0 removable, 634412 nonremovable row versions in 127459 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 1.48s/0.50u sec elapsed 12.28 sec.
INFO:  analyzing "public.large_test"
INFO:  "large_test": scanned 4041 of 4041 pages, containing 634412 live rows and 0 dead rows; 30000 rows in sample, 634412 estimated total rows
VACUUM

Now lets see if there is any change in fragmentation levels.

Yup, this time it worked after FULL VACUUMing of the database and now there isn’t any wasted or fragmented space exists in the table.

Hope It Helps!
Prashant Dixit

This entry was posted on September 4, 2018 at 2:45 PM and is filed under Advanced. Tagged: postgres. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to “Fragmentation Or Bloating in PostgreSQL – How to identify and fix it using DB Vacuuming”

Hanu said

July 11, 2022 at 8:46 AM
The query who mentioned to check bloating is tested regarding performance or not.I mean we can use this in prod environment

Reply
- FatDBA said
  
  August 2, 2022 at 2:14 PM
  Yes, its well tested. Go ahead and try that.
  
  Reply
Hanu said

September 10, 2022 at 3:42 AM
Thx for your response.Hope your are doing well.
I have doubt on below queries.Can you please do clear on this.

1.How to fix the rebuild index fails.
2.How to decide which rpms required/download for entire postgres/edb postgres software installation from postgres community/edb site.

Awaiting for your response!!!!

Reply

Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes