Logo ← PostgreSQL Blog

Forget Everything You Know About PostgreSQL Maintenance

A practical PostgreSQL maintenance guide for production environments, covering instance, database, and table-level operations with…

Forget Everything You Know About PostgreSQL Maintenance

A practical PostgreSQL maintenance guide for production environments, covering instance, database, and table-level operations with real-world commands.

Table of Contents

1. Why PostgreSQL Maintenance Must Be Thought in Layers

2. Instance-Level Maintenance

  • vacuumdb
  • reindexdb
  • clusterdb
  • CHECKPOINT

3. Database-Level Physical Operations

4. Table-Level Maintenance (Now It Actually Makes Sense)

  • VACUUM vs VACUUM FULL
  • Advanced VACUUM options
  • ANALYZE

5. pg_repack: The Minimal-Lock Power Tool

6. CLUSTER: Physical Reordering with Intent

7. Index Maintenance Deep Dive

8. Observability Before Action

  • pg_stat_statements
  • EXPLAIN / EXPLAIN ANALYZE

9. The Peak: Measuring Bloat Like a Professional

  • Table Bloat Analysis
  • Index Bloat Analysis

10. Final Thoughts: Thinking Like a PostgreSQL DBA

1. Why PostgreSQL Maintenance Must Be Thought in Layers

PostgreSQL maintenance operates on three distinct layers:

  • Instance (cluster) level
  • Database level
  • Table / index level

Mistaking one for another leads to:

  • unnecessary locks
  • unexpected I/O storms
  • why did everything slow down? moments

Great DBAs don’t run commands.
 They choose the correct scope.

2. Instance-Level Maintenance

These commands operate at the PostgreSQL instance (cluster) level.

That means:

They affect ALL databases inside the same PostgreSQL instance.

This is where real power and real danger lives.

vacuumdb (Cluster-Aware by Nature)

vacuumdb -U postgres --all --analyze-only --verbose
  • Connects to every database
  • Runs ANALYZE only
  • Zero table rewrite
  • Zero blocking
  • Planner statistics refreshed instance-wide
vacuumdb -U postgres --all --full --analyze --verbose

This is VACUUM FULL across ALL databases.

  • Table rewrites
  • Exclusive locks
  • Massive I/O
  • Never run casually on production

Key insight:
 vacuumdb is not just a wrapper.
 It is an instance-level orchestrator.

reindexdb (Instance-Wide Index Surgery)

reindexdb --all --concurrently --jobs 4 --echo
  • Rebuilds indexes in all databases
  • CONCURRENTLY minimizes blocking
  • Parallel jobs distribute CPU load

Without CONCURRENTLY:

  • writes block
  • sessions wait
  • alerts start screaming

clusterdb (The Most Misunderstood Tool)

clusterdb -U postgres -d sales_db \
  --table orders \
  --index idx_orders_timestamp \
  --echo

Even though it targets a table,
 clusterdb itself is an instance-level utility.

Why this matters:

  • It connects from outside
  • It respects instance-level resource limits
  • It behaves differently than in-session CLUSTER

CHECKPOINT (Silent but Brutal)

CHECKPOINT;
  • Forces dirty buffers to disk
  • Flushes WAL
  • Can create sudden I/O spikes

Used carefully:

  • before backups
  • during controlled maintenance windows

Used blindly:

  • latency explosions

3. Database-Level Physical Operations

Now we zoom in.

Database-level commands still affect many objects, but stay inside one database.

Examples:

  • REINDEX DATABASE
  • CLUSTER DATABASE
  • VACUUM DATABASE

This is the sweet spot for controlled maintenance.

4. Table-Level Maintenance (Now It Actually Makes Sense)

Only now does it make sense to talk about VACUUM.

VACUUM

VACUUM; -- non-blocking, removes dead tuples
  • Marks space reusable
  • No OS-level space return
  • No exclusive locks
  • Everyday hygiene

VACUUM FULL

VACUUM FULL;
VACUUM FULL VERBOSE emp;
  • Full table rewrite
  • Disk space returned
  • ACCESS EXCLUSIVE LOCK
  • Equivalent to controlled demolition

Advanced VACUUM Usage

VACUUM (index_cleanup true, verbose true, analyze true) customers;

VACUUM (full true, index_cleanup true, verbose true, analyze true) customers;

VACUUM (index_cleanup true, verbose true, analyze true, parallel 4) customers;
-- FULL does NOT support parallelism

Parallel VACUUM:

  • reduces wall-clock time
  • increases CPU usage
  • perfect for large tables on modern hardware

ANALYZE (Planner’s Oxygen)

ANALYZE;
ANALYZE VERBOSE;

No ANALYZE = blind planner.
 Blind planner = bad execution plans.

5. pg_repack: The Minimal-Lock Power Tool

This is where PostgreSQL becomes elegant.

pg_repack -a
  • Reorganizes all tables
  • Uses triggers + shadow tables
  • Near-zero blocking
  • Ideal for production
pg_repack -t tablename -d employee

Targeted precision.

pg_repack -s newtablespace -d employee

This does three things at once:

  1. Removes bloat
  2. Rewrites data
  3. Moves objects to a new tablespace

All with minimal lock time.

Fillfactor is respected.
 Indexes are rebuilt cleanly.
 Downtime is measured in seconds.

6. CLUSTER: Physical Reordering with Intent

CLUSTER testcluster USING idx_id;
  • Physically reorders table
  • Optimizes range scans
  • One index defines the layout

But:

  • Table rewrite
  • Blocking
  • Needs re-run after data churn

CLUSTER is powerful but situational.

7. Index Maintenance Deep Dive

REINDEX INDEX index_name;

REINDEX TABLE table_name;

REINDEX SCHEMA schema_name;

REINDEX DATABASE db_name;

REINDEX TABLE CONCURRENTLY cs_agreement;

REINDEX (verbose) SYSTEM;

REINDEX SYSTEM:

  • touches system catalogs
  • rarely needed
  • always dangerous without planning

8. Observability Before Action

pg_stat_statements

Shows you:

  • what hurts
  • how often
  • how expensive

EXPLAIN / EXPLAIN ANALYZE

Theory vs reality.

Always trust ANALYZE.

9. The Peak: Measuring Bloat Like a Professional

Feelings end here.
 Numbers begin.

Table Bloat Analysis

-- This query inspects bloat in tables.
SELECT current_database(), schemaname, tblname, bs*tblpages AS real_size,
  (tblpages-est_tblpages)*bs AS extra_size,
  CASE WHEN tblpages > 0 AND tblpages - est_tblpages > 0
    THEN 100 * (tblpages - est_tblpages)/tblpages::float
    ELSE 0
  END AS extra_pct, fillfactor,
  CASE WHEN tblpages - est_tblpages_ff > 0
    THEN (tblpages-est_tblpages_ff)*bs
    ELSE 0
  END AS bloat_size,
  CASE WHEN tblpages > 0 AND tblpages - est_tblpages_ff > 0
    THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float
    ELSE 0
  END AS bloat_pct, is_na
FROM (
  SELECT ceil(reltuples / ((bs-page_hdr)/tpl_size)) + ceil(toasttuples / 4) AS est_tblpages,
         ceil(reltuples / ((bs-page_hdr)*fillfactor/(tpl_size*100))) + ceil(toasttuples / 4) AS est_tblpages_ff,
         tblpages, fillfactor, bs, tblid, schemaname, tblname, heappages, toastpages, is_na
  FROM (
    SELECT (4 + tpl_hdr_size + tpl_data_size + (2*ma)
            - CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END
            - CASE WHEN ceil(tpl_data_size)::int%ma = 0 THEN ma ELSE ceil(tpl_data_size)::int%ma END
           ) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + toastpages) AS tblpages, heappages,
           toastpages, reltuples, toasttuples, bs, page_hdr, tblid, schemaname, tblname, fillfactor, is_na
    FROM (
      SELECT tbl.oid AS tblid, ns.nspname AS schemaname, tbl.relname AS tblname, tbl.reltuples,
             tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
             coalesce(toast.reltuples, 0) AS toasttuples,
             coalesce(substring(array_to_string(tbl.reloptions, ' ') FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
             current_setting('block_size')::numeric AS bs,
             CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
             24 AS page_hdr,
             23 + CASE WHEN MAX(coalesce(s.null_frac,0)) > 0 THEN (7 + count(s.attname)) / 8 ELSE 0::int END
               + CASE WHEN bool_or(att.attname = 'oid' and att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size,
             sum((1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 0)) AS tpl_data_size,
             bool_or(att.atttypid = 'pg_catalog.name'::regtype) OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <> count(s.attname) AS is_na
      FROM pg_attribute AS att
      JOIN pg_class AS tbl ON att.attrelid = tbl.oid
      JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
      LEFT JOIN pg_stats AS s ON s.schemaname=ns.nspname AND s.tablename=tbl.relname AND s.inherited=false AND s.attname=att.attname
      LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
      WHERE NOT att.attisdropped
      AND tbl.relkind in ('r','m')
      GROUP BY 1,2,3,4,5,6,7,8,9,10
      ORDER BY 2,3
    ) AS s
  ) AS s2
) AS s3
ORDER BY schemaname, tblname;

This query answers:

  • Is the table actually bloated?
  • Or just large?
  • Does VACUUM help, or is rewrite needed?

Index Bloat Analysis

-- This query inspects bloat in indexes.
SELECT current_database(), nspname AS schemaname, tblname, idxname, 
  bs*(relpages)::bigint AS real_size,
  bs*(relpages-est_pages)::bigint AS extra_size,
  100 * (relpages-est_pages)::float / relpages AS extra_pct,
  fillfactor,
  CASE WHEN relpages > est_pages_ff THEN bs*(relpages-est_pages_ff) ELSE 0 END AS bloat_size,
  100 * (relpages-est_pages_ff)::float / relpages AS bloat_pct,
  is_na
FROM (
  SELECT coalesce(1 + ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0) AS est_pages,
         coalesce(1 + ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0) AS est_pages_ff,
         bs, nspname, tblname, idxname, relpages, fillfactor, is_na
  FROM (
    SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, idxoid, fillfactor,
           (index_tuple_hdr_bm + maxalign - CASE WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign ELSE index_tuple_hdr_bm%maxalign END
            + nulldatawidth + maxalign - CASE WHEN nulldatawidth = 0 THEN 0 WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign ELSE nulldatawidth::integer%maxalign END
           )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
    FROM (
      SELECT n.nspname, i.tblname, i.idxname, i.reltuples, i.relpages, i.idxoid, i.fillfactor,
             current_setting('block_size')::numeric AS bs,
             CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS maxalign,
             24 AS pagehdr,
             16 AS pageopqdata,
             CASE WHEN max(coalesce(s.null_frac,0)) = 0 THEN 8 ELSE 8 + ((32 + 8 - 1) / 8) END AS index_tuple_hdr_bm,
             sum((1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
             max(CASE WHEN i.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END) > 0 AS is_na
      FROM (
        SELECT ct.relname AS tblname, ct.relnamespace, ic.idxname, ic.attpos, ic.indkey, ic.indkey[ic.attpos], ic.reltuples, ic.relpages, ic.tbloid, ic.idxoid, ic.fillfactor,
               coalesce(a1.attnum, a2.attnum) AS attnum, coalesce(a1.attname, a2.attname) AS attname, coalesce(a1.atttypid, a2.atttypid) AS atttypid,
               CASE WHEN a1.attnum IS NULL THEN ic.idxname ELSE ct.relname END AS attrelname
        FROM (
          SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor, indkey, pg_catalog.generate_series(1,indnatts) AS attpos
          FROM (
            SELECT ci.relname AS idxname, ci.reltuples, ci.relpages, i.indrelid AS tbloid, i.indexrelid AS idxoid,
                   coalesce(substring(array_to_string(ci.reloptions, ' ') from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor,
                   i.indnatts, pg_catalog.string_to_array(pg_catalog.textin(pg_catalog.int2vectorout(i.indkey)),' ')::int[] AS indkey
            FROM pg_catalog.pg_index i
            JOIN pg_catalog.pg_class ci ON ci.oid = i.indexrelid
            WHERE ci.relam=(SELECT oid FROM pg_am WHERE amname = 'btree') AND ci.relpages > 0
          ) AS idx_data
        ) AS ic
        JOIN pg_catalog.pg_class ct ON ct.oid = ic.tbloid
        LEFT JOIN pg_catalog.pg_attribute a1 ON ic.indkey[ic.attpos] <> 0 AND a1.attrelid = ic.tbloid AND a1.attnum = ic.indkey[ic.attpos]
        LEFT JOIN pg_catalog.pg_attribute a2 ON ic.indkey[ic.attpos] = 0 AND a2.attrelid = ic.idxoid AND a2.attnum = ic.attpos
      ) i
      JOIN pg_catalog.pg_namespace n ON n.oid = i.relnamespace
      JOIN pg_catalog.pg_stats s ON s.schemaname = n.nspname AND s.tablename = i.attrelname AND s.attname = i.attname
      GROUP BY 1,2,3,4,5,6,7,8,9,10,11
    ) AS rows_data_stats
  ) AS rows_hdr_pdg_stats
) AS relation_stats
ORDER BY nspname, tblname, idxname;

This is how you decide:

  • REINDEX?
  • pg_repack?
  • Ignore and move on?

Final Thoughts: Thinking Like a PostgreSQL DBA

Maintenance is not a checklist.
 It’s scope awareness.

Great DBAs:

  • think in layers
  • measure before acting
  • avoid locks by design
  • respect the instance

Wisdom is knowing where and how wide to use them.