Skip to content

Instantly share code, notes, and snippets.

@sebastianwebber
Last active March 10, 2026 03:39
Show Gist options
  • Select an option

  • Save sebastianwebber/f8faeff829b50a95c8d4c43e53d62d1d to your computer and use it in GitHub Desktop.

Select an option

Save sebastianwebber/f8faeff829b50a95c8d4c43e53d62d1d to your computer and use it in GitHub Desktop.
PostgreSQL Bug: `recovery_target_action=pause` silently ignored when `RUNNING_XACTS` has `subxid_overflow=true`
subxid-overflow-repro

PostgreSQL Bug: recovery_target_action=pause silently ignored when RUNNING_XACTS has subxid_overflow=true

Summary

When a PostgreSQL standby replays a RUNNING_XACTS WAL record that has subxid_overflow=true, recovery_target_action=pause is silently ignored and the standby promotes directly to primary without pausing.

This means any tool or operator that relies on the pause to inspect or coordinate recovery state (e.g. backup utilities) will unexpectedly find the instance fully promoted, resulting in errors such as:

ERROR:  recovery is not in progress
DETAIL:  pg_get_wal_replay_pause_state() cannot be called during normal operation
SQLSTATE: 55000

How to reproduce

Requirements: Go 1.22+, PostgreSQL 14+ binaries in PATH

cd /path/to/this/directory
go mod tidy
go run .

The program will:

  1. Initialize a throw-away primary on port 15432
  2. Open a transaction with 100 SAVEPOINTs — each creates a subtransaction XID; after the 64th (PGPROC_MAX_CACHED_SUBXIDS), the subxid cache overflows and the overflowed flag is set in the proc array
  3. Run pg_basebackup --checkpoint=fast with the big transaction still open — the forced checkpoint emits a RUNNING_XACTS WAL record with subxid_overflow=true
  4. Set up a standby from that backup with:
    recovery_target        = 'immediate'
    recovery_target_action = 'pause'   ← this is silently ignored
    
  5. Start the standby and query pg_get_wal_replay_pause_state()

Expected output:

SELECT pg_get_wal_replay_pause_state()
  Expected : 'paused'
  Actual   : 'paused'  ← correct

Actual output (bug):

SELECT pg_get_wal_replay_pause_state()
  Expected : 'paused'  (recovery_target_action = pause)
  Actual   : ERROR: recovery is not in progress (SQLSTATE 55000)

Root cause

A three-step chain in the PostgreSQL source:

Step 1 — procarray.c L1261

When a RUNNING_XACTS record with subxid_overflow=true is replayed, the hot-standby snapshot state is set to PENDING instead of READY:

if (running->subxid_status == SUBXIDS_MISSING)  /* subxid_overflow=true */
{
    standbyState = STANDBY_SNAPSHOT_PENDING;  /* never becomes READY */
    standbySnapshotPendingXmin = latestObservedXid;
}

Step 2 — xlogrecovery.c L2266

STANDBY_SNAPSHOT_PENDING prevents LocalHotStandbyActive from ever becoming true:

if (standbyState == STANDBY_SNAPSHOT_READY &&   /* condition fails: PENDING */
    !LocalHotStandbyActive &&
    reachedConsistency &&
    IsUnderPostmaster)
{
    LocalHotStandbyActive = true;  /* never reached */
    SendPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY);
}

Step 3 — xlogrecovery.c L2935

recoveryPausesHere() returns early when LocalHotStandbyActive is false, silently skipping the pause:

static void
recoveryPausesHere(bool endOfRecovery)
{
    /* Don't pause unless users can connect! */
    if (!LocalHotStandbyActive)
        return;   /* ← returns here; pause never happens */

    /* ... the actual pause loop is never executed ... */
}

Execution falls through to RECOVERY_TARGET_ACTION_PROMOTE: break; → PostgreSQL promotes to primary.


Conditions required to trigger the bug

Condition Details
Long-running transaction with many subtransactions The transaction must have more than PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at the time of a checkpoint
Checkpoint while the transaction is open The checkpoint emits a RUNNING_XACTS WAL record with subxid_overflow=true
Standby/restore replays that WAL record The record falls inside the minimal WAL window required to reach consistency
recovery_target_action = pause The default value; any configuration that expects recovery to pause

The overflow record only needs to appear once in the WAL stream. Once written, every restore from a backup that includes it will trigger the bug deterministically, with no manual retry ever succeeding.


Impact

  • Backup utilities (e.g. wal-g, pgbackup) that restore a base backup, replay WAL in recovery, and then query pg_get_wal_replay_pause_state() will fail with ERROR: recovery is not in progress.
  • The failure is deterministic and permanent: as long as the same base backup is used, the same WAL record is replayed, causing the same promotion on every attempt.
  • Any tool relying on pg_get_wal_replay_pause_state() or pg_wal_replay_pause() during recovery is affected.

Suggested fix

Option A — in recoveryPausesHere(): when LocalHotStandbyActive is false but a pause is requested and the reason is subxid overflow, log a warning and wait for the condition to resolve (or handle the subxid_overflow case so LocalHotStandbyActive can still be set).

Option B — allow recovery_target_action=pause to work via a file- or signal-based mechanism that does not require a database connection, so it does not depend on LocalHotStandbyActive.

Option C — in procarray.c / xlogrecovery.c: after processing a RUNNING_XACTS record with subxid_overflow=true, schedule a follow-up check that transitions standbyState to STANDBY_SNAPSHOT_READY once all the overflowed transactions are known to have completed, so that LocalHotStandbyActive can eventually be set.

At minimum, the current silent return in recoveryPausesHere() should emit a WARNING or LOG so operators are aware the pause was skipped.


Affected versions

Confirmed in PostgreSQL 17.8 and 18.1 (latest stable as of March 2026). The code path (procarray.c, xlogrecovery.c) has been present since at least PostgreSQL 14. Earlier versions may also be affected.

module subxid-overflow-repro
go 1.22
require (
github.com/charmbracelet/log v0.4.0
github.com/jackc/pgx/v5 v5.7.2
)
require (
github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
github.com/charmbracelet/lipgloss v0.10.0 // indirect
github.com/go-logfmt/logfmt v0.6.0 // indirect
github.com/jackc/pgpassfile v1.0.0 // indirect
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
github.com/mattn/go-isatty v0.0.18 // indirect
github.com/mattn/go-runewidth v0.0.15 // indirect
github.com/muesli/reflow v0.3.0 // indirect
github.com/muesli/termenv v0.15.2 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
golang.org/x/crypto v0.31.0 // indirect
golang.org/x/exp v0.0.0-20231006140011-7918f672742d // indirect
golang.org/x/sys v0.28.0 // indirect
golang.org/x/text v0.21.0 // indirect
)
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
github.com/charmbracelet/lipgloss v0.10.0 h1:KWeXFSexGcfahHX+54URiZGkBFazf70JNMtwg/AFW3s=
github.com/charmbracelet/lipgloss v0.10.0/go.mod h1:Wig9DSfvANsxqkRsqj6x87irdy123SR4dOXlKa91ciE=
github.com/charmbracelet/log v0.4.0 h1:G9bQAcx8rWA2T3pWvx7YtPTPwgqpk7D68BX21IRW8ZM=
github.com/charmbracelet/log v0.4.0/go.mod h1:63bXt/djrizTec0l11H20t8FDSvA4CRZJ1KH22MdptM=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/go-logfmt/logfmt v0.6.0 h1:wGYYu3uicYdqXVgoYbvnkrPVXkuLM1p1ifugDMEdRi4=
github.com/go-logfmt/logfmt v0.6.0/go.mod h1:WYhtIu8zTZfxdn5+rREduYbwxfcBr/Vr6KEVveWlfTs=
github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM=
github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg=
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo=
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM=
github.com/jackc/pgx/v5 v5.7.2 h1:mLoDLV6sonKlvjIEsV56SkWNCnuNv531l94GaIzO+XI=
github.com/jackc/pgx/v5 v5.7.2/go.mod h1:ncY89UGWxg82EykZUwSpUKEfccBGGYq1xjrOpsbsfGQ=
github.com/jackc/puddle/v2 v2.2.2 h1:PR8nw+E/1w0GLuRFSmiioY6UooMp6KJv0/61nB7icHo=
github.com/jackc/puddle/v2 v2.2.2/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4=
github.com/lucasb-eyer/go-colorful v1.2.0 h1:1nnpGOrhyZZuNyfu1QjKiUICQ74+3FNCN69Aj6K7nkY=
github.com/lucasb-eyer/go-colorful v1.2.0/go.mod h1:R4dSotOR9KMtayYi1e77YzuveK+i7ruzyGqttikkLy0=
github.com/mattn/go-isatty v0.0.18 h1:DOKFKCQ7FNG2L1rbrmstDN4QVRdS89Nkh85u68Uwp98=
github.com/mattn/go-isatty v0.0.18/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mattn/go-runewidth v0.0.12/go.mod h1:RAqKPSqVFrSLVXbA8x7dzmKdmGzieGRCM46jaSJTDAk=
github.com/mattn/go-runewidth v0.0.15 h1:UNAjwbU9l54TA3KzvqLGxwWjHmMgBUVhBiTjelZgg3U=
github.com/mattn/go-runewidth v0.0.15/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/muesli/reflow v0.3.0 h1:IFsN6K9NfGtjeggFP+68I4chLZV2yIKsXJFNZ+eWh6s=
github.com/muesli/reflow v0.3.0/go.mod h1:pbwTDkVPibjO2kyvBQRBxTWEEGDGq0FlB1BIKtnHY/8=
github.com/muesli/termenv v0.15.2 h1:GohcuySI0QmI3wN8Ok9PtKGkgkFIk7y6Vpb5PvrY+Wo=
github.com/muesli/termenv v0.15.2/go.mod h1:Epx+iuz8sNs7mNKhxzH4fWXGNpZwUaJKRS1noLXviQ8=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/rivo/uniseg v0.1.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
golang.org/x/crypto v0.31.0 h1:ihbySMvVjLAeSH1IbfcRTkD/iNscyz8rGzjF/E5hV6U=
golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo=
golang.org/x/sync v0.10.0 h1:3NQrjDixjgGwUOCaF8w2+VYHv0Ve/vGYSbdkTa98gmQ=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.28.0 h1:Fksou7UEQUWlKvIdsqzJmUmCX3cZuD2+P3XyyzwMhlA=
golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/text v0.21.0 h1:zyQAAkrwaneQ066sspRyJaG9VNi/YJ1NfzcGB3hZ/qo=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
// Reproducer for PostgreSQL bug:
// recovery_target_action=pause is silently ignored when a RUNNING_XACTS WAL
// record with subxid_overflow=true is replayed during standby recovery.
//
// Root cause (three-step chain):
// 1. RUNNING_XACTS with subxid_overflow=true sets standbyState = STANDBY_SNAPSHOT_PENDING
// instead of STANDBY_SNAPSHOT_READY (procarray.c).
// 2. STANDBY_SNAPSHOT_PENDING prevents LocalHotStandbyActive from ever becoming true
// (xlogrecovery.c).
// 3. recoveryPausesHere() returns early when LocalHotStandbyActive=false, silently
// skipping the pause and falling through to PROMOTE (xlogrecovery.c).
//
// Effect: standby promotes directly to primary instead of pausing.
// Symptom: pg_get_wal_replay_pause_state() throws ERROR: recovery is not in progress
// (SQLSTATE 55000).
//
// Affected versions confirmed: PostgreSQL 17.x
// See: https://github.com/postgres/postgres/blob/REL_17_8/src/backend/storage/ipc/procarray.c#L1261
// https://github.com/postgres/postgres/blob/REL_17_8/src/backend/access/transam/xlogrecovery.c#L2935
package main
import (
"context"
"errors"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
"time"
"github.com/charmbracelet/log"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgconn"
)
const (
primaryPort = 15432
standbyPort = 15433
// PGPROC_MAX_CACHED_SUBXIDS = 64 in PostgreSQL source.
// We need strictly more than 64 active subtransactions to trigger the overflow.
numSavepoints = 100
)
// ────────────────────────────────────────────────────────────────────────────
// Entrypoint
// ────────────────────────────────────────────────────────────────────────────
func main() {
log.SetReportTimestamp(false)
binDir, err := findPgBinDir()
if err != nil {
log.Fatal("PostgreSQL not found", "err", err, "hint", "install PostgreSQL 14+ and ensure pg_ctl is in your PATH")
}
ver, _ := pgServerVersion(binDir)
log.Info("PostgreSQL", "binaries", binDir, "version", ver)
tmpDir, err := os.MkdirTemp("", "pg-subxid-overflow-*")
if err != nil {
log.Fatal("MkdirTemp", "err", err)
}
log.Info("working directory", "path", tmpDir)
r := &repro{
binDir: binDir,
tmpDir: tmpDir,
primaryDir: filepath.Join(tmpDir, "primary"),
standbyDir: filepath.Join(tmpDir, "standby"),
archiveDir: filepath.Join(tmpDir, "wal-archive"),
}
defer r.cleanup()
if err := r.run(); err != nil {
log.Fatal("reproducer failed", "err", err)
}
}
// ────────────────────────────────────────────────────────────────────────────
// Reproducer
// ────────────────────────────────────────────────────────────────────────────
type repro struct {
binDir string
tmpDir string
primaryDir string
standbyDir string
archiveDir string
}
func (r *repro) run() error {
step("1/6", "Initialize and start primary PostgreSQL")
if err := r.initPrimary(); err != nil {
return fmt.Errorf("initPrimary: %w", err)
}
if err := r.startPrimary(); err != nil {
return fmt.Errorf("startPrimary: %w", err)
}
ctx := context.Background()
conn, err := r.connectWithRetry(ctx, primaryPort)
if err != nil {
return fmt.Errorf("connect to primary: %w", err)
}
defer conn.Close(ctx)
step("2/6", fmt.Sprintf("Open transaction with %d SAVEPOINTs to overflow subxid cache (limit=64)", numSavepoints))
if err := r.createSubxidOverflow(ctx, conn); err != nil {
return fmt.Errorf("createSubxidOverflow: %w", err)
}
// The big transaction is still OPEN here. When pg_basebackup forces a
// checkpoint (--checkpoint=fast), PostgreSQL emits a RUNNING_XACTS WAL
// record that will have subxid_overflow=true because our transaction has
// 100 active subtransaction XIDs — more than PGPROC_MAX_CACHED_SUBXIDS=64.
step("3/6", "Take base backup (--checkpoint=fast emits RUNNING_XACTS with subxid_overflow=true)")
if err := r.takeBaseBackup(); err != nil {
return fmt.Errorf("takeBaseBackup: %w", err)
}
// We can now commit/rollback the big transaction — backup is done.
if _, err := conn.Exec(ctx, "ROLLBACK"); err != nil {
log.Warn("ROLLBACK failed", "err", err)
}
step("4/6", "Configure standby: recovery_target=immediate, recovery_target_action=pause")
if err := r.configureStandby(); err != nil {
return fmt.Errorf("configureStandby: %w", err)
}
step("5/6", "Start standby and replay WAL (should pause; will it?)")
log.Info("expected behavior", "result", "standby pauses at recovery target")
log.Warn("buggy behavior", "result", "standby silently promotes (recoveryPausesHere returns early)")
fmt.Println()
if err := r.startStandby(); err != nil {
return fmt.Errorf("startStandby: %w", err)
}
step("6/6", "Query pg_get_wal_replay_pause_state() on standby")
return r.checkBug(ctx)
}
// ────────────────────────────────────────────────────────────────────────────
// Primary setup
// ────────────────────────────────────────────────────────────────────────────
func (r *repro) initPrimary() error {
if err := os.MkdirAll(r.archiveDir, 0o755); err != nil {
return err
}
if err := r.pgCmd("initdb",
"-D", r.primaryDir,
"--no-locale",
"--encoding=UTF8",
"-U", "postgres",
); err != nil {
return fmt.Errorf("initdb: %w", err)
}
// Append to postgresql.conf
pgconf := filepath.Join(r.primaryDir, "postgresql.conf")
f, err := os.OpenFile(pgconf, os.O_APPEND|os.O_WRONLY, 0o644)
if err != nil {
return err
}
defer f.Close()
archiveCmd := fmt.Sprintf("cp %%p %s/%%f", r.archiveDir)
_, err = fmt.Fprintf(f, `
# ── subxid-overflow-repro ──────────────────────────
listen_addresses = 'localhost'
port = %d
wal_level = replica
archive_mode = on
archive_command = '%s'
max_wal_senders = 5
wal_keep_size = 64
`, primaryPort, archiveCmd)
if err != nil {
return err
}
// Allow trust auth for localhost (test cluster only)
hba := filepath.Join(r.primaryDir, "pg_hba.conf")
return os.WriteFile(hba, []byte(
"local all all trust\n"+
"host all all 127.0.0.1/32 trust\n"+
"host all all ::1/128 trust\n"+
"host replication all 127.0.0.1/32 trust\n"+
"host replication all ::1/128 trust\n",
), 0o644)
}
func (r *repro) startPrimary() error {
return r.pgCmd("pg_ctl", "start", "-D", r.primaryDir, "-w")
}
// ────────────────────────────────────────────────────────────────────────────
// Subxid overflow trigger
// ────────────────────────────────────────────────────────────────────────────
func (r *repro) createSubxidOverflow(ctx context.Context, conn *pgx.Conn) error {
// Create the helper table first (outside the big transaction so it persists)
if _, err := conn.Exec(ctx,
"CREATE TABLE IF NOT EXISTS subxid_test (id int)",
); err != nil {
return fmt.Errorf("CREATE TABLE: %w", err)
}
if _, err := conn.Exec(ctx, "BEGIN"); err != nil {
return fmt.Errorf("BEGIN: %w", err)
}
// Create numSavepoints SAVEPOINTs, each with a write. Each SAVEPOINT
// produces a subtransaction XID. After PGPROC_MAX_CACHED_SUBXIDS=64
// subtransaction XIDs, the per-backend subxid cache overflows and the
// `overflowed` flag is set in the proc array.
for i := 1; i <= numSavepoints; i++ {
if _, err := conn.Exec(ctx, fmt.Sprintf("SAVEPOINT s%d", i)); err != nil {
return fmt.Errorf("SAVEPOINT s%d: %w", i, err)
}
if _, err := conn.Exec(ctx, fmt.Sprintf("INSERT INTO subxid_test VALUES (%d)", i)); err != nil {
return fmt.Errorf("INSERT at s%d: %w", i, err)
}
}
log.Info("subxid cache overflowed", "active_subtransactions", numSavepoints, "overflow_at", 64)
log.Info("transaction held open — visible during the upcoming checkpoint")
return nil
}
// ────────────────────────────────────────────────────────────────────────────
// Base backup
// ────────────────────────────────────────────────────────────────────────────
func (r *repro) takeBaseBackup() error {
// --checkpoint=fast forces an immediate checkpoint on the primary.
// Because our big transaction is still open, the checkpoint's RUNNING_XACTS
// WAL record will have subxid_overflow=true.
// --wal-method=stream captures all WAL generated during the backup so the
// standby's pg_wal directory contains everything needed for recovery.
if err := r.pgCmd("pg_basebackup",
"-h", "localhost",
"-p", fmt.Sprintf("%d", primaryPort),
"-U", "postgres",
"-D", r.standbyDir,
"--wal-method=stream",
"--checkpoint=fast",
"-P",
); err != nil {
return fmt.Errorf("pg_basebackup: %w", err)
}
log.Info("base backup complete", "dir", r.standbyDir)
return nil
}
// ────────────────────────────────────────────────────────────────────────────
// Standby setup
// ────────────────────────────────────────────────────────────────────────────
func (r *repro) configureStandby() error {
// Create standby.signal so PostgreSQL enters recovery mode.
if err := os.WriteFile(
filepath.Join(r.standbyDir, "standby.signal"), []byte{}, 0o644,
); err != nil {
return err
}
// pg_basebackup may have created postgresql.auto.conf with a primary_conninfo
// for streaming replication. Remove it: we want pure archive recovery so that
// recovery is controlled solely by the WAL on disk — matching the pgbackup
// restore-from-S3 scenario exactly.
os.Remove(filepath.Join(r.standbyDir, "postgresql.auto.conf"))
pgconf := filepath.Join(r.standbyDir, "postgresql.conf")
f, err := os.OpenFile(pgconf, os.O_APPEND|os.O_WRONLY, 0o644)
if err != nil {
return err
}
defer f.Close()
restoreCmd := fmt.Sprintf("cp %s/%%f %%p", r.archiveDir)
_, err = fmt.Fprintf(f, `
# ── subxid-overflow-repro: standby recovery ───────
port = %d
restore_command = '%s'
recovery_target = 'immediate'
recovery_target_action = 'pause' # BUG: this is silently ignored when
# RUNNING_XACTS has subxid_overflow=true
hot_standby = on
`, standbyPort, restoreCmd)
return err
}
func (r *repro) startStandby() error {
logFile := filepath.Join(r.standbyDir, "postgresql.log")
pgctl := filepath.Join(r.binDir, "pg_ctl")
cmd := exec.Command(pgctl,
"start", "-D", r.standbyDir,
"-w", "-t", "60",
"-l", logFile,
)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
// pg_ctl -w can fail when recovery ends in an unexpected state (e.g. it
// promoted instead of pausing). Log it and continue — checkBug() will
// determine the actual outcome.
log.Warn("pg_ctl start returned error (may be expected)", "err", err)
}
// Show relevant lines from the standby log.
if data, _ := os.ReadFile(logFile); len(data) > 0 {
fmt.Println()
fmt.Println(" ┌─ Standby log (recovery-related lines) ──────────────────────")
for _, line := range strings.Split(string(data), "\n") {
lower := strings.ToLower(line)
if strings.Contains(lower, "running_xacts") ||
strings.Contains(lower, "subxid") ||
strings.Contains(lower, "promot") ||
strings.Contains(lower, "archive recovery complete") ||
strings.Contains(lower, "recovery is") ||
strings.Contains(lower, "paused") ||
strings.Contains(lower, "consistent") ||
strings.Contains(lower, "recovery target") {
fmt.Printf(" │ %s\n", line)
}
}
fmt.Println(" └─────────────────────────────────────────────────────────────")
fmt.Println()
}
return nil
}
// ────────────────────────────────────────────────────────────────────────────
// Bug check
// ────────────────────────────────────────────────────────────────────────────
func (r *repro) checkBug(ctx context.Context) error {
conn, err := r.connectWithRetry(ctx, standbyPort)
if err != nil {
return fmt.Errorf("connect to standby on port %d: %w", standbyPort, err)
}
defer conn.Close(ctx)
// Also check pg_is_in_recovery() for extra clarity.
var inRecovery bool
_ = conn.QueryRow(ctx, "SELECT pg_is_in_recovery()").Scan(&inRecovery)
var pauseState string
queryErr := conn.QueryRow(ctx, "SELECT pg_get_wal_replay_pause_state()").Scan(&pauseState)
printResult(inRecovery, pauseState, queryErr)
return nil
}
func printResult(inRecovery bool, pauseState string, queryErr error) {
fmt.Println()
fmt.Println("══════════════════════════════════════════════════════════════════")
if queryErr != nil {
var pgErr *pgconn.PgError
if errors.As(queryErr, &pgErr) &&
pgErr.Code == "55000" &&
strings.Contains(pgErr.Message, "recovery is not in progress") {
log.Error("BUG REPRODUCED")
fmt.Println()
log.Info("pg_is_in_recovery()", "value", inRecovery)
fmt.Println()
log.Info("SELECT pg_get_wal_replay_pause_state()")
log.Info(" expected", "value", "paused (recovery_target_action = pause)")
log.Error(" actual", "value", fmt.Sprintf("ERROR: %s (SQLSTATE %s)", pgErr.Message, pgErr.Code))
fmt.Println()
log.Info("root cause — step 1", "file", "procarray.c:1261",
"detail", "RUNNING_XACTS subxid_overflow=true sets standbyState=STANDBY_SNAPSHOT_PENDING instead of READY")
log.Info("root cause — step 2", "file", "xlogrecovery.c:2266",
"detail", "standbyState != STANDBY_SNAPSHOT_READY so LocalHotStandbyActive is never set to true")
log.Info("root cause — step 3", "file", "xlogrecovery.c:2935",
"detail", "recoveryPausesHere() returns early: `if (!LocalHotStandbyActive) return;` → promotes")
fmt.Println()
log.Warn("affected deployments",
"condition", "recovery_target_action=pause + RUNNING_XACTS with subxid_overflow=true in replayed WAL",
"trigger", "live transaction with >PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at checkpoint time")
} else {
log.Error("unexpected error", "err", queryErr)
}
} else {
if pauseState == "paused" {
log.Info("NOT REPRODUCED — behavior is correct (or bug is already fixed)")
fmt.Println()
log.Info("pg_is_in_recovery()", "value", inRecovery)
log.Info("SELECT pg_get_wal_replay_pause_state()", "expected", "paused", "actual", "paused ✓")
fmt.Println()
log.Info("possible reasons",
"1", "this PostgreSQL version has the fix applied",
"2", "RUNNING_XACTS overflow record was not in the replayed WAL range",
"hint", "try with PostgreSQL 17.x to confirm")
} else {
log.Warn("unexpected pause state",
"pg_get_wal_replay_pause_state", pauseState,
"pg_is_in_recovery", inRecovery)
}
}
fmt.Println("══════════════════════════════════════════════════════════════════")
fmt.Println()
}
// ────────────────────────────────────────────────────────────────────────────
// Helpers
// ────────────────────────────────────────────────────────────────────────────
func (r *repro) connectWithRetry(ctx context.Context, port int) (*pgx.Conn, error) {
dsn := fmt.Sprintf(
"host=localhost port=%d user=postgres dbname=postgres sslmode=disable",
port,
)
var last error
for i := 0; i < 40; i++ {
conn, err := pgx.Connect(ctx, dsn)
if err == nil {
return conn, nil
}
last = err
time.Sleep(500 * time.Millisecond)
}
return nil, fmt.Errorf("could not connect after 20s: %w", last)
}
func (r *repro) pgCmd(name string, args ...string) error {
cmd := exec.Command(filepath.Join(r.binDir, name), args...)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
return cmd.Run()
}
func (r *repro) cleanup() {
fmt.Println()
log.Info("cleanup")
pgctl := filepath.Join(r.binDir, "pg_ctl")
for _, dir := range []string{r.standbyDir, r.primaryDir} {
if _, err := os.Stat(dir); err == nil {
exec.Command(pgctl, "stop", "-D", dir, "-m", "immediate", "-w").Run()
}
}
if err := os.RemoveAll(r.tmpDir); err != nil {
log.Warn("cleanup failed", "path", r.tmpDir, "err", err)
return
}
log.Info("removed temp dir", "path", r.tmpDir)
}
func step(n, msg string) {
fmt.Println()
log.Infof("step %s: %s", n, msg)
fmt.Println()
}
func findPgBinDir() (string, error) {
if path, err := exec.LookPath("pg_ctl"); err == nil {
return filepath.Dir(path), nil
}
for _, dir := range []string{
"/opt/homebrew/bin",
"/usr/local/bin",
"/opt/homebrew/opt/postgresql@17/bin",
"/opt/homebrew/opt/postgresql@16/bin",
"/opt/homebrew/opt/postgresql@15/bin",
"/usr/local/opt/postgresql@17/bin",
"/usr/local/opt/postgresql@16/bin",
"/usr/lib/postgresql/17/bin",
"/usr/lib/postgresql/16/bin",
"/usr/lib/postgresql/15/bin",
} {
if _, err := os.Stat(filepath.Join(dir, "pg_ctl")); err == nil {
return dir, nil
}
}
return "", fmt.Errorf("pg_ctl not found in PATH or common locations")
}
func pgServerVersion(binDir string) (string, error) {
out, err := exec.Command(filepath.Join(binDir, "postgres"), "--version").Output()
if err != nil {
return "unknown", err
}
return strings.TrimSpace(string(out)), nil
}
bug-subxid-overflow ▞▚  main ▞▚ go run main.go
INFO PostgreSQL binaries=/Users/seba/.local/postgres/18.1/install/bin version="postgres (PostgreSQL) 18.1"
INFO working directory path=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843
INFO step 1/6: Initialize and start primary PostgreSQL
The files belonging to this database system will be owned by user "seba".
This user must also own the server process.
The database cluster will be initialized with locale "C".
The default text search configuration will be set to "english".
Data page checksums are enabled.
creating directory /var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/primary ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... America/Sao_Paulo
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
/Users/seba/.local/postgres/18.1/install/bin/pg_ctl -D /var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/primary -l logfile start
waiting for server to start....2026-03-10 00:35:56.321 -03 [43624] LOG: starting PostgreSQL 18.1 on aarch64-apple-darwin25.2.0, compiled by clang version 21.1.8, 64-bit
2026-03-10 00:35:56.322 -03 [43624] LOG: listening on IPv6 address "::1", port 15432
2026-03-10 00:35:56.322 -03 [43624] LOG: listening on IPv4 address "127.0.0.1", port 15432
2026-03-10 00:35:56.322 -03 [43624] LOG: listening on Unix socket "/tmp/.s.PGSQL.15432"
2026-03-10 00:35:56.325 -03 [43630] LOG: database system was shut down at 2026-03-10 00:35:56 -03
2026-03-10 00:35:56.326 -03 [43624] LOG: database system is ready to accept connections
done
server started
INFO step 2/6: Open transaction with 100 SAVEPOINTs to overflow subxid cache (limit=64)
INFO subxid cache overflowed active_subtransactions=100 overflow_at=64
INFO transaction held open — visible during the upcoming checkpoint
INFO step 3/6: Take base backup (--checkpoint=fast emits RUNNING_XACTS with subxid_overflow=true)
2026-03-10 00:35:56.452 -03 [43628] LOG: checkpoint starting: immediate force wait
2026-03-10 00:35:56.456 -03 [43628] LOG: checkpoint complete: wrote 23 buffers (0.1%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.005 s; sync files=21, longest=0.001 s, average=0.001 s; distance=8682 kB, estimate=8682 kB; lsn=0/2000080, redo lsn=0/2000028
23893/23893 kB (100%), 1/1 tablespace
INFO base backup complete dir=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/standby
INFO step 4/6: Configure standby: recovery_target=immediate, recovery_target_action=pause
INFO step 5/6: Start standby and replay WAL (should pause; will it?)
INFO expected behavior result="standby pauses at recovery target"
WARN buggy behavior result="standby silently promotes (recoveryPausesHere returns early)"
waiting for server to start.... done
server started
┌─ Standby log (recovery-related lines) ──────────────────────
│ 2026-03-10 03:35:56.747 GMT [43649] LOG: skipping missing configuration file "/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/standby/postgresql.auto.conf"
│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000002.history': No such file or directory
│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/000000010000000000000003': No such file or directory
│ 2026-03-10 00:35:56.852 -03 [43655] LOG: consistent recovery state reached at 0/2000120
│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000002.history': No such file or directory
│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000001.history': No such file or directory
│ 2026-03-10 00:35:56.910 -03 [43655] LOG: archive recovery complete
└─────────────────────────────────────────────────────────────
INFO step 6/6: Query pg_get_wal_replay_pause_state() on standby
══════════════════════════════════════════════════════════════════
ERRO BUG REPRODUCED
INFO pg_is_in_recovery() value=false
INFO SELECT pg_get_wal_replay_pause_state()
INFO expected value="paused (recovery_target_action = pause)"
ERRO actual value="ERROR: recovery is not in progress (SQLSTATE 55000)"
INFO root cause — step 1 file=procarray.c:1261 detail="RUNNING_XACTS subxid_overflow=true sets standbyState=STANDBY_SNAPSHOT_PENDING instead of READY"
INFO root cause — step 2 file=xlogrecovery.c:2266 detail="standbyState != STANDBY_SNAPSHOT_READY so LocalHotStandbyActive is never set to true"
INFO root cause — step 3 file=xlogrecovery.c:2935 detail="recoveryPausesHere() returns early: `if (!LocalHotStandbyActive) return;` → promotes"
WARN affected deployments condition="recovery_target_action=pause + RUNNING_XACTS with subxid_overflow=true in replayed WAL" trigger="live transaction with >PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at checkpoint time"
══════════════════════════════════════════════════════════════════
INFO cleanup
2026-03-10 00:35:57.061 -03 [43624] LOG: received immediate shutdown request
2026-03-10 00:35:57.064 -03 [43624] LOG: database system is shut down
INFO removed temp dir path=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment