sebastianwebber/.gitignore

## .gitignore
subxid-overflow-repro

## README.md

      
    Raw
  

              README.md
            
          
    PostgreSQL Bug: recovery_target_action=pause silently ignored when RUNNING_XACTS has subxid_overflow=true

Summary

When a PostgreSQL standby replays a RUNNING_XACTS WAL record that has
subxid_overflow=true, recovery_target_action=pause is silently ignored
and the standby promotes directly to primary without pausing.
This means any tool or operator that relies on the pause to inspect or
coordinate recovery state (e.g. backup utilities) will unexpectedly find the
instance fully promoted, resulting in errors such as:
ERROR:  recovery is not in progress
DETAIL:  pg_get_wal_replay_pause_state() cannot be called during normal operation
SQLSTATE: 55000

How to reproduce

Requirements: Go 1.22+, PostgreSQL 14+ binaries in PATH
cd /path/to/this/directory
go mod tidy
go run .
The program will:

Initialize a throw-away primary on port 15432
Open a transaction with 100 SAVEPOINTs — each creates a subtransaction
XID; after the 64th (PGPROC_MAX_CACHED_SUBXIDS), the subxid cache
overflows and the overflowed flag is set in the proc array
Run pg_basebackup --checkpoint=fast with the big transaction still open —
the forced checkpoint emits a RUNNING_XACTS WAL record with
subxid_overflow=true
Set up a standby from that backup with:
recovery_target        = 'immediate'
recovery_target_action = 'pause'   ← this is silently ignored


Start the standby and query pg_get_wal_replay_pause_state()

Expected output:
SELECT pg_get_wal_replay_pause_state()
  Expected : 'paused'
  Actual   : 'paused'  ← correct

Actual output (bug):
SELECT pg_get_wal_replay_pause_state()
  Expected : 'paused'  (recovery_target_action = pause)
  Actual   : ERROR: recovery is not in progress (SQLSTATE 55000)


Root cause

A three-step chain in the PostgreSQL source:
Step 1 — procarray.c L1261

When a RUNNING_XACTS record with subxid_overflow=true is replayed,
the hot-standby snapshot state is set to PENDING instead of READY:
if (running->subxid_status == SUBXIDS_MISSING)  /* subxid_overflow=true */
{
    standbyState = STANDBY_SNAPSHOT_PENDING;  /* never becomes READY */
    standbySnapshotPendingXmin = latestObservedXid;
}
Step 2 — xlogrecovery.c L2266

STANDBY_SNAPSHOT_PENDING prevents LocalHotStandbyActive from ever
becoming true:
if (standbyState == STANDBY_SNAPSHOT_READY &&   /* condition fails: PENDING */
    !LocalHotStandbyActive &&
    reachedConsistency &&
    IsUnderPostmaster)
{
    LocalHotStandbyActive = true;  /* never reached */
    SendPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY);
}
Step 3 — xlogrecovery.c L2935

recoveryPausesHere() returns early when LocalHotStandbyActive is false,
silently skipping the pause:
static void
recoveryPausesHere(bool endOfRecovery)
{
    /* Don't pause unless users can connect! */
    if (!LocalHotStandbyActive)
        return;   /* ← returns here; pause never happens */

    /* ... the actual pause loop is never executed ... */
}
Execution falls through to RECOVERY_TARGET_ACTION_PROMOTE: break; →
PostgreSQL promotes to primary.

Conditions required to trigger the bug


Condition
Details


Long-running transaction with many subtransactions
The transaction must have more than PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at the time of a checkpoint


Checkpoint while the transaction is open
The checkpoint emits a RUNNING_XACTS WAL record with subxid_overflow=true


Standby/restore replays that WAL record
The record falls inside the minimal WAL window required to reach consistency


recovery_target_action = pause
The default value; any configuration that expects recovery to pause


The overflow record only needs to appear once in the WAL stream. Once
written, every restore from a backup that includes it will trigger the bug
deterministically, with no manual retry ever succeeding.

Impact


Backup utilities (e.g. wal-g, pgbackup) that restore a base backup, replay
WAL in recovery, and then query pg_get_wal_replay_pause_state() will fail
with ERROR: recovery is not in progress.
The failure is deterministic and permanent: as long as the same base
backup is used, the same WAL record is replayed, causing the same promotion
on every attempt.
Any tool relying on pg_get_wal_replay_pause_state() or
pg_wal_replay_pause() during recovery is affected.


Suggested fix

Option A — in recoveryPausesHere(): when LocalHotStandbyActive is false
but a pause is requested and the reason is subxid overflow, log a warning
and wait for the condition to resolve (or handle the subxid_overflow case
so LocalHotStandbyActive can still be set).
Option B — allow recovery_target_action=pause to work via a file- or
signal-based mechanism that does not require a database connection, so it
does not depend on LocalHotStandbyActive.
Option C — in procarray.c / xlogrecovery.c: after processing a
RUNNING_XACTS record with subxid_overflow=true, schedule a follow-up
check that transitions standbyState to STANDBY_SNAPSHOT_READY once
all the overflowed transactions are known to have completed, so that
LocalHotStandbyActive can eventually be set.
At minimum, the current silent return in recoveryPausesHere() should
emit a WARNING or LOG so operators are aware the pause was skipped.

Affected versions

Confirmed in PostgreSQL 17.8 and 18.1 (latest stable as of March 2026).
The code path (procarray.c, xlogrecovery.c) has been present since at least
PostgreSQL 14. Earlier versions may also be affected.

  
## go.mod
module subxid-overflow-repro

go 1.22

require (
	github.com/charmbracelet/log v0.4.0
	github.com/jackc/pgx/v5 v5.7.2
)

require (
	github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
	github.com/charmbracelet/lipgloss v0.10.0 // indirect
	github.com/go-logfmt/logfmt v0.6.0 // indirect
	github.com/jackc/pgpassfile v1.0.0 // indirect
	github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
	github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
	github.com/mattn/go-isatty v0.0.18 // indirect
	github.com/mattn/go-runewidth v0.0.15 // indirect
	github.com/muesli/reflow v0.3.0 // indirect
	github.com/muesli/termenv v0.15.2 // indirect
	github.com/rivo/uniseg v0.4.7 // indirect
	golang.org/x/crypto v0.31.0 // indirect
	golang.org/x/exp v0.0.0-20231006140011-7918f672742d // indirect
	golang.org/x/sys v0.28.0 // indirect
	golang.org/x/text v0.21.0 // indirect
)

## go.sum
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
github.com/charmbracelet/lipgloss v0.10.0 h1:KWeXFSexGcfahHX+54URiZGkBFazf70JNMtwg/AFW3s=
github.com/charmbracelet/lipgloss v0.10.0/go.mod h1:Wig9DSfvANsxqkRsqj6x87irdy123SR4dOXlKa91ciE=
github.com/charmbracelet/log v0.4.0 h1:G9bQAcx8rWA2T3pWvx7YtPTPwgqpk7D68BX21IRW8ZM=
github.com/charmbracelet/log v0.4.0/go.mod h1:63bXt/djrizTec0l11H20t8FDSvA4CRZJ1KH22MdptM=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/go-logfmt/logfmt v0.6.0 h1:wGYYu3uicYdqXVgoYbvnkrPVXkuLM1p1ifugDMEdRi4=
github.com/go-logfmt/logfmt v0.6.0/go.mod h1:WYhtIu8zTZfxdn5+rREduYbwxfcBr/Vr6KEVveWlfTs=
github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM=
github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg=
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo=
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM=
github.com/jackc/pgx/v5 v5.7.2 h1:mLoDLV6sonKlvjIEsV56SkWNCnuNv531l94GaIzO+XI=
github.com/jackc/pgx/v5 v5.7.2/go.mod h1:ncY89UGWxg82EykZUwSpUKEfccBGGYq1xjrOpsbsfGQ=
github.com/jackc/puddle/v2 v2.2.2 h1:PR8nw+E/1w0GLuRFSmiioY6UooMp6KJv0/61nB7icHo=
github.com/jackc/puddle/v2 v2.2.2/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4=
github.com/lucasb-eyer/go-colorful v1.2.0 h1:1nnpGOrhyZZuNyfu1QjKiUICQ74+3FNCN69Aj6K7nkY=
github.com/lucasb-eyer/go-colorful v1.2.0/go.mod h1:R4dSotOR9KMtayYi1e77YzuveK+i7ruzyGqttikkLy0=
github.com/mattn/go-isatty v0.0.18 h1:DOKFKCQ7FNG2L1rbrmstDN4QVRdS89Nkh85u68Uwp98=
github.com/mattn/go-isatty v0.0.18/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mattn/go-runewidth v0.0.12/go.mod h1:RAqKPSqVFrSLVXbA8x7dzmKdmGzieGRCM46jaSJTDAk=
github.com/mattn/go-runewidth v0.0.15 h1:UNAjwbU9l54TA3KzvqLGxwWjHmMgBUVhBiTjelZgg3U=
github.com/mattn/go-runewidth v0.0.15/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/muesli/reflow v0.3.0 h1:IFsN6K9NfGtjeggFP+68I4chLZV2yIKsXJFNZ+eWh6s=
github.com/muesli/reflow v0.3.0/go.mod h1:pbwTDkVPibjO2kyvBQRBxTWEEGDGq0FlB1BIKtnHY/8=
github.com/muesli/termenv v0.15.2 h1:GohcuySI0QmI3wN8Ok9PtKGkgkFIk7y6Vpb5PvrY+Wo=
github.com/muesli/termenv v0.15.2/go.mod h1:Epx+iuz8sNs7mNKhxzH4fWXGNpZwUaJKRS1noLXviQ8=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/rivo/uniseg v0.1.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
golang.org/x/crypto v0.31.0 h1:ihbySMvVjLAeSH1IbfcRTkD/iNscyz8rGzjF/E5hV6U=
golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo=
golang.org/x/sync v0.10.0 h1:3NQrjDixjgGwUOCaF8w2+VYHv0Ve/vGYSbdkTa98gmQ=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.28.0 h1:Fksou7UEQUWlKvIdsqzJmUmCX3cZuD2+P3XyyzwMhlA=
golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/text v0.21.0 h1:zyQAAkrwaneQ066sspRyJaG9VNi/YJ1NfzcGB3hZ/qo=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

## main.go
// Reproducer for PostgreSQL bug:
// recovery_target_action=pause is silently ignored when a RUNNING_XACTS WAL
// record with subxid_overflow=true is replayed during standby recovery.
//
// Root cause (three-step chain):
//   1. RUNNING_XACTS with subxid_overflow=true sets standbyState = STANDBY_SNAPSHOT_PENDING
//      instead of STANDBY_SNAPSHOT_READY (procarray.c).
//   2. STANDBY_SNAPSHOT_PENDING prevents LocalHotStandbyActive from ever becoming true
//      (xlogrecovery.c).
//   3. recoveryPausesHere() returns early when LocalHotStandbyActive=false, silently
//      skipping the pause and falling through to PROMOTE (xlogrecovery.c).
//
// Effect: standby promotes directly to primary instead of pausing.
// Symptom: pg_get_wal_replay_pause_state() throws ERROR: recovery is not in progress
//          (SQLSTATE 55000).
//
// Affected versions confirmed: PostgreSQL 17.x
// See: https://github.com/postgres/postgres/blob/REL_17_8/src/backend/storage/ipc/procarray.c#L1261
//      https://github.com/postgres/postgres/blob/REL_17_8/src/backend/access/transam/xlogrecovery.c#L2935
package main

import (
	"context"
	"errors"
	"fmt"
	"os"
	"os/exec"
	"path/filepath"
	"strings"
	"time"

	"github.com/charmbracelet/log"
	"github.com/jackc/pgx/v5"
	"github.com/jackc/pgx/v5/pgconn"
)

const (
	primaryPort = 15432
	standbyPort = 15433

	// PGPROC_MAX_CACHED_SUBXIDS = 64 in PostgreSQL source.
	// We need strictly more than 64 active subtransactions to trigger the overflow.
	numSavepoints = 100
)

// ────────────────────────────────────────────────────────────────────────────
// Entrypoint
// ────────────────────────────────────────────────────────────────────────────

func main() {
	log.SetReportTimestamp(false)

	binDir, err := findPgBinDir()
	if err != nil {
		log.Fatal("PostgreSQL not found", "err", err, "hint", "install PostgreSQL 14+ and ensure pg_ctl is in your PATH")
	}

	ver, _ := pgServerVersion(binDir)
	log.Info("PostgreSQL", "binaries", binDir, "version", ver)

	tmpDir, err := os.MkdirTemp("", "pg-subxid-overflow-*")
	if err != nil {
		log.Fatal("MkdirTemp", "err", err)
	}
	log.Info("working directory", "path", tmpDir)

	r := &repro{
		binDir:     binDir,
		tmpDir:     tmpDir,
		primaryDir: filepath.Join(tmpDir, "primary"),
		standbyDir: filepath.Join(tmpDir, "standby"),
		archiveDir: filepath.Join(tmpDir, "wal-archive"),
	}
	defer r.cleanup()

	if err := r.run(); err != nil {
		log.Fatal("reproducer failed", "err", err)
	}
}

// ────────────────────────────────────────────────────────────────────────────
// Reproducer
// ────────────────────────────────────────────────────────────────────────────

type repro struct {
	binDir     string
	tmpDir     string
	primaryDir string
	standbyDir string
	archiveDir string
}

func (r *repro) run() error {
	step("1/6", "Initialize and start primary PostgreSQL")
	if err := r.initPrimary(); err != nil {
		return fmt.Errorf("initPrimary: %w", err)
	}
	if err := r.startPrimary(); err != nil {
		return fmt.Errorf("startPrimary: %w", err)
	}

	ctx := context.Background()

	conn, err := r.connectWithRetry(ctx, primaryPort)
	if err != nil {
		return fmt.Errorf("connect to primary: %w", err)
	}
	defer conn.Close(ctx)

	step("2/6", fmt.Sprintf("Open transaction with %d SAVEPOINTs to overflow subxid cache (limit=64)", numSavepoints))
	if err := r.createSubxidOverflow(ctx, conn); err != nil {
		return fmt.Errorf("createSubxidOverflow: %w", err)
	}

	// The big transaction is still OPEN here. When pg_basebackup forces a
	// checkpoint (--checkpoint=fast), PostgreSQL emits a RUNNING_XACTS WAL
	// record that will have subxid_overflow=true because our transaction has
	// 100 active subtransaction XIDs — more than PGPROC_MAX_CACHED_SUBXIDS=64.
	step("3/6", "Take base backup (--checkpoint=fast emits RUNNING_XACTS with subxid_overflow=true)")
	if err := r.takeBaseBackup(); err != nil {
		return fmt.Errorf("takeBaseBackup: %w", err)
	}

	// We can now commit/rollback the big transaction — backup is done.
	if _, err := conn.Exec(ctx, "ROLLBACK"); err != nil {
		log.Warn("ROLLBACK failed", "err", err)
	}

	step("4/6", "Configure standby: recovery_target=immediate, recovery_target_action=pause")
	if err := r.configureStandby(); err != nil {
		return fmt.Errorf("configureStandby: %w", err)
	}

	step("5/6", "Start standby and replay WAL (should pause; will it?)")
	log.Info("expected behavior", "result", "standby pauses at recovery target")
	log.Warn("buggy behavior", "result", "standby silently promotes (recoveryPausesHere returns early)")
	fmt.Println()
	if err := r.startStandby(); err != nil {
		return fmt.Errorf("startStandby: %w", err)
	}

	step("6/6", "Query pg_get_wal_replay_pause_state() on standby")
	return r.checkBug(ctx)
}

// ────────────────────────────────────────────────────────────────────────────
// Primary setup
// ────────────────────────────────────────────────────────────────────────────

func (r *repro) initPrimary() error {
	if err := os.MkdirAll(r.archiveDir, 0o755); err != nil {
		return err
	}

	if err := r.pgCmd("initdb",
		"-D", r.primaryDir,
		"--no-locale",
		"--encoding=UTF8",
		"-U", "postgres",
	); err != nil {
		return fmt.Errorf("initdb: %w", err)
	}

	// Append to postgresql.conf
	pgconf := filepath.Join(r.primaryDir, "postgresql.conf")
	f, err := os.OpenFile(pgconf, os.O_APPEND|os.O_WRONLY, 0o644)
	if err != nil {
		return err
	}
	defer f.Close()

	archiveCmd := fmt.Sprintf("cp %%p %s/%%f", r.archiveDir)
	_, err = fmt.Fprintf(f, `

# ── subxid-overflow-repro ──────────────────────────
listen_addresses = 'localhost'
port             = %d
wal_level        = replica
archive_mode     = on
archive_command  = '%s'
max_wal_senders  = 5
wal_keep_size    = 64
`, primaryPort, archiveCmd)
	if err != nil {
		return err
	}

	// Allow trust auth for localhost (test cluster only)
	hba := filepath.Join(r.primaryDir, "pg_hba.conf")
	return os.WriteFile(hba, []byte(
		"local   all         all                              trust\n"+
			"host    all         all       127.0.0.1/32           trust\n"+
			"host    all         all       ::1/128                trust\n"+
			"host    replication all       127.0.0.1/32           trust\n"+
			"host    replication all       ::1/128                trust\n",
	), 0o644)
}

func (r *repro) startPrimary() error {
	return r.pgCmd("pg_ctl", "start", "-D", r.primaryDir, "-w")
}

// ────────────────────────────────────────────────────────────────────────────
// Subxid overflow trigger
// ────────────────────────────────────────────────────────────────────────────

func (r *repro) createSubxidOverflow(ctx context.Context, conn *pgx.Conn) error {
	// Create the helper table first (outside the big transaction so it persists)
	if _, err := conn.Exec(ctx,
		"CREATE TABLE IF NOT EXISTS subxid_test (id int)",
	); err != nil {
		return fmt.Errorf("CREATE TABLE: %w", err)
	}

	if _, err := conn.Exec(ctx, "BEGIN"); err != nil {
		return fmt.Errorf("BEGIN: %w", err)
	}

	// Create numSavepoints SAVEPOINTs, each with a write. Each SAVEPOINT
	// produces a subtransaction XID. After PGPROC_MAX_CACHED_SUBXIDS=64
	// subtransaction XIDs, the per-backend subxid cache overflows and the
	// `overflowed` flag is set in the proc array.
	for i := 1; i <= numSavepoints; i++ {
		if _, err := conn.Exec(ctx, fmt.Sprintf("SAVEPOINT s%d", i)); err != nil {
			return fmt.Errorf("SAVEPOINT s%d: %w", i, err)
		}
		if _, err := conn.Exec(ctx, fmt.Sprintf("INSERT INTO subxid_test VALUES (%d)", i)); err != nil {
			return fmt.Errorf("INSERT at s%d: %w", i, err)
		}
	}

	log.Info("subxid cache overflowed", "active_subtransactions", numSavepoints, "overflow_at", 64)
	log.Info("transaction held open — visible during the upcoming checkpoint")
	return nil
}

// ────────────────────────────────────────────────────────────────────────────
// Base backup
// ────────────────────────────────────────────────────────────────────────────

func (r *repro) takeBaseBackup() error {
	// --checkpoint=fast forces an immediate checkpoint on the primary.
	// Because our big transaction is still open, the checkpoint's RUNNING_XACTS
	// WAL record will have subxid_overflow=true.
	// --wal-method=stream captures all WAL generated during the backup so the
	// standby's pg_wal directory contains everything needed for recovery.
	if err := r.pgCmd("pg_basebackup",
		"-h", "localhost",
		"-p", fmt.Sprintf("%d", primaryPort),
		"-U", "postgres",
		"-D", r.standbyDir,
		"--wal-method=stream",
		"--checkpoint=fast",
		"-P",
	); err != nil {
		return fmt.Errorf("pg_basebackup: %w", err)
	}
	log.Info("base backup complete", "dir", r.standbyDir)
	return nil
}

// ────────────────────────────────────────────────────────────────────────────
// Standby setup
// ────────────────────────────────────────────────────────────────────────────

func (r *repro) configureStandby() error {
	// Create standby.signal so PostgreSQL enters recovery mode.
	if err := os.WriteFile(
		filepath.Join(r.standbyDir, "standby.signal"), []byte{}, 0o644,
	); err != nil {
		return err
	}

	// pg_basebackup may have created postgresql.auto.conf with a primary_conninfo
	// for streaming replication. Remove it: we want pure archive recovery so that
	// recovery is controlled solely by the WAL on disk — matching the pgbackup
	// restore-from-S3 scenario exactly.
	os.Remove(filepath.Join(r.standbyDir, "postgresql.auto.conf"))

	pgconf := filepath.Join(r.standbyDir, "postgresql.conf")
	f, err := os.OpenFile(pgconf, os.O_APPEND|os.O_WRONLY, 0o644)
	if err != nil {
		return err
	}
	defer f.Close()

	restoreCmd := fmt.Sprintf("cp %s/%%f %%p", r.archiveDir)
	_, err = fmt.Fprintf(f, `

# ── subxid-overflow-repro: standby recovery ───────
port                   = %d
restore_command        = '%s'
recovery_target        = 'immediate'
recovery_target_action = 'pause'   # BUG: this is silently ignored when
                                   # RUNNING_XACTS has subxid_overflow=true
hot_standby            = on
`, standbyPort, restoreCmd)
	return err
}

func (r *repro) startStandby() error {
	logFile := filepath.Join(r.standbyDir, "postgresql.log")
	pgctl := filepath.Join(r.binDir, "pg_ctl")

	cmd := exec.Command(pgctl,
		"start", "-D", r.standbyDir,
		"-w", "-t", "60",
		"-l", logFile,
	)
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		// pg_ctl -w can fail when recovery ends in an unexpected state (e.g. it
		// promoted instead of pausing). Log it and continue — checkBug() will
		// determine the actual outcome.
		log.Warn("pg_ctl start returned error (may be expected)", "err", err)
	}

	// Show relevant lines from the standby log.
	if data, _ := os.ReadFile(logFile); len(data) > 0 {
		fmt.Println()
		fmt.Println("  ┌─ Standby log (recovery-related lines) ──────────────────────")
		for _, line := range strings.Split(string(data), "\n") {
			lower := strings.ToLower(line)
			if strings.Contains(lower, "running_xacts") ||
				strings.Contains(lower, "subxid") ||
				strings.Contains(lower, "promot") ||
				strings.Contains(lower, "archive recovery complete") ||
				strings.Contains(lower, "recovery is") ||
				strings.Contains(lower, "paused") ||
				strings.Contains(lower, "consistent") ||
				strings.Contains(lower, "recovery target") {
				fmt.Printf("  │  %s\n", line)
			}
		}
		fmt.Println("  └─────────────────────────────────────────────────────────────")
		fmt.Println()
	}

	return nil
}

// ────────────────────────────────────────────────────────────────────────────
// Bug check
// ────────────────────────────────────────────────────────────────────────────

func (r *repro) checkBug(ctx context.Context) error {
	conn, err := r.connectWithRetry(ctx, standbyPort)
	if err != nil {
		return fmt.Errorf("connect to standby on port %d: %w", standbyPort, err)
	}
	defer conn.Close(ctx)

	// Also check pg_is_in_recovery() for extra clarity.
	var inRecovery bool
	_ = conn.QueryRow(ctx, "SELECT pg_is_in_recovery()").Scan(&inRecovery)

	var pauseState string
	queryErr := conn.QueryRow(ctx, "SELECT pg_get_wal_replay_pause_state()").Scan(&pauseState)

	printResult(inRecovery, pauseState, queryErr)
	return nil
}

func printResult(inRecovery bool, pauseState string, queryErr error) {
	fmt.Println()
	fmt.Println("══════════════════════════════════════════════════════════════════")

	if queryErr != nil {
		var pgErr *pgconn.PgError
		if errors.As(queryErr, &pgErr) &&
			pgErr.Code == "55000" &&
			strings.Contains(pgErr.Message, "recovery is not in progress") {

			log.Error("BUG REPRODUCED")
			fmt.Println()
			log.Info("pg_is_in_recovery()", "value", inRecovery)
			fmt.Println()
			log.Info("SELECT pg_get_wal_replay_pause_state()")
			log.Info("  expected", "value", "paused (recovery_target_action = pause)")
			log.Error("  actual", "value", fmt.Sprintf("ERROR: %s (SQLSTATE %s)", pgErr.Message, pgErr.Code))
			fmt.Println()
			log.Info("root cause — step 1", "file", "procarray.c:1261",
				"detail", "RUNNING_XACTS subxid_overflow=true sets standbyState=STANDBY_SNAPSHOT_PENDING instead of READY")
			log.Info("root cause — step 2", "file", "xlogrecovery.c:2266",
				"detail", "standbyState != STANDBY_SNAPSHOT_READY so LocalHotStandbyActive is never set to true")
			log.Info("root cause — step 3", "file", "xlogrecovery.c:2935",
				"detail", "recoveryPausesHere() returns early: `if (!LocalHotStandbyActive) return;` → promotes")
			fmt.Println()
			log.Warn("affected deployments",
				"condition", "recovery_target_action=pause + RUNNING_XACTS with subxid_overflow=true in replayed WAL",
				"trigger", "live transaction with >PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at checkpoint time")
		} else {
			log.Error("unexpected error", "err", queryErr)
		}
	} else {
		if pauseState == "paused" {
			log.Info("NOT REPRODUCED — behavior is correct (or bug is already fixed)")
			fmt.Println()
			log.Info("pg_is_in_recovery()", "value", inRecovery)
			log.Info("SELECT pg_get_wal_replay_pause_state()", "expected", "paused", "actual", "paused ✓")
			fmt.Println()
			log.Info("possible reasons",
				"1", "this PostgreSQL version has the fix applied",
				"2", "RUNNING_XACTS overflow record was not in the replayed WAL range",
				"hint", "try with PostgreSQL 17.x to confirm")
		} else {
			log.Warn("unexpected pause state",
				"pg_get_wal_replay_pause_state", pauseState,
				"pg_is_in_recovery", inRecovery)
		}
	}

	fmt.Println("══════════════════════════════════════════════════════════════════")
	fmt.Println()
}

// ────────────────────────────────────────────────────────────────────────────
// Helpers
// ────────────────────────────────────────────────────────────────────────────

func (r *repro) connectWithRetry(ctx context.Context, port int) (*pgx.Conn, error) {
	dsn := fmt.Sprintf(
		"host=localhost port=%d user=postgres dbname=postgres sslmode=disable",
		port,
	)
	var last error
	for i := 0; i < 40; i++ {
		conn, err := pgx.Connect(ctx, dsn)
		if err == nil {
			return conn, nil
		}
		last = err
		time.Sleep(500 * time.Millisecond)
	}
	return nil, fmt.Errorf("could not connect after 20s: %w", last)
}

func (r *repro) pgCmd(name string, args ...string) error {
	cmd := exec.Command(filepath.Join(r.binDir, name), args...)
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	return cmd.Run()
}

func (r *repro) cleanup() {
	fmt.Println()
	log.Info("cleanup")
	pgctl := filepath.Join(r.binDir, "pg_ctl")
	for _, dir := range []string{r.standbyDir, r.primaryDir} {
		if _, err := os.Stat(dir); err == nil {
			exec.Command(pgctl, "stop", "-D", dir, "-m", "immediate", "-w").Run()
		}
	}
	if err := os.RemoveAll(r.tmpDir); err != nil {
		log.Warn("cleanup failed", "path", r.tmpDir, "err", err)
		return
	}
	log.Info("removed temp dir", "path", r.tmpDir)
}

func step(n, msg string) {
	fmt.Println()
	log.Infof("step %s: %s", n, msg)
	fmt.Println()
}

func findPgBinDir() (string, error) {
	if path, err := exec.LookPath("pg_ctl"); err == nil {
		return filepath.Dir(path), nil
	}
	for _, dir := range []string{
		"/opt/homebrew/bin",
		"/usr/local/bin",
		"/opt/homebrew/opt/postgresql@17/bin",
		"/opt/homebrew/opt/postgresql@16/bin",
		"/opt/homebrew/opt/postgresql@15/bin",
		"/usr/local/opt/postgresql@17/bin",
		"/usr/local/opt/postgresql@16/bin",
		"/usr/lib/postgresql/17/bin",
		"/usr/lib/postgresql/16/bin",
		"/usr/lib/postgresql/15/bin",
	} {
		if _, err := os.Stat(filepath.Join(dir, "pg_ctl")); err == nil {
			return dir, nil
		}
	}
	return "", fmt.Errorf("pg_ctl not found in PATH or common locations")
}

func pgServerVersion(binDir string) (string, error) {
	out, err := exec.Command(filepath.Join(binDir, "postgres"), "--version").Output()
	if err != nil {
		return "unknown", err
	}
	return strings.TrimSpace(string(out)), nil
}

## output.log
 bug-subxid-overflow ▞▚  main ▞▚ go run main.go
INFO PostgreSQL binaries=/Users/seba/.local/postgres/18.1/install/bin version="postgres (PostgreSQL) 18.1"
INFO working directory path=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843

INFO step 1/6: Initialize and start primary PostgreSQL

The files belonging to this database system will be owned by user "seba".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory /var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/primary ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... America/Sao_Paulo
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /Users/seba/.local/postgres/18.1/install/bin/pg_ctl -D /var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/primary -l logfile start

waiting for server to start....2026-03-10 00:35:56.321 -03 [43624] LOG:  starting PostgreSQL 18.1 on aarch64-apple-darwin25.2.0, compiled by clang version 21.1.8, 64-bit
2026-03-10 00:35:56.322 -03 [43624] LOG:  listening on IPv6 address "::1", port 15432
2026-03-10 00:35:56.322 -03 [43624] LOG:  listening on IPv4 address "127.0.0.1", port 15432
2026-03-10 00:35:56.322 -03 [43624] LOG:  listening on Unix socket "/tmp/.s.PGSQL.15432"
2026-03-10 00:35:56.325 -03 [43630] LOG:  database system was shut down at 2026-03-10 00:35:56 -03
2026-03-10 00:35:56.326 -03 [43624] LOG:  database system is ready to accept connections
 done
server started

INFO step 2/6: Open transaction with 100 SAVEPOINTs to overflow subxid cache (limit=64)

INFO subxid cache overflowed active_subtransactions=100 overflow_at=64
INFO transaction held open — visible during the upcoming checkpoint

INFO step 3/6: Take base backup (--checkpoint=fast emits RUNNING_XACTS with subxid_overflow=true)

2026-03-10 00:35:56.452 -03 [43628] LOG:  checkpoint starting: immediate force wait
2026-03-10 00:35:56.456 -03 [43628] LOG:  checkpoint complete: wrote 23 buffers (0.1%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.005 s; sync files=21, longest=0.001 s, average=0.001 s; distance=8682 kB, estimate=8682 kB; lsn=0/2000080, redo lsn=0/2000028
23893/23893 kB (100%), 1/1 tablespace
INFO base backup complete dir=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/standby

INFO step 4/6: Configure standby: recovery_target=immediate, recovery_target_action=pause


INFO step 5/6: Start standby and replay WAL (should pause; will it?)

INFO expected behavior result="standby pauses at recovery target"
WARN buggy behavior result="standby silently promotes (recoveryPausesHere returns early)"

waiting for server to start.... done
server started

  ┌─ Standby log (recovery-related lines) ──────────────────────
  │  2026-03-10 03:35:56.747 GMT [43649] LOG:  skipping missing configuration file "/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/standby/postgresql.auto.conf"
  │  cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000002.history': No such file or directory
  │  cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/000000010000000000000003': No such file or directory
  │  2026-03-10 00:35:56.852 -03 [43655] LOG:  consistent recovery state reached at 0/2000120
  │  cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000002.history': No such file or directory
  │  cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000001.history': No such file or directory
  │  2026-03-10 00:35:56.910 -03 [43655] LOG:  archive recovery complete
  └─────────────────────────────────────────────────────────────


INFO step 6/6: Query pg_get_wal_replay_pause_state() on standby


══════════════════════════════════════════════════════════════════
ERRO BUG REPRODUCED

INFO pg_is_in_recovery() value=false

INFO SELECT pg_get_wal_replay_pause_state()
INFO   expected value="paused (recovery_target_action = pause)"
ERRO   actual value="ERROR: recovery is not in progress (SQLSTATE 55000)"

INFO root cause — step 1 file=procarray.c:1261 detail="RUNNING_XACTS subxid_overflow=true sets standbyState=STANDBY_SNAPSHOT_PENDING instead of READY"
INFO root cause — step 2 file=xlogrecovery.c:2266 detail="standbyState != STANDBY_SNAPSHOT_READY so LocalHotStandbyActive is never set to true"
INFO root cause — step 3 file=xlogrecovery.c:2935 detail="recoveryPausesHere() returns early: `if (!LocalHotStandbyActive) return;` → promotes"

WARN affected deployments condition="recovery_target_action=pause + RUNNING_XACTS with subxid_overflow=true in replayed WAL" trigger="live transaction with >PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at checkpoint time"
══════════════════════════════════════════════════════════════════


INFO cleanup
2026-03-10 00:35:57.061 -03 [43624] LOG:  received immediate shutdown request
2026-03-10 00:35:57.064 -03 [43624] LOG:  database system is shut down
INFO removed temp dir path=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843
Condition	Details
Long-running transaction with many subtransactions	The transaction must have more than `PGPROC_MAX_CACHED_SUBXIDS` (64) active subtransaction XIDs at the time of a checkpoint
Checkpoint while the transaction is open	The checkpoint emits a `RUNNING_XACTS` WAL record with `subxid_overflow=true`
Standby/restore replays that WAL record	The record falls inside the minimal WAL window required to reach consistency
`recovery_target_action = pause`	The default value; any configuration that expects recovery to pause
	module subxid-overflow-repro

	go 1.22

	require (
	github.com/charmbracelet/log v0.4.0
	github.com/jackc/pgx/v5 v5.7.2
	)

	require (
	github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
	github.com/charmbracelet/lipgloss v0.10.0 // indirect
	github.com/go-logfmt/logfmt v0.6.0 // indirect
	github.com/jackc/pgpassfile v1.0.0 // indirect
	github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
	github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
	github.com/mattn/go-isatty v0.0.18 // indirect
	github.com/mattn/go-runewidth v0.0.15 // indirect
	github.com/muesli/reflow v0.3.0 // indirect
	github.com/muesli/termenv v0.15.2 // indirect
	github.com/rivo/uniseg v0.4.7 // indirect
	golang.org/x/crypto v0.31.0 // indirect
	golang.org/x/exp v0.0.0-20231006140011-7918f672742d // indirect
	golang.org/x/sys v0.28.0 // indirect
	golang.org/x/text v0.21.0 // indirect
	)
	// Reproducer for PostgreSQL bug:
	// recovery_target_action=pause is silently ignored when a RUNNING_XACTS WAL
	// record with subxid_overflow=true is replayed during standby recovery.
	//
	// Root cause (three-step chain):
	// 1. RUNNING_XACTS with subxid_overflow=true sets standbyState = STANDBY_SNAPSHOT_PENDING
	// instead of STANDBY_SNAPSHOT_READY (procarray.c).
	// 2. STANDBY_SNAPSHOT_PENDING prevents LocalHotStandbyActive from ever becoming true
	// (xlogrecovery.c).
	// 3. recoveryPausesHere() returns early when LocalHotStandbyActive=false, silently
	// skipping the pause and falling through to PROMOTE (xlogrecovery.c).
	//
	// Effect: standby promotes directly to primary instead of pausing.
	// Symptom: pg_get_wal_replay_pause_state() throws ERROR: recovery is not in progress
	// (SQLSTATE 55000).
	//
	// Affected versions confirmed: PostgreSQL 17.x
	// See: https://github.com/postgres/postgres/blob/REL_17_8/src/backend/storage/ipc/procarray.c#L1261
	// https://github.com/postgres/postgres/blob/REL_17_8/src/backend/access/transam/xlogrecovery.c#L2935
	package main

	import (
	"context"
	"errors"
	"fmt"
	"os"
	"os/exec"
	"path/filepath"
	"strings"
	"time"

	"github.com/charmbracelet/log"
	"github.com/jackc/pgx/v5"
	"github.com/jackc/pgx/v5/pgconn"
	)

	const (
	primaryPort = 15432
	standbyPort = 15433

	// PGPROC_MAX_CACHED_SUBXIDS = 64 in PostgreSQL source.
	// We need strictly more than 64 active subtransactions to trigger the overflow.
	numSavepoints = 100
	)

	// ────────────────────────────────────────────────────────────────────────────
	// Entrypoint
	// ────────────────────────────────────────────────────────────────────────────

	func main() {
	log.SetReportTimestamp(false)

	binDir, err := findPgBinDir()
	if err != nil {
	log.Fatal("PostgreSQL not found", "err", err, "hint", "install PostgreSQL 14+ and ensure pg_ctl is in your PATH")
	}

	ver, _ := pgServerVersion(binDir)
	log.Info("PostgreSQL", "binaries", binDir, "version", ver)

	tmpDir, err := os.MkdirTemp("", "pg-subxid-overflow-*")
	if err != nil {
	log.Fatal("MkdirTemp", "err", err)
	}
	log.Info("working directory", "path", tmpDir)

	r := &repro{
	binDir: binDir,
	tmpDir: tmpDir,
	primaryDir: filepath.Join(tmpDir, "primary"),
	standbyDir: filepath.Join(tmpDir, "standby"),
	archiveDir: filepath.Join(tmpDir, "wal-archive"),
	}
	defer r.cleanup()

	if err := r.run(); err != nil {
	log.Fatal("reproducer failed", "err", err)
	}
	}

	// ────────────────────────────────────────────────────────────────────────────
	// Reproducer
	// ────────────────────────────────────────────────────────────────────────────

	type repro struct {
	binDir string
	tmpDir string
	primaryDir string
	standbyDir string
	archiveDir string
	}

	func (r *repro) run() error {
	step("1/6", "Initialize and start primary PostgreSQL")
	if err := r.initPrimary(); err != nil {
	return fmt.Errorf("initPrimary: %w", err)
	}
	if err := r.startPrimary(); err != nil {
	return fmt.Errorf("startPrimary: %w", err)
	}

	ctx := context.Background()

	conn, err := r.connectWithRetry(ctx, primaryPort)
	if err != nil {
	return fmt.Errorf("connect to primary: %w", err)
	}
	defer conn.Close(ctx)

	step("2/6", fmt.Sprintf("Open transaction with %d SAVEPOINTs to overflow subxid cache (limit=64)", numSavepoints))
	if err := r.createSubxidOverflow(ctx, conn); err != nil {
	return fmt.Errorf("createSubxidOverflow: %w", err)
	}

	// The big transaction is still OPEN here. When pg_basebackup forces a
	// checkpoint (--checkpoint=fast), PostgreSQL emits a RUNNING_XACTS WAL
	// record that will have subxid_overflow=true because our transaction has
	// 100 active subtransaction XIDs — more than PGPROC_MAX_CACHED_SUBXIDS=64.
	step("3/6", "Take base backup (--checkpoint=fast emits RUNNING_XACTS with subxid_overflow=true)")
	if err := r.takeBaseBackup(); err != nil {
	return fmt.Errorf("takeBaseBackup: %w", err)
	}

	// We can now commit/rollback the big transaction — backup is done.
	if _, err := conn.Exec(ctx, "ROLLBACK"); err != nil {
	log.Warn("ROLLBACK failed", "err", err)
	}

	step("4/6", "Configure standby: recovery_target=immediate, recovery_target_action=pause")
	if err := r.configureStandby(); err != nil {
	return fmt.Errorf("configureStandby: %w", err)
	}

	step("5/6", "Start standby and replay WAL (should pause; will it?)")
	log.Info("expected behavior", "result", "standby pauses at recovery target")
	log.Warn("buggy behavior", "result", "standby silently promotes (recoveryPausesHere returns early)")
	fmt.Println()
	if err := r.startStandby(); err != nil {
	return fmt.Errorf("startStandby: %w", err)
	}

	step("6/6", "Query pg_get_wal_replay_pause_state() on standby")
	return r.checkBug(ctx)
	}

	// ────────────────────────────────────────────────────────────────────────────
	// Primary setup
	// ────────────────────────────────────────────────────────────────────────────

	func (r *repro) initPrimary() error {
	if err := os.MkdirAll(r.archiveDir, 0o755); err != nil {
	return err
	}

	if err := r.pgCmd("initdb",
	"-D", r.primaryDir,
	"--no-locale",
	"--encoding=UTF8",
	"-U", "postgres",
	); err != nil {
	return fmt.Errorf("initdb: %w", err)
	}

	// Append to postgresql.conf
	pgconf := filepath.Join(r.primaryDir, "postgresql.conf")
	f, err := os.OpenFile(pgconf, os.O_APPEND\|os.O_WRONLY, 0o644)
	if err != nil {
	return err
	}
	defer f.Close()

	archiveCmd := fmt.Sprintf("cp %%p %s/%%f", r.archiveDir)
	_, err = fmt.Fprintf(f, `

	# ── subxid-overflow-repro ──────────────────────────
	listen_addresses = 'localhost'
	port = %d
	wal_level = replica
	archive_mode = on
	archive_command = '%s'
	max_wal_senders = 5
	wal_keep_size = 64
	`, primaryPort, archiveCmd)
	if err != nil {
	return err
	}

	// Allow trust auth for localhost (test cluster only)
	hba := filepath.Join(r.primaryDir, "pg_hba.conf")
	return os.WriteFile(hba, []byte(
	"local all all trust\n"+
	"host all all 127.0.0.1/32 trust\n"+
	"host all all ::1/128 trust\n"+
	"host replication all 127.0.0.1/32 trust\n"+
	"host replication all ::1/128 trust\n",
	), 0o644)
	}

	func (r *repro) startPrimary() error {
	return r.pgCmd("pg_ctl", "start", "-D", r.primaryDir, "-w")
	}

	// ────────────────────────────────────────────────────────────────────────────
	// Subxid overflow trigger
	// ────────────────────────────────────────────────────────────────────────────

	func (r repro) createSubxidOverflow(ctx context.Context, conn pgx.Conn) error {
	// Create the helper table first (outside the big transaction so it persists)
	if _, err := conn.Exec(ctx,
	"CREATE TABLE IF NOT EXISTS subxid_test (id int)",
	); err != nil {
	return fmt.Errorf("CREATE TABLE: %w", err)
	}

	if _, err := conn.Exec(ctx, "BEGIN"); err != nil {
	return fmt.Errorf("BEGIN: %w", err)
	}

	// Create numSavepoints SAVEPOINTs, each with a write. Each SAVEPOINT
	// produces a subtransaction XID. After PGPROC_MAX_CACHED_SUBXIDS=64
	// subtransaction XIDs, the per-backend subxid cache overflows and the
	// `overflowed` flag is set in the proc array.
	for i := 1; i <= numSavepoints; i++ {
	if _, err := conn.Exec(ctx, fmt.Sprintf("SAVEPOINT s%d", i)); err != nil {
	return fmt.Errorf("SAVEPOINT s%d: %w", i, err)
	}
	if _, err := conn.Exec(ctx, fmt.Sprintf("INSERT INTO subxid_test VALUES (%d)", i)); err != nil {
	return fmt.Errorf("INSERT at s%d: %w", i, err)
	}
	}

	log.Info("subxid cache overflowed", "active_subtransactions", numSavepoints, "overflow_at", 64)
	log.Info("transaction held open — visible during the upcoming checkpoint")
	return nil
	}

	// ────────────────────────────────────────────────────────────────────────────
	// Base backup
	// ────────────────────────────────────────────────────────────────────────────

	func (r *repro) takeBaseBackup() error {
	// --checkpoint=fast forces an immediate checkpoint on the primary.
	// Because our big transaction is still open, the checkpoint's RUNNING_XACTS
	// WAL record will have subxid_overflow=true.
	// --wal-method=stream captures all WAL generated during the backup so the
	// standby's pg_wal directory contains everything needed for recovery.
	if err := r.pgCmd("pg_basebackup",
	"-h", "localhost",
	"-p", fmt.Sprintf("%d", primaryPort),
	"-U", "postgres",
	"-D", r.standbyDir,
	"--wal-method=stream",
	"--checkpoint=fast",
	"-P",
	); err != nil {
	return fmt.Errorf("pg_basebackup: %w", err)
	}
	log.Info("base backup complete", "dir", r.standbyDir)
	return nil
	}

	// ────────────────────────────────────────────────────────────────────────────
	// Standby setup
	// ────────────────────────────────────────────────────────────────────────────

	func (r *repro) configureStandby() error {
	// Create standby.signal so PostgreSQL enters recovery mode.
	if err := os.WriteFile(
	filepath.Join(r.standbyDir, "standby.signal"), []byte{}, 0o644,
	); err != nil {
	return err
	}

	// pg_basebackup may have created postgresql.auto.conf with a primary_conninfo
	// for streaming replication. Remove it: we want pure archive recovery so that
	// recovery is controlled solely by the WAL on disk — matching the pgbackup
	// restore-from-S3 scenario exactly.
	os.Remove(filepath.Join(r.standbyDir, "postgresql.auto.conf"))

	pgconf := filepath.Join(r.standbyDir, "postgresql.conf")
	f, err := os.OpenFile(pgconf, os.O_APPEND\|os.O_WRONLY, 0o644)
	if err != nil {
	return err
	}
	defer f.Close()

	restoreCmd := fmt.Sprintf("cp %s/%%f %%p", r.archiveDir)
	_, err = fmt.Fprintf(f, `

	# ── subxid-overflow-repro: standby recovery ───────
	port = %d
	restore_command = '%s'
	recovery_target = 'immediate'
	recovery_target_action = 'pause' # BUG: this is silently ignored when
	# RUNNING_XACTS has subxid_overflow=true
	hot_standby = on
	`, standbyPort, restoreCmd)
	return err
	}

	func (r *repro) startStandby() error {
	logFile := filepath.Join(r.standbyDir, "postgresql.log")
	pgctl := filepath.Join(r.binDir, "pg_ctl")

	cmd := exec.Command(pgctl,
	"start", "-D", r.standbyDir,
	"-w", "-t", "60",
	"-l", logFile,
	)
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
	// pg_ctl -w can fail when recovery ends in an unexpected state (e.g. it
	// promoted instead of pausing). Log it and continue — checkBug() will
	// determine the actual outcome.
	log.Warn("pg_ctl start returned error (may be expected)", "err", err)
	}

	// Show relevant lines from the standby log.
	if data, _ := os.ReadFile(logFile); len(data) > 0 {
	fmt.Println()
	fmt.Println(" ┌─ Standby log (recovery-related lines) ──────────────────────")
	for _, line := range strings.Split(string(data), "\n") {
	lower := strings.ToLower(line)
	if strings.Contains(lower, "running_xacts") \|\|
	strings.Contains(lower, "subxid") \|\|
	strings.Contains(lower, "promot") \|\|
	strings.Contains(lower, "archive recovery complete") \|\|
	strings.Contains(lower, "recovery is") \|\|
	strings.Contains(lower, "paused") \|\|
	strings.Contains(lower, "consistent") \|\|
	strings.Contains(lower, "recovery target") {
	fmt.Printf(" │ %s\n", line)
	}
	}
	fmt.Println(" └─────────────────────────────────────────────────────────────")
	fmt.Println()
	}

	return nil
	}

	// ────────────────────────────────────────────────────────────────────────────
	// Bug check
	// ────────────────────────────────────────────────────────────────────────────

	func (r *repro) checkBug(ctx context.Context) error {
	conn, err := r.connectWithRetry(ctx, standbyPort)
	if err != nil {
	return fmt.Errorf("connect to standby on port %d: %w", standbyPort, err)
	}
	defer conn.Close(ctx)

	// Also check pg_is_in_recovery() for extra clarity.
	var inRecovery bool
	_ = conn.QueryRow(ctx, "SELECT pg_is_in_recovery()").Scan(&inRecovery)

	var pauseState string
	queryErr := conn.QueryRow(ctx, "SELECT pg_get_wal_replay_pause_state()").Scan(&pauseState)

	printResult(inRecovery, pauseState, queryErr)
	return nil
	}

	func printResult(inRecovery bool, pauseState string, queryErr error) {
	fmt.Println()
	fmt.Println("══════════════════════════════════════════════════════════════════")

	if queryErr != nil {
	var pgErr *pgconn.PgError
	if errors.As(queryErr, &pgErr) &&
	pgErr.Code == "55000" &&
	strings.Contains(pgErr.Message, "recovery is not in progress") {

	log.Error("BUG REPRODUCED")
	fmt.Println()
	log.Info("pg_is_in_recovery()", "value", inRecovery)
	fmt.Println()
	log.Info("SELECT pg_get_wal_replay_pause_state()")
	log.Info(" expected", "value", "paused (recovery_target_action = pause)")
	log.Error(" actual", "value", fmt.Sprintf("ERROR: %s (SQLSTATE %s)", pgErr.Message, pgErr.Code))
	fmt.Println()
	log.Info("root cause — step 1", "file", "procarray.c:1261",
	"detail", "RUNNING_XACTS subxid_overflow=true sets standbyState=STANDBY_SNAPSHOT_PENDING instead of READY")
	log.Info("root cause — step 2", "file", "xlogrecovery.c:2266",
	"detail", "standbyState != STANDBY_SNAPSHOT_READY so LocalHotStandbyActive is never set to true")
	log.Info("root cause — step 3", "file", "xlogrecovery.c:2935",
	"detail", "recoveryPausesHere() returns early: `if (!LocalHotStandbyActive) return;` → promotes")
	fmt.Println()
	log.Warn("affected deployments",
	"condition", "recovery_target_action=pause + RUNNING_XACTS with subxid_overflow=true in replayed WAL",
	"trigger", "live transaction with >PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at checkpoint time")
	} else {
	log.Error("unexpected error", "err", queryErr)
	}
	} else {
	if pauseState == "paused" {
	log.Info("NOT REPRODUCED — behavior is correct (or bug is already fixed)")
	fmt.Println()
	log.Info("pg_is_in_recovery()", "value", inRecovery)
	log.Info("SELECT pg_get_wal_replay_pause_state()", "expected", "paused", "actual", "paused ✓")
	fmt.Println()
	log.Info("possible reasons",
	"1", "this PostgreSQL version has the fix applied",
	"2", "RUNNING_XACTS overflow record was not in the replayed WAL range",
	"hint", "try with PostgreSQL 17.x to confirm")
	} else {
	log.Warn("unexpected pause state",
	"pg_get_wal_replay_pause_state", pauseState,
	"pg_is_in_recovery", inRecovery)
	}
	}

	fmt.Println("══════════════════════════════════════════════════════════════════")
	fmt.Println()
	}

	// ────────────────────────────────────────────────────────────────────────────
	// Helpers
	// ────────────────────────────────────────────────────────────────────────────

	func (r repro) connectWithRetry(ctx context.Context, port int) (pgx.Conn, error) {
	dsn := fmt.Sprintf(
	"host=localhost port=%d user=postgres dbname=postgres sslmode=disable",
	port,
	)
	var last error
	for i := 0; i < 40; i++ {
	conn, err := pgx.Connect(ctx, dsn)
	if err == nil {
	return conn, nil
	}
	last = err
	time.Sleep(500 * time.Millisecond)
	}
	return nil, fmt.Errorf("could not connect after 20s: %w", last)
	}

	func (r *repro) pgCmd(name string, args ...string) error {
	cmd := exec.Command(filepath.Join(r.binDir, name), args...)
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	return cmd.Run()
	}

	func (r *repro) cleanup() {
	fmt.Println()
	log.Info("cleanup")
	pgctl := filepath.Join(r.binDir, "pg_ctl")
	for _, dir := range []string{r.standbyDir, r.primaryDir} {
	if _, err := os.Stat(dir); err == nil {
	exec.Command(pgctl, "stop", "-D", dir, "-m", "immediate", "-w").Run()
	}
	}
	if err := os.RemoveAll(r.tmpDir); err != nil {
	log.Warn("cleanup failed", "path", r.tmpDir, "err", err)
	return
	}
	log.Info("removed temp dir", "path", r.tmpDir)
	}

	func step(n, msg string) {
	fmt.Println()
	log.Infof("step %s: %s", n, msg)
	fmt.Println()
	}

	func findPgBinDir() (string, error) {
	if path, err := exec.LookPath("pg_ctl"); err == nil {
	return filepath.Dir(path), nil
	}
	for _, dir := range []string{
	"/opt/homebrew/bin",
	"/usr/local/bin",
	"/opt/homebrew/opt/postgresql@17/bin",
	"/opt/homebrew/opt/postgresql@16/bin",
	"/opt/homebrew/opt/postgresql@15/bin",
	"/usr/local/opt/postgresql@17/bin",
	"/usr/local/opt/postgresql@16/bin",
	"/usr/lib/postgresql/17/bin",
	"/usr/lib/postgresql/16/bin",
	"/usr/lib/postgresql/15/bin",
	} {
	if _, err := os.Stat(filepath.Join(dir, "pg_ctl")); err == nil {
	return dir, nil
	}
	}
	return "", fmt.Errorf("pg_ctl not found in PATH or common locations")
	}

	func pgServerVersion(binDir string) (string, error) {
	out, err := exec.Command(filepath.Join(binDir, "postgres"), "--version").Output()
	if err != nil {
	return "unknown", err
	}
	return strings.TrimSpace(string(out)), nil
	}
	bug-subxid-overflow ▞▚  main ▞▚ go run main.go
	INFO PostgreSQL binaries=/Users/seba/.local/postgres/18.1/install/bin version="postgres (PostgreSQL) 18.1"
	INFO working directory path=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843

	INFO step 1/6: Initialize and start primary PostgreSQL

	The files belonging to this database system will be owned by user "seba".
	This user must also own the server process.

	The database cluster will be initialized with locale "C".
	The default text search configuration will be set to "english".

	Data page checksums are enabled.

	creating directory /var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/primary ... ok
	creating subdirectories ... ok
	selecting dynamic shared memory implementation ... posix
	selecting default "max_connections" ... 100
	selecting default "shared_buffers" ... 128MB
	selecting default time zone ... America/Sao_Paulo
	creating configuration files ... ok
	running bootstrap script ... ok
	performing post-bootstrap initialization ... ok
	syncing data to disk ... ok

	initdb: warning: enabling "trust" authentication for local connections
	initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

	Success. You can now start the database server using:

	/Users/seba/.local/postgres/18.1/install/bin/pg_ctl -D /var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/primary -l logfile start

	waiting for server to start....2026-03-10 00:35:56.321 -03 [43624] LOG: starting PostgreSQL 18.1 on aarch64-apple-darwin25.2.0, compiled by clang version 21.1.8, 64-bit
	2026-03-10 00:35:56.322 -03 [43624] LOG: listening on IPv6 address "::1", port 15432
	2026-03-10 00:35:56.322 -03 [43624] LOG: listening on IPv4 address "127.0.0.1", port 15432
	2026-03-10 00:35:56.322 -03 [43624] LOG: listening on Unix socket "/tmp/.s.PGSQL.15432"
	2026-03-10 00:35:56.325 -03 [43630] LOG: database system was shut down at 2026-03-10 00:35:56 -03
	2026-03-10 00:35:56.326 -03 [43624] LOG: database system is ready to accept connections
	done
	server started

	INFO step 2/6: Open transaction with 100 SAVEPOINTs to overflow subxid cache (limit=64)

	INFO subxid cache overflowed active_subtransactions=100 overflow_at=64
	INFO transaction held open — visible during the upcoming checkpoint

	INFO step 3/6: Take base backup (--checkpoint=fast emits RUNNING_XACTS with subxid_overflow=true)

	2026-03-10 00:35:56.452 -03 [43628] LOG: checkpoint starting: immediate force wait
	2026-03-10 00:35:56.456 -03 [43628] LOG: checkpoint complete: wrote 23 buffers (0.1%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.005 s; sync files=21, longest=0.001 s, average=0.001 s; distance=8682 kB, estimate=8682 kB; lsn=0/2000080, redo lsn=0/2000028
	23893/23893 kB (100%), 1/1 tablespace
	INFO base backup complete dir=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/standby

	INFO step 4/6: Configure standby: recovery_target=immediate, recovery_target_action=pause


	INFO step 5/6: Start standby and replay WAL (should pause; will it?)

	INFO expected behavior result="standby pauses at recovery target"
	WARN buggy behavior result="standby silently promotes (recoveryPausesHere returns early)"

	waiting for server to start.... done
	server started

	┌─ Standby log (recovery-related lines) ──────────────────────
	│ 2026-03-10 03:35:56.747 GMT [43649] LOG: skipping missing configuration file "/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/standby/postgresql.auto.conf"
	│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000002.history': No such file or directory
	│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/000000010000000000000003': No such file or directory
	│ 2026-03-10 00:35:56.852 -03 [43655] LOG: consistent recovery state reached at 0/2000120
	│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000002.history': No such file or directory
	│ cp: cannot stat '/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843/wal-archive/00000001.history': No such file or directory
	│ 2026-03-10 00:35:56.910 -03 [43655] LOG: archive recovery complete
	└─────────────────────────────────────────────────────────────


	INFO step 6/6: Query pg_get_wal_replay_pause_state() on standby


	══════════════════════════════════════════════════════════════════
	ERRO BUG REPRODUCED

	INFO pg_is_in_recovery() value=false

	INFO SELECT pg_get_wal_replay_pause_state()
	INFO expected value="paused (recovery_target_action = pause)"
	ERRO actual value="ERROR: recovery is not in progress (SQLSTATE 55000)"

	INFO root cause — step 1 file=procarray.c:1261 detail="RUNNING_XACTS subxid_overflow=true sets standbyState=STANDBY_SNAPSHOT_PENDING instead of READY"
	INFO root cause — step 2 file=xlogrecovery.c:2266 detail="standbyState != STANDBY_SNAPSHOT_READY so LocalHotStandbyActive is never set to true"
	INFO root cause — step 3 file=xlogrecovery.c:2935 detail="recoveryPausesHere() returns early: `if (!LocalHotStandbyActive) return;` → promotes"

	WARN affected deployments condition="recovery_target_action=pause + RUNNING_XACTS with subxid_overflow=true in replayed WAL" trigger="live transaction with >PGPROC_MAX_CACHED_SUBXIDS (64) active subtransaction XIDs at checkpoint time"
	══════════════════════════════════════════════════════════════════


	INFO cleanup
	2026-03-10 00:35:57.061 -03 [43624] LOG: received immediate shutdown request
	2026-03-10 00:35:57.064 -03 [43624] LOG: database system is shut down
	INFO removed temp dir path=/var/folders/hw/trbncgqn33qg3jsh7jf_3nw80000gn/T/pg-subxid-overflow-1941130843