Skip to content

Instantly share code, notes, and snippets.

@yongkangc
Last active September 11, 2025 05:56
Show Gist options
  • Select an option

  • Save yongkangc/4a81d4440c783861e77bdf32ab3a1c93 to your computer and use it in GitHub Desktop.

Select an option

Save yongkangc/4a81d4440c783861e77bdf32ab3a1c93 to your computer and use it in GitHub Desktop.
reth stages

What Does "Pipeline Sync" Mean in Reth?

Pipeline sync refers to Reth's staged, sequential processing architecture for synchronizing blockchain data. It's not a single operation, but rather a multi-stage pipeline that processes blocks through a series of specialized stages in a specific order.

The Core Concept

Instead of downloading and processing blocks all at once, Reth breaks the sync process into 15 distinct stages that run serially:

Headers → Bodies → SenderRecovery → Execution → Hashing → Indexing → Pruning → Finish

Each stage performs a specific transformation on the block data and passes it to the next stage.

How It Works

  1. External Component Sets Target: An external sync controller determines the chain tip (highest block)
  2. Pipeline Executes Serially: Each stage runs from current local tip to target tip
  3. Stages Process in Order:
    • Headers stage downloads block headers
    • Bodies stage downloads transactions
    • Execution stage runs all transactions
    • Hashing stages prepare state roots
    • Indexing stages optimize database lookups
  4. Loop Until Complete: Pipeline repeats until it reaches the target

Key Characteristics

  • Crash Recovery: Each stage maintains checkpoints, so if Reth crashes, it can resume from the last successful stage
  • Memory Efficient: Processes blocks in batches (typically 50k blocks) with database commits between stages
  • Validation: Each stage validates its input before processing
  • Unwind Capability: If validation fails, pipeline can unwind (rollback) stages in reverse order

Performance Benefits

  • Parallel Processing: Some stages (like sender recovery) can be massively parallelized
  • I/O Optimization: Stages are designed to minimize database reads/writes
  • Bottleneck Isolation: Performance issues can be isolated to specific stages
  • Incremental Progress: Node can serve some RPC calls even while mid-sync

Real-World Example

When you see logs like:

ts=2025-09-09T05:34:36.858592244Z level=debug target=pruner message="Segment pruning finished" segment=Transactions

This is the Prune stage of the pipeline cleaning up old transaction data after the main sync stages have completed.

The pipeline sync approach is what makes Reth efficient for both initial sync (catching up from genesis) and ongoing sync (staying current with the network).

Reth Sync Pipeline Stages (15 Total)

Online Stages (Require Network Access)

  1. Era - Downloads and processes ERA1 archive files (optional)
  2. Headers - Downloads block headers from network to local storage
  3. Bodies - Downloads block bodies (transactions) from network peers

Execution Stages (Offline Processing)

  1. SenderRecovery - Validates and recovers transaction senders from signatures
  2. Execution - Executes all transactions for blocks and generates state diffs
  3. PruneSenderRecovery - Prunes sender recovery data (optional, based on config)

Hashing Stages (State Preparation)

  1. MerkleUnwind - Prepares for state root calculation (unwind phase)
  2. AccountHashing - Hashes account state for Merkle tree construction
  3. StorageHashing - Hashes storage slots for Merkle tree construction
  4. MerkleExecute - Computes final state root (Merkle tree) for blocks

Indexing Stages (Database Optimization)

  1. TransactionLookup - Creates mappings from transaction hashes to sequential numbers
  2. IndexStorageHistory - Indexes storage slot changes for historical lookups
  3. IndexAccountHistory - Indexes account changes for historical state lookups

Maintenance Stages

  1. Prune - Removes old data based on pruning configuration
  2. Finish - Final cleanup and pipeline completion marker

Stage Organization

The stages are organized into logical StageSets:

  • DefaultStages: Complete pipeline (Online + Offline + Finish)
  • OnlineStages: Network-dependent stages (Headers, Bodies, Era)
  • OfflineStages: Local processing stages (Execution + Hashing + Indexing + Pruning)
  • ExecutionStages: Core transaction processing (SenderRecovery + Execution)
  • HashingStages: State hashing preparation (Merkle + Account + Storage)
  • HistoryIndexingStages: Database indexing for performance (Transaction + Storage + Account history)

Performance Context

As mentioned in your notes, the main bottlenecks are typically in:

  • Execution Stage: Transaction processing
  • Merkle Stages: State root calculations (heavy I/O)
  • Hashing Stages: CPU-intensive hashing operations

The pipeline processes blocks in batches (typically 50k blocks) with DB commits between stages for crash recovery and memory management.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment