yongkangc/stage.md

## stage.md

      
    Raw
  

              stage.md
            
          
    What Does "Pipeline Sync" Mean in Reth?

Pipeline sync refers to Reth's staged, sequential processing architecture for synchronizing blockchain data. It's not a single operation, but rather a multi-stage pipeline that processes blocks through a series of specialized stages in a specific order.
The Core Concept

Instead of downloading and processing blocks all at once, Reth breaks the sync process into 15 distinct stages that run serially:
Headers → Bodies → SenderRecovery → Execution → Hashing → Indexing → Pruning → Finish

Each stage performs a specific transformation on the block data and passes it to the next stage.
How It Works


External Component Sets Target: An external sync controller determines the chain tip (highest block)
Pipeline Executes Serially: Each stage runs from current local tip to target tip
Stages Process in Order:

Headers stage downloads block headers
Bodies stage downloads transactions
Execution stage runs all transactions
Hashing stages prepare state roots
Indexing stages optimize database lookups


Loop Until Complete: Pipeline repeats until it reaches the target

Key Characteristics


Crash Recovery: Each stage maintains checkpoints, so if Reth crashes, it can resume from the last successful stage
Memory Efficient: Processes blocks in batches (typically 50k blocks) with database commits between stages
Validation: Each stage validates its input before processing
Unwind Capability: If validation fails, pipeline can unwind (rollback) stages in reverse order

Performance Benefits


Parallel Processing: Some stages (like sender recovery) can be massively parallelized
I/O Optimization: Stages are designed to minimize database reads/writes
Bottleneck Isolation: Performance issues can be isolated to specific stages
Incremental Progress: Node can serve some RPC calls even while mid-sync

Real-World Example

When you see logs like:
ts=2025-09-09T05:34:36.858592244Z level=debug target=pruner message="Segment pruning finished" segment=Transactions

This is the Prune stage of the pipeline cleaning up old transaction data after the main sync stages have completed.
The pipeline sync approach is what makes Reth efficient for both initial sync (catching up from genesis) and ongoing sync (staying current with the network).
Reth Sync Pipeline Stages (15 Total)

Online Stages (Require Network Access)


Era - Downloads and processes ERA1 archive files (optional)
Headers - Downloads block headers from network to local storage
Bodies - Downloads block bodies (transactions) from network peers

Execution Stages (Offline Processing)


SenderRecovery - Validates and recovers transaction senders from signatures
Execution - Executes all transactions for blocks and generates state diffs
PruneSenderRecovery - Prunes sender recovery data (optional, based on config)

Hashing Stages (State Preparation)


MerkleUnwind - Prepares for state root calculation (unwind phase)
AccountHashing - Hashes account state for Merkle tree construction
StorageHashing - Hashes storage slots for Merkle tree construction
MerkleExecute - Computes final state root (Merkle tree) for blocks

Indexing Stages (Database Optimization)


TransactionLookup - Creates mappings from transaction hashes to sequential numbers
IndexStorageHistory - Indexes storage slot changes for historical lookups
IndexAccountHistory - Indexes account changes for historical state lookups

Maintenance Stages


Prune - Removes old data based on pruning configuration
Finish - Final cleanup and pipeline completion marker

Stage Organization

The stages are organized into logical StageSets:

DefaultStages: Complete pipeline (Online + Offline + Finish)
OnlineStages: Network-dependent stages (Headers, Bodies, Era)
OfflineStages: Local processing stages (Execution + Hashing + Indexing + Pruning)
ExecutionStages: Core transaction processing (SenderRecovery + Execution)
HashingStages: State hashing preparation (Merkle + Account + Storage)
HistoryIndexingStages: Database indexing for performance (Transaction + Storage + Account history)

Performance Context

As mentioned in your notes, the main bottlenecks are typically in:

Execution Stage: Transaction processing
Merkle Stages: State root calculations (heavy I/O)
Hashing Stages: CPU-intensive hashing operations

The pipeline processes blocks in batches (typically 50k blocks) with DB commits between stages for crash recovery and memory management.
No results found