Pipeline sync refers to Reth's staged, sequential processing architecture for synchronizing blockchain data. It's not a single operation, but rather a multi-stage pipeline that processes blocks through a series of specialized stages in a specific order.
Instead of downloading and processing blocks all at once, Reth breaks the sync process into 15 distinct stages that run serially:
Headers → Bodies → SenderRecovery → Execution → Hashing → Indexing → Pruning → Finish
Each stage performs a specific transformation on the block data and passes it to the next stage.
- External Component Sets Target: An external sync controller determines the chain tip (highest block)
- Pipeline Executes Serially: Each stage runs from current local tip to target tip
- Stages Process in Order:
- Headers stage downloads block headers
- Bodies stage downloads transactions
- Execution stage runs all transactions
- Hashing stages prepare state roots
- Indexing stages optimize database lookups
- Loop Until Complete: Pipeline repeats until it reaches the target
- Crash Recovery: Each stage maintains checkpoints, so if Reth crashes, it can resume from the last successful stage
- Memory Efficient: Processes blocks in batches (typically 50k blocks) with database commits between stages
- Validation: Each stage validates its input before processing
- Unwind Capability: If validation fails, pipeline can unwind (rollback) stages in reverse order
- Parallel Processing: Some stages (like sender recovery) can be massively parallelized
- I/O Optimization: Stages are designed to minimize database reads/writes
- Bottleneck Isolation: Performance issues can be isolated to specific stages
- Incremental Progress: Node can serve some RPC calls even while mid-sync
When you see logs like:
ts=2025-09-09T05:34:36.858592244Z level=debug target=pruner message="Segment pruning finished" segment=Transactions
This is the Prune stage of the pipeline cleaning up old transaction data after the main sync stages have completed.
The pipeline sync approach is what makes Reth efficient for both initial sync (catching up from genesis) and ongoing sync (staying current with the network).
- Era - Downloads and processes ERA1 archive files (optional)
- Headers - Downloads block headers from network to local storage
- Bodies - Downloads block bodies (transactions) from network peers
- SenderRecovery - Validates and recovers transaction senders from signatures
- Execution - Executes all transactions for blocks and generates state diffs
- PruneSenderRecovery - Prunes sender recovery data (optional, based on config)
- MerkleUnwind - Prepares for state root calculation (unwind phase)
- AccountHashing - Hashes account state for Merkle tree construction
- StorageHashing - Hashes storage slots for Merkle tree construction
- MerkleExecute - Computes final state root (Merkle tree) for blocks
- TransactionLookup - Creates mappings from transaction hashes to sequential numbers
- IndexStorageHistory - Indexes storage slot changes for historical lookups
- IndexAccountHistory - Indexes account changes for historical state lookups
- Prune - Removes old data based on pruning configuration
- Finish - Final cleanup and pipeline completion marker
The stages are organized into logical StageSets:
- DefaultStages: Complete pipeline (Online + Offline + Finish)
- OnlineStages: Network-dependent stages (Headers, Bodies, Era)
- OfflineStages: Local processing stages (Execution + Hashing + Indexing + Pruning)
- ExecutionStages: Core transaction processing (SenderRecovery + Execution)
- HashingStages: State hashing preparation (Merkle + Account + Storage)
- HistoryIndexingStages: Database indexing for performance (Transaction + Storage + Account history)
As mentioned in your notes, the main bottlenecks are typically in:
- Execution Stage: Transaction processing
- Merkle Stages: State root calculations (heavy I/O)
- Hashing Stages: CPU-intensive hashing operations
The pipeline processes blocks in batches (typically 50k blocks) with DB commits between stages for crash recovery and memory management.