Skip to content

Instantly share code, notes, and snippets.

@andlaf-ak
Created January 15, 2026 13:48
Show Gist options
  • Select an option

  • Save andlaf-ak/093a9e6ecf2a2b63477c39ae7f9d3cc6 to your computer and use it in GitHub Desktop.

Select an option

Save andlaf-ak/093a9e6ecf2a2b63477c39ae7f9d3cc6 to your computer and use it in GitHub Desktop.

User Stories: Raft Consensus Algorithm Implementation in Rust

Problem Summary

This document contains user stories for implementing the Raft consensus algorithm in Rust - a distributed consensus protocol that enables a cluster of computing nodes to agree on a series of state transitions and maintain a replicated state machine across multiple servers, even in the presence of failures.

User Personas

Primary Stakeholders

  1. Distributed Systems Engineer (DSE): Needs a correct, performant Raft implementation to build distributed systems
  2. Application Developer (AD): Uses Raft as a library to add consensus features to their application
  3. Operations Engineer (OE): Deploys, monitors, and maintains Raft clusters in production
  4. System Architect (SA): Designs distributed systems requiring consensus and fault tolerance

User Needs Summary

  • DSE: Requires safety guarantees, performance, and comprehensive observability
  • AD: Needs simple APIs, clear documentation, and reliable behavior
  • OE: Requires monitoring capabilities, operational tools, and debuggability
  • SA: Needs proven correctness, scalability characteristics, and deployment flexibility

Story Organization and Sequencing Rationale

The stories are organized into 8 major phases:

  1. Foundation/Infrastructure (F-series): Establishes project structure, core types, and abstractions
  2. Core Raft Algorithm (C-series): Implements leader election, log replication, and commit rules
  3. Persistence (P-series): Adds durable storage for critical state
  4. Network/RPC (N-series): Implements communication between nodes
  5. State Machine Integration (S-series): Connects Raft to application state machines
  6. Cluster Membership (M-series): Enables dynamic cluster configuration changes
  7. Observability/Monitoring (O-series): Adds metrics, logging, and health checks
  8. Testing and Validation (T-series): Comprehensive testing strategies

This sequencing enables incremental development where each story builds upon previous ones while delivering value independently.


Phase 1: Foundation and Infrastructure Stories

F-1: Project Setup and Basic Structure

As a Distributed Systems Engineer I want a properly initialized Rust project with standard tooling So that I have a solid foundation for implementing Raft with good development practices

Acceptance Criteria:

  • Cargo project created with appropriate metadata (name, version, authors)
  • Project includes standard directories (src, tests, examples, benches)
  • Basic .gitignore file excludes build artifacts and IDE files
  • README file describes the project purpose
  • License file is present (MIT or Apache 2.0)
  • Cargo.toml includes workspace configuration if needed
  • Project compiles successfully with cargo build
  • Basic CI configuration file is present (GitHub Actions or similar)

Definition of Done:

  • Developer can clone the repository and build successfully
  • All standard Rust tooling works (cargo build, cargo test, cargo clippy)
  • Project structure follows Rust community conventions

F-2: Core Type Definitions for Raft State

As a Distributed Systems Engineer I want well-defined types for basic Raft concepts So that I can represent Raft state with type safety and clarity

Acceptance Criteria:

  • NodeId type defined (wraps unique node identifier)
  • Term type defined (monotonically increasing term number)
  • LogIndex type defined (position in the log, starting from 1)
  • ServerState enum defined (Follower, Candidate, Leader)
  • Types implement required traits (Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord as appropriate)
  • Types have clear documentation comments explaining their purpose
  • Types use Rust's type system to prevent invalid states (e.g., Term cannot be negative)
  • Module structure organizes types logically

Definition of Done:

  • All types compile without warnings
  • Documentation is clear and accurate
  • Types can be used in subsequent stories
  • Unit tests verify type behavior (ordering, equality)

F-3: Log Entry Type Definition

As a Distributed Systems Engineer I want a type representing a single log entry So that I can store commands with their associated metadata

Acceptance Criteria:

  • LogEntry struct contains command data (generic over command type)
  • LogEntry includes term number when entry was created
  • LogEntry includes index position in the log
  • LogEntry supports serialization and deserialization
  • LogEntry implements Clone, Debug traits
  • Clear documentation explains each field's purpose
  • Type supports generic command payloads (using generics or trait objects)
  • LogEntry is immutable once created

Definition of Done:

  • LogEntry can be created and its fields accessed
  • LogEntry can be serialized to bytes and deserialized back
  • Type supports various command types
  • Documentation explains usage patterns

F-4: Persistent State Type Definition

As a Distributed Systems Engineer I want a type representing state that must survive crashes So that I can track what needs to be persisted for safety

Acceptance Criteria:

  • PersistentState struct includes current term
  • PersistentState includes votedFor (Option)
  • PersistentState includes log entries (Vec)
  • Type enforces invariants (e.g., term never decreases)
  • Type supports serialization for storage
  • Clear documentation explains why each field must be persistent
  • Type provides builder pattern or constructor for initialization
  • Type implements Clone, Debug traits

Definition of Done:

  • PersistentState can be created and modified
  • All fields are accessible via appropriate methods
  • Type can be serialized and deserialized
  • Documentation is complete and accurate

F-5: Volatile State Type Definition

As a Distributed Systems Engineer I want types for volatile state (not persisted) So that I can track runtime state that can be reconstructed after restart

Acceptance Criteria:

  • VolatileState struct includes commitIndex (highest known committed entry)
  • VolatileState includes lastApplied (highest entry applied to state machine)
  • LeaderState struct includes nextIndex map (per-follower tracking)
  • LeaderState includes matchIndex map (per-follower tracking)
  • Types implement Default for initialization
  • Clear documentation explains each field's purpose and when it's used
  • Types support efficient updates and queries
  • Types implement Clone, Debug traits

Definition of Done:

  • VolatileState and LeaderState can be created and modified
  • Fields can be updated atomically where needed
  • Types integrate cleanly with persistent state
  • Documentation explains relationship to persistent state

F-6: RPC Message Type Definitions

As a Distributed Systems Engineer I want types for all Raft RPC messages So that I can communicate between nodes with type safety

Acceptance Criteria:

  • RequestVoteRequest struct includes term, candidateId, lastLogIndex, lastLogTerm
  • RequestVoteResponse struct includes term, voteGranted
  • AppendEntriesRequest struct includes term, leaderId, prevLogIndex, prevLogTerm, entries, leaderCommit
  • AppendEntriesResponse struct includes term, success, matchIndex
  • All messages support serialization/deserialization
  • Messages implement Clone, Debug traits
  • Clear documentation explains each field per Raft specification
  • Types use previously defined types (Term, NodeId, LogIndex, LogEntry)

Definition of Done:

  • All message types compile and can be instantiated
  • Messages can be serialized to bytes and deserialized
  • Documentation matches Raft paper specification
  • Message types are used in RPC layer stories

F-7: Storage Abstraction Trait

As a Distributed Systems Engineer I want a trait defining the storage interface So that I can swap storage implementations without changing core Raft logic

Acceptance Criteria:

  • Storage trait defines methods for reading/writing term
  • Storage trait defines methods for reading/writing votedFor
  • Storage trait defines methods for appending log entries
  • Storage trait defines methods for reading log entries by index
  • Storage trait defines method for reading last log entry
  • All methods return Result types for error handling
  • Trait is async-compatible (returns Future or uses async_trait)
  • Clear documentation explains storage contract and durability requirements
  • Trait includes method for atomic multi-field updates

Definition of Done:

  • Storage trait compiles and is well-documented
  • Trait can be implemented by different backends
  • Error types are defined for storage failures
  • Trait supports all operations needed by Raft

F-8: In-Memory Storage Implementation

As a Distributed Systems Engineer I want an in-memory storage implementation So that I can test Raft logic without disk I/O complexity

Acceptance Criteria:

  • InMemoryStorage implements Storage trait
  • Storage holds term, votedFor, and log entries in memory
  • All Storage trait methods are implemented correctly
  • Implementation uses appropriate synchronization (Mutex/RwLock) for thread safety
  • Implementation returns errors for invalid operations (e.g., reading non-existent entry)
  • Clear documentation explains this is for testing, not production
  • Storage can be created empty or with initial state
  • Implementation is efficient for test scenarios

Definition of Done:

  • InMemoryStorage compiles and all trait methods work
  • Basic tests verify reading and writing state
  • Storage can be used in unit tests for other components
  • Documentation clarifies testing-only usage

F-9: Error Type Definitions

As a Distributed Systems Engineer I want comprehensive error types for all failure modes So that I can handle errors appropriately and provide clear diagnostics

Acceptance Criteria:

  • RaftError enum covers all error categories (Storage, Network, Protocol, Configuration)
  • Each error variant includes context (message, source error if applicable)
  • Errors implement std::error::Error trait
  • Errors implement Display for user-friendly messages
  • Errors distinguish recoverable from fatal errors
  • Clear documentation explains when each error occurs
  • Errors support error chaining (from other error types)
  • Errors include structured data for programmatic handling

Definition of Done:

  • RaftError compiles and implements required traits
  • Errors can be created and converted from underlying errors
  • Error messages are clear and actionable
  • Documentation explains error handling strategies

F-10: Configuration Type Definition

As a Application Developer I want a configuration structure with sensible defaults So that I can configure Raft behavior without understanding all details

Acceptance Criteria:

  • RaftConfig struct includes election timeout range (min/max)
  • RaftConfig includes heartbeat interval
  • RaftConfig includes node ID and cluster peer list
  • RaftConfig includes storage configuration
  • RaftConfig provides builder pattern for construction
  • Default values match Raft paper recommendations (150-300ms election timeout)
  • Configuration validates values (e.g., heartbeat < election timeout)
  • Clear documentation explains each configuration option
  • Configuration supports loading from file or environment variables

Definition of Done:

  • RaftConfig can be created with defaults
  • Configuration can be customized via builder
  • Invalid configurations are rejected with clear error messages
  • Documentation explains tuning recommendations

Phase 2: Core Raft Algorithm Stories

C-1: Follower State - Basic Initialization

As a Distributed Systems Engineer I want nodes to initialize in follower state So that new nodes start in a safe, passive state

Acceptance Criteria:

  • Node initializes with ServerState::Follower
  • Node initializes persistent state (term=0, votedFor=None, empty log)
  • Node initializes volatile state (commitIndex=0, lastApplied=0)
  • Node loads existing state from storage if available
  • Clear logging indicates node started as follower
  • Node ID is set from configuration
  • Initialization completes successfully or returns clear error

Definition of Done:

  • Node can be created and starts as follower
  • State is properly initialized from storage or defaults
  • Logs provide visibility into initialization process
  • Node is ready to receive RPCs

C-2: Follower State - Election Timer Initialization

As a Distributed Systems Engineer I want followers to start an election timer So that followers can detect leader failure

Acceptance Criteria:

  • Follower creates timer with randomized timeout (within configured range)
  • Timeout is chosen uniformly from election timeout range
  • Timer starts when node becomes follower
  • Timer can be reset without restarting the node
  • Clear logging shows timer value and expiration time
  • Timer uses monotonic clock (not wall clock)
  • Timer is cancelled when node changes state
  • Different followers choose different random timeouts

Definition of Done:

  • Election timer runs and can be observed
  • Timer timeout is randomized appropriately
  • Timer can be reset on receiving valid RPCs
  • Logging provides visibility into timer behavior

C-3: Follower State - Handle Election Timeout

As a Distributed Systems Engineer I want followers to transition to candidate when election timer expires So that the cluster can elect a new leader when current leader fails

Acceptance Criteria:

  • When election timer expires, follower transitions to candidate state
  • Node increments term before transitioning
  • New term is persisted before state transition
  • Election timeout event is logged clearly
  • Transition only occurs if node is still a follower
  • Timer expiration doesn't occur if timer was reset
  • State transition is atomic

Definition of Done:

  • Follower becomes candidate after timeout expires
  • Term is incremented and persisted
  • State transition is logged and observable
  • No data races in state transition

C-4: Candidate State - Start Election

As a Distributed Systems Engineer I want candidates to start an election when entering candidate state So that a new leader can be elected

Acceptance Criteria:

  • Candidate votes for itself (updates votedFor)
  • Vote for self is persisted before requesting votes from others
  • Candidate resets election timer with new random timeout
  • Candidate tracks votes received (starts with 1 for self)
  • Current term is already incremented (from follower transition)
  • Clear logging indicates election started with term number
  • Candidate prepares to send RequestVote RPCs (actual sending in later story)
  • Election state is initialized (vote count, nodes contacted)

Definition of Done:

  • Candidate votes for itself and persists vote
  • Election timer is reset with new random value
  • Vote tracking is initialized
  • Election start is logged and observable

C-5: Candidate State - Determine Election Result (Quorum Check)

As a Distributed Systems Engineer I want candidates to detect when they receive majority votes So that successful candidates can become leader

Acceptance Criteria:

  • Candidate calculates quorum size (cluster_size / 2 + 1)
  • Candidate counts votes including self-vote
  • When vote count reaches quorum, candidate recognizes election success
  • Quorum calculation works for odd and even cluster sizes
  • Vote counting is thread-safe (if concurrent vote responses)
  • Clear logging when quorum is reached
  • Candidate tracks which nodes granted votes (for debugging)
  • Duplicate votes from same node don't count twice

Definition of Done:

  • Candidate correctly identifies when quorum is reached
  • Quorum calculation is correct for various cluster sizes
  • Vote counting is safe under concurrency
  • Election success is logged and observable

C-6: Candidate State - Transition to Leader

As a Distributed Systems Engineer I want candidates to become leader upon winning election So that the cluster has a leader to coordinate operations

Acceptance Criteria:

  • Candidate transitions to Leader state after receiving quorum votes
  • Leader initializes nextIndex for each follower (last log index + 1)
  • Leader initializes matchIndex for each follower (0)
  • Election timer is stopped (leaders don't need it)
  • State transition is atomic
  • Clear logging indicates node became leader with term number
  • Leader state is initialized before accepting client requests
  • Transition only occurs if still in candidate state and term unchanged

Definition of Done:

  • Candidate becomes leader after winning election
  • Leader state (nextIndex, matchIndex) is properly initialized
  • State transition is logged and observable
  • Leader is ready to send heartbeats

C-7: Candidate State - Handle Split Vote (Retry Election)

As a Distributed Systems Engineer I want candidates to retry elections when timeouts occur without winning So that elections eventually succeed even with split votes

Acceptance Criteria:

  • When election timer expires in candidate state, start new election
  • Increment term before starting new election
  • Reset election timer with new random timeout
  • Reset vote count and vote for self again
  • New term is persisted before proceeding
  • Clear logging indicates new election attempt with new term
  • Limit is not imposed on election retries (retry indefinitely)
  • Previous election state is cleared

Definition of Done:

  • Candidate retries election on timeout
  • Each retry uses a new term and random timeout
  • Split votes eventually resolve through retries
  • Retry behavior is logged and observable

C-8: Leader State - Initialize Heartbeat Timer

As a Distributed Systems Engineer I want leaders to start a heartbeat timer So that leaders can maintain leadership by sending periodic heartbeats

Acceptance Criteria:

  • Leader creates heartbeat timer with configured interval
  • Heartbeat interval is less than election timeout (typically electionTimeout / 2)
  • Timer starts when node becomes leader
  • Timer fires periodically until node is no longer leader
  • Clear logging shows heartbeat interval
  • Timer is cancelled when leader loses leadership
  • Timer uses monotonic clock
  • Heartbeat timer is separate from election timer

Definition of Done:

  • Heartbeat timer runs while node is leader
  • Timer fires at configured interval
  • Timer is stopped when leadership ends
  • Timer behavior is logged and observable

C-9: Leader State - Prepare Heartbeat Messages

As a Distributed Systems Engineer I want leaders to create AppendEntries messages for heartbeats So that followers know the leader is alive

Acceptance Criteria:

  • Leader creates AppendEntriesRequest for each follower
  • Heartbeat messages have empty entries array
  • Messages include current term
  • Messages include leader's ID
  • Messages include prevLogIndex and prevLogTerm (based on follower's nextIndex)
  • Messages include current commitIndex
  • Each follower gets a message tailored to their state (based on nextIndex)
  • Messages are prepared when heartbeat timer fires
  • Message creation doesn't modify any state

Definition of Done:

  • Heartbeat messages are created correctly
  • Each follower receives appropriate message
  • Messages contain all required fields per Raft spec
  • Message preparation is efficient

C-10: Follower State - Accept Heartbeat (Update Timer)

As a Distributed Systems Engineer I want followers to reset election timer on receiving valid heartbeat So that followers don't start unnecessary elections while leader is active

Acceptance Criteria:

  • Follower receives AppendEntriesRequest with empty entries (heartbeat)
  • Follower verifies message term is >= current term
  • If message term is higher, update current term and persist
  • Reset election timer with new random timeout
  • Return success response if log consistency check passes
  • Clear logging indicates heartbeat received from leader
  • Don't start election while receiving regular heartbeats
  • Handle heartbeat even if log is inconsistent (return failure but reset timer)

Definition of Done:

  • Followers reset timer on valid heartbeats
  • Followers remain followers while heartbeats arrive
  • Timer reset prevents unnecessary elections
  • Heartbeat handling is logged

C-11: Term Update - Higher Term Detection

As a Distributed Systems Engineer I want nodes to detect when they see a higher term So that nodes can update to current term and maintain safety

Acceptance Criteria:

  • Node compares incoming RPC term with current term
  • If incoming term > current term, update current term
  • New term is persisted before continuing
  • Reset votedFor to None when updating term
  • If node is leader or candidate, step down to follower
  • Clear logging indicates term update with old and new values
  • Term update is atomic with state change
  • Respond to RPC using updated term

Definition of Done:

  • Nodes correctly detect higher terms
  • Term and votedFor are updated and persisted
  • Leaders/candidates step down when seeing higher term
  • Term updates are logged and observable

C-12: RequestVote RPC - Basic Request Validation

As a Distributed Systems Engineer I want nodes to validate RequestVote requests So that only valid vote requests are granted

Acceptance Criteria:

  • Node receives RequestVoteRequest
  • Check if request term >= current term
  • If request term < current term, reject with current term in response
  • If request term > current term, update term first (per C-11)
  • Validate request contains required fields (candidateId, lastLogIndex, lastLogTerm)
  • Log validation result (granted or denied with reason)
  • Response includes current term
  • Validation doesn't grant vote yet (just checks basic validity)

Definition of Done:

  • RequestVote requests are validated correctly
  • Invalid requests are rejected with appropriate response
  • Term updates occur before vote granting
  • Validation is logged

C-13: RequestVote RPC - Vote Granting (Already Voted Check)

As a Distributed Systems Engineer I want nodes to grant votes only if not already voted this term So that each node votes at most once per term (election safety)

Acceptance Criteria:

  • Node checks votedFor field
  • If votedFor is None, vote can be granted (subject to other checks)
  • If votedFor is Some(candidateId) for this candidate, grant vote (idempotent)
  • If votedFor is Some(differentId), reject vote
  • Vote granting updates votedFor and persists before responding
  • Clear logging shows vote decision and reason
  • Vote persistence is atomic
  • Response includes voteGranted boolean and current term

Definition of Done:

  • Nodes vote at most once per term
  • Vote granting respects existing votes
  • Votes are persisted before responding
  • Vote decisions are logged with reasons

C-14: RequestVote RPC - Log Up-to-Date Check

As a Distributed Systems Engineer I want nodes to grant votes only to candidates with up-to-date logs So that new leaders have all committed entries (leader completeness)

Acceptance Criteria:

  • Compare candidate's last log term with voter's last log term
  • If candidate's last term > voter's last term, candidate log is more up-to-date
  • If terms equal, compare last log index (higher index is more up-to-date)
  • If candidate's log is not at least as up-to-date, reject vote
  • If candidate's log is up-to-date and other checks pass, grant vote
  • Clear logging shows log comparison result
  • Comparison uses lastLogIndex and lastLogTerm from request
  • Log comparison is correct for empty logs (term=0, index=0)

Definition of Done:

  • Vote granting includes log up-to-date check
  • Only candidates with up-to-date logs receive votes
  • Log comparison logic is correct per Raft spec
  • Comparison results are logged

C-15: AppendEntries RPC - Consistency Check (Matching Entry)

As a Distributed Systems Engineer I want followers to verify log consistency on AppendEntries So that followers maintain identical log prefixes (log matching property)

Acceptance Criteria:

  • Follower checks if log contains entry at prevLogIndex with term = prevLogTerm
  • If prevLogIndex is 0, consistency check passes (for first entry)
  • If log doesn't have entry at prevLogIndex, return failure
  • If log has entry at prevLogIndex but different term, return failure
  • If consistency check passes, proceed with appending entries
  • If consistency check fails, return success=false with current term
  • Clear logging shows consistency check result with index/term values
  • Don't modify log on consistency failure

Definition of Done:

  • Followers correctly verify log consistency
  • Inconsistent logs are detected and rejected
  • Consistency check follows Raft specification
  • Check results are logged with diagnostic information

C-16: AppendEntries RPC - Append New Entries

As a Distributed Systems Engineer I want followers to append new entries from leader So that follower logs replicate leader's log

Acceptance Criteria:

  • After successful consistency check, append entries from request
  • If log already has entries at those indices, they must match (or be deleted)
  • Delete conflicting entries and all that follow (log must match leader)
  • Append all new entries to log
  • Persist new log entries before responding
  • Clear logging shows which entries were appended (indices/terms)
  • Entry appending is atomic (all or nothing)
  • Response indicates success=true after appending

Definition of Done:

  • Followers append entries correctly
  • Conflicting entries are handled properly
  • Log entries are persisted before response
  • Append operations are logged

C-17: AppendEntries RPC - Update Commit Index

As a Distributed Systems Engineer I want followers to update commit index from leader So that followers know which entries are committed

Acceptance Criteria:

  • Follower receives leaderCommit field from AppendEntriesRequest
  • If leaderCommit > follower's commitIndex, update commitIndex
  • New commitIndex is min(leaderCommit, index of last new entry)
  • CommitIndex never decreases
  • Clear logging when commitIndex advances
  • CommitIndex update doesn't require persistence (volatile state)
  • Trigger state machine application after commit index update
  • CommitIndex is bounded by follower's log size

Definition of Done:

  • Followers update commitIndex from leader
  • CommitIndex advances monotonically
  • CommitIndex never exceeds log size
  • Updates are logged and observable

C-18: Leader State - Track Follower Replication Progress

As a Distributed Systems Engineer I want leaders to update matchIndex based on AppendEntries responses So that leaders know how much of the log each follower has replicated

Acceptance Criteria:

  • Leader receives AppendEntriesResponse from follower
  • If response success=true, update matchIndex[follower] to last appended entry index
  • Update nextIndex[follower] to matchIndex[follower] + 1
  • If response success=false, decrement nextIndex[follower] and retry
  • MatchIndex values are used to determine commit point
  • Clear logging shows matchIndex and nextIndex updates
  • Updates are thread-safe if responses arrive concurrently
  • Leader doesn't update matchIndex for itself

Definition of Done:

  • Leaders track replication progress per follower
  • matchIndex and nextIndex are updated correctly
  • Failed AppendEntries trigger retry with lower nextIndex
  • Progress tracking is logged

C-19: Leader State - Determine Commit Point (Majority Replication)

As a Distributed Systems Engineer I want leaders to advance commit index when majority has replicated So that entries become committed and can be applied to state machine

Acceptance Criteria:

  • Leader calculates highest index N where majority have matchIndex >= N
  • N must be > current commitIndex
  • Entry at index N must have term = current term (safety check)
  • If such N exists, update commitIndex to N
  • CommitIndex never decreases
  • Clear logging when commitIndex advances with new value
  • Commit determination runs after each AppendEntries response
  • Leader's own log counts toward majority (leader has all entries)

Definition of Done:

  • Leaders correctly determine commit point
  • Commit requires majority replication
  • Only entries from current term are committed directly
  • Commit index advancement is logged

C-20: Leader State - Accept Client Command

As a Application Developer I want to submit commands to the leader So that my commands are replicated and applied to the state machine

Acceptance Criteria:

  • Leader provides interface to accept client commands
  • Leader appends command to its local log with current term
  • New log entry is persisted before responding to client
  • Leader returns index and term to client (for tracking)
  • Leader begins replicating entry via AppendEntries RPCs
  • If node is not leader, reject command with error (redirect to leader)
  • Clear logging shows command acceptance with assigned index
  • Command acceptance is thread-safe

Definition of Done:

  • Clients can submit commands to leader
  • Commands are appended to log and persisted
  • Replication begins immediately
  • Non-leaders reject commands appropriately

C-21: Client Interface - Wait for Commit Notification

As an Application Developer I want to receive notification when my command is committed So that I know the operation is durable and will be applied

Acceptance Criteria:

  • Leader provides mechanism to wait for specific log index to be committed
  • When commitIndex advances past command's index, notify waiting client
  • Notification includes success indication
  • If leadership is lost before commit, notify client of failure
  • Waiting is async (doesn't block leader operation)
  • Clear timeout mechanism for client waits
  • Multiple clients can wait for different indices simultaneously
  • Notification occurs promptly after commit

Definition of Done:

  • Clients can wait for commit notification
  • Notifications are sent when commands commit
  • Leadership loss is communicated to waiting clients
  • Waiting doesn't block the Raft node

C-22: State Machine Application - Apply Committed Entries

As a Distributed Systems Engineer I want committed entries to be applied to the state machine So that the replicated state machine processes commands

Acceptance Criteria:

  • When commitIndex > lastApplied, apply intervening entries
  • Apply entries in strict log order (from lastApplied+1 to commitIndex)
  • Each entry is applied exactly once to the state machine
  • Update lastApplied after successful application
  • Clear logging shows which entries are being applied
  • Application happens on all nodes (leaders and followers)
  • State machine application is decoupled from log replication
  • Handle state machine errors gracefully

Definition of Done:

  • Committed entries are applied to state machine
  • Entries are applied in order, exactly once
  • lastApplied tracks application progress
  • Application is logged and observable

C-23: Log Compaction - Snapshot Trigger

As an Operations Engineer I want nodes to detect when snapshots are needed So that log size is bounded and doesn't grow indefinitely

Acceptance Criteria:

  • Monitor log size (number of entries or bytes)
  • When log exceeds configured threshold, trigger snapshot
  • Snapshot only includes committed entries (up to lastApplied)
  • Snapshot triggering doesn't block normal operations
  • Clear logging indicates snapshot is being created
  • Configurable threshold for snapshot trigger (entries or size)
  • Snapshot creation can be triggered manually for testing
  • Only create snapshot if there are entries to compact

Definition of Done:

  • Nodes detect when snapshot is needed
  • Snapshot creation is triggered appropriately
  • Triggering doesn't disrupt normal operation
  • Snapshot triggers are logged

C-24: Log Compaction - Create Snapshot

As a Distributed Systems Engineer I want nodes to create snapshots of state machine state So that old log entries can be discarded

Acceptance Criteria:

  • Capture current state machine state as snapshot
  • Snapshot includes lastIncludedIndex (last entry applied to state)
  • Snapshot includes lastIncludedTerm (term of lastIncludedIndex entry)
  • Snapshot creation is atomic (consistent point-in-time)
  • Snapshot is stored durably (via storage layer)
  • Clear logging indicates snapshot creation progress
  • Snapshot creation doesn't block state machine queries
  • Old snapshots can be replaced with newer ones

Definition of Done:

  • Nodes create snapshots of state machine state
  • Snapshots include required metadata
  • Snapshots are stored durably
  • Snapshot creation is logged

C-25: Log Compaction - Discard Old Log Entries

As a Distributed Systems Engineer I want nodes to discard log entries included in snapshot So that log storage is reclaimed

Acceptance Criteria:

  • After snapshot creation, discard entries up to lastIncludedIndex
  • Keep entries after lastIncludedIndex (not yet in snapshot)
  • Update storage to remove old entries
  • Keep snapshot metadata (lastIncludedIndex, lastIncludedTerm)
  • Clear logging shows how many entries were discarded
  • Discarding is atomic (snapshot and log update)
  • Log compaction doesn't lose any committed entries
  • Log indices remain consistent (don't renumber)

Definition of Done:

  • Old log entries are discarded after snapshot
  • Storage space is reclaimed
  • Log remains consistent and usable
  • Compaction is logged with statistics

Phase 3: Persistence Stories

P-1: File-Based Storage - Basic File Operations

As a Distributed Systems Engineer I want a file-based storage implementation So that Raft state persists across process restarts

Acceptance Criteria:

  • FileStorage implements Storage trait
  • Storage uses separate files for term, votedFor, and log
  • Files are created in configured directory
  • File paths are configurable
  • Basic file I/O works (create, read, write)
  • Storage directory is created if it doesn't exist
  • Clear error handling for file operations (permission denied, disk full, etc.)
  • Storage can be initialized from existing files

Definition of Done:

  • FileStorage implements all Storage trait methods
  • Files are created and accessed correctly
  • Basic persistence works (write then read returns same data)
  • File errors are handled gracefully

P-2: File-Based Storage - Atomic Writes

As a Distributed Systems Engineer I want storage writes to be atomic So that partial writes don't corrupt state on crash

Acceptance Criteria:

  • Use write-rename pattern for atomic file updates
  • Write to temporary file first
  • Call fsync on temporary file to ensure durability
  • Rename temporary file to actual file (atomic on POSIX)
  • If crash occurs, either old or new state is intact (never partial)
  • Clear logging of write operations
  • Handle errors during any step of atomic write
  • Temporary files are cleaned up on error

Definition of Done:

  • Storage writes are atomic and durable
  • Crash during write leaves system in consistent state
  • Atomic write pattern is correctly implemented
  • Write operations are logged

P-3: File-Based Storage - Log Entry Append Optimization

As a Distributed Systems Engineer I want efficient log appending to file So that log writes have acceptable performance

Acceptance Criteria:

  • Append-only log file for better performance
  • Use buffered writes with explicit flush for durability
  • Batch multiple entries into single fsync when possible
  • Index file or header for quick lookups by index
  • Handle log truncation (for conflicting entries)
  • Clear logging of log operations
  • Configurable fsync policy (every write vs batched)
  • Log file format supports forward and backward scanning

Definition of Done:

  • Log appending is efficient (measured via benchmarks)
  • Durability guarantees are maintained
  • Log file can be read and appended correctly
  • Performance is acceptable for target workload

P-4: File-Based Storage - Corruption Detection

As a Operations Engineer I want storage to detect corrupted files So that I'm alerted to data corruption issues

Acceptance Criteria:

  • Use checksums (CRC32 or better) for data integrity
  • Verify checksums on read operations
  • Detect truncated files (incomplete writes)
  • Return clear error when corruption is detected
  • Log corruption events with details
  • Don't automatically "fix" corruption (preserve evidence)
  • Provide tools to manually recover from corruption
  • Consider using length-prefixed records for structure validation

Definition of Done:

  • Corruption is reliably detected on read
  • Corruption errors are clear and actionable
  • Checksums protect against silent corruption
  • Corruption detection is tested

P-5: File-Based Storage - Recovery from Existing State

As a Distributed Systems Engineer I want nodes to recover state from files on restart So that restarts don't lose critical state

Acceptance Criteria:

  • On initialization, load term from term file
  • Load votedFor from votedFor file
  • Load all log entries from log file
  • Verify data integrity during load (checksums)
  • Handle missing files (initialize to defaults)
  • Clear logging shows what state was recovered
  • Recovery fails cleanly if corruption detected
  • Recovery performance is reasonable (acceptable startup time)

Definition of Done:

  • Nodes successfully recover state on restart
  • Recovered state matches state before shutdown
  • Missing or corrupt files are handled appropriately
  • Recovery process is logged

P-6: File-Based Storage - Snapshot Persistence

As a Distributed Systems Engineer I want snapshots to be stored durably So that snapshots survive restarts and enable log compaction

Acceptance Criteria:

  • Snapshot stored as separate file
  • Snapshot file includes lastIncludedIndex and lastIncludedTerm metadata
  • Snapshot file includes serialized state machine state
  • Snapshot writes are atomic (write-rename pattern)
  • Snapshot files use checksums for integrity
  • Old snapshots are replaced by newer ones
  • Snapshot loading on recovery
  • Clear logging of snapshot persistence operations

Definition of Done:

  • Snapshots are persistently stored
  • Snapshot files include all necessary data
  • Snapshots can be loaded on restart
  • Snapshot persistence is atomic and durable

P-7: File-Based Storage - Compaction with Snapshot

As a Distributed Systems Engineer I want storage to support log compaction via snapshot So that old log entries can be removed after snapshot

Acceptance Criteria:

  • Provide method to install snapshot and truncate log
  • Atomically replace log entries with snapshot
  • Remove log entries up to snapshot's lastIncludedIndex
  • Keep snapshot file and remaining log entries
  • Ensure atomicity (snapshot + log truncation together)
  • Recovery works with snapshot + partial log
  • Clear logging of compaction operations
  • Storage space is reclaimed after compaction

Definition of Done:

  • Log compaction via snapshot works correctly
  • Old entries are removed and storage reclaimed
  • Recovery works with compacted logs
  • Compaction is atomic and safe

Phase 4: Network and RPC Stories

N-1: RPC Transport Abstraction

As a Distributed Systems Engineer I want an abstract RPC transport interface So that I can swap network implementations without changing Raft logic

Acceptance Criteria:

  • RpcTransport trait defines methods for sending RequestVote, AppendEntries
  • Trait methods are async (return Futures)
  • Methods accept target node ID and request message
  • Methods return Result with response or network error
  • Trait supports timeout configuration per request
  • Clear documentation of transport contract
  • Trait is generic over message types
  • Transport handles connection management internally

Definition of Done:

  • RpcTransport trait is well-defined and documented
  • Trait can be implemented by different backends (TCP, gRPC, in-memory)
  • Trait methods cover all Raft RPC needs
  • Trait is async-compatible

N-2: In-Memory RPC Transport for Testing

As a Distributed Systems Engineer I want an in-memory RPC transport for tests So that I can test Raft logic without real network I/O

Acceptance Criteria:

  • InMemoryTransport implements RpcTransport trait
  • Uses channels or similar for message passing between nodes
  • Simulates RPC calls synchronously or with minimal delay
  • Supports multiple nodes in same process
  • Can simulate message delays for testing
  • Can simulate message loss (drop messages)
  • Can simulate network partitions (block messages between groups)
  • Clear API for controlling test scenarios

Definition of Done:

  • InMemoryTransport works for multi-node tests
  • Message passing is reliable in normal mode
  • Network failures can be injected for testing
  • Transport is used in Raft integration tests

N-3: TCP-Based RPC Transport - Connection Management

As a Operations Engineer I want TCP connections between Raft nodes So that nodes can communicate over real networks

Acceptance Criteria:

  • TcpTransport implements RpcTransport trait
  • Establishes TCP connections to other nodes
  • Connection pool manages connections per target node
  • Reconnects automatically on connection failure
  • Configurable connection timeout
  • Configurable request timeout
  • Clear logging of connection events
  • Handles node unavailability gracefully

Definition of Done:

  • TCP connections are established and managed
  • Connection failures trigger reconnection
  • Transport works over real networks
  • Connection management is logged

N-4: TCP-Based RPC Transport - Message Serialization

As a Distributed Systems Engineer I want RPC messages serialized for network transmission So that messages can be sent over TCP

Acceptance Criteria:

  • Messages serialized using chosen format (bincode, MessagePack, protobuf)
  • Serialization includes message length prefix
  • Deserialization validates message format
  • Handles serialization errors gracefully
  • Supports large messages (log entries, snapshots)
  • Clear error messages for serialization failures
  • Versioning support for protocol evolution
  • Efficient serialization (low overhead)

Definition of Done:

  • Messages are correctly serialized and deserialized
  • Large messages are handled efficiently
  • Serialization errors are caught and reported
  • Wire format is documented

N-5: TCP-Based RPC Transport - Send RequestVote RPC

As a Distributed Systems Engineer I want to send RequestVote RPCs over TCP So that candidates can request votes during elections

Acceptance Criteria:

  • Serialize RequestVoteRequest message
  • Send over TCP connection to target node
  • Wait for response with timeout
  • Deserialize RequestVoteResponse
  • Return response or timeout error
  • Handle connection failures during request
  • Clear logging of RPC calls (target, term, success/failure)
  • Retry logic for transient failures (optional)

Definition of Done:

  • RequestVote RPCs are sent successfully
  • Responses are received and deserialized
  • Timeouts and failures are handled
  • RPC calls are logged

N-6: TCP-Based RPC Transport - Send AppendEntries RPC

As a Distributed Systems Engineer I want to send AppendEntries RPCs over TCP So that leaders can replicate log entries to followers

Acceptance Criteria:

  • Serialize AppendEntriesRequest message
  • Send over TCP connection to target node
  • Wait for response with timeout
  • Deserialize AppendEntriesResponse
  • Return response or timeout error
  • Handle large messages (many log entries)
  • Clear logging of RPC calls
  • Performance is acceptable (low latency)

Definition of Done:

  • AppendEntries RPCs are sent successfully
  • Large messages are handled correctly
  • Performance meets requirements
  • RPC calls are logged

N-7: RPC Server - Listen for Incoming Connections

As a Distributed Systems Engineer I want nodes to listen for incoming RPC connections So that nodes can receive RPC requests from other nodes

Acceptance Criteria:

  • RPC server listens on configured address and port
  • Accepts incoming TCP connections
  • Spawns handler for each connection
  • Handles multiple concurrent connections
  • Configurable listen address and port
  • Clear logging when server starts listening
  • Graceful shutdown of server
  • Handles connection errors (port already in use, etc.)

Definition of Done:

  • RPC server accepts incoming connections
  • Multiple connections are handled concurrently
  • Server can be started and stopped cleanly
  • Server startup/shutdown is logged

N-8: RPC Server - Handle RequestVote Requests

As a Distributed Systems Engineer I want nodes to receive and process RequestVote RPCs So that candidates can collect votes

Acceptance Criteria:

  • Receive and deserialize RequestVoteRequest
  • Invoke Raft node's vote handling logic (C-12, C-13, C-14)
  • Serialize and send RequestVoteResponse
  • Handle multiple simultaneous vote requests
  • Clear logging of received requests
  • Return errors for malformed requests
  • Handle handler errors gracefully
  • Respond promptly (low latency)

Definition of Done:

  • RequestVote requests are received and processed
  • Responses are sent correctly
  • Vote granting logic is invoked properly
  • Request handling is logged

N-9: RPC Server - Handle AppendEntries Requests

As a Distributed Systems Engineer I want nodes to receive and process AppendEntries RPCs So that followers can replicate log entries from leader

Acceptance Criteria:

  • Receive and deserialize AppendEntriesRequest
  • Invoke Raft node's append handling logic (C-15, C-16, C-17)
  • Serialize and send AppendEntriesResponse
  • Handle heartbeats (empty entries) correctly
  • Clear logging of received requests
  • Handle large requests (many entries) efficiently
  • Return errors for malformed requests
  • Respond promptly

Definition of Done:

  • AppendEntries requests are received and processed
  • Responses are sent correctly
  • Log replication logic is invoked properly
  • Request handling is efficient and logged

N-10: RPC Parallelism - Concurrent Vote Requests

As a Distributed Systems Engineer I want candidates to send RequestVote RPCs to all nodes in parallel So that elections complete quickly

Acceptance Criteria:

  • Candidate sends RequestVote to all other nodes concurrently
  • Uses async I/O to avoid blocking
  • Collects responses as they arrive
  • Doesn't wait for all responses (proceed when quorum reached)
  • Times out requests that take too long
  • Clear logging shows when requests are sent
  • Handles partial responses (some nodes unreachable)
  • Early termination when quorum reached or election lost

Definition of Done:

  • Vote requests are sent in parallel
  • Elections complete in minimal time
  • Quorum detection works with partial responses
  • Parallel sending is logged

N-11: RPC Parallelism - Concurrent AppendEntries

As a Distributed Systems Engineer I want leaders to send AppendEntries to all followers in parallel So that replication is fast

Acceptance Criteria:

  • Leader sends AppendEntries to all followers concurrently
  • Uses async I/O for parallelism
  • Tracks responses per follower independently
  • Updates matchIndex as responses arrive
  • Retries failed requests to specific followers
  • Clear logging shows replication progress
  • Handles slow or unresponsive followers without blocking others
  • Heartbeats are sent in parallel as well

Definition of Done:

  • AppendEntries are sent in parallel to all followers
  • Replication is efficient and non-blocking
  • Per-follower progress is tracked independently
  • Parallel replication is logged

N-12: Network Error Handling - Timeout Behavior

As an Operations Engineer I want clear timeout behavior for RPC requests So that network delays don't cause indefinite blocking

Acceptance Criteria:

  • All RPC requests have configurable timeout
  • Timeout is enforced for connection and response
  • Timed-out requests return timeout error
  • Timeouts don't crash the node
  • Clear logging when timeouts occur (which node, RPC type)
  • Timeout values are reasonable defaults but configurable
  • Distinguish between connection timeout and read timeout
  • Canceled requests clean up resources

Definition of Done:

  • All RPCs enforce timeout
  • Timeouts are handled gracefully
  • Timeout configuration works
  • Timeouts are logged

N-13: Network Error Handling - Retry Logic

As a Distributed Systems Engineer I want appropriate retry logic for transient failures So that temporary network issues don't cause permanent failures

Acceptance Criteria:

  • Retry logic for connection failures (node temporarily unreachable)
  • Exponential backoff for retries
  • Maximum retry limit to avoid infinite loops
  • Don't retry on non-transient errors (invalid request, etc.)
  • Clear logging of retry attempts
  • Retry logic is configurable (enable/disable, max retries)
  • Retries don't violate Raft safety properties
  • Distinguish between retryable and non-retryable errors

Definition of Done:

  • Transient failures are retried appropriately
  • Retries use exponential backoff
  • Retry behavior is logged
  • Retries don't compromise safety

Phase 5: State Machine Integration Stories

S-1: State Machine Abstraction Trait

As an Application Developer I want a trait defining the state machine interface So that I can implement custom state machines for my application

Acceptance Criteria:

  • StateMachine trait defines apply method (apply command to state)
  • Apply method receives command and returns result
  • Trait defines snapshot method (capture current state)
  • Trait defines restore method (restore from snapshot)
  • Methods use generic or trait object for command type
  • Clear documentation explains state machine contract
  • Apply must be deterministic (same command = same result)
  • Trait is simple and easy to implement

Definition of Done:

  • StateMachine trait is well-defined and documented
  • Trait can be implemented for various applications
  • Trait supports all required operations
  • Examples show how to implement the trait

S-2: Example State Machine - Key-Value Store

As an Application Developer I want an example state machine implementation So that I understand how to use Raft for my application

Acceptance Criteria:

  • Implement simple key-value store as StateMachine
  • Support commands: Put(key, value), Get(key), Delete(key)
  • Apply method updates internal HashMap
  • Snapshot method serializes entire HashMap
  • Restore method deserializes HashMap from snapshot
  • Clear documentation explains implementation
  • Example can be used in tests and demos
  • State transitions are deterministic

Definition of Done:

  • KV store implements StateMachine trait correctly
  • All operations work as expected
  • Snapshot/restore preserves state exactly
  • Example is documented and usable

S-3: State Machine Application Queue

As a Distributed Systems Engineer I want committed entries applied to state machine in order So that state machine sees consistent command sequence

Acceptance Criteria:

  • Maintain queue of entries to be applied (lastApplied to commitIndex)
  • Apply entries sequentially in log order
  • Each entry applied exactly once
  • Application happens asynchronously (doesn't block Raft)
  • Handle state machine errors (log but don't crash)
  • Clear logging shows which entries are applied
  • Backpressure if state machine is slow
  • Application catches up efficiently after being behind

Definition of Done:

  • Committed entries are applied in order
  • Application is decoupled from Raft main loop
  • State machine errors are handled gracefully
  • Application progress is logged

S-4: Linearizable Reads - Read Index

As an Application Developer I want to read committed state without going through the log So that read-only queries are fast

Acceptance Criteria:

  • Leader tracks read requests with current commitIndex
  • Leader confirms leadership by sending heartbeat to majority
  • After majority confirms leadership, serve read at tracked commitIndex
  • Ensure read sees all committed writes
  • Reads don't require log entries
  • Clear documentation explains linearizability guarantee
  • Read requests return error if leader loses leadership
  • Performance is better than writing read to log

Definition of Done:

  • Linearizable reads work on leader
  • Reads don't create log entries
  • Read linearizability is guaranteed
  • Read performance is acceptable

S-5: Query Interface for State Machine

As an Application Developer I want to query state machine state without modifying it So that I can read data from the replicated state

Acceptance Criteria:

  • Provide read-only query interface to state machine
  • Queries don't modify state machine
  • Queries use read index mechanism for linearizability (S-4)
  • Support queries on followers (with weaker consistency)
  • Clear documentation of consistency guarantees per mode
  • Query errors are handled gracefully
  • Queries are efficient (low latency)
  • Non-leader nodes can redirect queries to leader

Definition of Done:

  • State machine can be queried safely
  • Linearizable and non-linearizable reads both supported
  • Query interface is documented with consistency semantics
  • Query performance is acceptable

S-6: Snapshot Integration with State Machine

As a Distributed Systems Engineer I want snapshots to capture state machine state So that log compaction preserves application state

Acceptance Criteria:

  • Snapshot calls state machine's snapshot method
  • Snapshot includes state machine state bytes
  • Snapshot metadata includes lastAppliedIndex and term
  • Restore calls state machine's restore method on startup
  • Snapshot/restore handles large state (streaming if needed)
  • Clear logging of snapshot operations
  • Snapshot is consistent (point-in-time)
  • State machine state matches log application up to snapshot point

Definition of Done:

  • Snapshots capture complete state machine state
  • Restore recreates exact state
  • Large states are handled efficiently
  • Snapshot integration is logged

S-7: InstallSnapshot RPC Message Definition

As a Distributed Systems Engineer I want InstallSnapshot RPC message types defined So that leaders can send snapshots to lagging followers

Acceptance Criteria:

  • InstallSnapshotRequest includes term, leaderId
  • Request includes lastIncludedIndex and lastIncludedTerm
  • Request includes snapshot data (chunk for large snapshots)
  • Request includes offset and done flag (for chunked transfer)
  • InstallSnapshotResponse includes term
  • Messages support serialization
  • Clear documentation explains fields per Raft spec
  • Messages support large snapshot transfer

Definition of Done:

  • InstallSnapshot message types are defined
  • Messages support chunked snapshot transfer
  • Messages can be serialized and sent over network
  • Message structure matches Raft specification

S-8: InstallSnapshot RPC - Sender (Leader)

As a Distributed Systems Engineer I want leaders to send snapshots to followers missing log entries So that lagging followers can catch up without full log replay

Acceptance Criteria:

  • Leader detects when follower needs snapshot (nextIndex <= snapshot's lastIncludedIndex)
  • Leader sends InstallSnapshotRequest instead of AppendEntries
  • Large snapshots are sent in chunks
  • Leader tracks transfer progress per follower
  • Leader resends chunks on failure
  • Clear logging of snapshot transfers
  • Transfer doesn't block normal replication to other followers
  • Leader continues heartbeats during snapshot transfer

Definition of Done:

  • Leaders send snapshots to lagging followers
  • Large snapshots are transferred successfully
  • Transfer is reliable (handles failures)
  • Snapshot transfers are logged

S-9: InstallSnapshot RPC - Receiver (Follower)

As a Distributed Systems Engineer I want followers to receive and install snapshots from leader So that followers can catch up when missing too many log entries

Acceptance Criteria:

  • Follower receives InstallSnapshotRequest
  • Verify request term and step down if needed
  • Receive snapshot chunks and assemble
  • Discard log entries covered by snapshot
  • Restore state machine from snapshot
  • Update commitIndex and lastApplied to snapshot point
  • Clear logging of snapshot installation
  • Reset election timer on receiving snapshot (leader is active)
  • Respond with success after installation

Definition of Done:

  • Followers receive and install snapshots
  • State machine is restored from snapshot
  • Old log entries are discarded
  • Installation is logged and observable

Phase 6: Cluster Membership Stories

M-1: Configuration Entry Type

As a Distributed Systems Engineer I want configuration changes represented as log entries So that membership changes are replicated and committed like data

Acceptance Criteria:

  • Define ConfigurationEntry type with list of node IDs
  • Configuration entries have special type marker
  • Configurations are versioned (track which is current)
  • Support serialization for log storage
  • Clear documentation explains configuration log entries
  • Configuration entry includes old and new configuration (for joint consensus)
  • Configuration is immutable once created
  • Type distinguishes configuration from regular data entries

Definition of Done:

  • Configuration entry type is defined
  • Type can be stored in log like other entries
  • Configuration changes are represented clearly
  • Type is documented

M-2: Single-Server Membership Change - Add Server

As an Operations Engineer I want to add a new node to the cluster So that cluster can grow or replace failed nodes

Acceptance Criteria:

  • Leader accepts request to add new server
  • Leader verifies no concurrent configuration change in progress
  • Leader appends configuration entry to log with new server added
  • New configuration is replicated and committed
  • New configuration becomes effective after commit
  • Clear logging of membership change process
  • New server must catch up before voting (non-voting learner initially)
  • Operation fails if concurrent change exists

Definition of Done:

  • New servers can be added to cluster
  • Addition is safe (no split-brain)
  • Configuration change is committed before taking effect
  • Addition process is logged

M-3: Single-Server Membership Change - Remove Server

As an Operations Engineer I want to remove a node from the cluster So that cluster can shrink or remove failed nodes permanently

Acceptance Criteria:

  • Leader accepts request to remove server
  • Leader verifies no concurrent configuration change in progress
  • Leader appends configuration entry to log with server removed
  • New configuration is replicated and committed
  • New configuration becomes effective after commit
  • Clear logging of membership change process
  • Removed server shuts down after change is committed
  • Handle case where leader removes itself

Definition of Done:

  • Servers can be removed from cluster
  • Removal is safe and clean
  • Removed servers stop participating
  • Removal process is logged

M-4: Configuration Change - Prevent Concurrent Changes

As a Distributed Systems Engineer I want only one configuration change at a time So that cluster doesn't have multiple pending configurations

Acceptance Criteria:

  • Track if configuration change is in progress
  • Reject new configuration change requests while one is pending
  • Consider change complete when configuration entry is committed
  • Clear error message when concurrent change is rejected
  • Log indicates when change completes
  • System resumes accepting changes after completion
  • Leader tracks pending change status
  • Followers can't initiate configuration changes

Definition of Done:

  • Only one configuration change happens at a time
  • Concurrent changes are rejected cleanly
  • System correctly tracks change status
  • Concurrency control is logged

M-5: New Server Catch-Up - Non-Voting Learner

As a Distributed Systems Engineer I want new servers to replicate log without voting So that new servers catch up before affecting quorum

Acceptance Criteria:

  • New server receives log entries like follower
  • New server doesn't participate in elections (doesn't vote or become candidate)
  • Leader tracks catch-up progress for new server
  • Leader transitions new server to voting member when caught up
  • Caught up means matchIndex is close to leader's last log index
  • Clear logging shows catch-up progress
  • Catch-up happens before new server joins voting configuration
  • Timeout if new server doesn't catch up in reasonable time

Definition of Done:

  • New servers catch up without voting
  • Catch-up progress is tracked and logged
  • Transition to voting happens when ready
  • Catch-up mechanism is reliable

M-6: Configuration Query - Current Membership

As an Application Developer I want to query current cluster configuration So that I know which nodes are in the cluster

Acceptance Criteria:

  • Provide API to retrieve current committed configuration
  • Return list of node IDs in current configuration
  • Indicate which node is current leader (if known)
  • Query works on any node (leader or follower)
  • Configuration returned is the committed configuration (not pending)
  • Clear documentation of query API
  • Query is fast (doesn't require consensus)
  • Result distinguishes voting and non-voting members

Definition of Done:

  • Configuration can be queried easily
  • Query returns accurate current membership
  • API is documented and simple
  • Query is efficient

M-7: Bootstrap Initial Cluster Configuration

As an Operations Engineer I want to bootstrap initial cluster configuration So that cluster can start with known membership

Acceptance Criteria:

  • Initial configuration specified in config file or API
  • All nodes start with same initial configuration
  • Initial configuration is logged as first entry (index 0 or 1)
  • Bootstrap only happens on first start (not on restart)
  • Clear logging shows bootstrap process
  • Bootstrap includes all initial cluster members
  • Configuration is persisted before accepting operations
  • Nodes in initial configuration can elect leader

Definition of Done:

  • Cluster can be bootstrapped with initial configuration
  • All nodes agree on initial membership
  • Bootstrap process is reliable
  • Bootstrap is logged clearly

Phase 7: Observability and Monitoring Stories

O-1: Structured Logging - Core Events

As an Operations Engineer I want structured logs for core Raft events So that I can monitor cluster behavior and debug issues

Acceptance Criteria:

  • Log state transitions (Follower/Candidate/Leader)
  • Log term changes with old and new term
  • Log election starts and results
  • Log vote granting/denying with reason
  • Log heartbeat sending and receiving
  • Logs include timestamp, node ID, term
  • Use structured logging format (JSON or key-value)
  • Configurable log level (debug, info, warn, error)

Definition of Done:

  • Core Raft events are logged comprehensively
  • Logs are structured and parseable
  • Log level is configurable
  • Logs provide good visibility into cluster state

O-2: Structured Logging - Replication Events

As an Operations Engineer I want logs for log replication activity So that I can monitor replication progress and lag

Acceptance Criteria:

  • Log when entries are appended (index, term, count)
  • Log replication progress per follower (matchIndex updates)
  • Log commit index advancement
  • Log state machine application (lastApplied updates)
  • Log replication failures and retries
  • Logs include relevant indices and node IDs
  • Log frequency is reasonable (not too verbose)
  • Logs help diagnose replication issues

Definition of Done:

  • Replication activity is well-logged
  • Logs help identify slow followers
  • Commit progress is visible
  • Logs are not overwhelming in volume

O-3: Metrics - Basic Counters and Gauges

As an Operations Engineer I want metrics exposed for monitoring systems So that I can track cluster health over time

Acceptance Criteria:

  • Expose current term as gauge
  • Expose current state (Follower/Candidate/Leader) as gauge
  • Expose commit index and last applied as gauges
  • Expose election count as counter
  • Expose RPC request/response counts as counters
  • Metrics follow standard format (Prometheus, StatsD)
  • Metrics include node ID label
  • Metrics are updated in real-time

Definition of Done:

  • Basic metrics are exposed
  • Metrics can be scraped by monitoring systems
  • Metrics are accurate and up-to-date
  • Metrics follow standard conventions

O-4: Metrics - Performance Metrics

As an Operations Engineer I want performance metrics So that I can monitor latency and throughput

Acceptance Criteria:

  • Expose commit latency histogram (time from append to commit)
  • Expose RPC latency histograms (per RPC type)
  • Expose throughput (commands per second)
  • Expose log size (entries and bytes)
  • Expose snapshot size and frequency
  • Metrics include percentiles (p50, p95, p99)
  • Metrics help identify performance issues
  • Metrics are updated continuously

Definition of Done:

  • Performance metrics are exposed
  • Latency and throughput are trackable
  • Metrics help identify bottlenecks
  • Metrics are useful for capacity planning

O-5: Health Check Endpoint

As an Operations Engineer I want a health check endpoint So that load balancers and orchestrators can detect node health

Acceptance Criteria:

  • HTTP endpoint returns health status (healthy/unhealthy)
  • Health check considers: node is running, storage is accessible, in valid state
  • Returns 200 OK when healthy, 503 when unhealthy
  • Response includes basic info (node ID, state, term)
  • Health check is fast (< 100ms)
  • Configurable health check criteria
  • Clear documentation of health check semantics
  • Endpoint doesn't require authentication (or has separate auth)

Definition of Done:

  • Health check endpoint works reliably
  • Orchestrators can use endpoint for health detection
  • Unhealthy nodes are detected appropriately
  • Endpoint is documented

O-6: Metrics - Cluster-Wide Metrics

As an Operations Engineer I want cluster-wide aggregate metrics So that I can monitor overall cluster health

Acceptance Criteria:

  • Track quorum status (have majority or not)
  • Track leader election frequency
  • Track follower lag (max and average)
  • Track configuration change events
  • Track snapshot frequency and size
  • Metrics aggregate across cluster (via external monitoring)
  • Alerts can be configured on metrics
  • Metrics help identify cluster-wide issues

Definition of Done:

  • Cluster-wide metrics are exposed
  • Metrics provide cluster health visibility
  • Metrics support alerting strategies
  • Metrics are documented

O-7: Debug Endpoint - Current State

As a Distributed Systems Engineer I want an endpoint to inspect node state So that I can debug issues in development and production

Acceptance Criteria:

  • Endpoint returns current state (Follower/Candidate/Leader)
  • Returns current term, votedFor, commitIndex, lastApplied
  • Returns log summary (size, last index, last term)
  • Returns current configuration (cluster members)
  • Returns leader ID (if known)
  • Response is JSON formatted
  • Endpoint requires authentication in production
  • Clear documentation of endpoint

Definition of Done:

  • Debug endpoint provides comprehensive state view
  • Endpoint is useful for troubleshooting
  • State information is accurate
  • Endpoint is documented

O-8: Tracing - Distributed Tracing Integration

As a Distributed Systems Engineer I want distributed tracing for requests So that I can trace request flow across cluster nodes

Acceptance Criteria:

  • Integrate with tracing framework (OpenTelemetry, Jaeger)
  • Trace client requests through consensus process
  • Trace RPC calls between nodes
  • Spans include relevant context (term, index, node IDs)
  • Tracing is configurable (enable/disable, sampling rate)
  • Tracing overhead is minimal
  • Clear documentation of tracing setup
  • Traces help diagnose latency issues

Definition of Done:

  • Tracing integration works correctly
  • Requests can be traced across nodes
  • Tracing provides useful performance insights
  • Tracing is documented

Phase 8: Testing and Validation Stories

T-1: Unit Tests - Core Type Behavior

As a Distributed Systems Engineer I want unit tests for core types So that basic type behavior is verified

Acceptance Criteria:

  • Tests for Term, LogIndex, NodeId ordering and equality
  • Tests for LogEntry creation and serialization
  • Tests for PersistentState and VolatileState
  • Tests for RPC message types serialization
  • Tests cover normal cases and edge cases
  • Tests use standard Rust test framework
  • Clear test names describe what is tested
  • All tests pass consistently

Definition of Done:

  • Core types have comprehensive unit tests
  • Tests verify expected behavior
  • Edge cases are covered
  • Tests run fast and reliably

T-2: Unit Tests - Storage Implementations

As a Distributed Systems Engineer I want tests for storage implementations So that persistence behavior is correct

Acceptance Criteria:

  • Tests for InMemoryStorage read/write operations
  • Tests for FileStorage read/write operations
  • Tests for atomic writes and crash scenarios
  • Tests for corruption detection
  • Tests for recovery from persisted state
  • Tests for snapshot storage
  • Tests verify storage trait contract
  • Tests cover error conditions

Definition of Done:

  • Storage implementations have thorough tests
  • Persistence correctness is verified
  • Error handling is tested
  • Tests cover crash recovery scenarios

T-3: Unit Tests - Election Logic

As a Distributed Systems Engineer I want tests for election logic So that leader election works correctly

Acceptance Criteria:

  • Tests for follower timeout and candidate transition
  • Tests for vote granting rules (term, log up-to-date, already voted)
  • Tests for quorum detection
  • Tests for candidate to leader transition
  • Tests for split vote and retry
  • Tests for higher term detection and step down
  • Tests use mocked dependencies (storage, network)
  • Tests verify election safety invariants

Definition of Done:

  • Election logic is thoroughly tested
  • All vote granting rules are verified
  • Election transitions work correctly
  • Safety properties are tested

T-4: Unit Tests - Log Replication Logic

As a Distributed Systems Engineer I want tests for log replication So that log consistency is maintained

Acceptance Criteria:

  • Tests for log consistency check (prevIndex/prevTerm)
  • Tests for appending new entries
  • Tests for handling conflicting entries
  • Tests for commitIndex advancement
  • Tests for matchIndex/nextIndex tracking
  • Tests for commit point calculation (majority)
  • Tests verify log matching property
  • Tests cover edge cases (empty log, first entry)

Definition of Done:

  • Log replication logic is well-tested
  • Consistency checks are verified
  • Commit logic is correct
  • Edge cases are covered

T-5: Integration Tests - Single Leader Election

As a Distributed Systems Engineer I want integration tests for leader election So that election works across multiple nodes

Acceptance Criteria:

  • Test 3-node cluster elects single leader
  • Test 5-node cluster elects single leader
  • Test election completes within timeout
  • Test only one leader per term
  • Test followers recognize leader
  • Tests use InMemoryTransport
  • Tests verify election safety
  • Tests are deterministic and repeatable

Definition of Done:

  • Multi-node election tests pass
  • Leader election is reliable
  • Election safety is verified
  • Tests run consistently

T-6: Integration Tests - Log Replication Across Cluster

As a Distributed Systems Engineer I want integration tests for log replication So that logs are consistent across nodes

Acceptance Criteria:

  • Test leader replicates entries to followers
  • Test followers have identical logs after replication
  • Test commit happens when majority replicates
  • Test entries are applied to state machine
  • Test multiple commands in sequence
  • Tests verify log matching property across nodes
  • Tests use InMemoryTransport and InMemoryStorage
  • Tests check final state machine state matches

Definition of Done:

  • Multi-node replication tests pass
  • Logs are consistent across cluster
  • State machines converge
  • Tests are reliable

T-7: Integration Tests - Leader Failure and Re-Election

As a Distributed Systems Engineer I want tests for leader failure scenarios So that cluster recovers from leader failures

Acceptance Criteria:

  • Test leader crash triggers new election
  • Test new leader is elected with all committed entries
  • Test cluster continues accepting commands after re-election
  • Test state machine state is preserved
  • Tests verify leader completeness property
  • Tests simulate leader crash (stop responding)
  • Tests verify election timeout works
  • Tests check committed entries are not lost

Definition of Done:

  • Leader failure tests pass
  • Cluster recovers automatically
  • Committed data is preserved
  • Re-election works reliably

T-8: Integration Tests - Network Partition

As a Distributed Systems Engineer I want tests for network partition scenarios So that cluster handles partitions safely

Acceptance Criteria:

  • Test majority partition can continue
  • Test minority partition cannot commit entries
  • Test cluster recovers when partition heals
  • Test no split-brain occurs
  • Tests verify safety during partition
  • Tests use InMemoryTransport with partition simulation
  • Tests verify entries committed only in majority partition
  • Tests check partition healing reconciles logs

Definition of Done:

  • Partition tests pass
  • Cluster is safe during partitions
  • Split-brain is prevented
  • Partition recovery works

T-9: Integration Tests - Crash Recovery

As a Distributed Systems Engineer I want tests for crash and recovery So that nodes recover correctly after restart

Acceptance Criteria:

  • Test node crashes and restarts
  • Test node recovers persisted state (term, vote, log)
  • Test recovered node rejoins cluster
  • Test recovered node replicates any missed entries
  • Tests use FileStorage for persistence
  • Tests verify recovered state matches pre-crash state
  • Tests check node catches up after restart
  • Tests verify no data loss

Definition of Done:

  • Crash recovery tests pass
  • Nodes recover correctly
  • Persisted state is accurate
  • Recovery is reliable

T-10: Integration Tests - Log Compaction and Snapshots

As a Distributed Systems Engineer I want tests for log compaction So that snapshots work correctly

Acceptance Criteria:

  • Test snapshot creation after threshold
  • Test log entries are discarded after snapshot
  • Test snapshot includes correct state machine state
  • Test node can recover from snapshot
  • Test InstallSnapshot RPC for lagging followers
  • Tests verify snapshot metadata (lastIncludedIndex/Term)
  • Tests check storage is reclaimed
  • Tests verify state machine state after snapshot recovery

Definition of Done:

  • Snapshot tests pass
  • Compaction works correctly
  • Snapshots preserve state accurately
  • Snapshot installation works

T-11: Property-Based Tests - Safety Invariants

As a Distributed Systems Engineer I want property-based tests for safety properties So that Raft invariants are verified across many scenarios

Acceptance Criteria:

  • Test election safety (at most one leader per term)
  • Test log matching (same index/term implies identical prefix)
  • Test leader completeness (leader has all committed entries)
  • Test state machine safety (nodes apply same commands at same index)
  • Tests generate random inputs (commands, failures, partitions)
  • Tests use property testing framework (quickcheck, proptest)
  • Tests run many iterations to find edge cases
  • Tests verify invariants never violated

Definition of Done:

  • Property-based tests verify safety invariants
  • Tests run across many random scenarios
  • No invariant violations found
  • Tests provide high confidence in correctness

T-12: Chaos Testing - Fault Injection

As a Distributed Systems Engineer I want chaos tests with fault injection So that system handles arbitrary failure combinations

Acceptance Criteria:

  • Randomly crash and restart nodes
  • Randomly delay or drop messages
  • Randomly partition network
  • Randomly slow down nodes
  • Run client workload during chaos
  • Verify no data loss or inconsistency
  • Tests run for extended duration (minutes/hours)
  • Tests verify safety properties maintained during chaos

Definition of Done:

  • Chaos tests run successfully
  • System remains safe under arbitrary failures
  • No safety violations detected
  • Tests provide confidence in production readiness

T-13: Performance Tests - Latency Benchmarks

As a Distributed Systems Engineer I want performance benchmarks So that latency and throughput meet requirements

Acceptance Criteria:

  • Benchmark commit latency for single command
  • Benchmark throughput (commands per second)
  • Benchmark election time (leader failure to new leader)
  • Benchmark snapshot creation time
  • Benchmarks use realistic cluster sizes (3, 5, 7 nodes)
  • Benchmarks run on realistic hardware/network
  • Results compared against requirements
  • Benchmarks are repeatable

Definition of Done:

  • Performance benchmarks run successfully
  • Latency meets requirements (< 50ms commit)
  • Throughput meets requirements (> 10k ops/sec)
  • Performance is documented

T-14: Linearizability Tests - Jepsen-Style Verification

As a Distributed Systems Engineer I want linearizability verification tests So that consistency guarantees are proven

Acceptance Criteria:

  • Record all client operations (commands and reads)
  • Record timestamps and responses
  • Use linearizability checker (Knossos or similar)
  • Verify history is linearizable
  • Tests run with concurrent clients
  • Tests include failures and partitions
  • Tests verify both writes and reads
  • Tests fail if non-linearizable behavior detected

Definition of Done:

  • Linearizability tests pass
  • Consistency is proven formally
  • Tests cover concurrent and failure scenarios
  • System provides correct consistency guarantees

T-15: End-to-End Tests - Full Cluster Operations

As an Application Developer I want end-to-end tests demonstrating complete usage So that I understand how to use Raft in my application

Acceptance Criteria:

  • Test complete cluster lifecycle (bootstrap, operations, shutdown)
  • Test client submitting commands and reading results
  • Test adding and removing nodes
  • Test rolling upgrades (simulated)
  • Test monitoring and metrics collection
  • Tests use realistic configuration
  • Tests demonstrate best practices
  • Tests serve as examples

Definition of Done:

  • End-to-end tests cover full usage scenarios
  • Tests demonstrate complete Raft functionality
  • Tests serve as documentation and examples
  • Tests pass reliably

Story Prioritization Summary

Highest Priority (Foundational)

Stories F-1 through F-10, C-1 through C-3 establish the foundation and should be completed first to enable all other work.

High Priority (Core Algorithm)

Stories C-4 through C-22 implement the core Raft algorithm and must be completed for a minimally functional system.

Medium Priority (Production Readiness)

Stories P-1 through P-7 (Persistence), N-1 through N-13 (Network), S-1 through S-9 (State Machine) make the system production-ready.

Lower Priority (Advanced Features)

Stories M-1 through M-7 (Membership), O-1 through O-8 (Observability) add operational capabilities.

Continuous (Testing)

Stories T-1 through T-15 should be developed in parallel with implementation stories, not deferred to the end.

Dependencies and Sequencing

  • F-series has minimal dependencies, can start immediately
  • C-series depends on F-series foundation types
  • P-series depends on F-7 (Storage trait)
  • N-series depends on F-6 (RPC types) and C-series (handlers)
  • S-series depends on C-series (state machine application logic)
  • M-series depends on C-series and N-series (requires working cluster)
  • O-series can be developed in parallel with implementation
  • T-series should accompany each phase

How These Stories Solve the Original Problem

These 110+ user stories collectively implement the Raft consensus algorithm by:

  1. Establishing Foundation (F-series): Creating type-safe representations of Raft concepts and storage abstractions
  2. Implementing Core Algorithm (C-series): Building leader election, log replication, and commit mechanisms that ensure safety
  3. Adding Persistence (P-series): Making state durable across crashes, preventing data loss
  4. Enabling Communication (N-series): Allowing nodes to send RPCs and coordinate across a network
  5. Integrating Applications (S-series): Connecting Raft to state machines, enabling real applications
  6. Supporting Operations (M-series): Allowing dynamic cluster membership changes
  7. Providing Visibility (O-series): Making the system observable and debuggable
  8. Ensuring Correctness (T-series): Verifying safety properties and validating the implementation

Each story is independently valuable, testable, and builds incrementally toward a complete, production-ready Raft implementation in Rust that satisfies all functional requirements, non-functional requirements, and business rules from the problem analysis.

The stories follow INVEST principles:

  • Independent: Most stories can be implemented without blocking on others in the same phase
  • Negotiable: Details can be refined during implementation
  • Valuable: Each delivers observable functionality or verification
  • Estimable: Scope is clear enough to estimate complexity
  • Small: Each can be completed in days, not weeks
  • Testable: Clear acceptance criteria enable verification

Document Version: 1.0 Generated: 2026-01-15 Based on Problem Analysis: /Users/andrealaforgia/dev/personal/claude-code-agents/problem-analysis.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment