Analyze trading snapshots from CSV to calculate P&L, win rate, and expected value metrics (Standard EV, Kelly Criterion, Sharpe Ratio) aggregated by ISO week. Pure functional core for calculations with console output.
Load trade snapshots from CSV with type validation
Calculate realized P&L by pairing ENTRY/EXIT events
Aggregate trades by ISO week (Monday start)
Calculate win rate, average win/loss, and risk metrics
Output weekly analysis to console in readable format
Total Tasks : 14 (10 implementation + 4 reviews)
Estimated Time : ~155 min (~2.6 hours)
Progress : 0/14 complete (0%)
Status : NOT STARTED
Next Task : US-001
graph TD
US001[US-001: CSV loader] --> US002[US-002: P&L calc]
US001 --> US003[US-003: Weekly agg]
US001 --> US004[US-004: Win rate]
US002 --> REVIEW1[🚧 US-REVIEW-S1]
US003 --> REVIEW1
US004 --> REVIEW1
REVIEW1 --> US005[US-005: Standard EV]
REVIEW1 --> US006[US-006: Kelly]
REVIEW1 --> US007[US-007: Sharpe]
US005 --> REVIEW2[🚧 US-REVIEW-S2]
US006 --> REVIEW2
US007 --> REVIEW2
REVIEW2 --> US008[US-008: Main analysis]
REVIEW2 --> US009[US-009: Console formatter]
REVIEW2 --> US010[US-010: Integration tests]
US008 --> REVIEW3[🚧 US-REVIEW-S3]
US009 --> REVIEW3
US010 --> REVIEW3
REVIEW3 --> REVIEWF[🚧 US-REVIEW-FINAL]
Loading
Sprint 1: Data Loading & P&L Calculation (~60 min)
Priority: HIGH
Purpose: Load CSV data and calculate realized P&L per trade
Status: NOT STARTED
US-001: CSV loader with tests (~15 min, ~80 lines)
Implementation:
File: src/trade_analyzer.py (create new)
Function: load_trades(filepath: str) -> list[dict[str, Any]]
Test File: test/test_trade_analyzer.py (create new)
Reference: trade_snapshots.csv (place CSV in project root)
Target: ~20 lines code + ~60 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 5 min, ~60 lines): Write tests first
Create test/test_trade_analyzer.py
Import from src.trade_analyzer (will fail - module doesn't exist yet)
Write 3 test cases:
test_load_trades_valid_csv - Verify loads sample CSV
test_load_trades_converts_types - Check timestamp→datetime, prices→float
test_load_trades_row_count - Verify row count matches file
Run: pytest test/test_trade_analyzer.py -v
VERIFY: Tests fail with ImportError (function doesn't exist) - this is RED phase
GREEN Phase (~50% - 7 min, ~20 lines): Implement function
Create src/trade_analyzer.py
Use csv.DictReader to read CSV
Parse timestamp with datetime.fromisoformat()
Convert numeric fields: fill_price, fill_size, btc_price, deviation, book_bid, book_ask, book_depth
Convert asks/bids fields: asks_10, asks_20, asks_30, bids_60, bids_70, bids_80
Return list of dicts with converted types
Run: pytest test/test_trade_analyzer.py -v
VERIFY: All 3 tests pass - this is GREEN phase
VERIFY Phase (~20% - 3 min): Break/fix verification
Temporarily break code: Comment out timestamp parsing
Run: pytest test/test_trade_analyzer.py -v
VERIFY: Test fails on datetime conversion
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects in parsing logic
File I/O happens in I/O layer (load_trades does file read)
Parsing logic extracted to pure functions
✅ Deterministic: Same CSV content → same output always
✅ Immutable: Don't modify input data
✅ Referential Transparency: Can cache results
Do NOT log inside parsing functions
Do NOT validate market names (out of scope)
Do NOT handle malformed CSV (let csv.DictReader raise)
Acceptance Criteria:
RED: Tests written and verified failing (ImportError)
GREEN: Tests pass after implementation
VERIFY: Break/fix cycle confirms tests catch bugs
Create src/trade_analyzer.py
Function signature: def load_trades(filepath: str) -> list[dict[str, Any]]:
Run: pytest test/test_trade_analyzer.py::test_load_trades_valid_csv -v
Expected: Pass, returns list of dicts
Run: python -c "from src.trade_analyzer import load_trades; trades = load_trades('trade_snapshots.csv'); print(f'Loaded {len(trades)} trades')"
Expected: "Loaded N trades"
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
US-002: P&L calculator with tests (~15 min, ~85 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: calculate_pnl(trades: list[dict]) -> list[dict]
Test additions: test/test_trade_analyzer.py (modify)
Reference: Pair ENTRY/EXIT events by market, calculate P&L = (exit_price - entry_price) * size
Target: ~25 lines code + ~60 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 5 min, ~60 lines): Write tests first
Add to test/test_trade_analyzer.py
Import calculate_pnl (will fail initially)
Write 4 test cases:
test_calculate_pnl_winning_trade - YES ENTRY@21 EXIT@56 = +$35 profit
test_calculate_pnl_losing_trade - YES ENTRY@56 EXIT@21 = -$35 loss
test_calculate_pnl_multiple_markets - Verify independent P&L per market
test_calculate_pnl_unpaired_entry - ENTRY without EXIT = None (ignore)
Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
VERIFY: Tests fail (function doesn't exist or not imported) - RED phase
GREEN Phase (~50% - 8 min, ~25 lines): Implement function
Add calculate_pnl() to src/trade_analyzer.py
Group trades by market field
For each market, find ENTRY/EXIT pairs (chronological order)
Calculate P&L: (exit_fill_price - entry_fill_price) * fill_size / 100 (cents to dollars)
Side adjustment: YES = normal, NO = inverted (100 - price)
Return list with: {market, entry_time, exit_time, side, entry_price, exit_price, size, pnl}
Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
VERIFY: All 4 tests pass - GREEN phase
VERIFY Phase (~20% - 2 min): Break/fix verification
Temporarily break: Remove side adjustment logic
Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
VERIFY: Tests fail on NO side calculations
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects
Input trades list not mutated
Returns new list of P&L dicts
✅ Deterministic: Same trades → same P&L
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT modify trades list
Do NOT handle partial fills (assume all fills complete)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 4 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_pnl(trades: list[dict]) -> list[dict]:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); print(f'{len(pnls)} completed trades')"
Expected: "N completed trades"
Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
Expected: 4 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:
grep -n "logger\." src/trade_analyzer.py | grep calculate_pnl → empty
grep -n "print" src/trade_analyzer.py | grep calculate_pnl → empty
US-003: Weekly aggregator with tests (~15 min, ~90 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: aggregate_by_week(pnl_list: list[dict]) -> dict[str, list[dict]]
Test additions: test/test_trade_analyzer.py (modify)
Reference: ISO week format (Monday start)
Target: ~20 lines code + ~70 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 5 min, ~70 lines): Write tests first
Add to test/test_trade_analyzer.py
Import aggregate_by_week (will fail initially)
Write 4 test cases:
test_aggregate_by_week_single_week - All trades in one week
test_aggregate_by_week_multiple_weeks - Trades span 3 weeks
test_aggregate_by_week_iso_format - Verify "2025-W01" format
test_aggregate_by_week_monday_boundary - Monday trade starts new week
Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
VERIFY: Tests fail (function doesn't exist) - RED phase
GREEN Phase (~50% - 8 min, ~20 lines): Implement function
Add aggregate_by_week() to src/trade_analyzer.py
Use entry_time.isocalendar() to get ISO week
Format as "YYYY-WNN" (e.g., "2025-W01")
Group P&L dicts by week string
Return dict: {"2025-W01": [pnl1, pnl2], "2025-W02": [pnl3]}
Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
VERIFY: All 4 tests pass - GREEN phase
VERIFY Phase (~20% - 2 min): Break/fix verification
Temporarily break: Use Sunday week start instead of Monday
Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
VERIFY: test_aggregate_by_week_monday_boundary fails
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects
Input pnl_list not mutated
Returns new dict structure
✅ Deterministic: Same P&L list → same weekly grouping
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can cache results
Do NOT log inside aggregation
Do NOT sort weeks (dict preserves insertion order in Python 3.7+)
Do NOT filter weeks (include all)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 4 tests pass
VERIFY: Break/fix cycle works
Function signature: def aggregate_by_week(pnl_list: list[dict]) -> dict[str, list[dict]]:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, aggregate_by_week; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); weeks = aggregate_by_week(pnls); print(f'{len(weeks)} weeks')"
Expected: "N weeks"
Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
Expected: 4 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:
grep -n "pnl_list\.append\|pnl_list\.sort" src/trade_analyzer.py → empty (no mutation)
US-004: Win rate calculator with tests (~10 min, ~45 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: calculate_metrics(pnl_list: list[dict]) -> dict[str, float]
Test additions: test/test_trade_analyzer.py (modify)
Reference: Win rate = wins / total, avg_win = mean(wins), avg_loss = mean(losses)
Target: ~15 lines code + ~30 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 3 min, ~30 lines): Write tests first
Add to test/test_trade_analyzer.py
Import calculate_metrics (will fail initially)
Write 2 test cases:
test_calculate_metrics_mixed_trades - 3 wins (+10, +20, +5), 2 losses (-15, -5) → win_rate=0.6
test_calculate_metrics_all_wins - 5 wins → win_rate=1.0
Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
VERIFY: Tests fail (function doesn't exist) - RED phase
GREEN Phase (~50% - 5 min, ~15 lines): Implement function
Add calculate_metrics() to src/trade_analyzer.py
Separate P&L list into wins (pnl > 0) and losses (pnl <= 0)
Calculate: win_rate = len(wins) / len(pnl_list)
Calculate: avg_win = sum(wins) / len(wins) if wins else 0
Calculate: avg_loss = sum(losses) / len(losses) if losses else 0
Return: {"win_rate": float, "avg_win": float, "avg_loss": float, "total_pnl": float}
Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
VERIFY: All 2 tests pass - GREEN phase
VERIFY Phase (~20% - 2 min): Break/fix verification
Temporarily break: Count breakeven (pnl=0) as win
Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
VERIFY: Tests fail on win rate calculation
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects
Input pnl_list not mutated
Returns new dict with metrics
✅ Deterministic: Same P&L list → same metrics
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT round values (caller handles formatting)
Do NOT handle empty list (let division raise ZeroDivisionError)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 2 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_metrics(pnl_list: list[dict]) -> dict[str, float]:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, calculate_metrics; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); metrics = calculate_metrics(pnls); print(f'Win rate: {metrics[\"win_rate\"]:.1%}')"
Expected: "Win rate: N.N%"
Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
Expected: 2 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
US-REVIEW-S1: Sprint 1 Review 🚧 GATE (~5 min)
Scope: US-001, US-002, US-003, US-004
Review Steps:
Run: pytest test/test_trade_analyzer.py -v --tb=short
Expected: 13+ tests passed (3+4+4+2)
Run: mypy src/trade_analyzer.py --strict
Expected: Success
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, aggregate_by_week, calculate_metrics; print('imports ok')"
Expected: "imports ok"
Run: pytest test/test_trade_analyzer.py --cov=src.trade_analyzer --cov-report=term
Expected: Coverage > 90%
Linus 5-Layer Analysis (from linus-prompt-code-review.md):
Data Structure Analysis : "Bad programmers worry about code. Good programmers worry about data structures."
What is core data? CSV rows → dicts → P&L dicts → weekly aggregation
Where does data flow? File → list[dict] → list[pnl_dict] → dict[week, list[pnl_dict]]
Any unnecessary copying or conversion? Single-pass where possible, transformations create new structures (immutable)
Special Case Identification : "Good code has no special cases"
Find all if/else branches: Empty list checks, win/loss separation, side adjustment (YES/NO)
Which are real business logic? Side adjustment (YES/NO inversion), win/loss classification
Can we redesign data structures to eliminate branches? Side adjustment is domain logic (can't eliminate)
Complexity Review : "If > 3 levels of indentation, redesign it"
What is the essence? Transform CSV rows → P&L calculations → weekly metrics
How many concepts does solution use? 4 transformations (load → calculate → aggregate → metrics)
Can we reduce it to half? Already minimal - each function has single responsibility
Destructive Analysis : "Never break userspace" - backward compatibility
List all existing functionality that might be affected: None (new module)
Which dependencies will be broken? None
How to improve without breaking anything? N/A - greenfield
Practicality Verification : "Theory and practice sometimes clash. Theory loses."
Does this problem really exist in production? Yes - need to analyze trade performance
How many users encounter this? Single user (manual trading analysis)
Does solution complexity match problem severity? Yes - straightforward CSV analysis with pure functions
Taste Score: Good taste / Acceptable / Garbage
Test File Checks (from linus-prompt-code-review.md):
Tests import from src.trade_analyzer (not inline)
No production logic in test file
Only test_* function names allowed
Functional Programming Verification:
Pure functions? All calculation functions (calculate_pnl, aggregate_by_week, calculate_metrics) have no side effects?
Deterministic? Same inputs → same outputs verified?
Immutable? No input mutation verified?
Docstrings? All include "PURE FUNCTION" marker?
Separation? Pure functions (calculations) vs I/O (load_trades)?
Cross-Task Checks:
Verify patterns consistent across all functions (type hints, docstrings)
Check no orphaned imports or dead code
Validate error handling is uniform (let exceptions propagate)
Gate:
If issues found: Create fix tasks (US-001a, US-002a, etc.), output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-S1 complete", output <review-passed/>
Sprint 2: Expected Value Calculations (~50 min)
Priority: HIGH
Purpose: Calculate EV metrics (Standard, Kelly, Sharpe)
Status: NOT STARTED
US-005: Standard EV with tests (~15 min, ~60 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: calculate_standard_ev(metrics: dict[str, float]) -> float
Test additions: test/test_trade_analyzer.py (modify)
Reference: EV = (win_rate × avg_win) + ((1 - win_rate) × avg_loss)
Target: ~10 lines code + ~50 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 5 min, ~50 lines): Write tests first
Add to test/test_trade_analyzer.py
Import calculate_standard_ev (will fail initially)
Write 3 test cases:
test_standard_ev_positive - win_rate=0.6, avg_win=20, avg_loss=-10 → EV=8.0
test_standard_ev_negative - win_rate=0.4, avg_win=10, avg_loss=-15 → EV=-5.0
test_standard_ev_breakeven - win_rate=0.5, avg_win=10, avg_loss=-10 → EV=0.0
Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
VERIFY: Tests fail (function doesn't exist) - RED phase
GREEN Phase (~50% - 7 min, ~10 lines): Implement function
Add calculate_standard_ev() to src/trade_analyzer.py
Extract from metrics: win_rate, avg_win, avg_loss
Calculate: ev = (win_rate * avg_win) + ((1 - win_rate) * avg_loss)
Return float
Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
VERIFY: All 3 tests pass - GREEN phase
VERIFY Phase (~20% - 3 min): Break/fix verification
Temporarily break: Forget to multiply by (1 - win_rate)
Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
VERIFY: Tests fail on EV calculation
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects
Input metrics dict not mutated
Returns single float value
✅ Deterministic: Same metrics → same EV
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT validate metrics keys (assume caller provides correct format)
Do NOT handle divide-by-zero (assume metrics valid)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 3 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_standard_ev(metrics: dict[str, float]) -> float:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import calculate_standard_ev; ev = calculate_standard_ev({'win_rate': 0.6, 'avg_win': 20, 'avg_loss': -10}); print(f'EV: ${ev:.2f}')"
Expected: "EV: $8.00"
Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
Expected: 3 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:
grep -n "metrics\[" src/trade_analyzer.py | grep "=" → only reads, no writes
US-006: Kelly Criterion with tests (~15 min, ~60 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: calculate_kelly_criterion(metrics: dict[str, float]) -> float
Test additions: test/test_trade_analyzer.py (modify)
Reference: Kelly = (win_rate × |avg_loss|) - ((1 - win_rate) × avg_win) / |avg_loss|
Target: ~10 lines code + ~50 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 5 min, ~50 lines): Write tests first
Add to test/test_trade_analyzer.py
Import calculate_kelly_criterion (will fail initially)
Write 3 test cases:
test_kelly_positive - win_rate=0.6, avg_win=20, avg_loss=-10 → kelly=0.4
test_kelly_negative - win_rate=0.3, avg_win=10, avg_loss=-15 → kelly<0
test_kelly_zero_loss - avg_loss=0 → raise ValueError
Run: pytest test/test_trade_analyzer.py::test_kelly -v
VERIFY: Tests fail (function doesn't exist) - RED phase
GREEN Phase (~50% - 7 min, ~10 lines): Implement function
Add calculate_kelly_criterion() to src/trade_analyzer.py
Extract from metrics: win_rate, avg_win, avg_loss
Guard: if avg_loss == 0, raise ValueError("avg_loss cannot be zero")
Calculate: kelly = ((win_rate * abs(avg_loss)) - ((1 - win_rate) * avg_win)) / abs(avg_loss)
Return float
Run: pytest test/test_trade_analyzer.py::test_kelly -v
VERIFY: All 3 tests pass - GREEN phase
VERIFY Phase (~20% - 3 min): Break/fix verification
Temporarily break: Remove abs() around avg_loss
Run: pytest test/test_trade_analyzer.py::test_kelly -v
VERIFY: Tests fail on kelly calculation
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects
Input metrics dict not mutated
Returns single float value or raises exception
✅ Deterministic: Same metrics → same Kelly
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT clamp Kelly to [0, 1] (return raw value)
Do NOT handle missing keys (let KeyError propagate)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 3 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_kelly_criterion(metrics: dict[str, float]) -> float:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import calculate_kelly_criterion; kelly = calculate_kelly_criterion({'win_rate': 0.6, 'avg_win': 20, 'avg_loss': -10}); print(f'Kelly: {kelly:.2%}')"
Expected: "Kelly: 40.00%"
Run: pytest test/test_trade_analyzer.py::test_kelly -v
Expected: 3 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
US-007: Sharpe ratio with tests (~15 min, ~60 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: calculate_sharpe_ratio(pnl_list: list[dict]) -> float
Test additions: test/test_trade_analyzer.py (modify)
Reference: Sharpe = mean(returns) / stdev(returns), assume risk-free rate = 0
Target: ~10 lines code + ~50 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 5 min, ~50 lines): Write tests first
Add to test/test_trade_analyzer.py
Import calculate_sharpe_ratio (will fail initially)
Write 3 test cases:
test_sharpe_positive - Consistent positive returns → sharpe > 0
test_sharpe_volatile - High volatility → low sharpe
test_sharpe_single_trade - stdev=0 → raise ValueError
Run: pytest test/test_trade_analyzer.py::test_sharpe -v
VERIFY: Tests fail (function doesn't exist) - RED phase
GREEN Phase (~50% - 7 min, ~10 lines): Implement function
Add calculate_sharpe_ratio() to src/trade_analyzer.py
Import statistics.mean, statistics.stdev
Extract P&L values: returns = [trade['pnl'] for trade in pnl_list]
Guard: if len(returns) < 2, raise ValueError("Need at least 2 trades")
Calculate: mean_return = mean(returns), std_return = stdev(returns)
Guard: if std_return == 0, raise ValueError("stdev cannot be zero")
Calculate: sharpe = mean_return / std_return
Return float
Run: pytest test/test_trade_analyzer.py::test_sharpe -v
VERIFY: All 3 tests pass - GREEN phase
VERIFY Phase (~20% - 3 min): Break/fix verification
Temporarily break: Use population stdev (pstdev) instead of sample stdev
Run: pytest test/test_trade_analyzer.py::test_sharpe -v
VERIFY: Tests fail on sharpe calculation
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects
Input pnl_list not mutated
Returns single float value or raises exception
✅ Deterministic: Same P&L list → same Sharpe
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT annualize Sharpe (return raw value)
Do NOT use risk-free rate (assume 0)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 3 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_sharpe_ratio(pnl_list: list[dict]) -> float:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import calculate_sharpe_ratio; sharpe = calculate_sharpe_ratio([{'pnl': 10}, {'pnl': 20}, {'pnl': -5}]); print(f'Sharpe: {sharpe:.2f}')"
Expected: "Sharpe: N.NN"
Run: pytest test/test_trade_analyzer.py::test_sharpe -v
Expected: 3 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
US-REVIEW-S2: Sprint 2 Review 🚧 GATE (~5 min)
Scope: US-005, US-006, US-007
Review Steps:
Run: pytest test/test_trade_analyzer.py -v --tb=short
Expected: 22+ tests passed (13 from S1 + 3+3+3 from S2)
Run: mypy src/trade_analyzer.py --strict
Expected: Success
Run: python -c "from src.trade_analyzer import calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio; print('imports ok')"
Expected: "imports ok"
Run: pytest test/test_trade_analyzer.py --cov=src.trade_analyzer --cov-report=term
Expected: Coverage > 90%
Linus 5-Layer Analysis (from linus-prompt-code-review.md):
Data Structure Analysis : "Bad programmers worry about code. Good programmers worry about data structures."
What is core data? Metrics dict → EV floats, P&L list → Sharpe float
Where does data flow? Metrics (from S1) → EV calculations → results
Any unnecessary copying or conversion? No - direct calculations on input
Special Case Identification : "Good code has no special cases"
Find all if/else branches: Zero checks (avg_loss, stdev), length checks (< 2 trades)
Which are real business logic? All are mathematical constraints (division by zero, sample size)
Can we redesign data structures to eliminate branches? No - mathematical requirements
Complexity Review : "If > 3 levels of indentation, redesign it"
What is the essence? Apply EV formulas to trade statistics
How many concepts does solution use? 3 statistical formulas (Standard EV, Kelly, Sharpe)
Can we reduce it to half? Already minimal - one function per formula
Destructive Analysis : "Never break userspace" - backward compatibility
List all existing functionality that might be affected: S1 functions unchanged
Which dependencies will be broken? None - additive changes only
How to improve without breaking anything? All new functions, S1 remains stable
Practicality Verification : "Theory and practice sometimes clash. Theory loses."
Does this problem really exist in production? Yes - need risk metrics for position sizing
How many users encounter this? Single user (manual trading decisions)
Does solution complexity match problem severity? Yes - standard financial metrics
Taste Score: Good taste / Acceptable / Garbage
Test File Checks (from linus-prompt-code-review.md):
Tests import from src.trade_analyzer (not inline)
No production logic in test file
Only test_* function names allowed
Functional Programming Verification:
Pure functions? All EV functions (calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio) have no side effects?
Deterministic? Same inputs → same outputs verified?
Immutable? No input mutation verified?
Docstrings? All include "PURE FUNCTION" marker?
Separation? Pure functions (calculations) maintained from S1?
Cross-Task Checks:
Verify patterns consistent across all EV functions (error handling, type hints)
Check no orphaned imports or dead code
Validate error handling is uniform (ValueError with descriptive messages)
Gate:
If issues found: Create fix tasks (US-005a, US-006a, etc.), output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-S2 complete", output <review-passed/>
Sprint 3: Output & Integration (~35 min)
Priority: HIGH
Purpose: Create main analysis function and console output
Status: IN PROGRESS (US-008 complete)
US-008: Main analysis function with tests (~10 min, ~50 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: analyze_trades(filepath: str) -> dict[str, Any]
Test additions: test/test_trade_analyzer.py (modify)
Reference: Orchestrate all functions from S1+S2
Target: ~15 lines code + ~35 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 3 min, ~35 lines): Write tests first
Add to test/test_trade_analyzer.py
Import analyze_trades (will fail initially)
Write 2 test cases:
test_analyze_trades_full_pipeline - Use sample CSV, verify output keys
test_analyze_trades_structure - Verify weekly_data, overall_metrics, ev_metrics keys
Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
VERIFY: Tests fail (function doesn't exist) - RED phase
GREEN Phase (~50% - 5 min, ~15 lines): Implement function
Add analyze_trades() to src/trade_analyzer.py
Call: trades = load_trades(filepath)
Call: pnls = calculate_pnl(trades)
Call: weeks = aggregate_by_week(pnls)
Call: overall_metrics = calculate_metrics(pnls)
Call: standard_ev = calculate_standard_ev(overall_metrics)
Call: kelly = calculate_kelly_criterion(overall_metrics)
Call: sharpe = calculate_sharpe_ratio(pnls)
Build result dict: {"weekly_data": weeks, "overall_metrics": overall_metrics, "ev_metrics": {"standard_ev": standard_ev, "kelly": kelly, "sharpe": sharpe}}
Return dict
Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
VERIFY: All 2 tests pass - GREEN phase
VERIFY Phase (~20% - 2 min): Break/fix verification
Temporarily break: Skip calculate_pnl call
Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
VERIFY: Tests fail (missing P&L data)
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: NO - this is I/O layer (reads file)
File I/O is side effect (acceptable in shell layer)
All calculations delegate to pure functions
✅ Deterministic: Same file content → same output
✅ Immutable: Don't modify intermediate results
✅ Separation: I/O (file read) + orchestration, pure logic in S1+S2 functions
Do NOT add logging (caller handles output)
Do NOT filter or transform data (use raw results)
Do NOT catch exceptions (let them propagate)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 2 tests pass
VERIFY: Break/fix cycle works
Function signature: def analyze_trades(filepath: str) -> dict[str, Any]:
Run: python -c "from src.trade_analyzer import analyze_trades; result = analyze_trades('trade_snapshots.csv'); print(f'Analyzed {len(result[\"weekly_data\"])} weeks')"
Expected: "Analyzed N weeks"
Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
Expected: 2 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
US-009: Console formatter with tests (~10 min, ~45 lines)
Implementation:
File: src/trade_analyzer.py (modify)
Function: format_output(analysis: dict[str, Any]) -> str
Test additions: test/test_trade_analyzer.py (modify)
Reference: Format weekly data + metrics as readable console output
Target: ~15 lines code + ~30 lines tests
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 3 min, ~30 lines): Write tests first
Add to test/test_trade_analyzer.py
Import format_output (will fail initially)
Write 2 test cases:
test_format_output_contains_headers - Verify "Week", "Trades", "P&L", "Win Rate", "EV"
test_format_output_contains_values - Verify actual values appear in output
Run: pytest test/test_trade_analyzer.py::test_format_output -v
VERIFY: Tests fail (function doesn't exist) - RED phase
GREEN Phase (~50% - 5 min, ~15 lines): Implement function
Add format_output() to src/trade_analyzer.py
Build string with sections:
Header: "Trade Analysis Report"
Weekly section: Loop through weekly_data, format as table
Metrics section: Display overall_metrics (win_rate, avg_win, avg_loss, total_pnl)
EV section: Display ev_metrics (standard_ev, kelly, sharpe)
Use f-strings for formatting: {value:.2f}, {percent:.1%}
Return formatted string
Run: pytest test/test_trade_analyzer.py::test_format_output -v
VERIFY: All 2 tests pass - GREEN phase
VERIFY Phase (~20% - 2 min): Break/fix verification
Temporarily break: Remove weekly section
Run: pytest test/test_trade_analyzer.py::test_format_output -v
VERIFY: Tests fail (missing headers)
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Pure Function: No side effects
Input analysis dict not mutated
Returns formatted string
✅ Deterministic: Same analysis → same output string
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT print (return string, caller prints)
Do NOT sort weeks (preserve order from aggregate_by_week)
Do NOT add colors/formatting codes (plain text only)
Acceptance Criteria:
RED: Tests fail before implementation
GREEN: All 2 tests pass
VERIFY: Break/fix cycle works
Function signature: def format_output(analysis: dict[str, Any]) -> str:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import analyze_trades, format_output; analysis = analyze_trades('trade_snapshots.csv'); output = format_output(analysis); print(output[:100])"
Expected: First 100 chars of report
Run: pytest test/test_trade_analyzer.py::test_format_output -v
Expected: 2 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:
grep -n "print" src/trade_analyzer.py | grep format_output → empty
US-010: Integration tests (~10 min, ~40 lines)
Implementation:
File: test/test_trade_analyzer_integration.py (create new)
Tests: End-to-end workflow with real CSV
Reference: trade_snapshots.csv
Target: ~40 lines test code
Approach (TDD RED-GREEN-VERIFY):
RED Phase (~30% - 3 min, ~40 lines): Write tests first
Create test/test_trade_analyzer_integration.py
Import from src.trade_analyzer
Write 2 test cases:
test_full_workflow_real_csv - Load → analyze → format → verify output
test_all_ev_methods_computed - Verify standard_ev, kelly, sharpe all present
Run: pytest test/test_trade_analyzer_integration.py -v
VERIFY: Tests should PASS (all code exists from S1+S2+S3) - GREEN already
GREEN Phase (~50% - 5 min): Already implemented
All functions exist from previous user stories
Integration test verifies they work together
Run: pytest test/test_trade_analyzer_integration.py -v
VERIFY: All 2 tests pass - GREEN phase
VERIFY Phase (~20% - 2 min): Break/fix verification
Temporarily break: Comment out one function (e.g., calculate_pnl)
Run: pytest test/test_trade_analyzer_integration.py -v
VERIFY: Tests fail (pipeline broken)
Revert change
VERIFY: Tests pass again
Functional Programming Requirements (CRITICAL):
✅ Integration tests verify pure function composition
✅ No mocks needed (pure functions compose naturally)
✅ Deterministic output verified end-to-end
Do NOT use fixtures with side effects
Do NOT mock pure functions (test real implementations)
Do NOT test implementation details (only inputs/outputs)
Acceptance Criteria:
Create test/test_trade_analyzer_integration.py
Import from src.trade_analyzer
2 test functions: test_full_workflow_real_csv, test_all_ev_methods_computed
Run: pytest test/test_trade_analyzer_integration.py -v
Expected: 2 passed
Run: pytest test/ -v
Expected: 26+ tests total (24 unit + 2 integration), all pass
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
US-REVIEW-S3: Sprint 3 Review 🚧 GATE (~5 min)
Scope: US-008, US-009, US-010
Review Steps:
Run: pytest test/ -v --tb=short
Expected: 26+ tests passed (22 from S1+S2 + 2+2+2 from S3)
Run: mypy src/trade_analyzer.py --strict
Expected: Success
Run: python -c "from src.trade_analyzer import analyze_trades, format_output; print('imports ok')"
Expected: "imports ok"
Run: pytest test/ --cov=src.trade_analyzer --cov-report=term-missing
Expected: Coverage > 90%
Linus 5-Layer Analysis (from linus-prompt-code-review.md):
Data Structure Analysis : "Bad programmers worry about code. Good programmers worry about data structures."
What is core data? Analysis result dict with weekly_data, overall_metrics, ev_metrics
Where does data flow? File → trades → P&L → aggregation → metrics → formatted output
Any unnecessary copying or conversion? No - pipeline flows naturally
Special Case Identification : "Good code has no special cases"
Find all if/else branches: None in orchestration layer (delegates to pure functions)
Which are real business logic? All logic in S1+S2 pure functions
Can we redesign data structures to eliminate branches? Already clean separation
Complexity Review : "If > 3 levels of indentation, redesign it"
What is the essence? Orchestrate pure functions into analysis pipeline
How many concepts does solution use? 2 (orchestration + formatting)
Can we reduce it to half? Already minimal - straight-line calls
Destructive Analysis : "Never break userspace" - backward compatibility
List all existing functionality that might be affected: S1+S2 functions unchanged
Which dependencies will be broken? None - additive changes only
How to improve without breaking anything? All new functions, previous sprints stable
Practicality Verification : "Theory and practice sometimes clash. Theory loses."
Does this problem really exist in production? Yes - need complete analysis tool
How many users encounter this? Single user (ad-hoc analysis)
Does solution complexity match problem severity? Yes - simple orchestration of pure functions
Taste Score: Good taste / Acceptable / Garbage
Test File Checks (from linus-prompt-code-review.md):
Tests import from src.trade_analyzer (not inline)
No production logic in test files
Only test_* function names allowed
Integration tests use real data (no mocks)
Functional Programming Verification:
Pure functions? Format function is pure (no side effects)?
Deterministic? Same inputs → same outputs verified end-to-end?
Immutable? No input mutation in orchestration layer?
Docstrings? All pure functions include "PURE FUNCTION" marker?
Separation? Clear boundary: I/O (analyze_trades) vs Pure (all calculations + format)?
Cross-Task Checks:
Verify patterns consistent across all sprint 3 functions
Check no orphaned imports or dead code
Validate error handling is uniform (propagate exceptions)
Integration tests verify full pipeline works
Gate:
If issues found: Create fix tasks (US-008a, US-009a, etc.), output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-S3 complete", output <review-passed/>
US-REVIEW-FINAL: Final Cross-Sprint Review 🚧 GATE (~10 min)
Scope: All sprints (US-001 through US-010)
Purpose: Verify cross-sprint consistency and overall quality after all individual sprint reviews have passed.
Review Steps:
Run: git log --oneline | head -20
Verify all sprint review commits exist (US-REVIEW-S1, US-REVIEW-S2, US-REVIEW-S3)
Run: pytest test/ -v
Expected: 26+ tests pass, 0 failures
Run: mypy src/ --strict
Expected: No errors
Run: pytest test/ --cov=src --cov-report=term-missing
Expected: Coverage >= 90%
Run: python -c "from src.trade_analyzer import analyze_trades, format_output; result = analyze_trades('trade_snapshots.csv'); print(format_output(result)[:200])"
Expected: Formatted report output (first 200 chars)
Cross-Sprint Consistency Checks:
Naming conventions consistent across all sprints?
All functions use snake_case
All test functions start with test_
Error handling patterns uniform?
Pure functions raise ValueError with descriptive messages
I/O functions propagate exceptions
No duplicate code between sprint modules?
All functions in single module (src/trade_analyzer.py)
No code duplication
Import structure clean (no circular dependencies)?
Single module, no internal imports
All TODOs resolved?
Run: grep -r "TODO" src/
Expected: Empty
Linus 5-Layer Analysis (Whole Feature):
Data Structure Analysis : Does data flow cleanly across all sprints?
CSV → list[dict] (S1) → list[pnl_dict] (S1) → dict[week, list] (S1) → metrics dict (S1) → EV floats (S2) → analysis dict (S3) → formatted string (S3)
Flow is linear, no backtracking, each transformation builds on previous
Special Case Identification : Any special cases that could be eliminated?
Zero checks in S2 (mathematical constraints, can't eliminate)
Side adjustment in S1 (domain logic, can't eliminate)
All special cases are legitimate business/math requirements
Complexity Review : Is overall architecture simple and elegant?
Yes - pure functional pipeline with clear separation
Functional Core (S1 calculations + S2 EV + S3 format) vs Imperative Shell (S3 I/O)
No deep nesting, straight-line logic
Destructive Analysis : Does complete feature break existing functionality?
No - new module, no existing dependencies
All functions composable, no breaking changes possible
Practicality Verification : Does complete solution match problem scope?
Yes - lightweight analysis tool for CSV data
No over-engineering (no database, no web UI, no async)
Appropriate for ad-hoc trading analysis
Taste Score: Good taste / Acceptable / Garbage
Functional Programming Verification (If Applicable):
Pure functions consistently used across sprints?
S1: calculate_pnl, aggregate_by_week, calculate_metrics (pure)
S2: calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio (pure)
S3: format_output (pure)
Clear separation: Functional Core vs Imperative Shell?
Core: All calculation functions (pure, no I/O)
Shell: load_trades (I/O), analyze_trades (orchestration + I/O)
No side effects leaked into calculation functions?
Verified via grep: no print, no logger, no mutations
All pure functions marked in docstrings?
Checklist: verify "PURE FUNCTION" in all S1/S2/S3 calculation function docstrings
Gate:
If issues found: Create fix tasks, output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-FINAL complete", output <review-passed/>
No database storage (CSV input only, in-memory processing)
No web UI or API (console output only)
No real-time data streaming (batch CSV analysis)
No email validation or market name validation (trust input data)
No multi-threaded processing (single-threaded analysis sufficient)
No position sizing recommendations (metrics only, not advice)
No charting or visualization (text output only)
No persistence of analysis results (ephemeral output)
Python 3.10+ required for dict[str, str | int] type hints
Dependencies: csv, datetime, statistics (stdlib only, no external packages)
File I/O limited to single CSV read (no writes)
Functional programming: Pure functions for all calculations (S1+S2+S3 core), I/O isolated to shell (S3 orchestration)
Test coverage target: 90%+
All pure functions documented with "PURE FUNCTION" marker in docstrings
All tests pass? pytest test/ -v exits 0
Typecheck clean? mypy src/ --strict exits 0
Coverage adequate? pytest test/ --cov=src --cov-fail-under=90 exits 0
No TODO comments left? grep -r "TODO" src/ returns empty
Integration test passes? python -c "from src.trade_analyzer import analyze_trades, format_output; print(format_output(analyze_trades('trade_snapshots.csv')))" runs without error
STOP if any check fails. Fix before proceeding.