Skip to content

Instantly share code, notes, and snippets.

@williamp44
Last active February 15, 2026 05:01
Show Gist options
  • Select an option

  • Save williamp44/e5fe05b82f5a1d99897ce8e34622b863 to your computer and use it in GitHub Desktop.

Select an option

Save williamp44/e5fe05b82f5a1d99897ce8e34622b863 to your computer and use it in GitHub Desktop.
Ralph Loop: Example PRD

Linus Torvalds Code Review: Trade Analyzer

"Bad programmers worry about the code. Good programmers worry about data structures."


Taste Score: Acceptable (but with one piece of garbage hiding in plain sight)

The overall architecture is clean - pure functional pipeline, good I/O separation, sensible decomposition. But I found issues that range from "annoying" to "this will bite you."


Layer 1: Data Structure Analysis

The Core Problem: dict[str, Any] Everywhere

You pass dict[str, Any] through the entire pipeline. No TypedDict, no dataclass, no NamedTuple. This means:

  • Zero compile-time key validation. Misspell 'entry_price' as 'enrty_price' and you get a runtime KeyError
  • No IDE autocompletion on any of these dicts
  • mypy --strict passes but catches nothing about your actual data shape

This is the kind of thing where mypy gives you a false sense of security. You have type safety on the container (dict) but zero safety on the content.

Verdict: Not a showstopper for a single-user analysis script, but it's a design smell. TypedDict would cost you 10 lines and buy you real compile-time guarantees.


Layer 2: Special Case Identification - FATAL ISSUE

calculate_pnl() at src/trade_analyzer.py:82-113 - This is the worst function in the codebase.

for market, market_trades in groupby(
    sorted(trades, key=lambda t: t['market']), key=lambda t: t['market']
):
    entries = [t for t in market_trades if t['event'] == 'ENTRY']

    # Re-fetch market trades for exits (groupby iterator is consumed)
    all_market_trades = [t for t in trades if t['market'] == market]
    exits = [t for t in all_market_trades if t['event'] == 'EXIT']

That comment on line 88 - "Re-fetch market trades for exits (groupby iterator is consumed)" - is a confession, not documentation. The developer knew the groupby iterator was exhausted after building entries, and instead of fixing the data structure, they patched around it with an O(n) full rescan of all trades for every market group.

This is O(n*m) where m = number of markets. A defaultdict(list) gives you O(n) in a single pass with zero special cases:

by_market: dict[str, list] = {}
for t in trades:
    by_market.setdefault(t['market'], []).append(t)
for market, market_trades in by_market.items():
    entries = [t for t in market_trades if t['event'] == 'ENTRY']
    exits = [t for t in market_trades if t['event'] == 'EXIT']
    # ... pair them

No iterator exhaustion. No re-scan. No comment needed to explain why you're doing something stupid.

This is a textbook example of patching bad design instead of fixing it.


Layer 3: Complexity Review

The good news: Most functions are short, single-purpose, max 2 levels of indentation. format_output, calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio - these are all clean, minimal, good taste.

load_trades line 42-47: The numeric_fields list is recreated inside the loop on every row. Move it outside:

NUMERIC_FIELDS = [
    'fill_price', 'fill_size', 'btc_price', ...
]
# then in the loop:
for field in NUMERIC_FIELDS:

Minor, but it shows carelessness about what runs per-iteration vs. once.


Layer 4: Destructive Analysis - Missing Side Adjustment

The PRD at line 150 says:

"Side adjustment: YES = normal, NO = inverted (100 - price)"

The implementation completely ignores this. calculate_pnl does (exit_price - entry_price) * size / 100 regardless of whether side is YES or NO. The tests only test YES-side trades.

If this is a binary options platform (which the YES/NO nomenclature strongly suggests), a NO-side trade pays out (100 - price). Calculating P&L without this adjustment means your P&L numbers for NO-side trades are wrong.

Either:

  1. The PRD requirement is real and the code is buggy, or
  2. The PRD requirement is wrong and should be removed

Either way, there's a discrepancy between spec and implementation that nobody caught because the tests don't exercise it.


Layer 5: Practicality Verification

Test Quality Issues:

  1. test_load_trades_row_count (line 45-51): Loads the CSV twice and checks len(trades) == len(trades2). A parser that always returns [] passes this test. This test verifies determinism, not correctness. It should assert against a known count.

  2. Uncovered lines 96, 160, 251 - All three are error/edge-case paths:

    • Line 96: break when more exits than entries (untested)
    • Line 160: ZeroDivisionError for empty pnl_list (untested)
    • Line 251: ValueError for zero stdev (untested)

    97% coverage sounds great, but the 3% you missed is all edge-case handling - the exact paths where bugs hide.

  3. Kelly test docstring lies (line 376): test_kelly_positive says -> kelly=0.4 but the actual assertion is assert kelly == pytest.approx(-0.2). The comment in the test body explains the confusion, but the docstring still says 0.4. This is sloppy.


Key Insights

Dimension Finding
Data Structure dict[str, Any] provides zero type safety on data shape. TypedDict is the obvious fix.
Complexity calculate_pnl uses groupby + re-scan instead of a simple dict. Remove groupby entirely.
Risk Points Side adjustment (YES/NO) from PRD is unimplemented. P&L for NO trades may be wrong.
Test Gaps Edge cases (empty list, excess exits, zero stdev) are untested. One test name is misleading.

Linus-style Verdict

Worth doing? Yes. The architecture is fundamentally sound - pure functional core, I/O at the boundary, linear pipeline. The person who wrote this understands separation of concerns.

But: The calculate_pnl function needs to be rewritten. Not refactored - rewritten. Drop groupby, use a simple dict, eliminate the re-scan hack. And figure out whether the side-adjustment requirement from the PRD is real. If it is, this is a correctness bug, not a style issue.

The rest - TypedDicts, constant extraction, test hardening - those are "do it when you touch the file" improvements. The groupby mess and the side-adjustment gap are "fix it before you trust the output" issues.

just for grins i ran the steps to build the example prd, then did a code review of what was built using this prompt;
use @linus-prompt-code-review.md ot reivew the code build per
progress_trade_analyzer.txt
PRD_TRADE_ANALYZER.md

PRD: Trade Analyzer

Introduction

Analyze trading snapshots from CSV to calculate P&L, win rate, and expected value metrics (Standard EV, Kelly Criterion, Sharpe Ratio) aggregated by ISO week. Pure functional core for calculations with console output.

Goals

  • Load trade snapshots from CSV with type validation
  • Calculate realized P&L by pairing ENTRY/EXIT events
  • Aggregate trades by ISO week (Monday start)
  • Calculate win rate, average win/loss, and risk metrics
  • Output weekly analysis to console in readable format

Task Summary

Total Tasks: 14 (10 implementation + 4 reviews) Estimated Time: ~155 min (~2.6 hours) Progress: 0/14 complete (0%) Status: NOT STARTED Next Task: US-001

Task Dependencies

graph TD
    US001[US-001: CSV loader] --> US002[US-002: P&L calc]
    US001 --> US003[US-003: Weekly agg]
    US001 --> US004[US-004: Win rate]
    US002 --> REVIEW1[🚧 US-REVIEW-S1]
    US003 --> REVIEW1
    US004 --> REVIEW1
    REVIEW1 --> US005[US-005: Standard EV]
    REVIEW1 --> US006[US-006: Kelly]
    REVIEW1 --> US007[US-007: Sharpe]
    US005 --> REVIEW2[🚧 US-REVIEW-S2]
    US006 --> REVIEW2
    US007 --> REVIEW2
    REVIEW2 --> US008[US-008: Main analysis]
    REVIEW2 --> US009[US-009: Console formatter]
    REVIEW2 --> US010[US-010: Integration tests]
    US008 --> REVIEW3[🚧 US-REVIEW-S3]
    US009 --> REVIEW3
    US010 --> REVIEW3
    REVIEW3 --> REVIEWF[🚧 US-REVIEW-FINAL]
Loading

Sprint 1: Data Loading & P&L Calculation (~60 min)

Priority: HIGH Purpose: Load CSV data and calculate realized P&L per trade Status: NOT STARTED

  • US-001 CSV loader with tests (~15 min, ~80 lines)
  • US-002 P&L calculator with tests (~15 min, ~85 lines) [depends: US-001]
  • US-003 Weekly aggregator with tests (~15 min, ~90 lines) [depends: US-001]
  • US-004 Win rate calculator with tests (~10 min, ~45 lines) [depends: US-002]
  • US-REVIEW-S1 Sprint 1 Review 🚧 GATE (~5 min) [depends: US-002, US-003, US-004]

US-001: CSV loader with tests (~15 min, ~80 lines)

Implementation:

  • File: src/trade_analyzer.py (create new)
  • Function: load_trades(filepath: str) -> list[dict[str, Any]]
  • Test File: test/test_trade_analyzer.py (create new)
  • Reference: trade_snapshots.csv (place CSV in project root)
  • Target: ~20 lines code + ~60 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 5 min, ~60 lines): Write tests first

    • Create test/test_trade_analyzer.py
    • Import from src.trade_analyzer (will fail - module doesn't exist yet)
    • Write 3 test cases:
      • test_load_trades_valid_csv - Verify loads sample CSV
      • test_load_trades_converts_types - Check timestamp→datetime, prices→float
      • test_load_trades_row_count - Verify row count matches file
    • Run: pytest test/test_trade_analyzer.py -v
    • VERIFY: Tests fail with ImportError (function doesn't exist) - this is RED phase
  2. GREEN Phase (~50% - 7 min, ~20 lines): Implement function

    • Create src/trade_analyzer.py
    • Use csv.DictReader to read CSV
    • Parse timestamp with datetime.fromisoformat()
    • Convert numeric fields: fill_price, fill_size, btc_price, deviation, book_bid, book_ask, book_depth
    • Convert asks/bids fields: asks_10, asks_20, asks_30, bids_60, bids_70, bids_80
    • Return list of dicts with converted types
    • Run: pytest test/test_trade_analyzer.py -v
    • VERIFY: All 3 tests pass - this is GREEN phase
  3. VERIFY Phase (~20% - 3 min): Break/fix verification

    • Temporarily break code: Comment out timestamp parsing
    • Run: pytest test/test_trade_analyzer.py -v
    • VERIFY: Test fails on datetime conversion
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects in parsing logic
    • File I/O happens in I/O layer (load_trades does file read)
    • Parsing logic extracted to pure functions
  • Deterministic: Same CSV content → same output always
  • Immutable: Don't modify input data
  • Referential Transparency: Can cache results
  • Do NOT log inside parsing functions
  • Do NOT validate market names (out of scope)
  • Do NOT handle malformed CSV (let csv.DictReader raise)

Acceptance Criteria:

  • RED: Tests written and verified failing (ImportError)
  • GREEN: Tests pass after implementation
  • VERIFY: Break/fix cycle confirms tests catch bugs
  • Create src/trade_analyzer.py
  • Function signature: def load_trades(filepath: str) -> list[dict[str, Any]]:
  • Run: pytest test/test_trade_analyzer.py::test_load_trades_valid_csv -v
  • Expected: Pass, returns list of dicts
  • Run: python -c "from src.trade_analyzer import load_trades; trades = load_trades('trade_snapshots.csv'); print(f'Loaded {len(trades)} trades')"
  • Expected: "Loaded N trades"
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0

US-002: P&L calculator with tests (~15 min, ~85 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: calculate_pnl(trades: list[dict]) -> list[dict]
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: Pair ENTRY/EXIT events by market, calculate P&L = (exit_price - entry_price) * size
  • Target: ~25 lines code + ~60 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 5 min, ~60 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import calculate_pnl (will fail initially)
    • Write 4 test cases:
      • test_calculate_pnl_winning_trade - YES ENTRY@21 EXIT@56 = +$35 profit
      • test_calculate_pnl_losing_trade - YES ENTRY@56 EXIT@21 = -$35 loss
      • test_calculate_pnl_multiple_markets - Verify independent P&L per market
      • test_calculate_pnl_unpaired_entry - ENTRY without EXIT = None (ignore)
    • Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
    • VERIFY: Tests fail (function doesn't exist or not imported) - RED phase
  2. GREEN Phase (~50% - 8 min, ~25 lines): Implement function

    • Add calculate_pnl() to src/trade_analyzer.py
    • Group trades by market field
    • For each market, find ENTRY/EXIT pairs (chronological order)
    • Calculate P&L: (exit_fill_price - entry_fill_price) * fill_size / 100 (cents to dollars)
    • Side adjustment: YES = normal, NO = inverted (100 - price)
    • Return list with: {market, entry_time, exit_time, side, entry_price, exit_price, size, pnl}
    • Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
    • VERIFY: All 4 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 2 min): Break/fix verification

    • Temporarily break: Remove side adjustment logic
    • Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
    • VERIFY: Tests fail on NO side calculations
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects
    • Input trades list not mutated
    • Returns new list of P&L dicts
  • Deterministic: Same trades → same P&L
  • Immutable: Treat input as read-only
  • Referential Transparency: Can replace call with result
  • Do NOT log inside calculation
  • Do NOT modify trades list
  • Do NOT handle partial fills (assume all fills complete)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 4 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def calculate_pnl(trades: list[dict]) -> list[dict]:
  • Docstring: Must include "PURE FUNCTION" marker
  • Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); print(f'{len(pnls)} completed trades')"
  • Expected: "N completed trades"
  • Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
  • Expected: 4 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0
  • Functional Programming: Verify no side effects:
    • grep -n "logger\." src/trade_analyzer.py | grep calculate_pnl → empty
    • grep -n "print" src/trade_analyzer.py | grep calculate_pnl → empty

US-003: Weekly aggregator with tests (~15 min, ~90 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: aggregate_by_week(pnl_list: list[dict]) -> dict[str, list[dict]]
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: ISO week format (Monday start)
  • Target: ~20 lines code + ~70 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 5 min, ~70 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import aggregate_by_week (will fail initially)
    • Write 4 test cases:
      • test_aggregate_by_week_single_week - All trades in one week
      • test_aggregate_by_week_multiple_weeks - Trades span 3 weeks
      • test_aggregate_by_week_iso_format - Verify "2025-W01" format
      • test_aggregate_by_week_monday_boundary - Monday trade starts new week
    • Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
    • VERIFY: Tests fail (function doesn't exist) - RED phase
  2. GREEN Phase (~50% - 8 min, ~20 lines): Implement function

    • Add aggregate_by_week() to src/trade_analyzer.py
    • Use entry_time.isocalendar() to get ISO week
    • Format as "YYYY-WNN" (e.g., "2025-W01")
    • Group P&L dicts by week string
    • Return dict: {"2025-W01": [pnl1, pnl2], "2025-W02": [pnl3]}
    • Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
    • VERIFY: All 4 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 2 min): Break/fix verification

    • Temporarily break: Use Sunday week start instead of Monday
    • Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
    • VERIFY: test_aggregate_by_week_monday_boundary fails
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects
    • Input pnl_list not mutated
    • Returns new dict structure
  • Deterministic: Same P&L list → same weekly grouping
  • Immutable: Treat input as read-only
  • Referential Transparency: Can cache results
  • Do NOT log inside aggregation
  • Do NOT sort weeks (dict preserves insertion order in Python 3.7+)
  • Do NOT filter weeks (include all)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 4 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def aggregate_by_week(pnl_list: list[dict]) -> dict[str, list[dict]]:
  • Docstring: Must include "PURE FUNCTION" marker
  • Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, aggregate_by_week; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); weeks = aggregate_by_week(pnls); print(f'{len(weeks)} weeks')"
  • Expected: "N weeks"
  • Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
  • Expected: 4 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0
  • Functional Programming: Verify no side effects:
    • grep -n "pnl_list\.append\|pnl_list\.sort" src/trade_analyzer.py → empty (no mutation)

US-004: Win rate calculator with tests (~10 min, ~45 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: calculate_metrics(pnl_list: list[dict]) -> dict[str, float]
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: Win rate = wins / total, avg_win = mean(wins), avg_loss = mean(losses)
  • Target: ~15 lines code + ~30 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 3 min, ~30 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import calculate_metrics (will fail initially)
    • Write 2 test cases:
      • test_calculate_metrics_mixed_trades - 3 wins (+10, +20, +5), 2 losses (-15, -5) → win_rate=0.6
      • test_calculate_metrics_all_wins - 5 wins → win_rate=1.0
    • Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
    • VERIFY: Tests fail (function doesn't exist) - RED phase
  2. GREEN Phase (~50% - 5 min, ~15 lines): Implement function

    • Add calculate_metrics() to src/trade_analyzer.py
    • Separate P&L list into wins (pnl > 0) and losses (pnl <= 0)
    • Calculate: win_rate = len(wins) / len(pnl_list)
    • Calculate: avg_win = sum(wins) / len(wins) if wins else 0
    • Calculate: avg_loss = sum(losses) / len(losses) if losses else 0
    • Return: {"win_rate": float, "avg_win": float, "avg_loss": float, "total_pnl": float}
    • Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
    • VERIFY: All 2 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 2 min): Break/fix verification

    • Temporarily break: Count breakeven (pnl=0) as win
    • Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
    • VERIFY: Tests fail on win rate calculation
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects
    • Input pnl_list not mutated
    • Returns new dict with metrics
  • Deterministic: Same P&L list → same metrics
  • Immutable: Treat input as read-only
  • Referential Transparency: Can replace call with result
  • Do NOT log inside calculation
  • Do NOT round values (caller handles formatting)
  • Do NOT handle empty list (let division raise ZeroDivisionError)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 2 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def calculate_metrics(pnl_list: list[dict]) -> dict[str, float]:
  • Docstring: Must include "PURE FUNCTION" marker
  • Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, calculate_metrics; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); metrics = calculate_metrics(pnls); print(f'Win rate: {metrics[\"win_rate\"]:.1%}')"
  • Expected: "Win rate: N.N%"
  • Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
  • Expected: 2 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0

US-REVIEW-S1: Sprint 1 Review 🚧 GATE (~5 min)

Scope: US-001, US-002, US-003, US-004

Review Steps:

  • Run: pytest test/test_trade_analyzer.py -v --tb=short
  • Expected: 13+ tests passed (3+4+4+2)
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: Success
  • Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, aggregate_by_week, calculate_metrics; print('imports ok')"
  • Expected: "imports ok"
  • Run: pytest test/test_trade_analyzer.py --cov=src.trade_analyzer --cov-report=term
  • Expected: Coverage > 90%

Linus 5-Layer Analysis (from linus-prompt-code-review.md):

  1. Data Structure Analysis: "Bad programmers worry about code. Good programmers worry about data structures."

    • What is core data? CSV rows → dicts → P&L dicts → weekly aggregation
    • Where does data flow? File → list[dict] → list[pnl_dict] → dict[week, list[pnl_dict]]
    • Any unnecessary copying or conversion? Single-pass where possible, transformations create new structures (immutable)
  2. Special Case Identification: "Good code has no special cases"

    • Find all if/else branches: Empty list checks, win/loss separation, side adjustment (YES/NO)
    • Which are real business logic? Side adjustment (YES/NO inversion), win/loss classification
    • Can we redesign data structures to eliminate branches? Side adjustment is domain logic (can't eliminate)
  3. Complexity Review: "If > 3 levels of indentation, redesign it"

    • What is the essence? Transform CSV rows → P&L calculations → weekly metrics
    • How many concepts does solution use? 4 transformations (load → calculate → aggregate → metrics)
    • Can we reduce it to half? Already minimal - each function has single responsibility
  4. Destructive Analysis: "Never break userspace" - backward compatibility

    • List all existing functionality that might be affected: None (new module)
    • Which dependencies will be broken? None
    • How to improve without breaking anything? N/A - greenfield
  5. Practicality Verification: "Theory and practice sometimes clash. Theory loses."

    • Does this problem really exist in production? Yes - need to analyze trade performance
    • How many users encounter this? Single user (manual trading analysis)
    • Does solution complexity match problem severity? Yes - straightforward CSV analysis with pure functions

Taste Score: Good taste / Acceptable / Garbage

Test File Checks (from linus-prompt-code-review.md):

  • Tests import from src.trade_analyzer (not inline)
  • No production logic in test file
  • Only test_* function names allowed

Functional Programming Verification:

  • Pure functions? All calculation functions (calculate_pnl, aggregate_by_week, calculate_metrics) have no side effects?
  • Deterministic? Same inputs → same outputs verified?
  • Immutable? No input mutation verified?
  • Docstrings? All include "PURE FUNCTION" marker?
  • Separation? Pure functions (calculations) vs I/O (load_trades)?

Cross-Task Checks:

  • Verify patterns consistent across all functions (type hints, docstrings)
  • Check no orphaned imports or dead code
  • Validate error handling is uniform (let exceptions propagate)

Gate:

  • If issues found: Create fix tasks (US-001a, US-002a, etc.), output <review-issues-found/>
  • If clean: Mark [x], commit "docs: US-REVIEW-S1 complete", output <review-passed/>

Sprint 2: Expected Value Calculations (~50 min)

Priority: HIGH Purpose: Calculate EV metrics (Standard, Kelly, Sharpe) Status: NOT STARTED

  • US-005 Standard EV with tests (~15 min, ~60 lines) [depends: US-004]
  • US-006 Kelly Criterion with tests (~15 min, ~60 lines) [depends: US-004]
  • US-007 Sharpe ratio with tests (~15 min, ~60 lines) [depends: US-002]
  • US-REVIEW-S2 Sprint 2 Review 🚧 GATE (~5 min) [depends: US-005, US-006, US-007]

US-005: Standard EV with tests (~15 min, ~60 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: calculate_standard_ev(metrics: dict[str, float]) -> float
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: EV = (win_rate × avg_win) + ((1 - win_rate) × avg_loss)
  • Target: ~10 lines code + ~50 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 5 min, ~50 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import calculate_standard_ev (will fail initially)
    • Write 3 test cases:
      • test_standard_ev_positive - win_rate=0.6, avg_win=20, avg_loss=-10 → EV=8.0
      • test_standard_ev_negative - win_rate=0.4, avg_win=10, avg_loss=-15 → EV=-5.0
      • test_standard_ev_breakeven - win_rate=0.5, avg_win=10, avg_loss=-10 → EV=0.0
    • Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
    • VERIFY: Tests fail (function doesn't exist) - RED phase
  2. GREEN Phase (~50% - 7 min, ~10 lines): Implement function

    • Add calculate_standard_ev() to src/trade_analyzer.py
    • Extract from metrics: win_rate, avg_win, avg_loss
    • Calculate: ev = (win_rate * avg_win) + ((1 - win_rate) * avg_loss)
    • Return float
    • Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
    • VERIFY: All 3 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 3 min): Break/fix verification

    • Temporarily break: Forget to multiply by (1 - win_rate)
    • Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
    • VERIFY: Tests fail on EV calculation
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects
    • Input metrics dict not mutated
    • Returns single float value
  • Deterministic: Same metrics → same EV
  • Immutable: Treat input as read-only
  • Referential Transparency: Can replace call with result
  • Do NOT log inside calculation
  • Do NOT validate metrics keys (assume caller provides correct format)
  • Do NOT handle divide-by-zero (assume metrics valid)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 3 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def calculate_standard_ev(metrics: dict[str, float]) -> float:
  • Docstring: Must include "PURE FUNCTION" marker
  • Run: python -c "from src.trade_analyzer import calculate_standard_ev; ev = calculate_standard_ev({'win_rate': 0.6, 'avg_win': 20, 'avg_loss': -10}); print(f'EV: ${ev:.2f}')"
  • Expected: "EV: $8.00"
  • Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
  • Expected: 3 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0
  • Functional Programming: Verify no side effects:
    • grep -n "metrics\[" src/trade_analyzer.py | grep "=" → only reads, no writes

US-006: Kelly Criterion with tests (~15 min, ~60 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: calculate_kelly_criterion(metrics: dict[str, float]) -> float
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: Kelly = (win_rate × |avg_loss|) - ((1 - win_rate) × avg_win) / |avg_loss|
  • Target: ~10 lines code + ~50 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 5 min, ~50 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import calculate_kelly_criterion (will fail initially)
    • Write 3 test cases:
      • test_kelly_positive - win_rate=0.6, avg_win=20, avg_loss=-10 → kelly=0.4
      • test_kelly_negative - win_rate=0.3, avg_win=10, avg_loss=-15 → kelly<0
      • test_kelly_zero_loss - avg_loss=0 → raise ValueError
    • Run: pytest test/test_trade_analyzer.py::test_kelly -v
    • VERIFY: Tests fail (function doesn't exist) - RED phase
  2. GREEN Phase (~50% - 7 min, ~10 lines): Implement function

    • Add calculate_kelly_criterion() to src/trade_analyzer.py
    • Extract from metrics: win_rate, avg_win, avg_loss
    • Guard: if avg_loss == 0, raise ValueError("avg_loss cannot be zero")
    • Calculate: kelly = ((win_rate * abs(avg_loss)) - ((1 - win_rate) * avg_win)) / abs(avg_loss)
    • Return float
    • Run: pytest test/test_trade_analyzer.py::test_kelly -v
    • VERIFY: All 3 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 3 min): Break/fix verification

    • Temporarily break: Remove abs() around avg_loss
    • Run: pytest test/test_trade_analyzer.py::test_kelly -v
    • VERIFY: Tests fail on kelly calculation
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects
    • Input metrics dict not mutated
    • Returns single float value or raises exception
  • Deterministic: Same metrics → same Kelly
  • Immutable: Treat input as read-only
  • Referential Transparency: Can replace call with result
  • Do NOT log inside calculation
  • Do NOT clamp Kelly to [0, 1] (return raw value)
  • Do NOT handle missing keys (let KeyError propagate)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 3 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def calculate_kelly_criterion(metrics: dict[str, float]) -> float:
  • Docstring: Must include "PURE FUNCTION" marker
  • Run: python -c "from src.trade_analyzer import calculate_kelly_criterion; kelly = calculate_kelly_criterion({'win_rate': 0.6, 'avg_win': 20, 'avg_loss': -10}); print(f'Kelly: {kelly:.2%}')"
  • Expected: "Kelly: 40.00%"
  • Run: pytest test/test_trade_analyzer.py::test_kelly -v
  • Expected: 3 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0

US-007: Sharpe ratio with tests (~15 min, ~60 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: calculate_sharpe_ratio(pnl_list: list[dict]) -> float
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: Sharpe = mean(returns) / stdev(returns), assume risk-free rate = 0
  • Target: ~10 lines code + ~50 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 5 min, ~50 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import calculate_sharpe_ratio (will fail initially)
    • Write 3 test cases:
      • test_sharpe_positive - Consistent positive returns → sharpe > 0
      • test_sharpe_volatile - High volatility → low sharpe
      • test_sharpe_single_trade - stdev=0 → raise ValueError
    • Run: pytest test/test_trade_analyzer.py::test_sharpe -v
    • VERIFY: Tests fail (function doesn't exist) - RED phase
  2. GREEN Phase (~50% - 7 min, ~10 lines): Implement function

    • Add calculate_sharpe_ratio() to src/trade_analyzer.py
    • Import statistics.mean, statistics.stdev
    • Extract P&L values: returns = [trade['pnl'] for trade in pnl_list]
    • Guard: if len(returns) < 2, raise ValueError("Need at least 2 trades")
    • Calculate: mean_return = mean(returns), std_return = stdev(returns)
    • Guard: if std_return == 0, raise ValueError("stdev cannot be zero")
    • Calculate: sharpe = mean_return / std_return
    • Return float
    • Run: pytest test/test_trade_analyzer.py::test_sharpe -v
    • VERIFY: All 3 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 3 min): Break/fix verification

    • Temporarily break: Use population stdev (pstdev) instead of sample stdev
    • Run: pytest test/test_trade_analyzer.py::test_sharpe -v
    • VERIFY: Tests fail on sharpe calculation
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects
    • Input pnl_list not mutated
    • Returns single float value or raises exception
  • Deterministic: Same P&L list → same Sharpe
  • Immutable: Treat input as read-only
  • Referential Transparency: Can replace call with result
  • Do NOT log inside calculation
  • Do NOT annualize Sharpe (return raw value)
  • Do NOT use risk-free rate (assume 0)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 3 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def calculate_sharpe_ratio(pnl_list: list[dict]) -> float:
  • Docstring: Must include "PURE FUNCTION" marker
  • Run: python -c "from src.trade_analyzer import calculate_sharpe_ratio; sharpe = calculate_sharpe_ratio([{'pnl': 10}, {'pnl': 20}, {'pnl': -5}]); print(f'Sharpe: {sharpe:.2f}')"
  • Expected: "Sharpe: N.NN"
  • Run: pytest test/test_trade_analyzer.py::test_sharpe -v
  • Expected: 3 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0

US-REVIEW-S2: Sprint 2 Review 🚧 GATE (~5 min)

Scope: US-005, US-006, US-007

Review Steps:

  • Run: pytest test/test_trade_analyzer.py -v --tb=short
  • Expected: 22+ tests passed (13 from S1 + 3+3+3 from S2)
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: Success
  • Run: python -c "from src.trade_analyzer import calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio; print('imports ok')"
  • Expected: "imports ok"
  • Run: pytest test/test_trade_analyzer.py --cov=src.trade_analyzer --cov-report=term
  • Expected: Coverage > 90%

Linus 5-Layer Analysis (from linus-prompt-code-review.md):

  1. Data Structure Analysis: "Bad programmers worry about code. Good programmers worry about data structures."

    • What is core data? Metrics dict → EV floats, P&L list → Sharpe float
    • Where does data flow? Metrics (from S1) → EV calculations → results
    • Any unnecessary copying or conversion? No - direct calculations on input
  2. Special Case Identification: "Good code has no special cases"

    • Find all if/else branches: Zero checks (avg_loss, stdev), length checks (< 2 trades)
    • Which are real business logic? All are mathematical constraints (division by zero, sample size)
    • Can we redesign data structures to eliminate branches? No - mathematical requirements
  3. Complexity Review: "If > 3 levels of indentation, redesign it"

    • What is the essence? Apply EV formulas to trade statistics
    • How many concepts does solution use? 3 statistical formulas (Standard EV, Kelly, Sharpe)
    • Can we reduce it to half? Already minimal - one function per formula
  4. Destructive Analysis: "Never break userspace" - backward compatibility

    • List all existing functionality that might be affected: S1 functions unchanged
    • Which dependencies will be broken? None - additive changes only
    • How to improve without breaking anything? All new functions, S1 remains stable
  5. Practicality Verification: "Theory and practice sometimes clash. Theory loses."

    • Does this problem really exist in production? Yes - need risk metrics for position sizing
    • How many users encounter this? Single user (manual trading decisions)
    • Does solution complexity match problem severity? Yes - standard financial metrics

Taste Score: Good taste / Acceptable / Garbage

Test File Checks (from linus-prompt-code-review.md):

  • Tests import from src.trade_analyzer (not inline)
  • No production logic in test file
  • Only test_* function names allowed

Functional Programming Verification:

  • Pure functions? All EV functions (calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio) have no side effects?
  • Deterministic? Same inputs → same outputs verified?
  • Immutable? No input mutation verified?
  • Docstrings? All include "PURE FUNCTION" marker?
  • Separation? Pure functions (calculations) maintained from S1?

Cross-Task Checks:

  • Verify patterns consistent across all EV functions (error handling, type hints)
  • Check no orphaned imports or dead code
  • Validate error handling is uniform (ValueError with descriptive messages)

Gate:

  • If issues found: Create fix tasks (US-005a, US-006a, etc.), output <review-issues-found/>
  • If clean: Mark [x], commit "docs: US-REVIEW-S2 complete", output <review-passed/>

Sprint 3: Output & Integration (~35 min)

Priority: HIGH Purpose: Create main analysis function and console output Status: IN PROGRESS (US-008 complete)

  • US-008 Main analysis function with tests (~10 min, ~50 lines) [depends: US-005, US-006, US-007]
  • US-009 Console formatter with tests (~10 min, ~45 lines) [depends: US-008]
  • US-010 Integration tests (~10 min, ~40 lines) [depends: US-008]
  • US-REVIEW-S3 Sprint 3 Review 🚧 GATE (~5 min) [depends: US-008, US-009, US-010]

US-008: Main analysis function with tests (~10 min, ~50 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: analyze_trades(filepath: str) -> dict[str, Any]
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: Orchestrate all functions from S1+S2
  • Target: ~15 lines code + ~35 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 3 min, ~35 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import analyze_trades (will fail initially)
    • Write 2 test cases:
      • test_analyze_trades_full_pipeline - Use sample CSV, verify output keys
      • test_analyze_trades_structure - Verify weekly_data, overall_metrics, ev_metrics keys
    • Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
    • VERIFY: Tests fail (function doesn't exist) - RED phase
  2. GREEN Phase (~50% - 5 min, ~15 lines): Implement function

    • Add analyze_trades() to src/trade_analyzer.py
    • Call: trades = load_trades(filepath)
    • Call: pnls = calculate_pnl(trades)
    • Call: weeks = aggregate_by_week(pnls)
    • Call: overall_metrics = calculate_metrics(pnls)
    • Call: standard_ev = calculate_standard_ev(overall_metrics)
    • Call: kelly = calculate_kelly_criterion(overall_metrics)
    • Call: sharpe = calculate_sharpe_ratio(pnls)
    • Build result dict: {"weekly_data": weeks, "overall_metrics": overall_metrics, "ev_metrics": {"standard_ev": standard_ev, "kelly": kelly, "sharpe": sharpe}}
    • Return dict
    • Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
    • VERIFY: All 2 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 2 min): Break/fix verification

    • Temporarily break: Skip calculate_pnl call
    • Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
    • VERIFY: Tests fail (missing P&L data)
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: NO - this is I/O layer (reads file)
    • File I/O is side effect (acceptable in shell layer)
    • All calculations delegate to pure functions
  • Deterministic: Same file content → same output
  • Immutable: Don't modify intermediate results
  • Separation: I/O (file read) + orchestration, pure logic in S1+S2 functions
  • Do NOT add logging (caller handles output)
  • Do NOT filter or transform data (use raw results)
  • Do NOT catch exceptions (let them propagate)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 2 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def analyze_trades(filepath: str) -> dict[str, Any]:
  • Run: python -c "from src.trade_analyzer import analyze_trades; result = analyze_trades('trade_snapshots.csv'); print(f'Analyzed {len(result[\"weekly_data\"])} weeks')"
  • Expected: "Analyzed N weeks"
  • Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
  • Expected: 2 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0

US-009: Console formatter with tests (~10 min, ~45 lines)

Implementation:

  • File: src/trade_analyzer.py (modify)
  • Function: format_output(analysis: dict[str, Any]) -> str
  • Test additions: test/test_trade_analyzer.py (modify)
  • Reference: Format weekly data + metrics as readable console output
  • Target: ~15 lines code + ~30 lines tests

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 3 min, ~30 lines): Write tests first

    • Add to test/test_trade_analyzer.py
    • Import format_output (will fail initially)
    • Write 2 test cases:
      • test_format_output_contains_headers - Verify "Week", "Trades", "P&L", "Win Rate", "EV"
      • test_format_output_contains_values - Verify actual values appear in output
    • Run: pytest test/test_trade_analyzer.py::test_format_output -v
    • VERIFY: Tests fail (function doesn't exist) - RED phase
  2. GREEN Phase (~50% - 5 min, ~15 lines): Implement function

    • Add format_output() to src/trade_analyzer.py
    • Build string with sections:
      • Header: "Trade Analysis Report"
      • Weekly section: Loop through weekly_data, format as table
      • Metrics section: Display overall_metrics (win_rate, avg_win, avg_loss, total_pnl)
      • EV section: Display ev_metrics (standard_ev, kelly, sharpe)
    • Use f-strings for formatting: {value:.2f}, {percent:.1%}
    • Return formatted string
    • Run: pytest test/test_trade_analyzer.py::test_format_output -v
    • VERIFY: All 2 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 2 min): Break/fix verification

    • Temporarily break: Remove weekly section
    • Run: pytest test/test_trade_analyzer.py::test_format_output -v
    • VERIFY: Tests fail (missing headers)
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • Pure Function: No side effects
    • Input analysis dict not mutated
    • Returns formatted string
  • Deterministic: Same analysis → same output string
  • Immutable: Treat input as read-only
  • Referential Transparency: Can replace call with result
  • Do NOT print (return string, caller prints)
  • Do NOT sort weeks (preserve order from aggregate_by_week)
  • Do NOT add colors/formatting codes (plain text only)

Acceptance Criteria:

  • RED: Tests fail before implementation
  • GREEN: All 2 tests pass
  • VERIFY: Break/fix cycle works
  • Function signature: def format_output(analysis: dict[str, Any]) -> str:
  • Docstring: Must include "PURE FUNCTION" marker
  • Run: python -c "from src.trade_analyzer import analyze_trades, format_output; analysis = analyze_trades('trade_snapshots.csv'); output = format_output(analysis); print(output[:100])"
  • Expected: First 100 chars of report
  • Run: pytest test/test_trade_analyzer.py::test_format_output -v
  • Expected: 2 passed
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0
  • Functional Programming: Verify no side effects:
    • grep -n "print" src/trade_analyzer.py | grep format_output → empty

US-010: Integration tests (~10 min, ~40 lines)

Implementation:

  • File: test/test_trade_analyzer_integration.py (create new)
  • Tests: End-to-end workflow with real CSV
  • Reference: trade_snapshots.csv
  • Target: ~40 lines test code

Approach (TDD RED-GREEN-VERIFY):

  1. RED Phase (~30% - 3 min, ~40 lines): Write tests first

    • Create test/test_trade_analyzer_integration.py
    • Import from src.trade_analyzer
    • Write 2 test cases:
      • test_full_workflow_real_csv - Load → analyze → format → verify output
      • test_all_ev_methods_computed - Verify standard_ev, kelly, sharpe all present
    • Run: pytest test/test_trade_analyzer_integration.py -v
    • VERIFY: Tests should PASS (all code exists from S1+S2+S3) - GREEN already
  2. GREEN Phase (~50% - 5 min): Already implemented

    • All functions exist from previous user stories
    • Integration test verifies they work together
    • Run: pytest test/test_trade_analyzer_integration.py -v
    • VERIFY: All 2 tests pass - GREEN phase
  3. VERIFY Phase (~20% - 2 min): Break/fix verification

    • Temporarily break: Comment out one function (e.g., calculate_pnl)
    • Run: pytest test/test_trade_analyzer_integration.py -v
    • VERIFY: Tests fail (pipeline broken)
    • Revert change
    • VERIFY: Tests pass again

Functional Programming Requirements (CRITICAL):

  • ✅ Integration tests verify pure function composition
  • ✅ No mocks needed (pure functions compose naturally)
  • ✅ Deterministic output verified end-to-end
  • Do NOT use fixtures with side effects
  • Do NOT mock pure functions (test real implementations)
  • Do NOT test implementation details (only inputs/outputs)

Acceptance Criteria:

  • Create test/test_trade_analyzer_integration.py
  • Import from src.trade_analyzer
  • 2 test functions: test_full_workflow_real_csv, test_all_ev_methods_computed
  • Run: pytest test/test_trade_analyzer_integration.py -v
  • Expected: 2 passed
  • Run: pytest test/ -v
  • Expected: 26+ tests total (24 unit + 2 integration), all pass
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: exit 0

US-REVIEW-S3: Sprint 3 Review 🚧 GATE (~5 min)

Scope: US-008, US-009, US-010

Review Steps:

  • Run: pytest test/ -v --tb=short
  • Expected: 26+ tests passed (22 from S1+S2 + 2+2+2 from S3)
  • Run: mypy src/trade_analyzer.py --strict
  • Expected: Success
  • Run: python -c "from src.trade_analyzer import analyze_trades, format_output; print('imports ok')"
  • Expected: "imports ok"
  • Run: pytest test/ --cov=src.trade_analyzer --cov-report=term-missing
  • Expected: Coverage > 90%

Linus 5-Layer Analysis (from linus-prompt-code-review.md):

  1. Data Structure Analysis: "Bad programmers worry about code. Good programmers worry about data structures."

    • What is core data? Analysis result dict with weekly_data, overall_metrics, ev_metrics
    • Where does data flow? File → trades → P&L → aggregation → metrics → formatted output
    • Any unnecessary copying or conversion? No - pipeline flows naturally
  2. Special Case Identification: "Good code has no special cases"

    • Find all if/else branches: None in orchestration layer (delegates to pure functions)
    • Which are real business logic? All logic in S1+S2 pure functions
    • Can we redesign data structures to eliminate branches? Already clean separation
  3. Complexity Review: "If > 3 levels of indentation, redesign it"

    • What is the essence? Orchestrate pure functions into analysis pipeline
    • How many concepts does solution use? 2 (orchestration + formatting)
    • Can we reduce it to half? Already minimal - straight-line calls
  4. Destructive Analysis: "Never break userspace" - backward compatibility

    • List all existing functionality that might be affected: S1+S2 functions unchanged
    • Which dependencies will be broken? None - additive changes only
    • How to improve without breaking anything? All new functions, previous sprints stable
  5. Practicality Verification: "Theory and practice sometimes clash. Theory loses."

    • Does this problem really exist in production? Yes - need complete analysis tool
    • How many users encounter this? Single user (ad-hoc analysis)
    • Does solution complexity match problem severity? Yes - simple orchestration of pure functions

Taste Score: Good taste / Acceptable / Garbage

Test File Checks (from linus-prompt-code-review.md):

  • Tests import from src.trade_analyzer (not inline)
  • No production logic in test files
  • Only test_* function names allowed
  • Integration tests use real data (no mocks)

Functional Programming Verification:

  • Pure functions? Format function is pure (no side effects)?
  • Deterministic? Same inputs → same outputs verified end-to-end?
  • Immutable? No input mutation in orchestration layer?
  • Docstrings? All pure functions include "PURE FUNCTION" marker?
  • Separation? Clear boundary: I/O (analyze_trades) vs Pure (all calculations + format)?

Cross-Task Checks:

  • Verify patterns consistent across all sprint 3 functions
  • Check no orphaned imports or dead code
  • Validate error handling is uniform (propagate exceptions)
  • Integration tests verify full pipeline works

Gate:

  • If issues found: Create fix tasks (US-008a, US-009a, etc.), output <review-issues-found/>
  • If clean: Mark [x], commit "docs: US-REVIEW-S3 complete", output <review-passed/>

  • US-REVIEW-FINAL Final Cross-Sprint Review 🚧 GATE (~10 min) [depends: US-REVIEW-S3]

US-REVIEW-FINAL: Final Cross-Sprint Review 🚧 GATE (~10 min)

Scope: All sprints (US-001 through US-010)

Purpose: Verify cross-sprint consistency and overall quality after all individual sprint reviews have passed.

Review Steps:

  • Run: git log --oneline | head -20
  • Verify all sprint review commits exist (US-REVIEW-S1, US-REVIEW-S2, US-REVIEW-S3)
  • Run: pytest test/ -v
  • Expected: 26+ tests pass, 0 failures
  • Run: mypy src/ --strict
  • Expected: No errors
  • Run: pytest test/ --cov=src --cov-report=term-missing
  • Expected: Coverage >= 90%
  • Run: python -c "from src.trade_analyzer import analyze_trades, format_output; result = analyze_trades('trade_snapshots.csv'); print(format_output(result)[:200])"
  • Expected: Formatted report output (first 200 chars)

Cross-Sprint Consistency Checks:

  • Naming conventions consistent across all sprints?
    • All functions use snake_case
    • All test functions start with test_
  • Error handling patterns uniform?
    • Pure functions raise ValueError with descriptive messages
    • I/O functions propagate exceptions
  • No duplicate code between sprint modules?
    • All functions in single module (src/trade_analyzer.py)
    • No code duplication
  • Import structure clean (no circular dependencies)?
    • Single module, no internal imports
  • All TODOs resolved?
    • Run: grep -r "TODO" src/
    • Expected: Empty

Linus 5-Layer Analysis (Whole Feature):

  1. Data Structure Analysis: Does data flow cleanly across all sprints?

    • CSV → list[dict] (S1) → list[pnl_dict] (S1) → dict[week, list] (S1) → metrics dict (S1) → EV floats (S2) → analysis dict (S3) → formatted string (S3)
    • Flow is linear, no backtracking, each transformation builds on previous
  2. Special Case Identification: Any special cases that could be eliminated?

    • Zero checks in S2 (mathematical constraints, can't eliminate)
    • Side adjustment in S1 (domain logic, can't eliminate)
    • All special cases are legitimate business/math requirements
  3. Complexity Review: Is overall architecture simple and elegant?

    • Yes - pure functional pipeline with clear separation
    • Functional Core (S1 calculations + S2 EV + S3 format) vs Imperative Shell (S3 I/O)
    • No deep nesting, straight-line logic
  4. Destructive Analysis: Does complete feature break existing functionality?

    • No - new module, no existing dependencies
    • All functions composable, no breaking changes possible
  5. Practicality Verification: Does complete solution match problem scope?

    • Yes - lightweight analysis tool for CSV data
    • No over-engineering (no database, no web UI, no async)
    • Appropriate for ad-hoc trading analysis

Taste Score: Good taste / Acceptable / Garbage

Functional Programming Verification (If Applicable):

  • Pure functions consistently used across sprints?
    • S1: calculate_pnl, aggregate_by_week, calculate_metrics (pure)
    • S2: calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio (pure)
    • S3: format_output (pure)
  • Clear separation: Functional Core vs Imperative Shell?
    • Core: All calculation functions (pure, no I/O)
    • Shell: load_trades (I/O), analyze_trades (orchestration + I/O)
  • No side effects leaked into calculation functions?
    • Verified via grep: no print, no logger, no mutations
  • All pure functions marked in docstrings?
    • Checklist: verify "PURE FUNCTION" in all S1/S2/S3 calculation function docstrings

Gate:

  • If issues found: Create fix tasks, output <review-issues-found/>
  • If clean: Mark [x], commit "docs: US-REVIEW-FINAL complete", output <review-passed/>

Non-Goals

  • No database storage (CSV input only, in-memory processing)
  • No web UI or API (console output only)
  • No real-time data streaming (batch CSV analysis)
  • No email validation or market name validation (trust input data)
  • No multi-threaded processing (single-threaded analysis sufficient)
  • No position sizing recommendations (metrics only, not advice)
  • No charting or visualization (text output only)
  • No persistence of analysis results (ephemeral output)

Technical Considerations

  • Python 3.10+ required for dict[str, str | int] type hints
  • Dependencies: csv, datetime, statistics (stdlib only, no external packages)
  • File I/O limited to single CSV read (no writes)
  • Functional programming: Pure functions for all calculations (S1+S2+S3 core), I/O isolated to shell (S3 orchestration)
  • Test coverage target: 90%+
  • All pure functions documented with "PURE FUNCTION" marker in docstrings

Validation Gate

  • All tests pass? pytest test/ -v exits 0
  • Typecheck clean? mypy src/ --strict exits 0
  • Coverage adequate? pytest test/ --cov=src --cov-fail-under=90 exits 0
  • No TODO comments left? grep -r "TODO" src/ returns empty
  • Integration test passes? python -c "from src.trade_analyzer import analyze_trades, format_output; print(format_output(analyze_trades('trade_snapshots.csv')))" runs without error

STOP if any check fails. Fix before proceeding.

# Progress Log
## Learnings
(Patterns discovered during implementation)
---
timestamp event side fill_price fill_size market btc_price deviation book_bid book_ask book_depth asks_10 asks_20 asks_30 bids_60 bids_70 bids_80
2025-01-06 10:00:00 ENTRY YES 21.0 50 MKT-A 95000.00 -1.50 21 48 2500 0 0 2500 0 0 0
2025-01-06 10:15:00 EXIT YES 56.0 50 MKT-A 95200.00 -1.20 54 58 2200 0 0 2200 0 0 0
2025-01-06 11:00:00 ENTRY YES 45.0 30 MKT-B 95100.00 -1.30 44 47 1800 0 0 1800 0 0 0
2025-01-06 11:30:00 EXIT YES 32.0 30 MKT-B 94800.00 -1.60 30 34 1900 0 0 1900 0 0 0
2025-01-07 09:00:00 ENTRY NO 72.0 40 MKT-C 96000.00 0.50 70 74 2100 0 0 2100 0 0 0
2025-01-07 09:20:00 EXIT NO 41.0 40 MKT-C 96500.00 1.00 39 43 2300 0 0 2300 0 0 0
2025-01-07 14:00:00 ENTRY YES 38.0 25 MKT-A 95500.00 -0.80 36 40 2000 0 0 2000 0 0 0
2025-01-07 14:45:00 EXIT YES 65.0 25 MKT-A 95800.00 -0.50 63 67 1800 0 0 1800 0 0 0
2025-01-08 10:00:00 ENTRY YES 55.0 35 MKT-B 94500.00 -2.00 53 57 2400 0 0 2400 0 0 0
2025-01-08 10:30:00 EXIT YES 71.0 35 MKT-B 94900.00 -1.50 69 73 2200 0 0 2200 0 0 0
2025-01-13 09:00:00 ENTRY YES 25.0 60 MKT-A 97000.00 1.00 23 27 2600 0 0 2600 0 0 0
2025-01-13 09:30:00 EXIT YES 48.0 60 MKT-A 97200.00 1.20 46 50 2400 0 0 2400 0 0 0
2025-01-14 11:00:00 ENTRY NO 80.0 20 MKT-C 98000.00 2.00 78 82 1500 0 0 1500 0 0 0
2025-01-14 11:20:00 EXIT NO 55.0 20 MKT-C 97500.00 1.50 53 57 1700 0 0 1700 0 0 0
2025-01-15 10:00:00 ENTRY YES 30.0 45 MKT-B 96500.00 0.50 28 32 2100 0 0 2100 0 0 0
2025-01-15 10:15:00 EXIT YES 18.0 45 MKT-B 96200.00 0.20 16 20 2300 0 0 2300 0 0 0
2025-01-15 14:00:00 ENTRY YES 42.0 30 MKT-A 96800.00 0.80 40 44 2000 0 0 2000 0 0 0
2025-01-15 14:30:00 EXIT YES 67.0 30 MKT-A 97100.00 1.10 65 69 1800 0 0 1800 0 0 0
2025-01-16 09:00:00 ENTRY YES 50.0 25 MKT-B 97500.00 1.50 48 52 2200 0 0 2200 0 0 0
2025-01-16 09:45:00 ENTRY YES 33.0 40 MKT-C 97300.00 1.30 31 35 2000 0 0 2000 0 0 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment