williamp44/PRD_TRADE_ANALYZER.md

## feedback.md

      
    Raw
  

              feedback.md
            
          
    Linus Torvalds Code Review: Trade Analyzer

"Bad programmers worry about the code. Good programmers worry about data structures."

Taste Score: Acceptable (but with one piece of garbage hiding in plain sight)

The overall architecture is clean - pure functional pipeline, good I/O separation, sensible decomposition. But I found issues that range from "annoying" to "this will bite you."

Layer 1: Data Structure Analysis

The Core Problem: dict[str, Any] Everywhere
You pass dict[str, Any] through the entire pipeline. No TypedDict, no dataclass, no NamedTuple. This means:

Zero compile-time key validation. Misspell 'entry_price' as 'enrty_price' and you get a runtime KeyError
No IDE autocompletion on any of these dicts
mypy --strict passes but catches nothing about your actual data shape

This is the kind of thing where mypy gives you a false sense of security. You have type safety on the container (dict) but zero safety on the content.
Verdict: Not a showstopper for a single-user analysis script, but it's a design smell. TypedDict would cost you 10 lines and buy you real compile-time guarantees.

Layer 2: Special Case Identification - FATAL ISSUE

calculate_pnl() at src/trade_analyzer.py:82-113 - This is the worst function in the codebase.
for market, market_trades in groupby(
    sorted(trades, key=lambda t: t['market']), key=lambda t: t['market']
):
    entries = [t for t in market_trades if t['event'] == 'ENTRY']

    # Re-fetch market trades for exits (groupby iterator is consumed)
    all_market_trades = [t for t in trades if t['market'] == market]
    exits = [t for t in all_market_trades if t['event'] == 'EXIT']
That comment on line 88 - "Re-fetch market trades for exits (groupby iterator is consumed)" - is a confession, not documentation. The developer knew the groupby iterator was exhausted after building entries, and instead of fixing the data structure, they patched around it with an O(n) full rescan of all trades for every market group.
This is O(n*m) where m = number of markets. A defaultdict(list) gives you O(n) in a single pass with zero special cases:
by_market: dict[str, list] = {}
for t in trades:
    by_market.setdefault(t['market'], []).append(t)
for market, market_trades in by_market.items():
    entries = [t for t in market_trades if t['event'] == 'ENTRY']
    exits = [t for t in market_trades if t['event'] == 'EXIT']
    # ... pair them
No iterator exhaustion. No re-scan. No comment needed to explain why you're doing something stupid.
This is a textbook example of patching bad design instead of fixing it.

Layer 3: Complexity Review

The good news: Most functions are short, single-purpose, max 2 levels of indentation. format_output, calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio - these are all clean, minimal, good taste.
load_trades line 42-47: The numeric_fields list is recreated inside the loop on every row. Move it outside:
NUMERIC_FIELDS = [
    'fill_price', 'fill_size', 'btc_price', ...
]
# then in the loop:
for field in NUMERIC_FIELDS:
Minor, but it shows carelessness about what runs per-iteration vs. once.

Layer 4: Destructive Analysis - Missing Side Adjustment

The PRD at line 150 says:

"Side adjustment: YES = normal, NO = inverted (100 - price)"

The implementation completely ignores this. calculate_pnl does (exit_price - entry_price) * size / 100 regardless of whether side is YES or NO. The tests only test YES-side trades.
If this is a binary options platform (which the YES/NO nomenclature strongly suggests), a NO-side trade pays out (100 - price). Calculating P&L without this adjustment means your P&L numbers for NO-side trades are wrong.
Either:

The PRD requirement is real and the code is buggy, or
The PRD requirement is wrong and should be removed

Either way, there's a discrepancy between spec and implementation that nobody caught because the tests don't exercise it.

Layer 5: Practicality Verification

Test Quality Issues:


test_load_trades_row_count (line 45-51): Loads the CSV twice and checks len(trades) == len(trades2). A parser that always returns [] passes this test. This test verifies determinism, not correctness. It should assert against a known count.


Uncovered lines 96, 160, 251 - All three are error/edge-case paths:

Line 96: break when more exits than entries (untested)
Line 160: ZeroDivisionError for empty pnl_list (untested)
Line 251: ValueError for zero stdev (untested)

97% coverage sounds great, but the 3% you missed is all edge-case handling - the exact paths where bugs hide.


Kelly test docstring lies (line 376): test_kelly_positive says -> kelly=0.4 but the actual assertion is assert kelly == pytest.approx(-0.2). The comment in the test body explains the confusion, but the docstring still says 0.4. This is sloppy.


Key Insights


Dimension
Finding


Data Structure
dict[str, Any] provides zero type safety on data shape. TypedDict is the obvious fix.


Complexity
calculate_pnl uses groupby + re-scan instead of a simple dict. Remove groupby entirely.


Risk Points
Side adjustment (YES/NO) from PRD is unimplemented. P&L for NO trades may be wrong.


Test Gaps
Edge cases (empty list, excess exits, zero stdev) are untested. One test name is misleading.


Linus-style Verdict

Worth doing? Yes. The architecture is fundamentally sound - pure functional core, I/O at the boundary, linear pipeline. The person who wrote this understands separation of concerns.
But: The calculate_pnl function needs to be rewritten. Not refactored - rewritten. Drop groupby, use a simple dict, eliminate the re-scan hack. And figure out whether the side-adjustment requirement from the PRD is real. If it is, this is a correctness bug, not a style issue.
The rest - TypedDicts, constant extraction, test hardening - those are "do it when you touch the file" improvements. The groupby mess and the side-adjustment gap are "fix it before you trust the output" issues.

  
## gistfile1.txt
just for grins i ran the steps to build the example prd, then did a code review of what was built using this prompt;

use @linus-prompt-code-review.md ot reivew the code build per
  progress_trade_analyzer.txt
  PRD_TRADE_ANALYZER.md

## PRD_TRADE_ANALYZER.md

      
    Raw
  

              PRD_TRADE_ANALYZER.md
            
          
    PRD: Trade Analyzer

Introduction

Analyze trading snapshots from CSV to calculate P&L, win rate, and expected value metrics (Standard EV, Kelly Criterion, Sharpe Ratio) aggregated by ISO week. Pure functional core for calculations with console output.
Goals


Load trade snapshots from CSV with type validation
Calculate realized P&L by pairing ENTRY/EXIT events
Aggregate trades by ISO week (Monday start)
Calculate win rate, average win/loss, and risk metrics
Output weekly analysis to console in readable format


Task Summary

Total Tasks: 14 (10 implementation + 4 reviews)
Estimated Time: ~155 min (~2.6 hours)
Progress: 0/14 complete (0%)
Status: NOT STARTED
Next Task: US-001
Task Dependencies


      graph TD
    US001[US-001: CSV loader] --> US002[US-002: P&L calc]
    US001 --> US003[US-003: Weekly agg]
    US001 --> US004[US-004: Win rate]
    US002 --> REVIEW1[🚧 US-REVIEW-S1]
    US003 --> REVIEW1
    US004 --> REVIEW1
    REVIEW1 --> US005[US-005: Standard EV]
    REVIEW1 --> US006[US-006: Kelly]
    REVIEW1 --> US007[US-007: Sharpe]
    US005 --> REVIEW2[🚧 US-REVIEW-S2]
    US006 --> REVIEW2
    US007 --> REVIEW2
    REVIEW2 --> US008[US-008: Main analysis]
    REVIEW2 --> US009[US-009: Console formatter]
    REVIEW2 --> US010[US-010: Integration tests]
    US008 --> REVIEW3[🚧 US-REVIEW-S3]
    US009 --> REVIEW3
    US010 --> REVIEW3
    REVIEW3 --> REVIEWF[🚧 US-REVIEW-FINAL]

    
      Loading

  
Sprint 1: Data Loading & P&L Calculation (~60 min)

Priority: HIGH
Purpose: Load CSV data and calculate realized P&L per trade
Status: NOT STARTED

 US-001 CSV loader with tests (~15 min, ~80 lines)
 US-002 P&L calculator with tests (~15 min, ~85 lines) [depends: US-001]
 US-003 Weekly aggregator with tests (~15 min, ~90 lines) [depends: US-001]
 US-004 Win rate calculator with tests (~10 min, ~45 lines) [depends: US-002]
 US-REVIEW-S1 Sprint 1 Review 🚧 GATE (~5 min) [depends: US-002, US-003, US-004]


US-001: CSV loader with tests (~15 min, ~80 lines)

Implementation:

File: src/trade_analyzer.py (create new)
Function: load_trades(filepath: str) -> list[dict[str, Any]]
Test File: test/test_trade_analyzer.py (create new)
Reference: trade_snapshots.csv (place CSV in project root)
Target: ~20 lines code + ~60 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 5 min, ~60 lines): Write tests first

Create test/test_trade_analyzer.py
Import from src.trade_analyzer (will fail - module doesn't exist yet)
Write 3 test cases:

test_load_trades_valid_csv - Verify loads sample CSV
test_load_trades_converts_types - Check timestamp→datetime, prices→float
test_load_trades_row_count - Verify row count matches file


Run: pytest test/test_trade_analyzer.py -v
VERIFY: Tests fail with ImportError (function doesn't exist) - this is RED phase


GREEN Phase (~50% - 7 min, ~20 lines): Implement function

Create src/trade_analyzer.py
Use csv.DictReader to read CSV
Parse timestamp with datetime.fromisoformat()
Convert numeric fields: fill_price, fill_size, btc_price, deviation, book_bid, book_ask, book_depth
Convert asks/bids fields: asks_10, asks_20, asks_30, bids_60, bids_70, bids_80
Return list of dicts with converted types
Run: pytest test/test_trade_analyzer.py -v
VERIFY: All 3 tests pass - this is GREEN phase


VERIFY Phase (~20% - 3 min): Break/fix verification

Temporarily break code: Comment out timestamp parsing
Run: pytest test/test_trade_analyzer.py -v
VERIFY: Test fails on datetime conversion
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects in parsing logic

File I/O happens in I/O layer (load_trades does file read)
Parsing logic extracted to pure functions


✅ Deterministic: Same CSV content → same output always
✅ Immutable: Don't modify input data
✅ Referential Transparency: Can cache results
Do NOT log inside parsing functions
Do NOT validate market names (out of scope)
Do NOT handle malformed CSV (let csv.DictReader raise)

Acceptance Criteria:

RED: Tests written and verified failing (ImportError)
GREEN: Tests pass after implementation
VERIFY: Break/fix cycle confirms tests catch bugs
Create src/trade_analyzer.py
Function signature: def load_trades(filepath: str) -> list[dict[str, Any]]:
Run: pytest test/test_trade_analyzer.py::test_load_trades_valid_csv -v
Expected: Pass, returns list of dicts
Run: python -c "from src.trade_analyzer import load_trades; trades = load_trades('trade_snapshots.csv'); print(f'Loaded {len(trades)} trades')"
Expected: "Loaded N trades"
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0

US-002: P&L calculator with tests (~15 min, ~85 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: calculate_pnl(trades: list[dict]) -> list[dict]
Test additions: test/test_trade_analyzer.py (modify)
Reference: Pair ENTRY/EXIT events by market, calculate P&L = (exit_price - entry_price) * size
Target: ~25 lines code + ~60 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 5 min, ~60 lines): Write tests first

Add to test/test_trade_analyzer.py
Import calculate_pnl (will fail initially)
Write 4 test cases:

test_calculate_pnl_winning_trade - YES ENTRY@21 EXIT@56 = +$35 profit
test_calculate_pnl_losing_trade - YES ENTRY@56 EXIT@21 = -$35 loss
test_calculate_pnl_multiple_markets - Verify independent P&L per market
test_calculate_pnl_unpaired_entry - ENTRY without EXIT = None (ignore)


Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
VERIFY: Tests fail (function doesn't exist or not imported) - RED phase


GREEN Phase (~50% - 8 min, ~25 lines): Implement function

Add calculate_pnl() to src/trade_analyzer.py
Group trades by market field
For each market, find ENTRY/EXIT pairs (chronological order)
Calculate P&L: (exit_fill_price - entry_fill_price) * fill_size / 100 (cents to dollars)
Side adjustment: YES = normal, NO = inverted (100 - price)
Return list with: {market, entry_time, exit_time, side, entry_price, exit_price, size, pnl}
Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
VERIFY: All 4 tests pass - GREEN phase


VERIFY Phase (~20% - 2 min): Break/fix verification

Temporarily break: Remove side adjustment logic
Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
VERIFY: Tests fail on NO side calculations
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects

Input trades list not mutated
Returns new list of P&L dicts


✅ Deterministic: Same trades → same P&L
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT modify trades list
Do NOT handle partial fills (assume all fills complete)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 4 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_pnl(trades: list[dict]) -> list[dict]:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); print(f'{len(pnls)} completed trades')"
Expected: "N completed trades"
Run: pytest test/test_trade_analyzer.py::test_calculate_pnl -v
Expected: 4 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:

grep -n "logger\." src/trade_analyzer.py | grep calculate_pnl → empty
grep -n "print" src/trade_analyzer.py | grep calculate_pnl → empty


US-003: Weekly aggregator with tests (~15 min, ~90 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: aggregate_by_week(pnl_list: list[dict]) -> dict[str, list[dict]]
Test additions: test/test_trade_analyzer.py (modify)
Reference: ISO week format (Monday start)
Target: ~20 lines code + ~70 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 5 min, ~70 lines): Write tests first

Add to test/test_trade_analyzer.py
Import aggregate_by_week (will fail initially)
Write 4 test cases:

test_aggregate_by_week_single_week - All trades in one week
test_aggregate_by_week_multiple_weeks - Trades span 3 weeks
test_aggregate_by_week_iso_format - Verify "2025-W01" format
test_aggregate_by_week_monday_boundary - Monday trade starts new week


Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
VERIFY: Tests fail (function doesn't exist) - RED phase


GREEN Phase (~50% - 8 min, ~20 lines): Implement function

Add aggregate_by_week() to src/trade_analyzer.py
Use entry_time.isocalendar() to get ISO week
Format as "YYYY-WNN" (e.g., "2025-W01")
Group P&L dicts by week string
Return dict: {"2025-W01": [pnl1, pnl2], "2025-W02": [pnl3]}
Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
VERIFY: All 4 tests pass - GREEN phase


VERIFY Phase (~20% - 2 min): Break/fix verification

Temporarily break: Use Sunday week start instead of Monday
Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
VERIFY: test_aggregate_by_week_monday_boundary fails
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects

Input pnl_list not mutated
Returns new dict structure


✅ Deterministic: Same P&L list → same weekly grouping
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can cache results
Do NOT log inside aggregation
Do NOT sort weeks (dict preserves insertion order in Python 3.7+)
Do NOT filter weeks (include all)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 4 tests pass
VERIFY: Break/fix cycle works
Function signature: def aggregate_by_week(pnl_list: list[dict]) -> dict[str, list[dict]]:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, aggregate_by_week; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); weeks = aggregate_by_week(pnls); print(f'{len(weeks)} weeks')"
Expected: "N weeks"
Run: pytest test/test_trade_analyzer.py::test_aggregate_by_week -v
Expected: 4 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:

grep -n "pnl_list\.append\|pnl_list\.sort" src/trade_analyzer.py → empty (no mutation)


US-004: Win rate calculator with tests (~10 min, ~45 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: calculate_metrics(pnl_list: list[dict]) -> dict[str, float]
Test additions: test/test_trade_analyzer.py (modify)
Reference: Win rate = wins / total, avg_win = mean(wins), avg_loss = mean(losses)
Target: ~15 lines code + ~30 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 3 min, ~30 lines): Write tests first

Add to test/test_trade_analyzer.py
Import calculate_metrics (will fail initially)
Write 2 test cases:

test_calculate_metrics_mixed_trades - 3 wins (+10, +20, +5), 2 losses (-15, -5) → win_rate=0.6
test_calculate_metrics_all_wins - 5 wins → win_rate=1.0


Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
VERIFY: Tests fail (function doesn't exist) - RED phase


GREEN Phase (~50% - 5 min, ~15 lines): Implement function

Add calculate_metrics() to src/trade_analyzer.py
Separate P&L list into wins (pnl > 0) and losses (pnl <= 0)
Calculate: win_rate = len(wins) / len(pnl_list)
Calculate: avg_win = sum(wins) / len(wins) if wins else 0
Calculate: avg_loss = sum(losses) / len(losses) if losses else 0
Return: {"win_rate": float, "avg_win": float, "avg_loss": float, "total_pnl": float}
Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
VERIFY: All 2 tests pass - GREEN phase


VERIFY Phase (~20% - 2 min): Break/fix verification

Temporarily break: Count breakeven (pnl=0) as win
Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
VERIFY: Tests fail on win rate calculation
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects

Input pnl_list not mutated
Returns new dict with metrics


✅ Deterministic: Same P&L list → same metrics
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT round values (caller handles formatting)
Do NOT handle empty list (let division raise ZeroDivisionError)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 2 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_metrics(pnl_list: list[dict]) -> dict[str, float]:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, calculate_metrics; trades = load_trades('trade_snapshots.csv'); pnls = calculate_pnl(trades); metrics = calculate_metrics(pnls); print(f'Win rate: {metrics[\"win_rate\"]:.1%}')"
Expected: "Win rate: N.N%"
Run: pytest test/test_trade_analyzer.py::test_calculate_metrics -v
Expected: 2 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0

US-REVIEW-S1: Sprint 1 Review 🚧 GATE (~5 min)

Scope: US-001, US-002, US-003, US-004
Review Steps:

Run: pytest test/test_trade_analyzer.py -v --tb=short
Expected: 13+ tests passed (3+4+4+2)
Run: mypy src/trade_analyzer.py --strict
Expected: Success
Run: python -c "from src.trade_analyzer import load_trades, calculate_pnl, aggregate_by_week, calculate_metrics; print('imports ok')"
Expected: "imports ok"
Run: pytest test/test_trade_analyzer.py --cov=src.trade_analyzer --cov-report=term
Expected: Coverage > 90%

Linus 5-Layer Analysis (from linus-prompt-code-review.md):


Data Structure Analysis: "Bad programmers worry about code. Good programmers worry about data structures."

What is core data? CSV rows → dicts → P&L dicts → weekly aggregation
Where does data flow? File → list[dict] → list[pnl_dict] → dict[week, list[pnl_dict]]
Any unnecessary copying or conversion? Single-pass where possible, transformations create new structures (immutable)


Special Case Identification: "Good code has no special cases"

Find all if/else branches: Empty list checks, win/loss separation, side adjustment (YES/NO)
Which are real business logic? Side adjustment (YES/NO inversion), win/loss classification
Can we redesign data structures to eliminate branches? Side adjustment is domain logic (can't eliminate)


Complexity Review: "If > 3 levels of indentation, redesign it"

What is the essence? Transform CSV rows → P&L calculations → weekly metrics
How many concepts does solution use? 4 transformations (load → calculate → aggregate → metrics)
Can we reduce it to half? Already minimal - each function has single responsibility


Destructive Analysis: "Never break userspace" - backward compatibility

List all existing functionality that might be affected: None (new module)
Which dependencies will be broken? None
How to improve without breaking anything? N/A - greenfield


Practicality Verification: "Theory and practice sometimes clash. Theory loses."

Does this problem really exist in production? Yes - need to analyze trade performance
How many users encounter this? Single user (manual trading analysis)
Does solution complexity match problem severity? Yes - straightforward CSV analysis with pure functions


Taste Score: Good taste / Acceptable / Garbage
Test File Checks (from linus-prompt-code-review.md):

Tests import from src.trade_analyzer (not inline)
No production logic in test file
Only test_* function names allowed

Functional Programming Verification:

Pure functions? All calculation functions (calculate_pnl, aggregate_by_week, calculate_metrics) have no side effects?
Deterministic? Same inputs → same outputs verified?
Immutable? No input mutation verified?
Docstrings? All include "PURE FUNCTION" marker?
Separation? Pure functions (calculations) vs I/O (load_trades)?

Cross-Task Checks:

Verify patterns consistent across all functions (type hints, docstrings)
Check no orphaned imports or dead code
Validate error handling is uniform (let exceptions propagate)

Gate:

If issues found: Create fix tasks (US-001a, US-002a, etc.), output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-S1 complete", output <review-passed/>


Sprint 2: Expected Value Calculations (~50 min)

Priority: HIGH
Purpose: Calculate EV metrics (Standard, Kelly, Sharpe)
Status: NOT STARTED

 US-005 Standard EV with tests (~15 min, ~60 lines) [depends: US-004]
 US-006 Kelly Criterion with tests (~15 min, ~60 lines) [depends: US-004]
 US-007 Sharpe ratio with tests (~15 min, ~60 lines) [depends: US-002]
 US-REVIEW-S2 Sprint 2 Review 🚧 GATE (~5 min) [depends: US-005, US-006, US-007]


US-005: Standard EV with tests (~15 min, ~60 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: calculate_standard_ev(metrics: dict[str, float]) -> float
Test additions: test/test_trade_analyzer.py (modify)
Reference: EV = (win_rate × avg_win) + ((1 - win_rate) × avg_loss)
Target: ~10 lines code + ~50 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 5 min, ~50 lines): Write tests first

Add to test/test_trade_analyzer.py
Import calculate_standard_ev (will fail initially)
Write 3 test cases:

test_standard_ev_positive - win_rate=0.6, avg_win=20, avg_loss=-10 → EV=8.0
test_standard_ev_negative - win_rate=0.4, avg_win=10, avg_loss=-15 → EV=-5.0
test_standard_ev_breakeven - win_rate=0.5, avg_win=10, avg_loss=-10 → EV=0.0


Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
VERIFY: Tests fail (function doesn't exist) - RED phase


GREEN Phase (~50% - 7 min, ~10 lines): Implement function

Add calculate_standard_ev() to src/trade_analyzer.py
Extract from metrics: win_rate, avg_win, avg_loss
Calculate: ev = (win_rate * avg_win) + ((1 - win_rate) * avg_loss)
Return float
Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
VERIFY: All 3 tests pass - GREEN phase


VERIFY Phase (~20% - 3 min): Break/fix verification

Temporarily break: Forget to multiply by (1 - win_rate)
Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
VERIFY: Tests fail on EV calculation
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects

Input metrics dict not mutated
Returns single float value


✅ Deterministic: Same metrics → same EV
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT validate metrics keys (assume caller provides correct format)
Do NOT handle divide-by-zero (assume metrics valid)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 3 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_standard_ev(metrics: dict[str, float]) -> float:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import calculate_standard_ev; ev = calculate_standard_ev({'win_rate': 0.6, 'avg_win': 20, 'avg_loss': -10}); print(f'EV: ${ev:.2f}')"
Expected: "EV: $8.00"
Run: pytest test/test_trade_analyzer.py::test_standard_ev -v
Expected: 3 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:

grep -n "metrics\[" src/trade_analyzer.py | grep "=" → only reads, no writes


US-006: Kelly Criterion with tests (~15 min, ~60 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: calculate_kelly_criterion(metrics: dict[str, float]) -> float
Test additions: test/test_trade_analyzer.py (modify)
Reference: Kelly = (win_rate × |avg_loss|) - ((1 - win_rate) × avg_win) / |avg_loss|
Target: ~10 lines code + ~50 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 5 min, ~50 lines): Write tests first

Add to test/test_trade_analyzer.py
Import calculate_kelly_criterion (will fail initially)
Write 3 test cases:

test_kelly_positive - win_rate=0.6, avg_win=20, avg_loss=-10 → kelly=0.4
test_kelly_negative - win_rate=0.3, avg_win=10, avg_loss=-15 → kelly<0
test_kelly_zero_loss - avg_loss=0 → raise ValueError


Run: pytest test/test_trade_analyzer.py::test_kelly -v
VERIFY: Tests fail (function doesn't exist) - RED phase


GREEN Phase (~50% - 7 min, ~10 lines): Implement function

Add calculate_kelly_criterion() to src/trade_analyzer.py
Extract from metrics: win_rate, avg_win, avg_loss
Guard: if avg_loss == 0, raise ValueError("avg_loss cannot be zero")
Calculate: kelly = ((win_rate * abs(avg_loss)) - ((1 - win_rate) * avg_win)) / abs(avg_loss)
Return float
Run: pytest test/test_trade_analyzer.py::test_kelly -v
VERIFY: All 3 tests pass - GREEN phase


VERIFY Phase (~20% - 3 min): Break/fix verification

Temporarily break: Remove abs() around avg_loss
Run: pytest test/test_trade_analyzer.py::test_kelly -v
VERIFY: Tests fail on kelly calculation
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects

Input metrics dict not mutated
Returns single float value or raises exception


✅ Deterministic: Same metrics → same Kelly
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT clamp Kelly to [0, 1] (return raw value)
Do NOT handle missing keys (let KeyError propagate)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 3 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_kelly_criterion(metrics: dict[str, float]) -> float:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import calculate_kelly_criterion; kelly = calculate_kelly_criterion({'win_rate': 0.6, 'avg_win': 20, 'avg_loss': -10}); print(f'Kelly: {kelly:.2%}')"
Expected: "Kelly: 40.00%"
Run: pytest test/test_trade_analyzer.py::test_kelly -v
Expected: 3 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0

US-007: Sharpe ratio with tests (~15 min, ~60 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: calculate_sharpe_ratio(pnl_list: list[dict]) -> float
Test additions: test/test_trade_analyzer.py (modify)
Reference: Sharpe = mean(returns) / stdev(returns), assume risk-free rate = 0
Target: ~10 lines code + ~50 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 5 min, ~50 lines): Write tests first

Add to test/test_trade_analyzer.py
Import calculate_sharpe_ratio (will fail initially)
Write 3 test cases:

test_sharpe_positive - Consistent positive returns → sharpe > 0
test_sharpe_volatile - High volatility → low sharpe
test_sharpe_single_trade - stdev=0 → raise ValueError


Run: pytest test/test_trade_analyzer.py::test_sharpe -v
VERIFY: Tests fail (function doesn't exist) - RED phase


GREEN Phase (~50% - 7 min, ~10 lines): Implement function

Add calculate_sharpe_ratio() to src/trade_analyzer.py
Import statistics.mean, statistics.stdev
Extract P&L values: returns = [trade['pnl'] for trade in pnl_list]
Guard: if len(returns) < 2, raise ValueError("Need at least 2 trades")
Calculate: mean_return = mean(returns), std_return = stdev(returns)
Guard: if std_return == 0, raise ValueError("stdev cannot be zero")
Calculate: sharpe = mean_return / std_return
Return float
Run: pytest test/test_trade_analyzer.py::test_sharpe -v
VERIFY: All 3 tests pass - GREEN phase


VERIFY Phase (~20% - 3 min): Break/fix verification

Temporarily break: Use population stdev (pstdev) instead of sample stdev
Run: pytest test/test_trade_analyzer.py::test_sharpe -v
VERIFY: Tests fail on sharpe calculation
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects

Input pnl_list not mutated
Returns single float value or raises exception


✅ Deterministic: Same P&L list → same Sharpe
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT log inside calculation
Do NOT annualize Sharpe (return raw value)
Do NOT use risk-free rate (assume 0)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 3 tests pass
VERIFY: Break/fix cycle works
Function signature: def calculate_sharpe_ratio(pnl_list: list[dict]) -> float:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import calculate_sharpe_ratio; sharpe = calculate_sharpe_ratio([{'pnl': 10}, {'pnl': 20}, {'pnl': -5}]); print(f'Sharpe: {sharpe:.2f}')"
Expected: "Sharpe: N.NN"
Run: pytest test/test_trade_analyzer.py::test_sharpe -v
Expected: 3 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0

US-REVIEW-S2: Sprint 2 Review 🚧 GATE (~5 min)

Scope: US-005, US-006, US-007
Review Steps:

Run: pytest test/test_trade_analyzer.py -v --tb=short
Expected: 22+ tests passed (13 from S1 + 3+3+3 from S2)
Run: mypy src/trade_analyzer.py --strict
Expected: Success
Run: python -c "from src.trade_analyzer import calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio; print('imports ok')"
Expected: "imports ok"
Run: pytest test/test_trade_analyzer.py --cov=src.trade_analyzer --cov-report=term
Expected: Coverage > 90%

Linus 5-Layer Analysis (from linus-prompt-code-review.md):


Data Structure Analysis: "Bad programmers worry about code. Good programmers worry about data structures."

What is core data? Metrics dict → EV floats, P&L list → Sharpe float
Where does data flow? Metrics (from S1) → EV calculations → results
Any unnecessary copying or conversion? No - direct calculations on input


Special Case Identification: "Good code has no special cases"

Find all if/else branches: Zero checks (avg_loss, stdev), length checks (< 2 trades)
Which are real business logic? All are mathematical constraints (division by zero, sample size)
Can we redesign data structures to eliminate branches? No - mathematical requirements


Complexity Review: "If > 3 levels of indentation, redesign it"

What is the essence? Apply EV formulas to trade statistics
How many concepts does solution use? 3 statistical formulas (Standard EV, Kelly, Sharpe)
Can we reduce it to half? Already minimal - one function per formula


Destructive Analysis: "Never break userspace" - backward compatibility

List all existing functionality that might be affected: S1 functions unchanged
Which dependencies will be broken? None - additive changes only
How to improve without breaking anything? All new functions, S1 remains stable


Practicality Verification: "Theory and practice sometimes clash. Theory loses."

Does this problem really exist in production? Yes - need risk metrics for position sizing
How many users encounter this? Single user (manual trading decisions)
Does solution complexity match problem severity? Yes - standard financial metrics


Taste Score: Good taste / Acceptable / Garbage
Test File Checks (from linus-prompt-code-review.md):

Tests import from src.trade_analyzer (not inline)
No production logic in test file
Only test_* function names allowed

Functional Programming Verification:

Pure functions? All EV functions (calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio) have no side effects?
Deterministic? Same inputs → same outputs verified?
Immutable? No input mutation verified?
Docstrings? All include "PURE FUNCTION" marker?
Separation? Pure functions (calculations) maintained from S1?

Cross-Task Checks:

Verify patterns consistent across all EV functions (error handling, type hints)
Check no orphaned imports or dead code
Validate error handling is uniform (ValueError with descriptive messages)

Gate:

If issues found: Create fix tasks (US-005a, US-006a, etc.), output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-S2 complete", output <review-passed/>


Sprint 3: Output & Integration (~35 min)

Priority: HIGH
Purpose: Create main analysis function and console output
Status: IN PROGRESS (US-008 complete)

 US-008 Main analysis function with tests (~10 min, ~50 lines) [depends: US-005, US-006, US-007]
 US-009 Console formatter with tests (~10 min, ~45 lines) [depends: US-008]
 US-010 Integration tests (~10 min, ~40 lines) [depends: US-008]
 US-REVIEW-S3 Sprint 3 Review 🚧 GATE (~5 min) [depends: US-008, US-009, US-010]


US-008: Main analysis function with tests (~10 min, ~50 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: analyze_trades(filepath: str) -> dict[str, Any]
Test additions: test/test_trade_analyzer.py (modify)
Reference: Orchestrate all functions from S1+S2
Target: ~15 lines code + ~35 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 3 min, ~35 lines): Write tests first

Add to test/test_trade_analyzer.py
Import analyze_trades (will fail initially)
Write 2 test cases:

test_analyze_trades_full_pipeline - Use sample CSV, verify output keys
test_analyze_trades_structure - Verify weekly_data, overall_metrics, ev_metrics keys


Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
VERIFY: Tests fail (function doesn't exist) - RED phase


GREEN Phase (~50% - 5 min, ~15 lines): Implement function

Add analyze_trades() to src/trade_analyzer.py
Call: trades = load_trades(filepath)
Call: pnls = calculate_pnl(trades)
Call: weeks = aggregate_by_week(pnls)
Call: overall_metrics = calculate_metrics(pnls)
Call: standard_ev = calculate_standard_ev(overall_metrics)
Call: kelly = calculate_kelly_criterion(overall_metrics)
Call: sharpe = calculate_sharpe_ratio(pnls)
Build result dict: {"weekly_data": weeks, "overall_metrics": overall_metrics, "ev_metrics": {"standard_ev": standard_ev, "kelly": kelly, "sharpe": sharpe}}
Return dict
Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
VERIFY: All 2 tests pass - GREEN phase


VERIFY Phase (~20% - 2 min): Break/fix verification

Temporarily break: Skip calculate_pnl call
Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
VERIFY: Tests fail (missing P&L data)
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: NO - this is I/O layer (reads file)

File I/O is side effect (acceptable in shell layer)
All calculations delegate to pure functions


✅ Deterministic: Same file content → same output
✅ Immutable: Don't modify intermediate results
✅ Separation: I/O (file read) + orchestration, pure logic in S1+S2 functions
Do NOT add logging (caller handles output)
Do NOT filter or transform data (use raw results)
Do NOT catch exceptions (let them propagate)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 2 tests pass
VERIFY: Break/fix cycle works
Function signature: def analyze_trades(filepath: str) -> dict[str, Any]:
Run: python -c "from src.trade_analyzer import analyze_trades; result = analyze_trades('trade_snapshots.csv'); print(f'Analyzed {len(result[\"weekly_data\"])} weeks')"
Expected: "Analyzed N weeks"
Run: pytest test/test_trade_analyzer.py::test_analyze_trades -v
Expected: 2 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0

US-009: Console formatter with tests (~10 min, ~45 lines)

Implementation:

File: src/trade_analyzer.py (modify)
Function: format_output(analysis: dict[str, Any]) -> str
Test additions: test/test_trade_analyzer.py (modify)
Reference: Format weekly data + metrics as readable console output
Target: ~15 lines code + ~30 lines tests

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 3 min, ~30 lines): Write tests first

Add to test/test_trade_analyzer.py
Import format_output (will fail initially)
Write 2 test cases:

test_format_output_contains_headers - Verify "Week", "Trades", "P&L", "Win Rate", "EV"
test_format_output_contains_values - Verify actual values appear in output


Run: pytest test/test_trade_analyzer.py::test_format_output -v
VERIFY: Tests fail (function doesn't exist) - RED phase


GREEN Phase (~50% - 5 min, ~15 lines): Implement function

Add format_output() to src/trade_analyzer.py
Build string with sections:

Header: "Trade Analysis Report"
Weekly section: Loop through weekly_data, format as table
Metrics section: Display overall_metrics (win_rate, avg_win, avg_loss, total_pnl)
EV section: Display ev_metrics (standard_ev, kelly, sharpe)


Use f-strings for formatting: {value:.2f}, {percent:.1%}
Return formatted string
Run: pytest test/test_trade_analyzer.py::test_format_output -v
VERIFY: All 2 tests pass - GREEN phase


VERIFY Phase (~20% - 2 min): Break/fix verification

Temporarily break: Remove weekly section
Run: pytest test/test_trade_analyzer.py::test_format_output -v
VERIFY: Tests fail (missing headers)
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Pure Function: No side effects

Input analysis dict not mutated
Returns formatted string


✅ Deterministic: Same analysis → same output string
✅ Immutable: Treat input as read-only
✅ Referential Transparency: Can replace call with result
Do NOT print (return string, caller prints)
Do NOT sort weeks (preserve order from aggregate_by_week)
Do NOT add colors/formatting codes (plain text only)

Acceptance Criteria:

RED: Tests fail before implementation
GREEN: All 2 tests pass
VERIFY: Break/fix cycle works
Function signature: def format_output(analysis: dict[str, Any]) -> str:
Docstring: Must include "PURE FUNCTION" marker
Run: python -c "from src.trade_analyzer import analyze_trades, format_output; analysis = analyze_trades('trade_snapshots.csv'); output = format_output(analysis); print(output[:100])"
Expected: First 100 chars of report
Run: pytest test/test_trade_analyzer.py::test_format_output -v
Expected: 2 passed
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0
Functional Programming: Verify no side effects:

grep -n "print" src/trade_analyzer.py | grep format_output → empty


US-010: Integration tests (~10 min, ~40 lines)

Implementation:

File: test/test_trade_analyzer_integration.py (create new)
Tests: End-to-end workflow with real CSV
Reference: trade_snapshots.csv
Target: ~40 lines test code

Approach (TDD RED-GREEN-VERIFY):


RED Phase (~30% - 3 min, ~40 lines): Write tests first

Create test/test_trade_analyzer_integration.py
Import from src.trade_analyzer
Write 2 test cases:

test_full_workflow_real_csv - Load → analyze → format → verify output
test_all_ev_methods_computed - Verify standard_ev, kelly, sharpe all present


Run: pytest test/test_trade_analyzer_integration.py -v
VERIFY: Tests should PASS (all code exists from S1+S2+S3) - GREEN already


GREEN Phase (~50% - 5 min): Already implemented

All functions exist from previous user stories
Integration test verifies they work together
Run: pytest test/test_trade_analyzer_integration.py -v
VERIFY: All 2 tests pass - GREEN phase


VERIFY Phase (~20% - 2 min): Break/fix verification

Temporarily break: Comment out one function (e.g., calculate_pnl)
Run: pytest test/test_trade_analyzer_integration.py -v
VERIFY: Tests fail (pipeline broken)
Revert change
VERIFY: Tests pass again


Functional Programming Requirements (CRITICAL):

✅ Integration tests verify pure function composition
✅ No mocks needed (pure functions compose naturally)
✅ Deterministic output verified end-to-end
Do NOT use fixtures with side effects
Do NOT mock pure functions (test real implementations)
Do NOT test implementation details (only inputs/outputs)

Acceptance Criteria:

Create test/test_trade_analyzer_integration.py
Import from src.trade_analyzer
2 test functions: test_full_workflow_real_csv, test_all_ev_methods_computed
Run: pytest test/test_trade_analyzer_integration.py -v
Expected: 2 passed
Run: pytest test/ -v
Expected: 26+ tests total (24 unit + 2 integration), all pass
Run: mypy src/trade_analyzer.py --strict
Expected: exit 0

US-REVIEW-S3: Sprint 3 Review 🚧 GATE (~5 min)

Scope: US-008, US-009, US-010
Review Steps:

Run: pytest test/ -v --tb=short
Expected: 26+ tests passed (22 from S1+S2 + 2+2+2 from S3)
Run: mypy src/trade_analyzer.py --strict
Expected: Success
Run: python -c "from src.trade_analyzer import analyze_trades, format_output; print('imports ok')"
Expected: "imports ok"
Run: pytest test/ --cov=src.trade_analyzer --cov-report=term-missing
Expected: Coverage > 90%

Linus 5-Layer Analysis (from linus-prompt-code-review.md):


Data Structure Analysis: "Bad programmers worry about code. Good programmers worry about data structures."

What is core data? Analysis result dict with weekly_data, overall_metrics, ev_metrics
Where does data flow? File → trades → P&L → aggregation → metrics → formatted output
Any unnecessary copying or conversion? No - pipeline flows naturally


Special Case Identification: "Good code has no special cases"

Find all if/else branches: None in orchestration layer (delegates to pure functions)
Which are real business logic? All logic in S1+S2 pure functions
Can we redesign data structures to eliminate branches? Already clean separation


Complexity Review: "If > 3 levels of indentation, redesign it"

What is the essence? Orchestrate pure functions into analysis pipeline
How many concepts does solution use? 2 (orchestration + formatting)
Can we reduce it to half? Already minimal - straight-line calls


Destructive Analysis: "Never break userspace" - backward compatibility

List all existing functionality that might be affected: S1+S2 functions unchanged
Which dependencies will be broken? None - additive changes only
How to improve without breaking anything? All new functions, previous sprints stable


Practicality Verification: "Theory and practice sometimes clash. Theory loses."

Does this problem really exist in production? Yes - need complete analysis tool
How many users encounter this? Single user (ad-hoc analysis)
Does solution complexity match problem severity? Yes - simple orchestration of pure functions


Taste Score: Good taste / Acceptable / Garbage
Test File Checks (from linus-prompt-code-review.md):

Tests import from src.trade_analyzer (not inline)
No production logic in test files
Only test_* function names allowed
Integration tests use real data (no mocks)

Functional Programming Verification:

Pure functions? Format function is pure (no side effects)?
Deterministic? Same inputs → same outputs verified end-to-end?
Immutable? No input mutation in orchestration layer?
Docstrings? All pure functions include "PURE FUNCTION" marker?
Separation? Clear boundary: I/O (analyze_trades) vs Pure (all calculations + format)?

Cross-Task Checks:

Verify patterns consistent across all sprint 3 functions
Check no orphaned imports or dead code
Validate error handling is uniform (propagate exceptions)
Integration tests verify full pipeline works

Gate:

If issues found: Create fix tasks (US-008a, US-009a, etc.), output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-S3 complete", output <review-passed/>


 US-REVIEW-FINAL Final Cross-Sprint Review 🚧 GATE (~10 min) [depends: US-REVIEW-S3]


US-REVIEW-FINAL: Final Cross-Sprint Review 🚧 GATE (~10 min)

Scope: All sprints (US-001 through US-010)
Purpose: Verify cross-sprint consistency and overall quality after all individual sprint reviews have passed.
Review Steps:

Run: git log --oneline | head -20
Verify all sprint review commits exist (US-REVIEW-S1, US-REVIEW-S2, US-REVIEW-S3)
Run: pytest test/ -v
Expected: 26+ tests pass, 0 failures
Run: mypy src/ --strict
Expected: No errors
Run: pytest test/ --cov=src --cov-report=term-missing
Expected: Coverage >= 90%
Run: python -c "from src.trade_analyzer import analyze_trades, format_output; result = analyze_trades('trade_snapshots.csv'); print(format_output(result)[:200])"
Expected: Formatted report output (first 200 chars)

Cross-Sprint Consistency Checks:

Naming conventions consistent across all sprints?

All functions use snake_case
All test functions start with test_


Error handling patterns uniform?

Pure functions raise ValueError with descriptive messages
I/O functions propagate exceptions


No duplicate code between sprint modules?

All functions in single module (src/trade_analyzer.py)
No code duplication


Import structure clean (no circular dependencies)?

Single module, no internal imports


All TODOs resolved?

Run: grep -r "TODO" src/
Expected: Empty


Linus 5-Layer Analysis (Whole Feature):


Data Structure Analysis: Does data flow cleanly across all sprints?

CSV → list[dict] (S1) → list[pnl_dict] (S1) → dict[week, list] (S1) → metrics dict (S1) → EV floats (S2) → analysis dict (S3) → formatted string (S3)
Flow is linear, no backtracking, each transformation builds on previous


Special Case Identification: Any special cases that could be eliminated?

Zero checks in S2 (mathematical constraints, can't eliminate)
Side adjustment in S1 (domain logic, can't eliminate)
All special cases are legitimate business/math requirements


Complexity Review: Is overall architecture simple and elegant?

Yes - pure functional pipeline with clear separation
Functional Core (S1 calculations + S2 EV + S3 format) vs Imperative Shell (S3 I/O)
No deep nesting, straight-line logic


Destructive Analysis: Does complete feature break existing functionality?

No - new module, no existing dependencies
All functions composable, no breaking changes possible


Practicality Verification: Does complete solution match problem scope?

Yes - lightweight analysis tool for CSV data
No over-engineering (no database, no web UI, no async)
Appropriate for ad-hoc trading analysis


Taste Score: Good taste / Acceptable / Garbage
Functional Programming Verification (If Applicable):

Pure functions consistently used across sprints?

S1: calculate_pnl, aggregate_by_week, calculate_metrics (pure)
S2: calculate_standard_ev, calculate_kelly_criterion, calculate_sharpe_ratio (pure)
S3: format_output (pure)


Clear separation: Functional Core vs Imperative Shell?

Core: All calculation functions (pure, no I/O)
Shell: load_trades (I/O), analyze_trades (orchestration + I/O)


No side effects leaked into calculation functions?

Verified via grep: no print, no logger, no mutations


All pure functions marked in docstrings?

Checklist: verify "PURE FUNCTION" in all S1/S2/S3 calculation function docstrings


Gate:

If issues found: Create fix tasks, output <review-issues-found/>
If clean: Mark [x], commit "docs: US-REVIEW-FINAL complete", output <review-passed/>


Non-Goals


No database storage (CSV input only, in-memory processing)
No web UI or API (console output only)
No real-time data streaming (batch CSV analysis)
No email validation or market name validation (trust input data)
No multi-threaded processing (single-threaded analysis sufficient)
No position sizing recommendations (metrics only, not advice)
No charting or visualization (text output only)
No persistence of analysis results (ephemeral output)

Technical Considerations


Python 3.10+ required for dict[str, str | int] type hints
Dependencies: csv, datetime, statistics (stdlib only, no external packages)
File I/O limited to single CSV read (no writes)
Functional programming: Pure functions for all calculations (S1+S2+S3 core), I/O isolated to shell (S3 orchestration)
Test coverage target: 90%+
All pure functions documented with "PURE FUNCTION" marker in docstrings


Validation Gate


All tests pass? pytest test/ -v exits 0
Typecheck clean? mypy src/ --strict exits 0
Coverage adequate? pytest test/ --cov=src --cov-fail-under=90 exits 0
No TODO comments left? grep -r "TODO" src/ returns empty
Integration test passes? python -c "from src.trade_analyzer import analyze_trades, format_output; print(format_output(analyze_trades('trade_snapshots.csv')))" runs without error

STOP if any check fails. Fix before proceeding.

  
## progress_trade_analyzer.txt
# Progress Log

## Learnings
(Patterns discovered during implementation)

---

## trade_snapshots.csv

          
            timestamp
            event
            side
            fill_price
            fill_size
            market
            btc_price
            deviation
            book_bid
            book_ask
            book_depth
            asks_10
            asks_20
            asks_30
            bids_60
            bids_70
            bids_80

            
              2025-01-06 10:00:00
              ENTRY
              YES
              21.0
              50
              MKT-A
              95000.00
              -1.50
              21
              48
              2500
              0
              0
              2500
              0
              0
              0

            
              2025-01-06 10:15:00
              EXIT
              YES
              56.0
              50
              MKT-A
              95200.00
              -1.20
              54
              58
              2200
              0
              0
              2200
              0
              0
              0

            
              2025-01-06 11:00:00
              ENTRY
              YES
              45.0
              30
              MKT-B
              95100.00
              -1.30
              44
              47
              1800
              0
              0
              1800
              0
              0
              0

            
              2025-01-06 11:30:00
              EXIT
              YES
              32.0
              30
              MKT-B
              94800.00
              -1.60
              30
              34
              1900
              0
              0
              1900
              0
              0
              0

            
              2025-01-07 09:00:00
              ENTRY
              NO
              72.0
              40
              MKT-C
              96000.00
              0.50
              70
              74
              2100
              0
              0
              2100
              0
              0
              0

            
              2025-01-07 09:20:00
              EXIT
              NO
              41.0
              40
              MKT-C
              96500.00
              1.00
              39
              43
              2300
              0
              0
              2300
              0
              0
              0

            
              2025-01-07 14:00:00
              ENTRY
              YES
              38.0
              25
              MKT-A
              95500.00
              -0.80
              36
              40
              2000
              0
              0
              2000
              0
              0
              0

            
              2025-01-07 14:45:00
              EXIT
              YES
              65.0
              25
              MKT-A
              95800.00
              -0.50
              63
              67
              1800
              0
              0
              1800
              0
              0
              0

            
              2025-01-08 10:00:00
              ENTRY
              YES
              55.0
              35
              MKT-B
              94500.00
              -2.00
              53
              57
              2400
              0
              0
              2400
              0
              0
              0

            
              2025-01-08 10:30:00
              EXIT
              YES
              71.0
              35
              MKT-B
              94900.00
              -1.50
              69
              73
              2200
              0
              0
              2200
              0
              0
              0

            
              2025-01-13 09:00:00
              ENTRY
              YES
              25.0
              60
              MKT-A
              97000.00
              1.00
              23
              27
              2600
              0
              0
              2600
              0
              0
              0

            
              2025-01-13 09:30:00
              EXIT
              YES
              48.0
              60
              MKT-A
              97200.00
              1.20
              46
              50
              2400
              0
              0
              2400
              0
              0
              0

            
              2025-01-14 11:00:00
              ENTRY
              NO
              80.0
              20
              MKT-C
              98000.00
              2.00
              78
              82
              1500
              0
              0
              1500
              0
              0
              0

            
              2025-01-14 11:20:00
              EXIT
              NO
              55.0
              20
              MKT-C
              97500.00
              1.50
              53
              57
              1700
              0
              0
              1700
              0
              0
              0

            
              2025-01-15 10:00:00
              ENTRY
              YES
              30.0
              45
              MKT-B
              96500.00
              0.50
              28
              32
              2100
              0
              0
              2100
              0
              0
              0

            
              2025-01-15 10:15:00
              EXIT
              YES
              18.0
              45
              MKT-B
              96200.00
              0.20
              16
              20
              2300
              0
              0
              2300
              0
              0
              0

            
              2025-01-15 14:00:00
              ENTRY
              YES
              42.0
              30
              MKT-A
              96800.00
              0.80
              40
              44
              2000
              0
              0
              2000
              0
              0
              0

            
              2025-01-15 14:30:00
              EXIT
              YES
              67.0
              30
              MKT-A
              97100.00
              1.10
              65
              69
              1800
              0
              0
              1800
              0
              0
              0

            
              2025-01-16 09:00:00
              ENTRY
              YES
              50.0
              25
              MKT-B
              97500.00
              1.50
              48
              52
              2200
              0
              0
              2200
              0
              0
              0

            
              2025-01-16 09:45:00
              ENTRY
              YES
              33.0
              40
              MKT-C
              97300.00
              1.30
              31
              35
              2000
              0
              0
              2000
              0
              0
              0
Dimension	Finding
Data Structure	`dict[str, Any]` provides zero type safety on data shape. TypedDict is the obvious fix.
Complexity	`calculate_pnl` uses `groupby` + re-scan instead of a simple dict. Remove `groupby` entirely.
Risk Points	Side adjustment (YES/NO) from PRD is unimplemented. P&L for NO trades may be wrong.
Test Gaps	Edge cases (empty list, excess exits, zero stdev) are untested. One test name is misleading.
	just for grins i ran the steps to build the example prd, then did a code review of what was built using this prompt;

	use @linus-prompt-code-review.md ot reivew the code build per
	progress_trade_analyzer.txt
	PRD_TRADE_ANALYZER.md
	# Progress Log

	## Learnings
	(Patterns discovered during implementation)

	---
timestamp	event	side	fill_price	fill_size	market	btc_price	deviation	book_bid	book_ask	book_depth	asks_10	asks_20	asks_30	bids_60	bids_70	bids_80
2025-01-06 10:00:00	ENTRY	YES	21.0	50	MKT-A	95000.00	-1.50	21	48	2500	0	0	2500	0	0	0
2025-01-06 10:15:00	EXIT	YES	56.0	50	MKT-A	95200.00	-1.20	54	58	2200	0	0	2200	0	0	0
2025-01-06 11:00:00	ENTRY	YES	45.0	30	MKT-B	95100.00	-1.30	44	47	1800	0	0	1800	0	0	0
2025-01-06 11:30:00	EXIT	YES	32.0	30	MKT-B	94800.00	-1.60	30	34	1900	0	0	1900	0	0	0
2025-01-07 09:00:00	ENTRY	NO	72.0	40	MKT-C	96000.00	0.50	70	74	2100	0	0	2100	0	0	0
2025-01-07 09:20:00	EXIT	NO	41.0	40	MKT-C	96500.00	1.00	39	43	2300	0	0	2300	0	0	0
2025-01-07 14:00:00	ENTRY	YES	38.0	25	MKT-A	95500.00	-0.80	36	40	2000	0	0	2000	0	0	0
2025-01-07 14:45:00	EXIT	YES	65.0	25	MKT-A	95800.00	-0.50	63	67	1800	0	0	1800	0	0	0
2025-01-08 10:00:00	ENTRY	YES	55.0	35	MKT-B	94500.00	-2.00	53	57	2400	0	0	2400	0	0	0
2025-01-08 10:30:00	EXIT	YES	71.0	35	MKT-B	94900.00	-1.50	69	73	2200	0	0	2200	0	0	0
2025-01-13 09:00:00	ENTRY	YES	25.0	60	MKT-A	97000.00	1.00	23	27	2600	0	0	2600	0	0	0
2025-01-13 09:30:00	EXIT	YES	48.0	60	MKT-A	97200.00	1.20	46	50	2400	0	0	2400	0	0	0
2025-01-14 11:00:00	ENTRY	NO	80.0	20	MKT-C	98000.00	2.00	78	82	1500	0	0	1500	0	0	0
2025-01-14 11:20:00	EXIT	NO	55.0	20	MKT-C	97500.00	1.50	53	57	1700	0	0	1700	0	0	0
2025-01-15 10:00:00	ENTRY	YES	30.0	45	MKT-B	96500.00	0.50	28	32	2100	0	0	2100	0	0	0
2025-01-15 10:15:00	EXIT	YES	18.0	45	MKT-B	96200.00	0.20	16	20	2300	0	0	2300	0	0	0
2025-01-15 14:00:00	ENTRY	YES	42.0	30	MKT-A	96800.00	0.80	40	44	2000	0	0	2000	0	0	0
2025-01-15 14:30:00	EXIT	YES	67.0	30	MKT-A	97100.00	1.10	65	69	1800	0	0	1800	0	0	0
2025-01-16 09:00:00	ENTRY	YES	50.0	25	MKT-B	97500.00	1.50	48	52	2200	0	0	2200	0	0	0
2025-01-16 09:45:00	ENTRY	YES	33.0	40	MKT-C	97300.00	1.30	31	35	2000	0	0	2000	0	0	0