Skip to content

Instantly share code, notes, and snippets.

@bojanrajkovic
Created March 9, 2026 19:02
Show Gist options
  • Select an option

  • Save bojanrajkovic/a64afeebbbe71395649174e20ec798d5 to your computer and use it in GitHub Desktop.

Select an option

Save bojanrajkovic/a64afeebbbe71395649174e20ec798d5 to your computer and use it in GitHub Desktop.
Loupe: Comment Re-Anchoring Design Doc

Comment Re-Anchoring Design

Summary

When a reviewer leaves a comment on a specific line or passage, that anchor point refers to a particular version of the file. As the author continues working — pushing new commits, rebasing, or amending — the underlying content shifts. Without re-anchoring, comments become misleading: they point to wrong lines or silently disappear. This design introduces a three-layer pipeline that keeps comments positioned correctly as branches evolve, and gives each comment a precise state describing how confident the system is in its current placement.

The pipeline works by combining two complementary techniques that no existing tool does together. Layer 1 applies git's own diff math: given the commit the comment was created against and the current HEAD, it builds a LineAdjustmentMap that translates line numbers through the diff hunks deterministically. Layer 2 verifies that the translated position still contains the expected text. If it does not, Layer 3 searches the file using a fallback chain of progressively looser text matching — whitespace normalization, prefix/suffix anchoring, substring search, edit-distance similarity — and assigns a confidence score. The result is one of four anchor states: Anchored (exact match), Shifted (found at a new position with confidence), Outdated (content changed), or Orphaned (not found). Re-anchoring triggers asynchronously whenever the file watcher detects a branch HEAD change, so the diff view refreshes immediately and comments snap to their new positions a moment later via SSE.

Definition of Done

  1. Unified re-anchoring framework for both diff-mode and rendered-mode comments
  2. Four-state anchor model: Anchored / Shifted / Outdated / Orphaned
  3. Fallback chain algorithm: git diff-based translation → exact text match → fuzzy match (edit distance, substring, prefix/suffix) → orphaned
  4. Progressive confidence UX — visual indicators when anchor has shifted, distinct treatment for outdated vs orphaned
  5. Structural context exploration — heading-based context for rendered-mode as a starting point, AST-level for code as a future enhancement
  6. Trigger timing integrated with the existing file watcher design (idea.md lines 47-51), with async resolution
  7. Caching strategy for line adjustment maps and resolved anchor positions
  8. Comparison to GitHub (outdated), Hypothesis (multi-selector), Phabricator (re-anchor)
  9. Data model changes: new columns on comments, new line_adjustment_maps cache table

Acceptance Criteria

reanchoring.AC1: Pipeline determines correct anchor state

  • reanchoring.AC1.1 Success: Comment on unchanged line → Anchored (Layer 1 translates, Layer 2 validates)
  • reanchoring.AC1.2 Success: Comment on line that moved (context line in different position) → Layer 2 validates at new position → Anchored
  • reanchoring.AC1.3 Success: Comment on line that moved but text at translated position doesn't match → Layer 3 fuzzy finds it → Shifted
  • reanchoring.AC1.4 Success: Comment on deleted/modified line → Outdated with best-guess position
  • reanchoring.AC1.5 Success: Comment text completely gone from file → Orphaned
  • reanchoring.AC1.6 Edge: File deleted entirely → Orphaned
  • reanchoring.AC1.7 Edge: File renamed → pipeline follows rename, re-anchors in new file
  • reanchoring.AC1.8 Edge: branch_head commit unreachable (force-push) → skip Layer 1, proceed to text matching

reanchoring.AC2: Fuzzy matching fallback chain

  • reanchoring.AC2.1 Success: Whitespace-only change (reformat) → normalized match finds it
  • reanchoring.AC2.2 Success: Exact text changed but prefix/suffix context survives → prefix/suffix search finds region
  • reanchoring.AC2.3 Success: Text partially modified → edit distance match above threshold → Shifted
  • reanchoring.AC2.4 Failure: Edit distance below threshold → Orphaned (not false-matched)
  • reanchoring.AC2.5 Edge: Duplicate text in file → prefix/suffix or structural context disambiguates

reanchoring.AC3: Pipeline bypass

  • reanchoring.AC3.1 Success: Lockfile comment skips Layers 2-3, uses Layer 1 result or Orphaned
  • reanchoring.AC3.2 Success: File over 500KB bypasses fuzzy matching
  • reanchoring.AC3.3 Success: Per-workspace override adds/removes bypass patterns

reanchoring.AC4: Caching

  • reanchoring.AC4.1 Success: LineAdjustmentMap cached after first computation, reused for subsequent comments on same file
  • reanchoring.AC4.2 Success: Resolved anchor positions cached, pipeline skipped when anchor_head = current HEAD
  • reanchoring.AC4.3 Success: Pipeline skip when branch_head = current HEAD (no re-anchoring needed)
  • reanchoring.AC4.4 Success: Cache entries cleaned up on review deletion

reanchoring.AC5: Async trigger and resolution

  • reanchoring.AC5.1 Success: Branch HEAD change triggers re-anchoring without blocking diff refresh
  • reanchoring.AC5.2 Success: Resolved positions pushed to client via SSE
  • reanchoring.AC5.3 Success: Comments show transient "resolving" state before snapping to final position

reanchoring.AC6: Structural context (rendered-mode)

  • reanchoring.AC6.1 Success: Rendered-mode comment stores heading breadcrumb on creation
  • reanchoring.AC6.2 Success: Fuzzy matching searches within heading scope before full file
  • reanchoring.AC6.3 Edge: Heading structure changed → falls back to full-file search

Glossary

  • Anchor: The stored reference that ties a comment to a specific location in a file — either line numbers in diff-mode or a text quote selector in rendered-mode. Re-anchoring is the process of updating this reference when the file changes.
  • Anchor state: One of four values (Anchored, Shifted, Outdated, Orphaned) indicating how confidently the pipeline resolved a comment's position after file changes.
  • branch_head: The git commit SHA recorded when a comment was created. The "old" side of the diff when re-anchoring: identifies which version of the file the comment originally referenced.
  • context_snippet: For diff-mode comments, the verbatim text of the line(s) the comment was placed on, stored at creation time. Used in Layer 2 to verify that the translated position still contains the same content.
  • diff hunk: A contiguous block of changed lines in a unified diff, preceded by a @@ header. The LineAdjustmentMap is built by iterating over hunks and recording how each line number shifts.
  • Drizzle ORM: A TypeScript ORM used in this project for type-safe SQLite queries. New schema additions follow existing Drizzle conventions.
  • edit distance (Levenshtein / Dice similarity): Measures of how different two strings are. Levenshtein counts single-character edits; Dice similarity counts shared character pairs. Used in Layer 3 to score fuzzy candidates; ≥ 0.8 is treated as a match.
  • heading breadcrumb / heading_path: An array of Markdown heading labels from the document root down to the section containing a rendered-mode comment (e.g. ["Getting Started", "Installation", "macOS"]). Stored at creation and used in Layer 3 to narrow fuzzy search scope.
  • Hypothesis: A web annotation platform that attaches comments to web pages using text-quote selectors. Prior art for text-based anchor fallback, as distinct from git-based line math.
  • LineAdjustmentMap: A data structure mapping old line numbers to new line numbers for a specific file between two commits. Built from diff hunks; keyed by (file_path, old_commit, new_commit) and cached in the database.
  • Phabricator: A code review tool from Meta that uses line adjustment maps to translate comment positions when a revision is updated. Prior art for git-based line translation.
  • pipeline bypass: Mechanism that skips Layers 2 and 3 for files where fuzzy search is meaningless — lockfiles, minified bundles, large generated files. Layer 1 is still attempted; if it fails, the comment goes directly to Orphaned.
  • rendered-mode: Comment mode where the target file is displayed as rendered output (e.g. Markdown as formatted prose) rather than as a raw diff. Anchoring uses a TextQuoteSelector rather than line numbers.
  • SSE (Server-Sent Events): A one-way HTTP streaming mechanism used by Loupe to push updates from server to client over a single persistent connection.
  • TextQuoteSelector: A W3C Web Annotation standard for identifying a text passage by its exact content plus short prefix and suffix context strings. Used as the anchor for rendered-mode comments.
  • three-dot diff vs two-dot diff: Three-dot (base...branch) diffs from the merge-base — used by gitDiff() for review display. Two-dot (old..new) diffs between specific commits — needed by the pipeline to compute LineAdjustmentMap.
  • Tree-sitter: A parser-generator framework for source code syntax trees. Mentioned as a possible future mechanism for symbol-based structural context in diff-mode comments, deferred until fuzzy matching produces false positives.

Architecture

Approach: Layered Pipeline

Re-anchoring uses a three-layer pipeline that combines git-based positional translation with text-based validation. Each layer narrows the problem for the next:

Comment needs re-anchoring (branch_head ≠ current HEAD)
  │
  ├─ Layer 1: Git Diff Translation
  │   Build LineAdjustmentMap from diff hunks between branch_head and current HEAD.
  │   Translate comment's line range to new positions.
  │
  ├─ Layer 2: Text Validation
  │   Verify that the translated position still contains the expected text
  │   (context_snippet for diff-mode, TextQuoteSelector.exact for rendered-mode).
  │   If yes → Anchored. If line was deleted → Outdated. Otherwise → Layer 3.
  │
  └─ Layer 3: Fuzzy Fallback
      Search the file for the anchor text using progressively looser matching:
      normalized exact → prefix/suffix search → substring → edit distance.
      If found above threshold → Shifted. Otherwise → Orphaned.

This combines Phabricator's deterministic line math with Hypothesis-style text matching. Phabricator has the former, Hypothesis has the latter — no existing system does both.

Anchor State Model

Four states, reflecting progressive degradation through the pipeline:

State Meaning Determined by Behavior
Anchored Comment is exactly where it should be Layer 1 translated + Layer 2 validated text match Normal display, no indicator
Shifted Comment moved, re-anchored with confidence Layer 3 fuzzy match above threshold Display at new position with "shifted" badge
Outdated Underlying content changed substantively Layer 1 flagged line as deleted/modified "Outdated" marker, original context preserved
Orphaned Cannot find the anchor All layers failed, or file deleted Surfaced (not hidden) at file or review level

The key distinction between Shifted and Outdated: Shifted means the same content moved. Outdated means the content itself changed — the comment may no longer be relevant.

Orphaned comments are surfaced, not hidden. GitHub hides outdated comments from Files Changed, which is the #1 complaint in their community discussions — reviewers can't tell if their feedback was addressed. Loupe surfaces orphans with their original context so nothing is silently lost. The exact placement of orphaned comments in the UI (nearest line, file header, detached panel) is deferred to the inline commenting UI design.

Pipeline Bypass

Noisy files skip Layers 2 and 3 entirely. If Layer 1's line translation works, use it. If not, go straight to Orphaned. No fuzzy searching through lockfiles.

Default bypass patterns:

  • *.lock, *-lock.* (pnpm-lock.yaml, package-lock.json, yarn.lock, Gemfile.lock)
  • *.min.js, *.min.css (minified bundles)
  • Generated files (.generated.*, *.pb.go, *.g.dart)
  • Files over 500KB (catch-all for large generated content)

Bypass patterns are a hardcoded default list with per-workspace override for projects where unusual files are review-worthy.


Pipeline Algorithm

Layer 1: Git Diff Translation

Input: comment's branch_head, current HEAD, file_path, line numbers.

  1. Compute git diff <branch_head>..<current_HEAD> -- <file_path>.
  2. Parse the diff hunks into a LineAdjustmentMap: for each hunk, build a mapping of old_line → new_line.
    • Context lines: map 1:1 (line exists in both old and new).
    • Deleted lines: map to the nearest surviving line with a deleted flag.
    • Added lines: shift subsequent line mappings by the insertion size.
  3. Translate the comment's line range through the map.
  4. Output: new line numbers + metadata (lines_deleted: boolean, offset: number).

For rendered-mode comments, Layer 1 translates source_line_start/end — the best-effort source mapping stored on the anchor — not the TextQuoteSelector itself. The selector is text-based and doesn't need line translation.

Layer 2: Text Validation

Input: translated position from Layer 1, plus the comment's text anchor (context_snippet for diff-mode, TextQuoteSelector.exact for rendered-mode).

  1. Extract text at the translated position in the current file.
  2. Compare against the stored anchor text.
  3. Exact match → state = Anchored. Done.
  4. No match, but Layer 1 didn't flag deletion → proceed to Layer 3.
  5. Layer 1 flagged the line as deleted/modified → state = Outdated. Store translated position as best guess. Done.

For rendered-mode: search the rendered document for TextQuoteSelector.exact in a window around the translated source_line_start. Validate with prefix/suffix if found.

Layer 3: Fuzzy Fallback

Input: anchor text and full current file content. Only reached when Layer 2 validation failed and the line wasn't flagged as deleted.

Fallback chain (stop at first match above confidence threshold):

  1. Normalized exact match: strip whitespace differences, normalize quotes/encoding. Handles reformatting without semantic change.
  2. Prefix/suffix anchored search: search for prefix + ??? + suffix pattern. If the exact text changed but surrounding context survived, this finds the region.
  3. Substring match: if the exact text was trimmed or extended, search for the longest common substring above a minimum length.
  4. Edit distance: Levenshtein or Dice similarity against sliding windows of similar length. Threshold: ≥ 0.8 similarity.
  5. Structural hint (when available): if heading-based structural context was stored, search within that scope first to reduce false matches in repeated content.

Each match gets a confidence score. Best match above threshold → Shifted. No match above threshold → Orphaned.

Confidence thresholds may differ between diff-mode (code) and rendered-mode (prose). Code is more repetitive and structurally similar, requiring higher thresholds to avoid false matches.


Caching Strategy

Cache 1: Line Adjustment Maps

Keyed by (file_path, old_commit, new_commit). Deterministic — the same diff between two commits always produces the same map.

  • Storage: line_adjustment_maps DB table.
  • Sharing: A review with 50 comments across 10 files needs only 10 maps, not 50.
  • Invalidation: Never — commit SHAs are immutable. Cache grows monotonically.
  • Cleanup (v1): Inline during normal operations. Cascade-delete when review is deleted. Prune maps not referenced by any active review when a review completes.
  • Cleanup (future): Background sweep for maps older than 30 days not referenced by any active review, if cache size becomes a concern.

Cache 2: Resolved Anchor Positions

Keyed by (comment_id, current_HEAD). The final pipeline result — what the UI consumes.

  • Storage: anchor_state, resolved_anchor, and anchor_head columns on the comments table.
  • Invalidation: When current_HEAD changes (new commit pushed). Old resolved positions become stale and the pipeline re-runs on next trigger.
  • Cleanup: Self-cleaning — positions are overwritten on re-run. Cascade-delete handles review deletion.

Pipeline Skip Optimization

If a comment's branch_head equals current_HEAD, skip the entire pipeline — the comment is at its original position by definition. This is the common case when viewing a review without new commits.


Trigger Timing

The file watcher design in idea.md (lines 47-51) already specifies watching .git/refs/heads/<branch> to detect local changes and pushing updates via SSE. Re-anchoring hooks into this same event.

When the watcher detects a ref change:

  1. Server re-runs git diff and pushes the updated diff view via SSE (existing design).
  2. Server triggers re-anchoring for all comments on the affected review (new).
  3. Re-anchoring runs asynchronously — the diff refresh is not blocked. Comments briefly show a "resolving..." state, then snap to their new positions when the pipeline completes.
  4. Resolved positions are pushed to the client via the same SSE connection.

This async model is important for perceived performance: the diff (which the reviewer is waiting for) appears immediately, and comments resolve shortly after.


Structural Context

Starting Point: Heading-Based Context (Rendered-Mode Only)

For rendered-mode comments on Markdown files, store the heading breadcrumb at comment creation time:

{
  "selector": { "type": "TextQuoteSelector", ... },
  "source_line_start": 12,
  "source_line_end": 14,
  "branch_head": "e4f5g6h",
  "heading_path": ["Getting Started", "Installation", "macOS"]
}

This is a new optional field on RenderedAnchor. During Layer 3 fuzzy matching, search within the section under the same heading path first, falling back to full-file search if the heading structure changed.

Implementation is lightweight: parse Markdown headings (regex or simple AST), walk up to build the breadcrumb. No language-aware parsing needed.

Future Enhancement: Symbol-Based Context (Code)

For diff-mode comments on code files, storing the enclosing function/class name would disambiguate fuzzy matches in repetitive code. This requires language-aware parsing (Tree-sitter, LSP symbol extraction, or regex heuristics) and is significantly more complex to implement.

Do not implement symbol-based context initially. Add it only if fuzzy matching produces false positives in practice. The trigger: multiple fuzzy matches with similar confidence scores in the same file, where structural context would have disambiguated.


Existing Patterns

Git Infrastructure

gitDiff(base, branch) in src/lib/server/git.ts produces raw unified diff strings using three-dot diff (git diff base...branch). The re-anchoring pipeline needs two-dot diff (git diff old_commit..new_commit) for computing line translations between specific commits. This is a new function alongside the existing one, not a replacement.

Diff Parsing

parseDiff() in src/lib/server/diff.ts returns TokenizedDiffFile[] with TokenizedBlock and TokenizedLine structures (src/lib/types/diff.ts). The LineAdjustmentMap builds from raw diff hunks (before tokenization), so it operates at the same level as diff2html's parser — it can potentially reuse diff2html's hunk parsing.

Caching

The shiki highlighter in src/lib/server/highlighter.ts uses a module-level singleton pattern. The line adjustment map cache uses a different pattern (DB-backed) because maps need to persist across server restarts and be shared across requests.

Database

The DB layer (src/lib/server/db/) uses Drizzle ORM with better-sqlite3. New columns on comments and the new line_adjustment_maps table follow the existing schema conventions: text PKs with UUID defaults, ISO 8601 text timestamps, JSON columns via text({ mode: "json" }).


Data Model Changes

New columns on comments

anchor_state: text("anchor_state", {
  enum: ["anchored", "shifted", "outdated", "orphaned"]
}).notNull().default("anchored"),

resolved_anchor: text("resolved_anchor", { mode: "json" })
  .$type<ResolvedAnchor | null>()
  .default(null),

anchor_head: text("anchor_head"),
  // HEAD sha the anchor was last resolved against
  // null = resolved against original branch_head

ResolvedAnchor holds the translated position — new line numbers for diff-mode, new text offset for rendered-mode. Shape TBD during implementation.

New table: line_adjustment_maps

line_adjustment_maps (
  id           text primary key,
  file_path    text not null,
  old_commit   text not null,
  new_commit   text not null,
  map_data     text not null,  -- JSON: serialized LineAdjustmentMap
  created_at   text not null,
  unique(file_path, old_commit, new_commit)
)

New field on RenderedAnchor (optional)

heading_path?: string[]
// e.g. ["Getting Started", "Installation", "macOS"]

No migration needed — this is a new optional field in the JSON anchor blob.


Prior Art Comparison

System Re-anchoring States Text Matching Git Translation Structural Context
GitHub None Active / Outdated No No No
Phabricator Line adjustment maps from hunks Active / Ghost No Yes No
Hypothesis Multi-selector fallback Anchored / Orphaned Yes (exact + fuzzy) No (web, no VCS) No
Apache Annotator Dice coefficient sliding window Anchored / Failed Yes (Dice similarity) No (web, no VCS) No
Loupe Layered pipeline Anchored / Shifted / Outdated / Orphaned Yes (full fallback chain) Yes (LineAdjustmentMap) Future (heading path)

Loupe is the only system combining git-based positional translation with text-based validation. The four-state model with progressive confidence is richer than any existing system. Async resolution via the file watcher means comments re-anchor live as the branch evolves.


Implementation Phases

Phase 1: Data Model and Types

Goal: Schema changes and TypeScript type definitions for the re-anchoring framework.

Components:

  • Migration adding anchor_state, resolved_anchor, anchor_head columns to comments table
  • Migration adding line_adjustment_maps table
  • TypeScript types in src/lib/server/db/schema.ts: ResolvedAnchor, anchor state enum, LineAdjustmentMap table definition
  • Updated RenderedAnchor type with optional heading_path field
  • LineAdjustmentMap TypeScript type in src/lib/server/reanchor/ (the serializable map structure)

Dependencies: Persistence layer (PR #24) merged.

Done when: Migration applies cleanly, types compile, existing tests pass.

Phase 2: LineAdjustmentMap (Layer 1)

Goal: Parse git diff hunks into a line adjustment map and translate line ranges.

Components:

  • src/lib/server/reanchor/line-map.tsLineAdjustmentMap class: newFromHunks() to build from parsed diff, mapLine() to translate a line number, mapRange() for multi-line comments
  • src/lib/server/reanchor/git-diff.ts — wrapper around git diff <old>..<new> -- <file> (two-dot diff between specific commits)
  • Unit tests covering: context line translation, deleted line snapping, added line offset, multi-hunk files, empty diffs

Dependencies: Phase 1 (types).

Done when: LineAdjustmentMap correctly translates line numbers through diff hunks, including deleted-line snapping and offset computation. All unit tests pass.

Phase 3: Text Validation and Fuzzy Matching (Layers 2 & 3)

Goal: Text-based anchor verification and fuzzy fallback search.

Components:

  • src/lib/server/reanchor/text-match.ts — text matching functions: exactMatch(), normalizedMatch(), prefixSuffixSearch(), substringMatch(), editDistanceMatch(), confidenceScore()
  • src/lib/server/reanchor/validate.ts — Layer 2 validation: extract text at position, compare against stored anchor, determine if match/mismatch/deleted
  • Unit tests covering: exact match, whitespace normalization, prefix/suffix finding, edit distance thresholds, confidence scoring, diff-mode vs rendered-mode paths

Dependencies: Phase 1 (types).

Done when: Text validation correctly identifies matches and mismatches. Fuzzy fallback chain finds shifted text with appropriate confidence scores. All unit tests pass.

Phase 4: Pipeline Orchestrator

Goal: Connect all three layers into the unified re-anchoring pipeline with state determination and bypass logic.

Components:

  • src/lib/server/reanchor/pipeline.ts — main reanchorComment() function: runs Layer 1 → 2 → 3, determines final anchor state, returns resolved position
  • src/lib/server/reanchor/bypass.ts — pipeline bypass pattern matching: default patterns, per-workspace overrides
  • src/lib/server/reanchor/index.ts — public API: reanchorComments(reviewId, currentHead) batch function
  • Integration tests covering: full pipeline for each state (Anchored, Shifted, Outdated, Orphaned), bypass pattern skipping, batch processing multiple comments

Dependencies: Phase 2 (Layer 1), Phase 3 (Layers 2 & 3).

Done when: Pipeline correctly determines anchor state for all four cases. Bypass patterns prevent fuzzy matching on noisy files. Batch function processes all comments for a review. All tests pass.

Phase 5: Caching

Goal: Persist line adjustment maps and resolved anchor positions to avoid redundant computation.

Components:

  • DB operations in src/lib/server/reanchor/cache.ts — CRUD for line_adjustment_maps table, read/write for anchor_state/resolved_anchor/anchor_head columns on comments
  • Cache integration into pipeline: check cache before computing, write after computing
  • Cache cleanup on review completion/deletion (inline, via existing cascade deletes for comments; explicit cleanup for line_adjustment_maps)
  • Unit tests covering: cache hit/miss, stale cache detection (anchor_head ≠ current HEAD), cleanup on review delete

Dependencies: Phase 4 (pipeline).

Done when: Pipeline uses cached results when available. Cache is populated on first run. Stale entries are detected and recomputed. Cleanup removes orphaned map entries. All tests pass.

Phase 6: File Watcher Integration

Goal: Trigger async re-anchoring when the branch HEAD changes, integrated with the existing file watcher and SSE design.

Components:

  • Hook into the ref-change event from the file watcher (.git/refs/heads/<branch> — design specified in idea.md lines 47-51)
  • Async re-anchoring: dispatch pipeline run without blocking diff refresh
  • SSE push of resolved anchor positions to client after pipeline completes
  • Client-side handling: "resolving..." transient state on comments, snap to resolved positions on SSE update

Dependencies: Phase 5 (caching), file watcher infrastructure (separate implementation — this phase integrates with it).

Done when: Branch HEAD change triggers re-anchoring. Diff refresh is not blocked. Resolved positions arrive via SSE. Comments transition through resolving → final state.

Phase 7: Heading-Based Structural Context

Goal: Store and use Markdown heading breadcrumbs to improve fuzzy matching accuracy for rendered-mode comments.

Components:

  • src/lib/server/reanchor/heading.ts — extract heading path from Markdown source at a given line number
  • Integration into comment creation: populate heading_path on RenderedAnchor when a rendered-mode comment is created
  • Integration into Layer 3: search within the matching heading section first, fall back to full-file search
  • Unit tests covering: heading extraction from Markdown, heading-scoped fuzzy search, fallback when heading structure changes

Dependencies: Phase 4 (pipeline), inline commenting UI (separate — this phase adds structural context to existing rendered-mode anchors).

Done when: Rendered-mode comments store heading breadcrumbs. Fuzzy matching uses heading scope to reduce false matches. Falls back gracefully when headings change. All tests pass.

Additional Considerations

Confidence threshold tuning: The edit distance threshold (≥ 0.8) and the boundary between Shifted and Orphaned are initial values. Real-world usage will reveal whether these need adjustment. Consider logging match confidence scores during early usage to inform tuning.

File renames: Git tracks renames. When Layer 1 detects a rename (git diff reports rename from → rename to), the pipeline should follow the rename and search the new filename. This is a detail within Layer 1, not a separate mechanism.

Force-push / rebase: A force-push replaces the branch history. The branch_head stored on the comment may no longer exist in the reflog. Layer 1 should handle git diff failures gracefully — if the old commit is unreachable, skip Layer 1 and go directly to text matching (Layers 2-3 using the current file content).

Multi-commit jumps: A single git diff branch_head..current_HEAD handles any number of intermediate commits in one operation. Unlike Phabricator (which chains maps through revisions), we diff directly between the two endpoints. This is simpler and sufficient — we don't need to trace the path through intermediate states.

Background cache cleanup (future): If the line_adjustment_maps table grows large, add a periodic sweep that prunes entries older than 30 days not referenced by any active review. Not needed for v1 — inline cleanup during review completion is sufficient for a local app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment