When a reviewer leaves a comment on a specific line or passage, that anchor point refers to a particular version of the file. As the author continues working — pushing new commits, rebasing, or amending — the underlying content shifts. Without re-anchoring, comments become misleading: they point to wrong lines or silently disappear. This design introduces a three-layer pipeline that keeps comments positioned correctly as branches evolve, and gives each comment a precise state describing how confident the system is in its current placement.
The pipeline works by combining two complementary techniques that no existing tool does together. Layer 1 applies git's own diff math: given the commit the comment was created against and the current HEAD, it builds a LineAdjustmentMap that translates line numbers through the diff hunks deterministically. Layer 2 verifies that the translated position still contains the expected text. If it does not, Layer 3 searches the file using a fallback chain of progressively looser text matching — whitespace normalization, prefix/suffix anchoring, substring search, edit-distance similarity — and assigns a confidence score. The result is one of four anchor states: Anchored (exact match), Shifted (found at a new position with confidence), Outdated (content changed), or Orphaned (not found). Re-anchoring triggers asynchronously whenever the file watcher detects a branch HEAD change, so the diff view refreshes immediately and comments snap to their new positions a moment later via SSE.
- Unified re-anchoring framework for both diff-mode and rendered-mode comments
- Four-state anchor model: Anchored / Shifted / Outdated / Orphaned
- Fallback chain algorithm: git diff-based translation → exact text match → fuzzy match (edit distance, substring, prefix/suffix) → orphaned
- Progressive confidence UX — visual indicators when anchor has shifted, distinct treatment for outdated vs orphaned
- Structural context exploration — heading-based context for rendered-mode as a starting point, AST-level for code as a future enhancement
- Trigger timing integrated with the existing file watcher design (
idea.mdlines 47-51), with async resolution - Caching strategy for line adjustment maps and resolved anchor positions
- Comparison to GitHub (outdated), Hypothesis (multi-selector), Phabricator (re-anchor)
- Data model changes: new columns on
comments, newline_adjustment_mapscache table
- reanchoring.AC1.1 Success: Comment on unchanged line → Anchored (Layer 1 translates, Layer 2 validates)
- reanchoring.AC1.2 Success: Comment on line that moved (context line in different position) → Layer 2 validates at new position → Anchored
- reanchoring.AC1.3 Success: Comment on line that moved but text at translated position doesn't match → Layer 3 fuzzy finds it → Shifted
- reanchoring.AC1.4 Success: Comment on deleted/modified line → Outdated with best-guess position
- reanchoring.AC1.5 Success: Comment text completely gone from file → Orphaned
- reanchoring.AC1.6 Edge: File deleted entirely → Orphaned
- reanchoring.AC1.7 Edge: File renamed → pipeline follows rename, re-anchors in new file
- reanchoring.AC1.8 Edge: branch_head commit unreachable (force-push) → skip Layer 1, proceed to text matching
- reanchoring.AC2.1 Success: Whitespace-only change (reformat) → normalized match finds it
- reanchoring.AC2.2 Success: Exact text changed but prefix/suffix context survives → prefix/suffix search finds region
- reanchoring.AC2.3 Success: Text partially modified → edit distance match above threshold → Shifted
- reanchoring.AC2.4 Failure: Edit distance below threshold → Orphaned (not false-matched)
- reanchoring.AC2.5 Edge: Duplicate text in file → prefix/suffix or structural context disambiguates
- reanchoring.AC3.1 Success: Lockfile comment skips Layers 2-3, uses Layer 1 result or Orphaned
- reanchoring.AC3.2 Success: File over 500KB bypasses fuzzy matching
- reanchoring.AC3.3 Success: Per-workspace override adds/removes bypass patterns
- reanchoring.AC4.1 Success: LineAdjustmentMap cached after first computation, reused for subsequent comments on same file
- reanchoring.AC4.2 Success: Resolved anchor positions cached, pipeline skipped when anchor_head = current HEAD
- reanchoring.AC4.3 Success: Pipeline skip when branch_head = current HEAD (no re-anchoring needed)
- reanchoring.AC4.4 Success: Cache entries cleaned up on review deletion
- reanchoring.AC5.1 Success: Branch HEAD change triggers re-anchoring without blocking diff refresh
- reanchoring.AC5.2 Success: Resolved positions pushed to client via SSE
- reanchoring.AC5.3 Success: Comments show transient "resolving" state before snapping to final position
- reanchoring.AC6.1 Success: Rendered-mode comment stores heading breadcrumb on creation
- reanchoring.AC6.2 Success: Fuzzy matching searches within heading scope before full file
- reanchoring.AC6.3 Edge: Heading structure changed → falls back to full-file search
- Anchor: The stored reference that ties a comment to a specific location in a file — either line numbers in diff-mode or a text quote selector in rendered-mode. Re-anchoring is the process of updating this reference when the file changes.
- Anchor state: One of four values (
Anchored,Shifted,Outdated,Orphaned) indicating how confidently the pipeline resolved a comment's position after file changes. - branch_head: The git commit SHA recorded when a comment was created. The "old" side of the diff when re-anchoring: identifies which version of the file the comment originally referenced.
- context_snippet: For diff-mode comments, the verbatim text of the line(s) the comment was placed on, stored at creation time. Used in Layer 2 to verify that the translated position still contains the same content.
- diff hunk: A contiguous block of changed lines in a unified diff, preceded by a
@@header. TheLineAdjustmentMapis built by iterating over hunks and recording how each line number shifts. - Drizzle ORM: A TypeScript ORM used in this project for type-safe SQLite queries. New schema additions follow existing Drizzle conventions.
- edit distance (Levenshtein / Dice similarity): Measures of how different two strings are. Levenshtein counts single-character edits; Dice similarity counts shared character pairs. Used in Layer 3 to score fuzzy candidates; ≥ 0.8 is treated as a match.
- heading breadcrumb / heading_path: An array of Markdown heading labels from the document root down to the section containing a rendered-mode comment (e.g.
["Getting Started", "Installation", "macOS"]). Stored at creation and used in Layer 3 to narrow fuzzy search scope. - Hypothesis: A web annotation platform that attaches comments to web pages using text-quote selectors. Prior art for text-based anchor fallback, as distinct from git-based line math.
- LineAdjustmentMap: A data structure mapping old line numbers to new line numbers for a specific file between two commits. Built from diff hunks; keyed by
(file_path, old_commit, new_commit)and cached in the database. - Phabricator: A code review tool from Meta that uses line adjustment maps to translate comment positions when a revision is updated. Prior art for git-based line translation.
- pipeline bypass: Mechanism that skips Layers 2 and 3 for files where fuzzy search is meaningless — lockfiles, minified bundles, large generated files. Layer 1 is still attempted; if it fails, the comment goes directly to Orphaned.
- rendered-mode: Comment mode where the target file is displayed as rendered output (e.g. Markdown as formatted prose) rather than as a raw diff. Anchoring uses a
TextQuoteSelectorrather than line numbers. - SSE (Server-Sent Events): A one-way HTTP streaming mechanism used by Loupe to push updates from server to client over a single persistent connection.
- TextQuoteSelector: A W3C Web Annotation standard for identifying a text passage by its exact content plus short prefix and suffix context strings. Used as the anchor for rendered-mode comments.
- three-dot diff vs two-dot diff: Three-dot (
base...branch) diffs from the merge-base — used bygitDiff()for review display. Two-dot (old..new) diffs between specific commits — needed by the pipeline to computeLineAdjustmentMap. - Tree-sitter: A parser-generator framework for source code syntax trees. Mentioned as a possible future mechanism for symbol-based structural context in diff-mode comments, deferred until fuzzy matching produces false positives.
Re-anchoring uses a three-layer pipeline that combines git-based positional translation with text-based validation. Each layer narrows the problem for the next:
Comment needs re-anchoring (branch_head ≠ current HEAD)
│
├─ Layer 1: Git Diff Translation
│ Build LineAdjustmentMap from diff hunks between branch_head and current HEAD.
│ Translate comment's line range to new positions.
│
├─ Layer 2: Text Validation
│ Verify that the translated position still contains the expected text
│ (context_snippet for diff-mode, TextQuoteSelector.exact for rendered-mode).
│ If yes → Anchored. If line was deleted → Outdated. Otherwise → Layer 3.
│
└─ Layer 3: Fuzzy Fallback
Search the file for the anchor text using progressively looser matching:
normalized exact → prefix/suffix search → substring → edit distance.
If found above threshold → Shifted. Otherwise → Orphaned.
This combines Phabricator's deterministic line math with Hypothesis-style text matching. Phabricator has the former, Hypothesis has the latter — no existing system does both.
Four states, reflecting progressive degradation through the pipeline:
| State | Meaning | Determined by | Behavior |
|---|---|---|---|
| Anchored | Comment is exactly where it should be | Layer 1 translated + Layer 2 validated text match | Normal display, no indicator |
| Shifted | Comment moved, re-anchored with confidence | Layer 3 fuzzy match above threshold | Display at new position with "shifted" badge |
| Outdated | Underlying content changed substantively | Layer 1 flagged line as deleted/modified | "Outdated" marker, original context preserved |
| Orphaned | Cannot find the anchor | All layers failed, or file deleted | Surfaced (not hidden) at file or review level |
The key distinction between Shifted and Outdated: Shifted means the same content moved. Outdated means the content itself changed — the comment may no longer be relevant.
Orphaned comments are surfaced, not hidden. GitHub hides outdated comments from Files Changed, which is the #1 complaint in their community discussions — reviewers can't tell if their feedback was addressed. Loupe surfaces orphans with their original context so nothing is silently lost. The exact placement of orphaned comments in the UI (nearest line, file header, detached panel) is deferred to the inline commenting UI design.
Noisy files skip Layers 2 and 3 entirely. If Layer 1's line translation works, use it. If not, go straight to Orphaned. No fuzzy searching through lockfiles.
Default bypass patterns:
*.lock,*-lock.*(pnpm-lock.yaml, package-lock.json, yarn.lock, Gemfile.lock)*.min.js,*.min.css(minified bundles)- Generated files (
.generated.*,*.pb.go,*.g.dart) - Files over 500KB (catch-all for large generated content)
Bypass patterns are a hardcoded default list with per-workspace override for projects where unusual files are review-worthy.
Input: comment's branch_head, current HEAD, file_path, line numbers.
- Compute
git diff <branch_head>..<current_HEAD> -- <file_path>. - Parse the diff hunks into a
LineAdjustmentMap: for each hunk, build a mapping ofold_line → new_line.- Context lines: map 1:1 (line exists in both old and new).
- Deleted lines: map to the nearest surviving line with a
deletedflag. - Added lines: shift subsequent line mappings by the insertion size.
- Translate the comment's line range through the map.
- Output: new line numbers + metadata (
lines_deleted: boolean,offset: number).
For rendered-mode comments, Layer 1 translates source_line_start/end — the best-effort source mapping stored on the anchor — not the TextQuoteSelector itself. The selector is text-based and doesn't need line translation.
Input: translated position from Layer 1, plus the comment's text anchor (context_snippet for diff-mode, TextQuoteSelector.exact for rendered-mode).
- Extract text at the translated position in the current file.
- Compare against the stored anchor text.
- Exact match → state = Anchored. Done.
- No match, but Layer 1 didn't flag deletion → proceed to Layer 3.
- Layer 1 flagged the line as deleted/modified → state = Outdated. Store translated position as best guess. Done.
For rendered-mode: search the rendered document for TextQuoteSelector.exact in a window around the translated source_line_start. Validate with prefix/suffix if found.
Input: anchor text and full current file content. Only reached when Layer 2 validation failed and the line wasn't flagged as deleted.
Fallback chain (stop at first match above confidence threshold):
- Normalized exact match: strip whitespace differences, normalize quotes/encoding. Handles reformatting without semantic change.
- Prefix/suffix anchored search: search for
prefix + ??? + suffixpattern. If the exact text changed but surrounding context survived, this finds the region. - Substring match: if the exact text was trimmed or extended, search for the longest common substring above a minimum length.
- Edit distance: Levenshtein or Dice similarity against sliding windows of similar length. Threshold: ≥ 0.8 similarity.
- Structural hint (when available): if heading-based structural context was stored, search within that scope first to reduce false matches in repeated content.
Each match gets a confidence score. Best match above threshold → Shifted. No match above threshold → Orphaned.
Confidence thresholds may differ between diff-mode (code) and rendered-mode (prose). Code is more repetitive and structurally similar, requiring higher thresholds to avoid false matches.
Keyed by (file_path, old_commit, new_commit). Deterministic — the same diff between two commits always produces the same map.
- Storage:
line_adjustment_mapsDB table. - Sharing: A review with 50 comments across 10 files needs only 10 maps, not 50.
- Invalidation: Never — commit SHAs are immutable. Cache grows monotonically.
- Cleanup (v1): Inline during normal operations. Cascade-delete when review is deleted. Prune maps not referenced by any active review when a review completes.
- Cleanup (future): Background sweep for maps older than 30 days not referenced by any active review, if cache size becomes a concern.
Keyed by (comment_id, current_HEAD). The final pipeline result — what the UI consumes.
- Storage:
anchor_state,resolved_anchor, andanchor_headcolumns on thecommentstable. - Invalidation: When
current_HEADchanges (new commit pushed). Old resolved positions become stale and the pipeline re-runs on next trigger. - Cleanup: Self-cleaning — positions are overwritten on re-run. Cascade-delete handles review deletion.
If a comment's branch_head equals current_HEAD, skip the entire pipeline — the comment is at its original position by definition. This is the common case when viewing a review without new commits.
The file watcher design in idea.md (lines 47-51) already specifies watching .git/refs/heads/<branch> to detect local changes and pushing updates via SSE. Re-anchoring hooks into this same event.
When the watcher detects a ref change:
- Server re-runs
git diffand pushes the updated diff view via SSE (existing design). - Server triggers re-anchoring for all comments on the affected review (new).
- Re-anchoring runs asynchronously — the diff refresh is not blocked. Comments briefly show a "resolving..." state, then snap to their new positions when the pipeline completes.
- Resolved positions are pushed to the client via the same SSE connection.
This async model is important for perceived performance: the diff (which the reviewer is waiting for) appears immediately, and comments resolve shortly after.
For rendered-mode comments on Markdown files, store the heading breadcrumb at comment creation time:
{
"selector": { "type": "TextQuoteSelector", ... },
"source_line_start": 12,
"source_line_end": 14,
"branch_head": "e4f5g6h",
"heading_path": ["Getting Started", "Installation", "macOS"]
}This is a new optional field on RenderedAnchor. During Layer 3 fuzzy matching, search within the section under the same heading path first, falling back to full-file search if the heading structure changed.
Implementation is lightweight: parse Markdown headings (regex or simple AST), walk up to build the breadcrumb. No language-aware parsing needed.
For diff-mode comments on code files, storing the enclosing function/class name would disambiguate fuzzy matches in repetitive code. This requires language-aware parsing (Tree-sitter, LSP symbol extraction, or regex heuristics) and is significantly more complex to implement.
Do not implement symbol-based context initially. Add it only if fuzzy matching produces false positives in practice. The trigger: multiple fuzzy matches with similar confidence scores in the same file, where structural context would have disambiguated.
gitDiff(base, branch) in src/lib/server/git.ts produces raw unified diff strings using three-dot diff (git diff base...branch). The re-anchoring pipeline needs two-dot diff (git diff old_commit..new_commit) for computing line translations between specific commits. This is a new function alongside the existing one, not a replacement.
parseDiff() in src/lib/server/diff.ts returns TokenizedDiffFile[] with TokenizedBlock and TokenizedLine structures (src/lib/types/diff.ts). The LineAdjustmentMap builds from raw diff hunks (before tokenization), so it operates at the same level as diff2html's parser — it can potentially reuse diff2html's hunk parsing.
The shiki highlighter in src/lib/server/highlighter.ts uses a module-level singleton pattern. The line adjustment map cache uses a different pattern (DB-backed) because maps need to persist across server restarts and be shared across requests.
The DB layer (src/lib/server/db/) uses Drizzle ORM with better-sqlite3. New columns on comments and the new line_adjustment_maps table follow the existing schema conventions: text PKs with UUID defaults, ISO 8601 text timestamps, JSON columns via text({ mode: "json" }).
anchor_state: text("anchor_state", {
enum: ["anchored", "shifted", "outdated", "orphaned"]
}).notNull().default("anchored"),
resolved_anchor: text("resolved_anchor", { mode: "json" })
.$type<ResolvedAnchor | null>()
.default(null),
anchor_head: text("anchor_head"),
// HEAD sha the anchor was last resolved against
// null = resolved against original branch_headResolvedAnchor holds the translated position — new line numbers for diff-mode, new text offset for rendered-mode. Shape TBD during implementation.
line_adjustment_maps (
id text primary key,
file_path text not null,
old_commit text not null,
new_commit text not null,
map_data text not null, -- JSON: serialized LineAdjustmentMap
created_at text not null,
unique(file_path, old_commit, new_commit)
)heading_path?: string[]
// e.g. ["Getting Started", "Installation", "macOS"]No migration needed — this is a new optional field in the JSON anchor blob.
| System | Re-anchoring | States | Text Matching | Git Translation | Structural Context |
|---|---|---|---|---|---|
| GitHub | None | Active / Outdated | No | No | No |
| Phabricator | Line adjustment maps from hunks | Active / Ghost | No | Yes | No |
| Hypothesis | Multi-selector fallback | Anchored / Orphaned | Yes (exact + fuzzy) | No (web, no VCS) | No |
| Apache Annotator | Dice coefficient sliding window | Anchored / Failed | Yes (Dice similarity) | No (web, no VCS) | No |
| Loupe | Layered pipeline | Anchored / Shifted / Outdated / Orphaned | Yes (full fallback chain) | Yes (LineAdjustmentMap) | Future (heading path) |
Loupe is the only system combining git-based positional translation with text-based validation. The four-state model with progressive confidence is richer than any existing system. Async resolution via the file watcher means comments re-anchor live as the branch evolves.
Goal: Schema changes and TypeScript type definitions for the re-anchoring framework.
Components:
- Migration adding
anchor_state,resolved_anchor,anchor_headcolumns tocommentstable - Migration adding
line_adjustment_mapstable - TypeScript types in
src/lib/server/db/schema.ts:ResolvedAnchor, anchor state enum,LineAdjustmentMaptable definition - Updated
RenderedAnchortype with optionalheading_pathfield LineAdjustmentMapTypeScript type insrc/lib/server/reanchor/(the serializable map structure)
Dependencies: Persistence layer (PR #24) merged.
Done when: Migration applies cleanly, types compile, existing tests pass.
Goal: Parse git diff hunks into a line adjustment map and translate line ranges.
Components:
src/lib/server/reanchor/line-map.ts—LineAdjustmentMapclass:newFromHunks()to build from parsed diff,mapLine()to translate a line number,mapRange()for multi-line commentssrc/lib/server/reanchor/git-diff.ts— wrapper aroundgit diff <old>..<new> -- <file>(two-dot diff between specific commits)- Unit tests covering: context line translation, deleted line snapping, added line offset, multi-hunk files, empty diffs
Dependencies: Phase 1 (types).
Done when: LineAdjustmentMap correctly translates line numbers through diff hunks, including deleted-line snapping and offset computation. All unit tests pass.
Goal: Text-based anchor verification and fuzzy fallback search.
Components:
src/lib/server/reanchor/text-match.ts— text matching functions:exactMatch(),normalizedMatch(),prefixSuffixSearch(),substringMatch(),editDistanceMatch(),confidenceScore()src/lib/server/reanchor/validate.ts— Layer 2 validation: extract text at position, compare against stored anchor, determine if match/mismatch/deleted- Unit tests covering: exact match, whitespace normalization, prefix/suffix finding, edit distance thresholds, confidence scoring, diff-mode vs rendered-mode paths
Dependencies: Phase 1 (types).
Done when: Text validation correctly identifies matches and mismatches. Fuzzy fallback chain finds shifted text with appropriate confidence scores. All unit tests pass.
Goal: Connect all three layers into the unified re-anchoring pipeline with state determination and bypass logic.
Components:
src/lib/server/reanchor/pipeline.ts— mainreanchorComment()function: runs Layer 1 → 2 → 3, determines final anchor state, returns resolved positionsrc/lib/server/reanchor/bypass.ts— pipeline bypass pattern matching: default patterns, per-workspace overridessrc/lib/server/reanchor/index.ts— public API:reanchorComments(reviewId, currentHead)batch function- Integration tests covering: full pipeline for each state (Anchored, Shifted, Outdated, Orphaned), bypass pattern skipping, batch processing multiple comments
Dependencies: Phase 2 (Layer 1), Phase 3 (Layers 2 & 3).
Done when: Pipeline correctly determines anchor state for all four cases. Bypass patterns prevent fuzzy matching on noisy files. Batch function processes all comments for a review. All tests pass.
Goal: Persist line adjustment maps and resolved anchor positions to avoid redundant computation.
Components:
- DB operations in
src/lib/server/reanchor/cache.ts— CRUD forline_adjustment_mapstable, read/write foranchor_state/resolved_anchor/anchor_headcolumns oncomments - Cache integration into pipeline: check cache before computing, write after computing
- Cache cleanup on review completion/deletion (inline, via existing cascade deletes for comments; explicit cleanup for line_adjustment_maps)
- Unit tests covering: cache hit/miss, stale cache detection (anchor_head ≠ current HEAD), cleanup on review delete
Dependencies: Phase 4 (pipeline).
Done when: Pipeline uses cached results when available. Cache is populated on first run. Stale entries are detected and recomputed. Cleanup removes orphaned map entries. All tests pass.
Goal: Trigger async re-anchoring when the branch HEAD changes, integrated with the existing file watcher and SSE design.
Components:
- Hook into the ref-change event from the file watcher (
.git/refs/heads/<branch>— design specified inidea.mdlines 47-51) - Async re-anchoring: dispatch pipeline run without blocking diff refresh
- SSE push of resolved anchor positions to client after pipeline completes
- Client-side handling: "resolving..." transient state on comments, snap to resolved positions on SSE update
Dependencies: Phase 5 (caching), file watcher infrastructure (separate implementation — this phase integrates with it).
Done when: Branch HEAD change triggers re-anchoring. Diff refresh is not blocked. Resolved positions arrive via SSE. Comments transition through resolving → final state.
Goal: Store and use Markdown heading breadcrumbs to improve fuzzy matching accuracy for rendered-mode comments.
Components:
src/lib/server/reanchor/heading.ts— extract heading path from Markdown source at a given line number- Integration into comment creation: populate
heading_pathonRenderedAnchorwhen a rendered-mode comment is created - Integration into Layer 3: search within the matching heading section first, fall back to full-file search
- Unit tests covering: heading extraction from Markdown, heading-scoped fuzzy search, fallback when heading structure changes
Dependencies: Phase 4 (pipeline), inline commenting UI (separate — this phase adds structural context to existing rendered-mode anchors).
Done when: Rendered-mode comments store heading breadcrumbs. Fuzzy matching uses heading scope to reduce false matches. Falls back gracefully when headings change. All tests pass.
Confidence threshold tuning: The edit distance threshold (≥ 0.8) and the boundary between Shifted and Orphaned are initial values. Real-world usage will reveal whether these need adjustment. Consider logging match confidence scores during early usage to inform tuning.
File renames: Git tracks renames. When Layer 1 detects a rename (git diff reports rename from → rename to), the pipeline should follow the rename and search the new filename. This is a detail within Layer 1, not a separate mechanism.
Force-push / rebase: A force-push replaces the branch history. The branch_head stored on the comment may no longer exist in the reflog. Layer 1 should handle git diff failures gracefully — if the old commit is unreachable, skip Layer 1 and go directly to text matching (Layers 2-3 using the current file content).
Multi-commit jumps: A single git diff branch_head..current_HEAD handles any number of intermediate commits in one operation. Unlike Phabricator (which chains maps through revisions), we diff directly between the two endpoints. This is simpler and sufficient — we don't need to trace the path through intermediate states.
Background cache cleanup (future): If the line_adjustment_maps table grows large, add a periodic sweep that prunes entries older than 30 days not referenced by any active review. Not needed for v1 — inline cleanup during review completion is sufficient for a local app.