Skip to content

Instantly share code, notes, and snippets.

@tbg
Created March 4, 2026 10:15
Show Gist options
  • Select an option

  • Save tbg/052dae1e3f6449811fca92e275905f1f to your computer and use it in GitHub Desktop.

Select an option

Save tbg/052dae1e3f6449811fca92e275905f1f to your computer and use it in GitHub Desktop.
review-crdb skill example: PR #79134 (SKIP LOCKED implementation)

Review: PR #79134 — kv: support FOR {UPDATE,SHARE} SKIP LOCKED

Summary

This PR implements the KV portion of SKIP LOCKED support for SELECT ... FOR UPDATE SKIP LOCKED and SELECT ... FOR SHARE SKIP LOCKED. The change spans the MVCC scanner, KV concurrency control, optimistic evaluation, timestamp cache, refresh spans, and the lock table. The SQL optimizer still rejects SKIP LOCKED (the SQL portion was extracted into a separate PR, #83627), so this is plumbing-only from the KV side.

Overall assessment: the change is well-structured across five focused commits, the isolation model is clearly articulated, and the test coverage (data-driven MVCC tests, lock table tests, concurrency manager tests, integration tests) is thorough. The commit arc is clean and tells a reviewable story. There are a few correctness and design concerns worth discussing, and some comments that could be improved.

Findings

Correctness

IsKeyLockedByConflictingTxn races on the lock table snapshot -- pebble_mvcc_scanner.go, lock_table.go

  • Severity: blocking

The IsKeyLockedByConflictingTxn implementation in lockTableGuardImpl takes l.mu.Lock() on the lockState found in the tableSnapshot btree. This is correct for reading the lock holder, but there is a subtle concern: the tableSnapshot is captured at sequencing time, while evaluation happens later. Between those two points, a lock could be released and the lockState object could be reused or its fields updated. The l.mu.Lock() protects the read, but the question is whether the lockState itself could have been garbage collected or reused for a different key.

In practice, CockroachDB's lock table retains lockState objects in the btree even after release (they are only removed lazily), and the btree snapshot is taken under lockTableImpl.mu. So this should be safe. However, the PR does not document this invariant -- a comment in IsKeyLockedByConflictingTxn explaining why it is safe to access lockState from the snapshot after sequencing would be valuable, since a future refactor could break this assumption.


Non-locking reads with skipLocked and failOnMoreRecent: WriteTooOldError on unlocked keys is correct but surprising -- pebble_mvcc_scanner.go (cases 2a, 4a)

  • Severity: suggestion

When skipLocked and failOnMoreRecent are both set, the scanner skips keys that are locked by conflicting transactions but still raises WriteTooOldError for unlocked keys with more recent versions. This is logically consistent: the weaker isolation applies only to lock conflicts, not to MVCC conflicts on unlocked keys.

However, this means a SELECT ... FOR UPDATE SKIP LOCKED can still fail with WriteTooOldError and force a retry. The MVCC test coverage confirms this behavior. A comment somewhere (perhaps in the MVCCScan or MVCCGet documentation) should call this out explicitly, as it may surprise users who expect SKIP LOCKED to "never block and never error on conflicts."


collectSpansRead casts req.(roachpb.LockingReadRequest) without guarding against non-locking reads -- replica_read.go

  • Severity: suggestion

In collectSpansRead, when ba.WaitPolicy == lock.WaitPolicy_SkipLocked && roachpb.CanSkipLocked(req), the code does:

getAlloc.get.KeyLocking = req.(roachpb.LockingReadRequest).KeyLockingStrength()

If a CanSkipLocked request were ever added that does not implement LockingReadRequest, this would panic. Currently CanSkipLocked is only true for GetRequest, ScanRequest, and ReverseScanRequest, all of which implement LockingReadRequest, so this is safe today. But the assertion is implicit. A comment or a defensive check would prevent a future surprise.


Intent resolution cost for SKIP LOCKED -- intent_resolver.go

  • Severity: nit

The TODO comment added at CleanupIntentsAsync raises a valid concern: if SKIP LOCKED scans frequently encounter intents from live transactions, the async intent resolution will attempt to resolve them, only to find they are still active. The comment says "is this needed if the intents could not have expired yet (i.e. they are not at least 5s old)?" This is a good question for a follow-up, but does not block this PR. The comment is appropriately marked as a TODO.

Commit Structure

  • Severity: nit

The commit arc is well done:

  1. Rename SKIP -> SKIP_LOCKED (mechanical, clear)
  2. Refactor getAndAdvance case ordering (mechanical, enables next commit)
  3. Teach MVCC scanner about skip-locked (core behavior change)
  4. Teach concurrency control about skip-locked (KV integration)
  5. Teach optimistic evaluation about skip-locked (performance optimization)

This is a clean progression. The refactor commit (#2) is properly separated from the behavioral change (#3), which makes both easier to review. The behavioral changes in #3 are well-tested with the extensive data-driven test file.

One minor observation: commit 4 is quite large because it touches concurrency control, timestamp cache, refresh spans, and span collection all at once. These are tightly coupled for correctness (they all relate to the weaker isolation semantics), so splitting them further might not improve reviewability. The PR description's implementation section explains these touchpoints well.

Comments

LockTableView interface doc: grammar -- pkg/storage/mvcc.go

  • Severity: nit
// LockTableView is a transaction-bound view into an in-memory collections of

"collections" -> "collection"


MVCCGet doc: grammar -- pkg/storage/mvcc.go

  • Severity: nit
// In this mode, the LockTableView provided in the options is consulted any
// observed key to determine whether it is locked with an unreplicated lock.

Missing "for" -- should be "is consulted for any observed key".


collectSpans comment: typo -- replica_send.go

  • Severity: nit
// able to constraint their read spans

"constraint" -> "constrain"

Error Handling

The change to RefreshSpanIterate to return an error is a good improvement. The previous signature silently discarded potential errors from ResponseKeyIterate. The callers are updated appropriately.

Go Conventions

Duplicate scanLockStrength function -- lock_table_test.go and datadriven_util_test.go

  • Severity: suggestion

The scanLockStrength function is defined identically in both lock_table_test.go and datadriven_util_test.go (same package). Since datadriven_util_test.go appears to be the shared utility file for the concurrency tests, the one in lock_table_test.go is redundant. Only one definition should be needed.

PR Description

The PR description is thorough and well-written. The implementation section explains the isolation model clearly, the interaction with latching, the timestamp cache, and refresh spans. The example SQL session at the top orients the reader immediately. The honest "I haven't done serious benchmarking" note is appreciated.

The note about extracting the SQL changes to a separate PR (#83627) is helpful for understanding the scope.

Aspects Skipped

  • Redactability: no new types with String() or Error() methods are added; the LockTableView interface has no string representation.
  • Compatibility: the new WaitPolicy_SkipLocked proto enum value is additive (value 2) and the SQL layer still rejects it, so there is no mixed-version compatibility concern for this PR specifically. A cluster version gate will be needed when the SQL layer is wired up.

Missing Sources

None -- all referenced rules and skills were available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment