Skip to content

Instantly share code, notes, and snippets.

@tbg
Created March 5, 2026 09:54
Show Gist options
  • Select an option

  • Save tbg/c2db34097525e5570ad1dca4771b44cb to your computer and use it in GitHub Desktop.

Select an option

Save tbg/c2db34097525e5570ad1dca4771b44cb to your computer and use it in GitHub Desktop.
Review of cockroachdb/cockroach PR #164900: mmaintegration: introduce physical capacity model

Review: PR #164900 — mmaintegration: introduce physical capacity model

Author: wenyihu6 | Branch: oldmodel2 | Epic: CRDB-55052

Blocking Issues (must fix)

  1. [correctness] highDiskSpaceUtilization comment is now stale (capacity_model.go:703-724): The comment explains that fractionUsed = load/capacity = LogicalBytes / (LogicalBytes / diskUtil) = diskUtil. Under the new model, load=Used, capacity=Used+Available — the math still recovers actual disk utilization, but the comment references the old LogicalBytes-based derivation and is now misleading.

  2. [correctness] minCapacity floor is dramatically lower than the old floor (physical_model.go): The old model had cpuCapacityFloorPerStore = 0.1 * 1e9 (0.1 cores). The new minCapacity = 1.0 means 1 ns/s — effectively zero CPU capacity. The old floor existed to prevent utilization from going to infinity on overloaded nodes (its comment explains this in detail). If a store has non-zero load and capacity=1 ns/s, utilization becomes astronomical. The PR description doesn't mention this behavioral change. Either the new floor should be comparable to the old one, or the PR should explain why the protection is no longer needed.

  3. [conventions] Amp.SafeFormat doesn't mark the float as safe for redaction (load.go): w.Printf("%.2f", af) passes af without redact.SafeFloat. The established pattern in this file (e.g., meanStoreLoad.SafeFormat) uses redact.SafeFloat() for floats. Fix with either redact.SafeFloat(float64(af)) or by implementing SafeValue() (amp factors contain no PII).

  4. [conventions] Numbered list in computePhysicalCPU design comment skips item 2 (physical_model.go:~line 80): The "Trade-off" section enumerates items 1, 3, 4, 5 — item 2 is missing.

Suggestions (should fix)

  1. [types] Amp type has no doc comment (load.go): Every other named type in this file (LoadDimension, LoadValue, LoadVector, etc.) has a doc comment. Amp should explain what it represents, its valid range, and that it converts logical to physical units. Same for AmpVector.

  2. [tests] MakePhysicalRangeLoad has no direct test (physical_model.go:270-282): The existing mma_conversion_test.go only passes IdentityAmpVector() (amp=1.0), which is a no-op. A bug in dimension indexing (e.g., applying the wrong dimension's amp factor) would go undetected. Add a test with non-identity amp factors verifying each output field.

  3. [tests] maxDiskSpaceAmplification cap is never exercised in tests: The testdata covers amp=2.0 and amp=0.5 but never hits the 5.0 cap. Add a case like logical-gib=10, used-gib=100 to verify capping.

  4. [conventions] amplificationFactors() shadows builtin cap (mma_replica_store.go:296): cap := s.storeGossip.CachedCapacity() shadows the Go builtin. Use sc or storeCap instead.

  5. [commits] Commit 1 message claims "wires it into all callers": The opening sentence says "wires it into all callers, replacing the logical (capped multiplier) model" but commit 1 only introduces the model. The wiring happens in commit 3. The message describes the full PR, not this specific commit.

  6. [correctness] Old computeCPUCapacityWithCap and computeStoreByteSizeCapacity are now dead code (capacity_model.go): After commit 3, these are only referenced from tests. Either remove them or add a comment noting they're retained as test baselines.

  7. [types] Compile-time assertion is one-directional and undocumented (load.go): var _ [3 - NumLoadDimensions]struct{} catches additions but not removals. Add a bidirectional check and a comment explaining it guards AmpVector.SafeFormat:

    // NumLoadDimensions must be 3. Update SafeFormat methods if changed.
    var _ [NumLoadDimensions - 3]struct{}
    var _ [3 - NumLoadDimensions]struct{}
  8. [conventions] Redundant float64() casts in MakePhysicalRangeLoad (physical_model.go:~line 700): requestCPUNanos, raftCPUNanos, writeBytesPerSec, and cpuNanos are already float64. The float64(cpuNanos) etc. wrappers are no-ops.

Nits (take it or leave it)

  1. [types] Amp is terse for this codebase: Other types use full words (LoadDimension, LoadValue). AmpFactor would be clearer.

  2. [commits] Commits 1 and 2 lack Release note: trailers: Only commit 3 has one.

  3. [commits] Commit 2 (minCapacity floor) could be folded into commit 1: It's ~10 lines and is a design decision of the physical model, not separable follow-up.

  4. [tests] fmtBytes wrapper is redundant (physical_model_test.go): fmtBytes just calls fmtIBytes — could use fmtIBytes directly.

  5. [correctness] amplificationFactors() constructs a partial StoreDescriptor: Only Capacity and NodeCapacity are populated. If computePhysicalStore ever reads other fields, this would silently break. The existing design note in the code acknowledges the dual-path concern, which is good.

Strengths

  • Excellent design documentation: The computePhysicalCPU comment with worked examples ("background in load, not capacity", "per-store load via CPUPerSecond", "capacity split evenly") is some of the best design documentation I've seen in this codebase. It makes the tradeoffs explicit and auditable.
  • Clean separation of concerns: The physical model cleanly separates store-level computation from range-level conversion via MakePhysicalRangeLoad.
  • Good use of data-driven testing: 12 well-commented scenarios with inline derivations make the test easily auditable.
  • Eliminates code duplication: The two separate mmaRangeLoad implementations are consolidated into MakePhysicalRangeLoad.
  • Correct math: The model is internally consistent — sum of per-store loads recovers node usage (proved in the comment), and disk load/capacity recovers actual utilization.

Aspects skipped

  • Simplification: Deferred — the code is newly introduced and already clean.

(made with /review-crdb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment