hoffmang9/upstream_chiavdf_fork_plan.md

## upstream_chiavdf_fork_plan.md

      
    Raw
  

              upstream_chiavdf_fork_plan.md
            
          
  name
  overview
  todos
  isProject
  
  
  Upstream chiavdf fork plan
  A phased plan to upstream the Ealrann/chiavdf fork's bluebox compaction optimizations into Chia-Network/chiavdf, enabling WesoForge to drop its fork dependency and adopt upstream directly.
  
  
  id
  content
  status
  
  
  pr1-streaming-prover
  PR 1: Streaming one-weso prover for bluebox compaction. Includes library-mode infra (vdf.h quiet_mode + pairindex, threading.h 512 slots, Makefile fastlib target) plus new fast_wrapper.h/cpp. Do NOT change enable_threads in parameters.h.
  pending
  
  
  id
  content
  status
  
  
  pr2-batch-proving
  PR 2 (future): Batch proving with discriminant reuse (Trick 2). Refactor StreamingOneWesolowskiCallback into StreamingWesolowskiBuckets, add BatchOneWesolowskiCallback and batch C API. Depends on PR 1.
  pending
  
  
  id
  content
  status
  
  
  wesoforge-migration
  After PR 1 merge: Switch WesoForge submodule from Ealrann/chiavdf to Chia-Network/chiavdf. No Rust code changes needed — FFI API is identical.
  pending
  
  
  false
  
  
Upstream chiavdf Fork Into Chia-Network/chiavdf

Current State of the Fork

The fork (Ealrann/chiavdf, branch bbr) diverges from upstream at commit 7d1f1d6 (Update license #295) with 7 commits on the main bbr branch and 2 additional commits on an unreleased "trick 2" side branch.
What the fork changes

The diff is remarkably clean — only 7 files changed total, with ~1,030 lines added:
Modifications to existing upstream files (very small, ~35 lines of real change):


File
Change


[src/vdf.h](chiavdf/src/vdf.h)
Added quiet_mode flag, vdf_fast_pairindex() function, changed pairindex=0 to =vdf_fast_pairindex(), wrapped a cout in quiet_mode check, removed one print


[src/threading.h](chiavdf/src/threading.h)
Increased master_counter[100] and slave_counter[100] to [512]


[src/parameters.h](chiavdf/src/parameters.h)
Changed enable_threads=true to enable_threads=false


[src/Makefile.vdf-client](chiavdf/src/Makefile.vdf-client)
Added fastlib target, PIC/PIE flags


Purely additive new files (~1,000 lines):


File
Purpose


[src/c_bindings/fast_wrapper.h](chiavdf/src/c_bindings/fast_wrapper.h)
C FFI header (146 lines)


[src/c_bindings/fast_wrapper.cpp](chiavdf/src/c_bindings/fast_wrapper.cpp)
Streaming prover implementation (796 lines)


The Three Optimizations

Trick 1: Streaming One-Wesolowski (known y_ref)

For bluebox compaction, y_ref is already known from the block. This means B = GetB(D, x, y_ref) can be computed before squaring starts. Instead of storing O(ceil(T/kl)) checkpoint forms and scanning them post-squaring, the prover updates proof buckets inline during the squaring loop via StreamingOneWesolowskiCallback. Result: ~3x memory reduction.
GetBlock Optimization: Incremental Mapping

The naive GetBlock(p, k, T, B) does a full modular exponentiation per call. The fork observes that r_{p+1} = r_p * inv(2^k) mod B, so it maintains rolling state and computes each successive GetBlock with just a multiply+mod+div instead of an exponentiation. No lookup table needed.
(k, l) Tuner: Memory-Budgeted Parameter Selection

Upstream's ApproximateParameters() is a fixed heuristic. The fork adds a grid search over (k, l) space constrained by a configurable per-worker memory budget, picking the minimum-cost parameters. This matters when running many workers in parallel with limited RAM per worker.
Trick 2 (unreleased, on side branch): Discriminant Reuse / Batch Proving

Jobs sharing the same (challenge, size_bits, x0) have an identical squaring trajectory f(t). The "trick 2" branch refactors StreamingOneWesolowskiCallback into a reusable StreamingWesolowskiBuckets class and adds a BatchOneWesolowskiCallback that runs repeated_square once for T_max while updating per-job bucket state at each job's checkpoint times. This reduces squaring work from sum(T_j) to max(T_j) across grouped jobs. This branch also includes a thorough design doc at BBR_BLUEBOX_COMPACTION_OVERVIEW.md.
Why Upstreaming is Feasible

The fork is well-structured for upstreaming because:

Changes to existing files are minimal (~35 lines across 4 files)
The bulk of the work is purely additive (new files that don't conflict with anything)
No behavioral changes for existing callers — the streaming prover is an entirely new code path behind a new API
The existing c_bindings/c_wrapper.h API is untouched
The fork is only 7 commits ahead of a recent upstream commit

Recommended Upstreaming Strategy: 2 PRs

PR 1: "Streaming one-weso prover for bluebox compaction"

A single PR containing both the library-mode infrastructure changes and the new streaming prover. This is a natural unit because the infrastructure changes (~35 lines across 3 existing files) exist solely to support the streaming prover, and reviewers benefit from seeing the "why" alongside the "what".
Modifications to existing files:

**src/vdf.h**: Add quiet_mode global (default false, no behavior change). Add vdf_fast_pairindex(). Replace hardcoded pairindex=0 with vdf_fast_pairindex(). Gate the "VDF loop finished" cout on !quiet_mode.
**src/threading.h**: Increase counter arrays from [100] to [512]. (Upstream only uses slot 0, so this is a zero-risk expansion.)
**src/Makefile.vdf-client**: Add PIC/PIE build flags and the fastlib static library target.

New files (purely additive):

**src/c_bindings/fast_wrapper.h**: New C FFI header declaring the streaming prover API, memory budget setter, stats/parameter introspection.
**src/c_bindings/fast_wrapper.cpp**: Complete implementation including:

StreamingOneWesolowskiCallback (Trick 1: streaming bucket accumulation)
Incremental get_block_opt() (GetBlock optimization)
tune_streaming_parameters() ((k,l) tuner)
All chiavdf_prove_one_weso_fast* entry points
Memory budget, stats, and parameter introspection APIs


This PR should include the fork's README content as documentation (perhaps as docs/bluebox_compaction.md rather than replacing the main README).
Critical note on enable_threads: The fork flips enable_threads from true to false globally in parameters.h. This would break timelord operation if applied upstream. Instead:

Do NOT change the parameters.h default.
The streaming prover in fast_wrapper.cpp already sets fast_algorithm=false and two_weso=false, which bypasses the threaded proof paths. The enable_threads flip in the fork is belt-and-suspenders safety, not functionally required.
If a runtime flag is truly needed, propose a chiavdf_set_enable_threads(bool) setter, but this can be deferred.

PR 2 (future): "Batch proving with discriminant reuse" (Trick 2)

Based on the unreleased side branch (b6cc20a). Refactors StreamingOneWesolowskiCallback into StreamingWesolowskiBuckets (reusable per-job bucket state) and adds:

BatchOneWesolowskiCallback with event queue
ChiavdfBatchJob struct and chiavdf_prove_one_weso_fast_streaming_getblock_opt_batch() C API
Finalization offloading to background threads
BBR_BLUEBOX_COMPACTION_OVERVIEW.md design document

This PR depends on PR 1 being merged first.
WesoForge Migration Path

After PR 1 is merged upstream:

Switch submodule in [.gitmodules](.gitmodules) from https://github.com/Ealrann/chiavdf.git (branch bbr) to https://github.com/Chia-Network/chiavdf.git (branch main)
No changes needed to [crates/chiavdf-fast/](crates/chiavdf-fast/) — the Rust FFI layer calls the same C API (fast_wrapper.h)
No changes needed to [crates/client-engine/](crates/client-engine/) or [crates/client/](crates/client/) — they call the Rust API which wraps the same C API
Build system ([crates/chiavdf-fast/build.rs](crates/chiavdf-fast/build.rs)) already uses make -f Makefile.vdf-client fastlib, which will work once the fastlib target is upstream
Remove the patches/ directory if the GMP 6.3 patch gets merged upstream or becomes unnecessary
Test that all platforms (Linux x86, macOS Intel, macOS ARM, Windows) still build and produce valid proofs

After PR 2:

WesoForge can adopt the batch API to replace its current per-job-per-worker model with grouped discriminant reuse, further improving throughput

Risk Assessment


PR 1 risk: LOW — The infrastructure changes are minimal and backward-compatible. Counter array expansion is safe (only slot 0 is used upstream). quiet_mode defaults to false. vdf_fast_pairindex() is backward-compatible (slot 0 is still used first). The new fast_wrapper files are entirely additive and don't change any existing behavior.
PR 2 risk: MEDIUM — Refactors the streaming callback, but only within fast_wrapper.cpp (no upstream code changes). The batch API is more complex and may need iteration.
**enable_threads risk: NONE if we don't touch it** — The fork's global flip is unnecessary for correctness and should not be upstreamed.

Diagram: Data Flow with Streaming Prover


      flowchart TD
    subgraph inputs [Job Inputs]
        challenge[challenge bytes]
        y_ref[y_ref from block]
        T[iterations T]
    end

    subgraph setup [Setup Phase]
        D["D = CreateDiscriminant(challenge)"]
        B["B = GetB(D, x, y_ref)"]
        kl["(k,l) = tune or ApproximateParameters"]
        buckets["Allocate l * 2^k buckets"]
    end

    subgraph squaring [Squaring Loop]
        loop["repeated_square(T, x, D, ...)"]
        checkpoint{"iteration % kl == 0?"}
        getblock["b = get_block_opt(p)"]
        update["bucket[j][b] *= checkpoint"]
    end

    subgraph finalize [Finalization]
        fold["Fold buckets into proof form"]
        serialize["Serialize y || proof"]
    end

    challenge --> D
    y_ref --> B
    D --> B
    T --> kl
    kl --> buckets
    B --> getblock

    buckets --> loop
    loop --> checkpoint
    checkpoint -->|yes| getblock
    getblock --> update
    update --> loop
    checkpoint -->|no| loop

    loop -->|"t == T"| fold
    fold --> serialize

    
      Loading
File	Change
`[src/vdf.h](chiavdf/src/vdf.h)`	Added `quiet_mode` flag, `vdf_fast_pairindex()` function, changed `pairindex=0` to `=vdf_fast_pairindex()`, wrapped a `cout` in `quiet_mode` check, removed one print
`[src/threading.h](chiavdf/src/threading.h)`	Increased `master_counter[100]` and `slave_counter[100]` to `[512]`
`[src/parameters.h](chiavdf/src/parameters.h)`	Changed `enable_threads=true` to `enable_threads=false`
`[src/Makefile.vdf-client](chiavdf/src/Makefile.vdf-client)`	Added `fastlib` target, PIC/PIE flags
File	Purpose
`[src/c_bindings/fast_wrapper.h](chiavdf/src/c_bindings/fast_wrapper.h)`	C FFI header (146 lines)
`[src/c_bindings/fast_wrapper.cpp](chiavdf/src/c_bindings/fast_wrapper.cpp)`	Streaming prover implementation (796 lines)