Skip to content

Instantly share code, notes, and snippets.

@roman01la
Created March 9, 2026 08:49
Show Gist options
  • Select an option

  • Save roman01la/b437dfb1c6105710b9a83b4074d7d8ff to your computer and use it in GitHub Desktop.

Select an option

Save roman01la/b437dfb1c6105710b9a83b4074d7d8ff to your computer and use it in GitHub Desktop.

VS Code Copilot Chat — Token Optimization Spec

Reverse-engineered from github.copilot-chat-0.37.9 extension bundle. Source: ~/.vscode/extensions/github.copilot-chat-0.37.9/dist/extension.js


Table of Contents

  1. Architecture Overview
  2. Prompt Tree & JSX-Like Element System
  3. Flexbox-Style Token Budget Allocation
  4. Priority-Based Pruning
  5. Tree-Sitter Document Summarization
  6. Conversation History Summarization
  7. Tool Result Truncation
  8. Prompt Caching (Cache Breakpoints)
  9. Workspace Search (TF-IDF & Embeddings)
  10. Token Counting & Estimation
  11. Inline Completion Context Budgets

1. Architecture Overview

Copilot Chat models the entire LLM prompt as a tree of typed nodes (JSX-like elements). Each node declares layout properties (flexGrow, flexBasis, flexReserve, priority) that control two distinct phases:

  1. Allocation phase — distributes the model's token budget across prompt sections using a CSS-flexbox-inspired algorithm.
  2. Pruning phase — if the rendered prompt exceeds the budget, iteratively removes the lowest-priority leaf nodes until it fits.

Key Components

Component Role
Prompt Tree JSX-like tree of PromptElement nodes
Flex Allocator Distributes token budget proportionally across sibling groups
Priority Pruner Removes lowest-priority nodes when over budget
Token Budget Tracker (N8) Per-element budget accounting
Document Summarizer (QC) Tree-sitter-based code summarization
History Summarizer (KAe) LLM-based conversation compaction
Cache Breakpoints Marks positions for API-level prefix caching
TF-IDF / Embeddings Search Budget-aware workspace context retrieval
Tiktoken Tokenizer Precise BPE token counting in a worker thread

Lifecycle

JSX Element Tree
    │
    ▼
┌─────────────────────────────┐
│  1. Flex Budget Allocation  │  Groups sorted by flexGrow (desc).
│     (_processPromptPieces)  │  Each group gets proportional budget.
│                             │  flexReserve holds tokens for later groups.
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  2. Element Rendering       │  Each element's prepare() + render()
│     (prepare → render)      │  called with its allocated N8 budget.
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  3. Materialization         │  Render tree nodes → runtime nodes:
│     (materialize)           │  DD (containers), Wx (messages),
│                             │  Sde (text chunks), D8 (images),
│                             │  Hx (cache breakpoints)
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  4. Growth Phase            │  If under budget, re-render Expandable
│     (_grow)                 │  elements with the remaining slack.
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  5. Pruning Loop            │  While tokenCount > limit:
│     (_getFinalElementTree)  │    removeLowestPriorityChild()
│                             │  Uses 1.25x heuristic to reduce recounts.
└─────────────┬───────────────┘
              │
              ▼
       Final Chat Messages

2. Prompt Tree & JSX-Like Element System

Node Types

The materialized tree consists of these node types:

Class Type Description
DD (GenericMaterializedContainer) Container Groups children; has priority, flags, metadata
Wx (MaterializedChatMessage) Message A chat message (system/user/assistant/tool) with role, content children, optional tool calls
Sde (MaterializedChatMessageTextChunk) Leaf A text segment within a message
D8 (MaterializedChatMessageImage) Leaf An image content part
rL (MaterializedChatMessageOpaque) Leaf Opaque content with pre-computed token usage
Hx (MaterializedChatMessageBreakpoint) Leaf Cache breakpoint marker (protected from pruning)

Container Flags (Bitmask)

Flag Value Meaning
LegacyPrioritization 1 Use flat leaf-walk pruning instead of hierarchical
Chunk 2 Treat entire subtree as atomic unit for pruning
passPriority 4 Flatten children into parent's priority comparison
EmptyAlternate 8 Pick between two children based on emptiness

Render Tree Node (vBe)

Before materialization, the tree exists as vBe render nodes:

vBe = class {
    parent; childIndex; id;
    _obj;        // The PromptElement instance
    _state;      // State from prepare() call
    _children;   // Child render nodes
    _metadata;
    _objFlags;   // Bitmask (LegacyPrioritization | Chunk | passPriority | EmptyAlternate)

    materialize(parent) {
        // Image → MaterializedChatMessageImage
        // BaseChatMessage → MaterializedChatMessage
        // Everything else → GenericMaterializedContainer
    }
};

Token Counting (Memoized)

All materialized nodes memoize their token counts via once(). When a child is removed during pruning, onChunksChange() propagates upward, clearing cached values:

onChunksChange() {
    this._tokenCount.clear();
    this._upperBound.clear();
    this._text.clear();
    this.parent?.onChunksChange();
}

3. Flexbox-Style Token Budget Allocation

Properties

Each prompt element can declare these flex properties (analogous to CSS flexbox):

Property Type Default Description
flexGrow number Infinity Render-order priority. Higher = rendered first, gets first pick of budget.
flexBasis number 1 Relative weight when splitting budget among siblings in the same flexGrow group.
flexReserve number | string none Tokens to hold back for lower-priority groups. String form "/N" means 1/N of remaining budget.
priority number MAX_SAFE_INTEGER Used during post-allocation pruning. Lower = pruned first.

Token Budget Tracker (N8)

N8 = class {
    tokenBudget;     // Total budget allocated
    _consumed = 0;   // Tokens used so far
    endpoint;        // Model endpoint metadata

    get remainingTokenBudget() {
        return Math.max(0, this.tokenBudget - this._consumed);
    }
    consume(amount) {
        this._consumed += amount;  // Can be negative (release reservation)
    }
};

Allocation Algorithm (_processPromptPieces)

The algorithm runs in 4 phases:

Phase 1: Group by flexGrow

Elements are bucketed by their flexGrow value into a Map<number, Element[]>.

Phase 2: Sort groups descending

Groups are sorted by flexGrow descending. Higher flexGrow groups render first.

Phase 3: Process each group

For each group (highest flexGrow first):

  1. Reserve — Temporarily consume tokens for all lower-priority groups that declared flexReserve:

    // String form: "/3" means "reserve 1/3 of remaining budget"
    let reserved = typeof flexReserve === "string"
        ? Math.floor(remaining / Number(flexReserve.slice(1)))
        : flexReserve;
    budget.consume(reserved);
  2. Cap detection — For elements with a TokenLimit: if their proportional share exceeds the cap, lock them at the cap and remove their weight from the distribution pool.

  3. Budget calculation — Uncapped elements split the remaining budget by flexBasis ratios:

    tokenBudget = capped
        ? tokenLimit
        : Math.floor((remaining - lockedTokens) * (flexBasis / totalBasis));
  4. Release reservationbudget.consume(-reserved) to give tokens back.

  5. Render — Call prepare() then render() on all elements in the group (in parallel via Promise.all).

  6. Consume actual — Deduct the real token consumption from the parent budget.

Phase 4: Post-Allocation Pruning

After all groups render, _getFinalElementTree enforces TokenLimit constraints from innermost to outermost, pruning lowest-priority children as needed.

Actual Flex Values in the Prompt

flexGrow=∞  : System messages (always rendered first)
flexGrow=7  : User query text, custom instructions
flexGrow=5  : Chat variable attachments
flexGrow=3  : File attachments (capped at budget/6 via TokenLimit)
flexGrow=2  : Current tool call rounds, current user context
flexGrow=1  : Conversation history (rendered last, gets remaining budget)

4. Priority-Based Pruning

Overview

When the materialized prompt exceeds the token budget, the system iteratively finds and removes the lowest-priority node. This continues until the prompt fits.

Main Pruning Loop (_getFinalElementTree)

async _getFinalElementTree(maxBudget) {
    let tree = this._root.materialize();
    let limits = [{ limit: maxBudget, id: root.id }, ...this._tokenLimits];

    // Process limits from innermost to outermost
    for (let i = limits.length - 1; i >= 0; i--) {
        let subtree = tree.findById(limits[i].id);
        let count = await subtree.tokenCount(tokenizer);

        // If under budget, try to grow Expandable elements
        if (count < limit) { this._grow(subtree, count, limit); continue; }

        // Prune loop
        while (count > limit) {
            do {
                for (let removed of subtree.removeLowestPriorityChild()) {
                    let savings = removed.upperBoundTokenCount(tokenizer);
                    count -= savings * 1.25;  // Heuristic: 25% margin to reduce recounts
                }
            } while (count > limit);
            count = await subtree.tokenCount(tokenizer);  // Precise recount
        }
    }
}

Node Selection Algorithm ($et)

The walk to find the lowest-priority node:

  1. LegacyPrioritization (flag 1): Flat recursive walk across all leaves; find the single leaf with the lowest priority.

  2. Standard walk: Iterate direct children of the container:

    • Skip nodes containing cache breakpoints (at root level) — breakpoints are protected.
    • Flatten containers with passPriority flag (flag 4) — their children compete directly with siblings.
    • Track the child with the lowest priority.
    • Tie-break equal priorities using tHt() — the node whose children have the lower minimum priority loses first.
  3. Recurse vs Remove:

    • If the target is a leaf, a Chunk (flag 2), or an empty container → remove it directly.
    • Otherwise → recurse into the container to find its lowest-priority leaf.

Node Removal (Tde)

function Tde(node, removedList) {
    let parent = node.parent;
    parent.children.splice(parent.children.indexOf(node), 1);
    removedList.push(node);
    cascadeKeepWith(node, removedList);  // Remove related nodes (e.g., tool call + result pairs)
    if (parent.isEmpty) Tde(parent, removedList);  // Cascade empty parents
    else parent.onChunksChange();  // Invalidate cached token counts
}

keepWith Cascading

When a node is removed, all nodes sharing the same keepWithId are also removed. This ensures paired content (tool calls and their results) are always removed together.

Growth Phase (_grow)

If the prompt is under budget after rendering, Expandable elements are re-rendered with the full available budget and swapped into the tree:

async _grow(tree, currentTokens, limit) {
    for (let growable of this._growables) {
        let budget = limit - currentTokens + growable.initialConsume;
        // Re-render with expanded budget, swap into tree
        tree.replaceNode(growable.id, rerendered);
    }
}

5. Tree-Sitter Document Summarization

Architecture

Source Code
    │
    ▼
┌─────────────────────┐     ┌──────────────────────┐
│  Tree-Sitter WASM   │────▶│  Overlay Node Tree   │
│  (parse AST)        │     │  (TR nodes: FOLD/LINE)│
└─────────────────────┘     └──────────┬───────────┘
                                       │
              ┌────────────────────────┘
              ▼                    ▼
    ┌──────────────────┐  ┌──────────────────┐
    │  QC Summarizer   │  │  Fallback: A4    │
    │  (budget-aware)  │  │  (returns full)  │
    └────────┬─────────┘  └──────────────────┘
             │
             ▼
    Summarized Document
    (elided regions → "...")

Overlay Nodes (TR)

The overlay node is a lightweight tree representing document regions:

TR = class {
    startIndex;   // Start character offset
    endIndex;     // End character offset
    kind;         // "LINE" (leaf) or "FOLD" (collapsible region)
    children;     // Child overlay nodes
};

Construction priority:

  1. Tree-sitter ASTparserService.getTreeSitterAST(doc).getStructure() — rich structural overlay
  2. Folding ranges (fallback) → indentation-based heuristics via Zcn(), with language-specific adjustments for offside-rule languages (Python, YAML) vs brace-based (JS, Java)

Tree-Sitter Setup

  • WASM-based parsing via tree-sitter.wasm + per-language grammar files (tree-sitter-{language}.wasm)
  • Singleton parser service (MHe) with an LRU cache of 5 parse trees per language
  • Parse trees are ref-counted for safe sharing across consumers

Supported languages: JavaScript, TypeScript, TSX, Python, Ruby, Rust, Go, Java, C++, C#, PHP

Summarization Algorithm (Qsn)

The core algorithm uses a greedy cost-based approach:

  1. Convert overlay nodes into a "summarizable tree" where each node can be toggled between showing full text or an ellipsis (...).

  2. Mark selection-intersecting nodes as must-survive (never elided).

  3. Compute per-node cost — determines how "expensive" it is to keep a node visible:

    cost = 100 * min_selection_distance + depth + 10 * (distance_ratio)
    

    Key factors:

    • Selection proximity: Nodes intersecting the selection cost 0.
    • Asymmetric distance: Nodes AFTER the selection get 3× distance penalty (context before the cursor is more valuable).
    • Tree depth: Deeper nodes cost slightly more.
    • Import statements: Cost 0 when tryPreserveTypeChecking is enabled.
  4. Greedy fill: Sort nodes cheapest-first, add them one by one until the character budget is exceeded.

  5. Produce edits: Replace elided regions with "..." markers.

Iterative Document Fitting

When fitting summarized documents into the prompt, the system uses a shrink loop:

let budget = promptTokenBudget * 0.85 - 300;  // Start at 85% minus overhead
let summary = summarizer.summarizeDocument(budget);

for (let i = 0; i < 5; i++) {
    if (await countTokens(summary) <= budget) break;
    budget *= 0.85;  // Shrink by 15%
    summary = summarizer.summarizeDocument(budget);
}
// Effective minimum after 5 iterations: original × 0.85^6 ≈ 38%

Definition-Aware Reduction

When code changes exceed the budget, the system progressively reduces documents by finding the enclosing definition (function/class) for each change hunk via tree-sitter:

┌─────────────────────────┐
│ class MyService {       │ ← definition header (kept)
│   ...                   │ ← elided (removed)
│   handleRequest() {     │ ← enclosing definition (kept)
│ +   const x = validate  │ ← changed line (always kept)
│ +   return process(x)   │ ← changed line (always kept)
│   }                     │ ← closing brace (kept)
│   ...                   │ ← elided (removed)
│ }                       │ ← closing brace (kept)
└─────────────────────────┘

If the result still exceeds the budget with multiple files, a "split_input" error is thrown to trigger prompt splitting.


6. Conversation History Summarization

When Triggered

  • Reactively: When BudgetExceededError is thrown during prompt rendering.
  • Preemptively: When the previous turn's token usage exceeds a budget threshold.

Summarization Flow

BudgetExceededError or threshold exceeded
    │
    ▼
┌─────────────────────────┐
│  Execute PreCompact     │  Run registered extension hooks
│  hooks                  │  (e.g., MCP extensions)
└─────────┬───────────────┘
          │
          ▼
┌─────────────────────────┐
│  Render summarization   │  Mode: "full" (with tools, tool_choice:"none")
│  prompt                 │  or "simple" (lightweight fallback)
└─────────┬───────────────┘
          │
          ▼
┌─────────────────────────┐
│  Send to LLM            │  Model: gpt-4.1 (if available & sufficient context)
│  (temperature=0,        │  Otherwise: current model
│   stream=false)         │
└─────────┬───────────────┘
          │
          ▼
┌─────────────────────────┐
│  Validate summary       │  Token count must fit within budget.
│                         │  If too large → throw, use "simple" fallback.
└─────────┬───────────────┘
          │
          ▼
  Store summary in conversation history.
  Subsequent renders use condensed text.

Modes

Mode Description
"full" Renders complete conversation (including tool schemas with tool_choice: "none"). If prompt cache is enabled, uses 105% of normal budget for extra headroom.
"simple" Lightweight rendering without tool schemas. Used as fallback if "full" mode errors or if config forces it.

Summarization Prompt

The system prompt instructs the LLM to produce a structured summary with sections:

  • Conversation Overview
  • Technical Foundation
  • Codebase Status
  • Problem Resolution
  • Progress Tracking
  • Active Work State
  • Recent Operations
  • Continuation Plan

Includes an <analysis> step for chain-of-thought before the final summary. For Claude models, an extra instruction is appended: "Do NOT call any tools."


7. Tool Result Truncation

Two-Stage Strategy

Stage 1: Large Results to Disk

When enabled (LargeToolResultsToDiskEnabled experiment flag) and the result exceeds a configurable threshold:

  1. If the result is JSON, pretty-print it and extract a schema (lDe).
  2. Write the full content to a session-specific directory on disk.
  3. Replace the tool result with a pointer message:
    Large tool result (42KB) written to file.
    Use the read_file tool to access the content at: /path/to/content.json
    
    Data schema found at: /path/to/schema.json
    

Stage 2: Token-Based Head/Tail Truncation

When the result exceeds the truncate token limit:

let ratio = text.length / tokenCount;           // chars-per-token
let keepChars = ratio * (budget - marker.length);
let head = Math.round(keepChars * 0.4);          // 40% from start
let tail = keepChars - head;                      // 60% from end

return text.slice(0, head)
     + "\n[Tool response was too long and was truncated.]\n"
     + text.slice(-tail);

Rationale: The tail (most recent output) is typically more relevant than the head, so it gets 60% of the budget.


8. Prompt Caching (Cache Breakpoints)

Overview

Cache breakpoints are markers placed at strategic positions in the prompt, enabling API-level prefix caching (supported by both Anthropic and OpenAI). The cache type is always "ephemeral".

Placement Points

Breakpoints are placed at two levels:

A. In the Prompt Tree (during rendering)

Location When
After system/environment info (HNe) New chats
After each user message (k4, Jdt) Always
After the last tool result in each round (Vrt) When enableCacheBreakpoints=true

Historical turns do not get their own breakpoints (enableCacheBreakpoints: false).

B. Post-Render Dynamic Placement (MNe)

After materialization, up to 4 cache breakpoints total are placed at strategic message boundaries:

First pass (reverse):

  • Tool-to-non-tool boundaries (first tool message in a sequence)
  • Most recent user message
  • Pure assistant messages (no tool calls)

Second pass (forward):

  • Early System/User messages (the stable prompt prefix)

Cache Effectiveness Tracking

The API response handler tracks:

{
    prompt_tokens: inputTokens + cacheCreationTokens + cacheReadTokens,
    prompt_tokens_details: {
        cached_tokens: cacheReadTokens  // Cache hit metric
    }
}

Both cache_creation_input_tokens and cache_read_input_tokens are tracked from Anthropic's message_start and message_delta events.


9. Workspace Search (TF-IDF & Embeddings)

Strategy Fallback Chain

The WorkspaceChunkSearch orchestrator tries strategies in order:

1. Full Workspace Search (EM)
   │ Available? ──yes──▶ Return immediately
   │ no
   ▼
2. Remote Code Search (YE) ──── 12.5s timeout
   │                              │
   │ timeout?                     │ success?
   │   ▼                          ▼
   │ Race against local      Return result
   │   ▼
3. Local Embeddings Search (cw) ── 8s timeout
   │                                  │
   │ timeout?                         │ success?
   │   ▼                              ▼
   │ Fall through                Return result
   │
   ▼
4. TF-IDF + Semantic Reranking (yU)
   │
   ▼
5. Pure TF-IDF (IM) ──── always available

TF-IDF Search (IM)

  • Runs in a dedicated Web Worker (tfidfWorker.js)
  • Backed by SQLite database (local-index.1.db) for persistent index
  • Indexes up to 25,000 files on initialization
  • Subscribes to file create/change/delete events for incremental updates
  • Uses maxSpread: 0.75 — only returns results within 75% of the best score
  • Queries are built by joining extracted keywords with commas

Embeddings Search

  • Uses text-embedding-3-small-512 model for vector similarity
  • Optional reranking service for result quality improvement

Budget-Based Result Limiting

const TOKENS_PER_CHUNK = 250;
maxChunks = Math.floor(tokenBudget / TOKENS_PER_CHUNK);

10. Token Counting & Estimation

Precise: Tiktoken BPE Tokenizer

Encodings:

  • o200k_base — GPT-4o and newer (200K vocab)
  • cl100k_base — GPT-4/3.5-turbo (100K vocab)

Architecture:

  • Runs in a dedicated worker thread (tikTokenizerWorker.js)
  • LRU cache of 5,000 entries (text → token count)
  • Worker auto-terminates after 15 seconds of inactivity
  • Falls back to in-process mode if worker unavailable

Token counting for special content types:

Content Type Method
Text BPE tokenization (cached)
Opaque Pre-computed tokenUsage field
Image Vision token formula (GBe)
Cache breakpoint 0 tokens
Tool definitions 16 base + 8 per tool + countObjectTokens() × 1.1 (10% overhead)
Tool calls countMessageObjectTokens() × 1.5 (50% overhead)
Messages array 3 base tokens + sum of message tokens

Fast Estimation

When precise counting is too expensive:

estimatedTokens = text.length * 3 / 4   // ~0.75 tokens per character

The inverse is used for character budgets:

characterBudget = tokenBudget * 4       // ~4 characters per token

11. Inline Completion Context Budgets

Token Budget Computation

const DEFAULT_TOKEN_BUDGET = 8192;  // 8K tokens

// Available budget after accounting for current document:
availableBudget = 8192 - (documentLength / 4) - 256;
//                        └── estimated tokens ──┘   └── overhead ──┘

// Character budget conversion:
primaryCharacterBudget = (tokenBudget ?? 7168) * 4;     // ~28,672 chars
secondaryCharacterBudget = 8192 * 4;                     // ~32,768 chars

Two-Tier Budget System (Z7)

Context items are split into mandatory (priority ≥ 0.7) and optional pools:

Z7 = class {
    mandatory;   // High-priority items consume from this
    optional;    // Lower-priority items consume from this

    spend(amount) {
        this.mandatory -= amount;
        this.optional -= amount;
    }
    isExhausted() { return this.mandatory <= 0; }
    isOptionalExhausted() { return this.optional <= 0; }
};

Usage Modes

Mode Optional Budget Use Case
"minimal" 0 Only mandatory context
"fillHalf" budget / 2 Moderate context
"double" min(budget, docLength) Up to document size extra
"fill" budget Maximum context (2× mandatory)

Neighbor File Context

  • Tracks up to 32 recently active/visible editors in an LRU cache
  • Provides up to 10 most recently active neighbor files (excluding current document)
  • Used for inline completion context and TypeScript server plugin

Cache Population

  • Proactive cache warming triggered on: cursor moves, text changes, inline completion requests
  • Time budget: 50ms for cache population
  • Race timeout: 20ms — if the TypeScript server doesn't respond in time, yield what we have

Appendix: Key Constants

Constant Value Description
modelMaxPromptTokens Model-specific Maximum prompt tokens for the model
TOKENS_PER_CHUNK (yhe) 250 Average tokens per workspace search chunk
MAX_CACHE_BREAKPOINTS (Lai) 4 Maximum cache breakpoints in rendered prompt
LRU_TOKENIZER_CACHE_SIZE 5,000 Tiktoken LRU cache entries
LRU_PARSE_TREE_CACHE_SIZE 5 per language Tree-sitter parse tree cache
MAX_WORKSPACE_FILES 25,000 Maximum files indexed by TF-IDF
WORKER_IDLE_TIMEOUT 15,000ms Tokenizer worker auto-termination
DEFAULT_INLINE_TOKEN_BUDGET (zFt) 8,192 Default inline completion token budget
MAX_NEIGHBOR_FILES 10 Neighbor files for inline completions
NEIGHBOR_FILE_LRU_SIZE 32 LRU capacity for tracking recent editors
SUMMARIZATION_SHRINK_FACTOR 0.85 Per-iteration budget reduction for document fitting
SUMMARIZATION_MAX_ITERATIONS 5 Maximum shrink iterations
TOOL_RESULT_HEAD_RATIO 0.4 Head portion of truncated tool results
TOOL_RESULT_TAIL_RATIO 0.6 Tail portion of truncated tool results
TOOL_TOKEN_OVERHEAD 1.1× Overhead multiplier for tool definition tokens
TOOL_CALL_OVERHEAD 1.5× Overhead multiplier for tool call tokens
FAST_TOKEN_ESTIMATE length × 0.75 Approximate tokens from character count
CHAR_PER_TOKEN_ESTIMATE 4 Approximate characters per token
BASE_TOKENS_PER_MESSAGE 3 Overhead tokens per chat message
PRUNING_SAVINGS_MARGIN 1.25× Heuristic margin to reduce token recounts during pruning
TFIDF_MAX_SPREAD 0.75 Only return TF-IDF results within 75% of best score
REMOTE_SEARCH_TIMEOUT 12,500ms Timeout for remote code search
LOCAL_EMBEDDINGS_TIMEOUT 8,000ms Timeout for local embeddings search
CACHE_POPULATION_TIMEOUT 50ms Time budget for proactive cache warming
CACHE_RACE_TIMEOUT 20ms Race timeout for TypeScript server response
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment