bkase/phase1.md

## phase1.md

      
    Raw
  

              phase1.md
            
          
    Phases

Phase 1 — Language core + toolchain + test harness (foundation)

Build the minimal compile-to-JS language end-to-end (lexer → parser → compiler → runtime), define the v0.1 spec, stand up Bun+TypeScript repo tooling (oxlint/oxfmt), and establish a serious testing spine (unit + a first e2e smoke test). Deliver a tiny “runner” web page and a build pipeline that can emit a single dist/index.html even if the UI is still barebones.
Phase 2 — In-browser “studio”: virtual filesystem + editor + live recompile + graphics sandbox

Implement the graphical computing environment (canvas-based), virtual filesystem (seeded with the project’s own sources), file explorer, and the live recompilation loop. Establish robust browser sandboxing for running user code safely. Grow e2e coverage around IDE workflows.
Phase 3 — “Product” layer: two small games + polish + documentation-grade code + final single-file build

Ship two small games written in the new language (or at least using its stdlib + graphics module), improve ergonomics (errors, traces, perf), add doc-quality in-code documentation, ensure everything works offline as a single index.html, and harden tests (including flake controls + CI gates).

Engineering Plan — Phase 1

0. Executive summary

We will create a tiny language (“Twig”, placeholder name) that compiles to JavaScript and runs in a browser with no FFI exposed to user programs (i.e., no direct “call arbitrary JS” escape hatch). The compiler and runtime will be written in TypeScript and bundled with Bun. We’ll establish strict correctness with Bun’s built-in test runner (TypeScript-supported, Jest-like) and snapshots for compiler output, plus a minimal Playwright smoke test to verify the browser runner loads and executes a sample program. (bun.com)
Phase 1 is explicitly about de-risking:

“Can we implement a language pipeline cleanly?”
“Can we run compiled output deterministically (tests)?”
“Can we emit a single index.html artifact from Bun tooling?”


1. Goals and non-goals

Goals (Phase 1)


Language v0.1 implemented:

Lexer → parser → AST with spans
Compiler to JS
Minimal runtime + stdlib
Deterministic behavior for tests


Tooling:

Bun workspace (TS-first)
oxlint for lint, oxfmt for format


Tests:

Many unit tests for each layer (lexer/parser/compiler/runtime)
Snapshot (“golden”) tests for compiler output
Minimal e2e smoke test using Playwright (loads runner page, runs “hello”)


Build artifact proof:

A build script that produces one dist/index.html containing everything needed to run the minimal system offline.
This is a proof that the final “single file” constraint is feasible early. (The full IDE arrives in later phases.)


Non-goals (Phase 1)


Full-feature language (types, macros, advanced modules, optimization passes)
Rich IDE (file tree, editor UX, live recompile UI) — Phase 2+
Two games — Phase 3
Sophisticated security sandbox — Phase 2+ (Phase 1 just establishes the execution strategy and constraints)


2. Key constraints translated into concrete engineering requirements

“No FFI”

Interpretation we will implement:

User programs cannot call arbitrary JavaScript APIs.
Compiled programs run against a sealed runtime object (e.g., __rt) that exposes only our standard library and (later) graphics/file APIs.
We do not implement constructs like js.call("document.querySelector", ...).

Enforcement strategy (Phase 1 baseline):


Generated JS references only:

local variables
__rt (runtime)


Generated JS executes in a wrapper that shadows common globals (window, document, globalThis, Function, eval) to reduce accidental/intentional escape. (Not perfect security, but aligned with “minimal system”; hardening later.)


“Pure bun as much as possible; TypeScript as much as possible”


Compiler, runtime, web runner, build scripts: TypeScript.
Use Bun bundler and Bun test runner as defaults. (bun.com)
For Playwright: we’ll use Bun for dependency management and scripts; but we will not force Playwright to run under Bun runtime if it causes instability (there are known Bun-runtime compatibility issues when forcing Bun execution). (GitHub)

“Use ox tools for lint and format”


Use oxlint for linting and oxfmt for formatting. (Oxc)

“Compile it to index.html”


Phase 1 will build a minimal dist/index.html single-file output.
We’ll implement a bundling+inlining build script using Bun.build() outputs (which are BuildArtifacts and can be read via .text()), then inject JS/CSS into an HTML template. (bun.com)


3. Proposed language v0.1 (minimal but future-friendly)

3.1 Syntax choice

Pick an s-expression syntax to minimize parser complexity and maximize extensibility:
Examples:
; hello
(print "hi")

; let + function
(let add (fn (a b) (+ a b)))
(print (add 1 2))

; if
(if (> x 10)
  (print "big")
  (print "small"))
Rationale:

Lexer+parser are straightforward.
“Special forms” are explicit.
We can grow features later without grammar wars.

3.2 Core forms (Phase 1)


Literals: number, string, boolean, nil


Identifier: foo, bar-baz (decide exact identifier charset in spec)


Call: (f a b c)


Special forms:

(do expr1 expr2 ... exprN) → sequencing
(let name expr) → immutable binding in current scope
(fn (arg1 arg2 ...) body) → function
(if cond then else) → expression if


Comments:

; to end of line


Phase 1 deliberately excludes:

mutation / assignment
macros
pattern matching
user-defined types

3.3 Semantics (Phase 1)


Lexical scoping, closures.
Everything is an expression; do returns last expression.
Truthiness: only false and nil are falsy (or we can do JS-like; pick and document).

3.4 Error model

All compiler stages return structured diagnostics:

Diagnostic { kind, message, span, notes[] }
span references the source text by { start, end, line/col } computed from a line map.


4. High-level architecture

4.1 Repository layout

A clean “documentation-as-code” structure:
/src
  /lang
    ast.ts
    span.ts
    lexer.ts
    parser.ts
    printer.ts        (optional, for debugging + tests)
    diagnostics.ts
    compiler.ts
  /runtime
    index.ts
    std.ts
    values.ts         (tagging / runtime helpers)
  /host
    host.ts           (Host interface: readFile, now, random, etc.)
    bun-host.ts
    browser-host.ts   (stub in phase1; real use in phase2)
  /web
    template.html
    app.ts            (minimal runner UI: run one example program)
  /build
    build-single-html.ts
/tests
  lexer.test.ts
  parser.test.ts
  compiler.test.ts
  runtime.test.ts
  integration.test.ts
/e2e
  smoke.spec.ts

4.2 Interfaces to keep Phase 2 easy

Even though Phase 1 won’t ship a full IDE, we should design for it now.
Host abstraction
interface Host {
  readText(path: string): Promise<string | null>; // phase2: virtual FS
  now(): number;
  random(): number; // injectable deterministic PRNG for tests
}
Compiler API
type CompileResult =
  | { ok: true; js: string; diagnostics: Diagnostic[] }
  | { ok: false; js?: string; diagnostics: Diagnostic[] };

async function compileSource(source: string, opts: CompileOpts): Promise<CompileResult>;
Later, Phase 2 will extend this to compileModule(entryPath, host) and a module graph. We keep the types aligned so we don’t rewrite everything.

5. Compiler pipeline design (Phase 1)

5.1 Stages


Lex: input string → tokens with spans
Parse: tokens → AST with spans
Lower: AST → “core AST” (optional in Phase 1; but we should reserve a file/module)
Emit: core AST → JS string

5.2 JS output structure (for “no FFI” posture)

Generated JS is a module-like string that exports a single function:
export function __run(__rt) {
  "use strict";
  const window = undefined, document = undefined, globalThis = undefined;
  const Function = undefined, eval = undefined;
  // compiled code...
  return result;
}
The runtime provides a controlled API surface:
type Runtime = {
  std: StdLib;
  // phase2: gfx, fs, ui...
};
5.3 Execution strategy (tests + browser)


In unit/integration tests: evaluate generated JS via new Function or dynamic import from a data URL.
In browser runner: create a Blob URL from generated JS and import() it, then call __run(runtime).


6. Runtime + stdlib (Phase 1 subset)

6.1 Value model

We will keep values “JS-native” where possible:

numbers → JS number
strings → JS string
booleans → JS boolean
nil → null (or a unique symbol; pick and document)
functions → JS functions, but only those created by compiled code or stdlib

6.2 Stdlib v0.1 (small but usable)

Core

print(x) → append stringified output to a runtime buffer (and optionally console.log)
str(x) → string conversion
=, <, >, <=, >=

Math

+, -, *, /, %

Logic

and, or, not (or just rely on if + truthiness; choose explicit functions for clarity)

Lists (minimal)

list(a b c) → array
len(xs)
get(xs i) with bounds checks → nil or error (decide)
push(xs x) (returns new array, immutable style)

We should keep phase1 stdlib pure and deterministic, except print.
6.3 Determinism hooks for testing


runtime gets random() from Host; tests inject a seeded PRNG.
runtime gets now() from Host; tests can fix it.


7. Tooling and quality gates

7.1 Bun

We will rely on:

Bun test runner for unit/integration/snapshot tests. (bun.com)
Bun bundler/build APIs for producing browser bundles and the single-file artifact. (bun.com)
Bun can run HTML entrypoints in dev (useful later), but Phase 1 only needs a minimal runner and a build. (bun.com)

7.2 Lint + format (“ox tools”)


Add oxlint and oxfmt as dev dependencies and wire scripts. (Oxc)
Use .oxlintrc.json (json/jsonc supported) and keep rules sane for a compiler codebase. (Oxc)

Recommended scripts

bun run lint → oxlint
bun run lint:fix → oxlint --fix
bun run format → oxfmt .
bun run format:check → oxfmt --check . (verify flag support when implementing; if not available, alternative is running and checking git diff)

(We’ll keep formatting config minimal and documented in-repo.)

8. Build plan: emit a single dist/index.html in Phase 1

8.1 Why this in Phase 1?

Single-file build is a high-risk “integration constraint.” If we postpone it, we risk a painful Phase 3 surprise. Phase 1 will prove the mechanism with the minimal runner.
8.2 Strategy


Use Bun.build() on src/web/app.ts (browser target, minify optional).


From BuildOutput.outputs, pick:

the JS entry-point/chunk
optionally a CSS artifact (if we produce CSS as a separate file; we can also inline CSS by writing it in the HTML template directly)


Read artifact content using await artifact.text() (supported on BuildArtifact). (bun.com)


Inject JS into src/web/template.html:

<script type="module"> ...bundled JS... </script>
<style> ...css... </style>


Write the resulting HTML to dist/index.html.


Result: a single file that can be opened directly.

9. Testing plan (Phase 1)

9.1 Unit tests (Bun)

Bun’s test runner supports TypeScript and snapshots, so we will lean on that heavily. (bun.com)
Lexer tests


tokenization of:

identifiers
numbers (including edge cases: -1, 1.0, .5 if allowed)
strings (escapes, unterminated)
parentheses
comments


span accuracy: start/end indexes correct


Parser tests


parse each syntactic form


precedence is irrelevant in s-exprs, but we test:

nesting
empty lists ()
error recovery: unexpected EOF, mismatched parens


spans: AST nodes carry correct ranges


Compiler tests


“golden” snapshot tests:

compile small inputs and snapshot emitted JS
keep snapshots stable by normalizing whitespace and deterministic temp names


semantic tests:

compile expression → execute → compare result
compile if, let, fn + closure capture


Runtime/std tests


print output buffer behavior
numeric ops correctness
list ops correctness and bounds behavior

9.2 Integration tests (Bun)

Test full pipeline:


compile sample programs (fibonacci, map/reduce, closure) and run them through runtime host.


verify both:

final returned value
captured printed output (exact strings)


9.3 E2E tests (Playwright) — smoke only in Phase 1

We want e2e “as we go,” but Phase 1 keeps it minimal:


Build dist/index.html


Serve it via a tiny HTTP server (recommended; avoids file:// quirks)


Playwright opens the page and asserts:

the page loads without console errors
the sample program output area contains expected text (e.g., “hi”)


Playwright usage is via its standard playwright test workflow. (playwright.dev)
Important runtime note (pragmatic Bun posture):
There are documented issues when forcing Playwright to run under Bun’s runtime; we will not force that mode in Phase 1. We will run Playwright in its stable configuration (typically Node execution) while still using Bun for dependency management and scripting. (GitHub)
9.4 Test-writing workflow guidance


Every new language feature ships with:

parser unit tests
compiler snapshot
runtime semantic test (compile+run)


Bugs get a “regression test first” rule.


9.5 Quality gates for Phase 1


bun test is green
bun run lint is green
bun run format:check is green
bun run build produces a single dist/index.html
bun run e2e passes smoke test


10. Concrete Phase 1 deliverables checklist

Deliverable A — Language v0.1 spec (repo docs)


docs/language-v0.1.md describing:

syntax
semantics
stdlib surface
error philosophy


Small examples (hello, let/fn/if)


Deliverable B — Compiler + runtime in TS


Clean module boundaries
Diagnostics with spans
Deterministic test harness host

Deliverable C — Minimal browser runner


template.html + app.ts:

compiles embedded example program string
runs it
shows output in the DOM (simple <pre id="output">)


Deliverable D — Single-file build script


build-single-html.ts (Bun-run) producing dist/index.html fully self-contained (inline JS/CSS)

Deliverable E — Tests


Unit tests for lexer/parser/compiler/runtime
Integration tests for compile+run
Playwright smoke test loads the page and checks expected output


11. Risks and mitigations (Phase 1)

Risk: “Single-file index.html bundling gets messy”

Mitigation:

De-risk immediately by implementing the inlining build step in Phase 1 using Bun.build() + BuildArtifact.text(). (bun.com)
Avoid external assets (images/fonts) in Phase 1; everything is inline code.

Risk: “Playwright + Bun runtime instability”

Mitigation:

Don’t force Bun runtime for Playwright execution; use the stable Playwright runner path.
Keep e2e smoke minimal in Phase 1; expand in Phase 2. (GitHub)

Risk: “Language design painted into a corner”

Mitigation:

Keep the compiler API and host abstraction future-proof (module graph later).
Adopt a tiny core language; add features only when demanded by Phase 2/3 needs.


12. Definition of Done for Phase 1

Phase 1 is complete when:


We can write a small program in the new language and:

compile it to JS
execute it under Bun tests
execute it in the browser runner


We can produce a single dist/index.html that runs offline.


Lint/format/test/e2e gates are wired and green.


If you want, when you ask for Phase 2, I’ll write it as a similarly detailed engineering doc focused on:

virtual filesystem representation + seeding with source code
file explorer UI model
editor component plan (minimal textarea vs richer editor)
live compilation loop and sandbox strategy
graphics runtime API design and rendering loop
deeper Playwright coverage for IDE workflows


## phase2.md

      
    Raw
  

              phase2.md
            
          
    Engineering Plan — Phase 2

Main-thread studio with VFS + editor + live recompile + graphics sandbox (no Web Worker)

Phase 2 builds the “graphical computing environment” on top of the Phase 1 compiler/runtime, while honoring your updated constraint:

No Web Worker: user programs execute on the main thread.
UI stays responsive: execution is cooperative (step-budgeted) and yields to the event loop between slices.
Single-file output: still emits one offline-openable dist/index.html.

The big technical shift vs the earlier Phase 2 draft is the runner: instead of isolating user code in a Worker, we’ll make the runtime/compiled output structurally incapable of hogging the thread for long by running under a trampoline + step budget.

1. Goals and non-goals

Goals (Phase 2)


Studio UI

File explorer (system + workspace)
Text editor
Output panel (stdout)
Diagnostics panel (compile/runtime errors)
Canvas panel (graphics)
Run controls: Run, Auto-run, Stop, Reset


Virtual filesystem (VFS)

In-memory hierarchical FS
Seeded with the project’s own source code (so you can explore it)
Workspace persisted (browser storage)
Read-only system tree by default, with “Copy to workspace” affordance


Live recompilation

Debounced compile on edit when Auto-run is enabled
Diagnostics with spans (file + line/col + excerpt)
Successful compile triggers a clean “restart”


Graphics computing environment

Canvas-based gfx/* stdlib API
input/* for keyboard/mouse
gfx/on-frame for animation sketches


Main-thread cooperative execution

Programs run in bounded slices (N steps per slice)
Yield between slices (setTimeout(0) / requestAnimationFrame) to keep UI alive
Stop button cancels the job


Testing expands substantially

Lots of Bun unit tests (VFS, runner, scheduling, gfx state machine)
DOM/UI tests under Bun (Bun’s test runner supports “UI & DOM testing”) (Bun)
Playwright e2e covering core workflows (playwright test, --ui for debugging) (Playwright)


Non-goals (Phase 2)


Two “little games” (Phase 3)
Fancy editor features (syntax highlighting, multi-cursor, LSP, etc.)
Hardened security isolation from a malicious program (no Worker means we can only do “best effort” sandboxing)


2. Architecture overview

2.1 High-level layers


UI Layer

Explorer, Editor, Output, Diagnostics, Canvas


Studio Model

Pure state + actions (open file, edit buffer, run, etc.)


Services

VfsService (overlay + persistence)
CompileService (compile entry file → runnable artifact)
RunService (cooperative executor + cancellation + on-frame loop)


Runtime

stdlib modules: std/*, gfx/*, input/*, ref/*, fs/*
stdout buffer + diagnostic helpers


Renderer

Canvas rendering + input event wiring


This structure keeps most code testable without a real browser.
2.2 Repo layout (Phase 2 additions)

/src
  /studio
    model.ts            # state + actions + reducer (pure)
    controller.ts       # wires UI events to services (thin)
    view.ts             # DOM creation + minimal component helpers
    debounce.ts
  /vfs
    path.ts
    vfs.ts
    seed.ts             # types
    persist.ts
  /runner
    step.ts             # Step/Thunk types, trampoline
    scheduler.ts        # yield strategy, cancellation, budgets
    run-service.ts      # run/stop/on-frame orchestration
  /runtime
    runtime.ts          # Runtime object + buffer
    std.ts
    ref.ts
    fs.ts
    gfx.ts
    input.ts
  /web
    template.html
    app.ts              # boots studio
  /build
    generate-seed.ts
    build-single-html.ts
/tests
  ...lots...
/e2e
  ...playwright specs...


3. The critical piece: main-thread cooperative execution

With no Worker, the only real way to prevent UI lock-ups is to ensure the language runtime never runs “unbounded” in one JS turn.
3.1 Execution strategy: trampoline + step budget

We will execute compiled programs using a trampoline:

The compiler emits code where evaluation proceeds via thunks (zero-arg functions).
Each thunk represents “one small step”.
The trampoline runs up to budget steps, then yields to the event loop, then continues.

Core types:
type Value = unknown;

type Done = { done: true; value: Value };
type Thunk = () => Step;
type Step = Thunk | Done;
Trampoline slice:
function runSlice(step: Thunk, budget: number): { next?: Thunk; done?: Value } {
  let current: Thunk = step;
  for (let i = 0; i < budget; i++) {
    const s = current();
    if (typeof s === "function") {
      current = s;
    } else {
      return { done: s.value };
    }
  }
  return { next: current };
}
3.2 Yield strategy

We need rendering opportunities, not just microtasks. So the yield strategy should schedule a macrotask or a frame:

For “one-shot” runs (press Run / auto-run compile success): yield via setTimeout(0) to let the browser paint.
For animation (gfx/on-frame): run the callback in a rAF tick and cap steps per frame. If it doesn’t finish inside the per-frame budget, we carry it over to the next frame (and optionally warn).

3.3 Why this implies a compiler constraint

To make step budgeting meaningful, the generated JS must avoid:

raw while(true) / for(;;) loops
deep JS recursion
calling user-defined functions through JS array methods that hide control flow

So in Phase 2, we enforce:

the language does not have built-in loop constructs yet (animation uses gfx/on-frame)
function calls are trampolined (no JS stack growth)

This is the main technical work in Phase 2’s language/runtime area: adjusting codegen so it can resume execution after a yield.
3.4 Cancellation

A “Stop” button should:

mark the current job as cancelled
prevent further slices from scheduling
clear animation loop callback(s)

Because we’re stepping, we can check job.cancelled at slice boundaries and optionally every N steps.
3.5 Runtime errors

Runner wraps each slice in try/catch:

On exception, stop the job and report a runtime diagnostic in the diagnostics panel.
Keep the UI alive.


4. Virtual filesystem design

4.1 Requirements


Two top-level trees:

/system/** — seeded, read-only
/workspace/** — user-editable, persisted


Deterministic directory listings


Strict path normalization (no .. escape)


4.2 Data model

Use an overlay filesystem:

SeedFS: Map<string, string> (path → file text)
UserFS: Map<string, string> + directory index
OverlayFS: resolves reads from UserFS first, then SeedFS

Operations:
interface VFS {
  readText(path: string): string | null;
  writeText(path: string, text: string): void;   // workspace only
  listDir(path: string): DirEntry[];
  stat(path: string): Stat | null;
  mkdir(path: string): void;
  remove(path: string): void;
  rename(from: string, to: string): void;
}
4.3 Persistence

Phase 2 baseline: persist UserFS to localStorage as JSON.

Pros: simple, no async, small code
Cons: size limit (but acceptable for this project’s goals)

We keep the persistence logic behind an interface so IndexedDB can replace it later without rewriting VFS.
4.4 Seeding with “its own source code”

Build step generates a module containing a manifest:
export const SEED_FILES: Record<string, string> = {
  "/system/src/runtime/runtime.ts": "....",
  ...
};
This makes the environment “self-inspecting” offline.

5. Studio UI (Explorer + Editor + Output + Canvas)

5.1 Layout

A simple three-pane layout:

Left: file explorer
Center: editor
Right: tabs or stacked panels: Output / Diagnostics / Canvas

Top toolbar:

Entry file (default /workspace/main.tw)
Run
Auto-run toggle
Stop
Reset (clears stdout + gfx state, restarts runtime without changing editor buffer)

5.2 Editor MVP

Use <textarea> (monospace) for Phase 2:

loads file content
Ctrl/Cmd+S saves to VFS
optionally show line numbers (not required, but helpful)

This keeps the system dependency-free and readable.
5.3 File explorer MVP


Expand/collapse directories
Click file to open
Create/rename/delete for /workspace/** only
System tree is read-only; opening works; writing triggers “Copy to workspace” flow


6. Live compilation loop

6.1 Triggers


On edit:

mark dirty
if Auto-run: debounce (e.g. 300ms), then compile + run


On Run:

save (optional policy: either auto-save current buffer or run buffer directly; choose one and test it)
compile + run


On Stop:

cancel current run job


On Reset:

clear runtime state + restart animation loop if program registers it again


6.2 Compile service

Phase 2 compile reads from VFS:
type CompileResult =
  | { ok: true; program: CompiledProgram; diagnostics: Diagnostic[] }
  | { ok: false; diagnostics: Diagnostic[] };

compileFromVfs(entryPath: string, vfs: VFS): CompileResult;
CompiledProgram is the thing the runner executes (e.g., a top-level Thunk, plus metadata for sourcemaps/diagnostics).
6.3 Diagnostics UX

Show:

file path
line:col
excerpt with caret underline
diagnostic kind: parse / compile / runtime


7. Graphics environment design (gfx/*)

Even on main thread, we keep gfx as a disciplined API rather than letting programs poke canvas directly.
7.1 Canvas host


One <canvas> element
Resizable via gfx/size

7.2 Command buffer

gfx/* functions enqueue drawing commands into a buffer:
type GfxCommand =
  | { op: "size"; w: number; h: number }
  | { op: "clear"; r: number; g: number; b: number; a: number }
  | { op: "fill"; r: number; g: number; b: number; a: number }
  | { op: "rect"; x: number; y: number; w: number; h: number }
  | ...
The runner flushes:

after each run slice yield (optional)
and at least once per rAF frame if an animation is active

This keeps rendering consistent and testable.
7.3 Animation: gfx/on-frame

Design:


gfx/on-frame registers a callback (fn (t) ...)


Studio has a rAF loop:

each frame calls the callback with t (ms since start)
callback executes with a per-frame step budget
flush gfx buffer once per frame


If the callback does not complete within budget:

we pause it and resume next frame
(optional) show “frame budget exceeded” warning in diagnostics

7.4 Input: input/*

We wire DOM events and maintain state:

mouse x/y
mouse down
set of keys down

Stdlib accessors:

input/mouse-x, input/mouse-y
input/mouse-down?
input/key-down? "ArrowLeft"


8. Runtime stdlib extensions for Phase 2

In addition to Phase 1 core stdlib:
8.1 ref/* (mutable cells for interactive programs)


ref/new x
ref/get r
ref/set r x

This avoids adding assignment syntax yet still allows stateful sketches and, later, games.
8.2 fs/* (controlled access to VFS)

We do not expose the real browser FS. This is still “no FFI”.

fs/read-text "/workspace/foo.tw" → string or nil
fs/write-text "/workspace/foo.tw" "..." → ok/nil
fs/list "/workspace" → list of entries

Enforce workspace-only writes.

9. Build pipeline: still emits one dist/index.html

9.1 Inputs that must be embedded


Studio app bundle (JS + CSS)
Seed file manifest (source code text)
Any default workspace files (like /workspace/main.tw sample)

9.2 Bun build approach

Use Bun.build() to bundle the browser app. (Bun)
Bundler outputs are BuildArtifact objects (Blob-like), so we can read them and inline them into HTML. (Bun)
Key build config choices:

splitting: false (single output) (Bun)
minify: false (readability; smaller “documentation” gap)
sourcemap: "inline" in dev build (optional; controlled by env)

9.3 Single-file HTML emitter

build-single-html.ts:

Read template.html
Run Bun.build({ entrypoints: ["src/web/app.ts"], ... })
Inline the JS bundle into <script type="module">...</script>
Inline CSS similarly (or keep CSS inside JS if bundler does that)
Write dist/index.html


10. Tooling and quality gates (Bun + oxlint + oxfmt)

10.1 Bun as the default toolchain


Tests: Bun’s test runner supports TypeScript and snapshot testing, plus UI/DOM testing. (Bun)
Watch mode available (bun test --watch) per Bun test docs. (Bun)

10.2 Linting: oxlint


oxlint --fix for safe automatic fixes. (Oxc)

10.3 Formatting: oxfmt


oxfmt --check for CI validation. (Oxc)

Suggested scripts:

lint: oxlint
lint:fix: oxlint --fix
format: oxfmt --write .
format:check: oxfmt --check . (Oxc)
test: bun test
test:watch: bun test --watch (Bun)
build: bun run src/build/build-single-html.ts
e2e: playwright test


11. Testing plan (Phase 2)

11.1 Unit tests (Bun) — heavy emphasis

Bun test runner supports TS + snapshots and explicitly calls out UI/DOM testing support. (Bun)
VFS tests


Path normalization and traversal rules
Overlay precedence correctness
Read-only enforcement for /system/**
Workspace mutations: create/rename/delete, directory listing stability
Persistence: serialize/deserialize; corrupted storage recovery

Studio model reducer tests (pure)


Open file updates buffer
Edit marks dirty
Save writes to VFS and clears dirty
Auto-run toggle changes behavior
Status transitions: compiling → ok/error, running → stopped/error

Cooperative scheduler / trampoline tests (core!)


Budget slicing: run a program that requires many steps and assert it completes across multiple slices
Yield behavior: with a tiny budget, assert the scheduler yields and resumes
Cancellation: start a long job, cancel, ensure it stops scheduling further slices
No stack blowup: recursive function compiled into thunks should not overflow JS stack

Runtime/std/gfx tests


print buffering
ref/* correctness
gfx command queue correctness
gfx/on-frame registration semantics (single callback vs multiple; define and test)

11.2 DOM/UI tests under Bun

Because Bun supports UI/DOM testing as a built-in test runner feature, we can run lightweight DOM tests for:

file explorer renders expected nodes
clicking a file opens it
Ctrl/Cmd+S triggers save
diagnostics panel shows errors

(Keep these tests narrow: wiring + rendering correctness, not “full browser fidelity”.)
11.3 E2E tests (Playwright)

Playwright’s standard workflow is playwright test, with --ui for interactive debugging. (Playwright)
E2E scenarios for Phase 2:


Boot smoke

open served dist/index.html
assert explorer + editor + output + canvas exist


Edit → live recompile → stdout updates

open /workspace/main.tw
type (print "hello")
assert output contains hello


Compile error surfaced

introduce parse error (missing ))
assert diagnostics panel shows message + line/col


Canvas deterministic draw

program draws a solid rect at known coordinates
sample canvas pixel at center; assert RGBA


Stop button works

run a program that registers gfx/on-frame and increments a counter in stdout each frame
press Stop
assert counter stops increasing


Note on Bun runtime mode for Playwright

There are known issues reported when forcing Playwright to run under Bun’s runtime flag (e.g., hangs/segfaults). We should run Playwright in its stable default mode (Node) while still using Bun for installs/scripts. (GitHub)

12. Implementation sequence (Phase 2 milestones)

Milestone 2.1 — VFS + seed generation


Implement path normalization utilities + tests
Implement SeedFS + UserFS + OverlayFS + tests
Implement generate-seed.ts build step
Minimal explorer UI listing seeded files (read-only)

Milestone 2.2 — Editor MVP + open/save


Textarea editor
Open file from explorer
Save to /workspace + persistence
Add DOM tests for open/save wiring

Milestone 2.3 — CompileService from VFS + diagnostics panel


Compile entry file read from VFS
Display diagnostics with excerpts
Auto-run debounce plumbing

Milestone 2.4 — Main-thread RunService with cooperative scheduling


Implement trampoline engine + scheduler + cancellation
Adjust compiler backend (or codegen mode) so execution is thunked/stepwise
Run button executes program without freezing UI

Milestone 2.5 — Graphics + input + on-frame loop


Canvas panel
gfx command queue + flush
gfx/on-frame + rAF integration with per-frame budget
Input state from DOM

Milestone 2.6 — Single-file build hardening


Ensure dist/index.html contains everything
Ensure it runs offline (no network requests)
Add a build “smoke test” (optional): parse output and check required markers exist

Milestone 2.7 — Expand Playwright coverage


Add the five e2e tests above
CI-friendly headless mode
Add --ui debugging instructions for local dev


13. Risks and mitigations (no Worker edition)

Risk: You can’t truly preempt main-thread code

Mitigation:

Ensure language execution is always under trampoline control.
Avoid emitting JS constructs that can run unbounded without returning control.

Risk: Compilation itself can block the UI

Mitigation:

Debounce compile.
Keep language small.
If it becomes an issue later: split compile work into chunks and yield between pipeline stages (rarely needed for small projects).

Risk: Users can write “forever” programs

Mitigation:

Distinguish “top-level run must finish” vs “on-frame is allowed to be endless”.
Provide Stop button and a clear “running” indicator.

Risk: Single-file seed manifest becomes huge

Mitigation:


Only seed what’s needed for exploration:

/src/**, /docs/** (optional), maybe omit /tests/** in the shipped index, or provide a build flag to include/exclude.


Keep a manifest size report in build output so it’s visible.


14. Definition of Done (Phase 2)

Phase 2 is done when:


dist/index.html (single file) opens offline and shows:

explorer + editor + stdout + diagnostics + canvas


Editing /workspace/main.tw:

auto-compiles (when enabled)
runs on success (stdout/canvas updates)
shows diagnostics on error


Graphics demo runs via gfx/on-frame without freezing UI.


Stop works (cancels animation).


Tests:

substantial Bun unit tests across VFS/model/scheduler/runtime/gfx
Playwright e2e covers at least the five workflows above using playwright test (Playwright)
oxlint and oxfmt checks are wired (oxlint --fix available; oxfmt --check in CI). (Oxc)


If you want, when we move to Phase 3, I’ll structure it around: (1) two small games built in the language using gfx/* + input/* + ref/*, (2) better in-app documentation tour (openable files + “start here”), and (3) tightening the runtime/compiler ergonomics (errors, performance, UX polish) while keeping the single-file constraint.

  
## phase3.md

      
    Raw
  

              phase3.md
            
          
    Engineering Plan — Phase 3

Two little games + “documentation as code” structure (clear, navigable, runnable)

Phase 3 is intentionally narrow:

Ship two tiny games written in the new language (using gfx/*, input/*, ref/*, plus a couple small helper libs).
Make the repo and the in-browser filesystem feel like a self-teaching system: the code is the documentation, and the docs are runnable.

Everything still compiles into one offline index.html and is browsable/editable via the studio.

1. Deliverables at the end of Phase 3

1. Two games (in-language)


Pong (classic paddles + ball + score)
Snake (grid snake + food + score)

Both:


run from the studio with a single click (or “Run current file”)


have a short How to play header comment inside the .tw file


are structured as a readable reference implementation for:

gfx/on-frame
input handling
state management with ref
deterministic update loops (fixed timestep or frame-delta)


2. Documentation-as-code structure


A numbered, curated docs directory that reads top-to-bottom
A set of runnable examples (small .tw programs) that the docs reference
A source tour that points you to the most important TypeScript files (compiler/runtime/studio) and tells you what to notice
A “Docs don’t rot” test suite: examples compile and (where feasible) run under Bun tests

3. Testing


Unit tests (Bun) for:

shared game math / collision helpers
snake movement + food placement determinism
pong collision + scoring invariants


Integration tests that compile & run game modules in a headless runtime with stubbed gfx/input


Playwright e2e flows:

launch each game
simulate a few inputs
assert visible scoreboard changes / canvas pixel changes
assert Stop works and UI stays responsive


2. Game selection rationale (why these two)

Pong


Minimal asset needs: rectangles + text.
Demonstrates continuous motion + collision response.
Great showcase for gfx/on-frame and per-frame dt.

Snake


Discrete grid logic + deterministic random placement (food).
Demonstrates “game state as data” and list operations.
Great showcase for fixed-timestep updates and using host-seeded RNG deterministically in tests.

These two together prove the platform is useful without needing fancy language features.

3. Required minimal language/runtime helpers (only what games need)

To keep game code readable (and documentation-grade), Phase 3 should add/standardize a couple small helpers. Keep these tiny and well-tested.
3.1 Minimal “record” helpers (for readable state)

If game state is only nested lists, it becomes hard to read. A tiny record/dict API makes code dramatically clearer.
Proposed stdlib additions (backed by plain JS objects internally):

rec/new k1 v1 k2 v2 ... → record/object
rec/get r k → value or nil
rec/set r k v → record (mutable or persistent; pick one and document)
rec/has? r k

This is still “no FFI” because the only way to interact is through stdlib.
Testing requirement: strict unit tests around behavior, including missing keys and key type constraints (strings only).
3.2 Math helpers


math/abs, math/min, math/max
math/clamp x lo hi
math/floor (for Snake grid)
math/rand-int n (uses host random; returns integer in [0, n))

3.3 List helpers (just enough for Snake)

Prefer keeping this minimal and orthogonal:

list/cons x xs (prepend)
list/head xs
list/tail xs
list/slice xs start end (or list/take xs n)
list/contains? xs x (requires equality semantics; for Snake positions we can store as strings "x,y" to simplify)

If you want to keep stdlib smaller, we can implement these in a shared game library file instead of stdlib—but then we want a way to reuse them cleanly (see includes below).
3.4 Minimal “include” mechanism (recommended for docs-quality)

To make the games readable, we want shared helpers in one place.
Add a compile-time special form:

(include "path") where path is a string literal
The compiler resolves relative to the including file, reads from VFS, parses, and inlines the AST.
No dynamic imports, no runtime loading, no cycles (detect and error).
This is not an FFI; it’s a compile-time convenience.

If you absolutely don’t want include yet, keep each game as a single file and accept some duplication. For “documentation as code,” I strongly recommend include.

4. Game architecture pattern (shared by Pong and Snake)

Make both games follow the same structure so they teach the system consistently.
4.1 The “three-function” pattern

Each game file exposes:

init() -> State
update(state, input, dtMs) -> State
render(state) -> nil (emits gfx commands)

Then the program wires it up:


stateRef = ref/new(init())


lastTRef = ref/new(0)


(gfx/on-frame (fn (t) ...)):

compute dt = clamp(t - lastT, 0, 33) (or similar)
snapshot input (read input/* once per frame)
state := update(state, input, dt)
render(state)


4.2 Input snapshot shape

Do not sprinkle input/key-down? throughout update; keep it centralized.
Example record:

input = rec/new "up" (input/key-down? "ArrowUp") "down" (input/key-down? "ArrowDown") "left" (input/key-down? "ArrowLeft") "right" (input/key-down? "ArrowRight") "w" (input/key-down? "w") "s" (input/key-down? "s") 

That input object is easy to stub in tests.
4.3 Rendering convention


Always clear the screen (gfx/clear) each frame.
Draw UI text (score, instructions) as a consistent overlay.
Keep rendering side-effect-only; state changes happen in update.

4.4 Determinism conventions (for tests)


Snake’s food placement must use math/rand-int and the host RNG should be seedable in Bun tests.
Pong should not use randomness.
Both games should avoid relying on “real time” except through dtMs provided by on-frame, which tests can simulate.


5. Pong design and implementation details

5.1 Gameplay


Two paddles: left controlled by W/S, right by Up/Down.
Ball bounces off paddles and top/bottom walls.
If ball passes a side boundary, opposite player scores, ball resets to center.

5.2 State representation (record)

Example keys:

"w", "h": canvas size
"paddleH", "paddleW"
"p1y", "p2y": paddle y positions
"ballX", "ballY", "ballVx", "ballVy"
"score1", "score2"
"serving": 1 or 2 (optional)

5.3 Update logic (continuous)


Paddle movement:

p1y += speed * dt * (w? - s?)
clamp to [0, h - paddleH]


Ball movement:

ballX += ballVx * dt
ballY += ballVy * dt


Wall collision:

if ballY < 0 or ballY > h: invert ballVy, clamp inside bounds


Paddle collision:


treat paddles as AABBs; if intersect and ball moving toward paddle:


invert ballVx


optionally add “english” based on impact point:

ballVy += (ballY - paddleCenterY) * factor


clamp ballVy to avoid runaway speeds


Score:

if ballX < 0: score2++, reset ball
if ballX > w: score1++, reset ball


5.4 Render logic


gfx/size w h
clear background
draw center line (optional)
draw paddles + ball as rectangles
draw score text

5.5 “Doc header” inside the file

At the top of /system/games/01-pong.tw:

How to play
What to look for (update loop, collision)
Pointers to shared helpers used (e.g., include "../lib/math.tw")

5.6 Pong acceptance criteria


Ball bounces correctly
Scoring works
UI remains responsive
“Stop” stops animation
Code is readable and heavily commented at key points (not every line)


6. Snake design and implementation details

6.1 Gameplay


Grid-based snake moves at fixed tick (e.g., 120ms per step).
Arrow keys change direction (no immediate 180° reversal).
Eat food → grow by 1, score increases.
Hit wall or self → game over; press R to restart.

6.2 Fixed timestep update (important)

Snake should not advance one cell per frame (frame rate dependent). Use accumulator:
State keys:

"tickMs" (e.g., 120)
"accMs" accumulated ms since last move
"dir" one of "up"|"down"|"left"|"right"
"nextDir" (input wants to turn)
"snake" list of cells, head first (cells can be "x,y" strings for simplicity)
"food" cell string
"score"
"alive" boolean

Update(state, input, dt):


accMs += dt


process turn input → update nextDir


while accMs >= tickMs:


accMs -= tickMs


perform one grid step:


apply direction change rules


compute new head cell


if wall collision → alive=false


if self collision → alive=false


else:


prepend new head


if head == food:

score++
place new food using RNG avoiding snake cells


else:

remove last tail cell


return new state


Food placement:


use math/rand-int with a bounded attempt loop:

randomly pick cell
if not in snake → use
else retry (cap attempts; if full grid, win state)


6.3 Rendering


Compute cell size from canvas size / grid dims.
Draw background.
Draw snake head in brighter color (or same; keep minimal).
Draw food.
Draw score + instructions (“R to restart”).

6.4 Snake acceptance criteria


Deterministic tick-based movement.
No 180° immediate reversal.
Eating food grows snake.
Game over state and restart.
Fully testable update logic (see testing plan).


7. How the studio exposes the games (UX + doc friendliness)

Phase 3 should make the games and docs discoverable without needing external instructions.
7.1 Default startup behavior

On first load:


open /system/docs/00-start-here.md in the editor


show a “Quick links” section at the top of that file:

open Pong
open Snake
open Graphics tutorial
where to find compiler/runtime sources


7.2 Run current file (tiny but huge UX win)

Add a toolbar button:


Run current file

compiles and runs whatever file is currently open
if it’s read-only system file, it still runs (fine)


Keep existing “entryPath” mechanism as advanced usage, but make “Run current file” the primary flow for docs + examples + games.


7.3 “Fork to workspace” for safe editing

When user edits a /system/** file:


show non-intrusive banner:

“System files are read-only. [Fork to workspace]”


Fork action copies file to /workspace/... and opens the fork.


This makes exploration safe and encourages modification.

8. Documentation-as-code structure

This is the “clear structure” part: the filesystem itself is a guided textbook.
8.1 Seeded directory layout (in VFS)

Use numeric prefixes so the explorer naturally orders things:
/system
  /docs
    00-start-here.md
    01-how-to-run.md
    02-language-tour.md
    03-stdlib-tour.md
    04-graphics-tour.md
    05-studio-architecture-tour.md
    06-testing-tour.md
    07-games-tour.md

  /examples
    00-hello.tw
    01-functions.tw
    02-refs.tw
    03-graphics-basics.tw
    04-input-basics.tw
    05-animation-basics.tw

  /games
    00-readme.md
    lib/
      00-readme.md
      math.tw
      collision.tw
      input.tw
      render.tw
    01-pong.tw
    02-snake.tw

And separately, the “system source” is already seeded from your repo, e.g.:

/system/src/lang/*
/system/src/runtime/*
/system/src/studio/*

8.2 What goes in each doc file (content contracts)

Keep docs short, concrete, and file-path oriented. The system is a codebase, not a blog post.
00-start-here.md


What this is


60-second checklist:

open Pong, press Run current file
open Snake, press Run current file
open graphics example, tweak numbers


How to fork a file to workspace


02-language-tour.md


Tiny language spec (s-exprs)


Links to examples:

/system/examples/00-hello.tw
/system/examples/01-functions.tw


Where the parser/compiler live in TS and what to read first


04-graphics-tour.md

gfx/* overview
runnable examples that draw and animate
how gfx/on-frame works

07-games-tour.md


game architecture pattern: init/update/render


explicit pointers:

“read Pong first”
“Snake shows fixed timestep + RNG”


how to run and how to modify


8.3 “Documentation in code” conventions (for TS and .tw)

Conventions that make the source teach itself:
Every file starts with a header comment answering:

What this module does
What it exports (or “top-level program”)
Invariants / assumptions
How to test it (path to test file)
For .tw programs: how to run it and controls

Keep helpers tiny and named

prefer clamp, aabb-intersect?, step-snake over inline arithmetic
name intermediate values; avoid “magic math” without a comment

Tests are part of documentation

test file names mirror module names
tests are written as readable behavioral specs

8.4 Executable docs (“examples are chapters”)

Every .md doc should link to at least one runnable .tw file that demonstrates the feature, so users can:

open the example
hit Run current file
tweak it and see changes

This is what makes the environment “computing environment” rather than static docs.

9. Testing plan for Phase 3 (games + docs)

9.1 Unit tests (Bun) — shared helpers


math/clamp correctness
collision helpers (AABB intersections) correctness
deterministic RNG behaviors if math/rand-int is new/modified

9.2 Language-level integration tests (Bun) — game logic

Goal: verify gameplay rules without needing a browser canvas.
Approach:


compile the game module (or an extracted pure module) and evaluate it in Bun tests


provide a runtime with:

stubbed gfx that records commands (optional)
deterministic random() for Snake tests


call init, update, and inspect returned state (records)


Recommended: keep update pure and make it return the next state (record), so TS tests can assert:


Pong:

ball reflects off paddle
scoring increments and ball resets


Snake:

snake advances exactly one cell per tick
reversal is disallowed
food-eating grows snake and increments score
self-collision causes alive=false


9.3 “Docs don’t rot” tests

Add a test that compiles all files under:

/system/examples/*.tw
/system/games/*.tw

For animation examples, don’t run indefinitely:

run only a bounded number of frames/steps (your cooperative scheduler makes this easy)
ensure no runtime errors and at least one render flush occurred (for gfx examples)

9.4 Playwright e2e (browser) — minimal but meaningful

E2E scenarios:


Pong launches

open /system/games/01-pong.tw
click Run current file
assert canvas changes from blank (pixel sample) or scoreboard text appears


Pong input affects paddle

press ArrowUp
wait a few frames
verify paddle position changed (can be via visible text overlay like P2Y: or via pixel sample in paddle region)


Snake launches and moves

open snake, run
wait ~500ms
verify snake head moved (again via overlay text like HEAD: x,y or via pixel sample)


Snake eats food deterministically

run with a known seed (either default fixed seed in studio for e2e, or expose a “seed” input)
simulate direction changes
verify score increments


Stop works

start a game
click Stop
verify scoreboard stops updating / animation halts


Reliability tip: For tests, render some state text (score/head position) to a dedicated DOM element (e.g., <pre id="hud">) so e2e can assert without brittle pixel sampling. That is not user-facing “FFI”—it’s just UI.

10. Implementation sequence (Phase 3 work plan)

Milestone 3.1 — Docs skeleton + quick links + Run current file


Create /system/docs/* and /system/examples/*
Ensure explorer ordering (numeric prefixes)
Add “Run current file” button
On startup, open /system/docs/00-start-here.md

Milestone 3.2 — Shared game library (tiny, readable)


Add /system/games/lib/* with:

math helpers (clamp, etc.)
collision helpers (pong)
input snapshot helpers


Unit tests for helpers


Milestone 3.3 — Pong


Implement init/update/render pattern
Add “How to play” header comment
Add Bun integration tests for collision + scoring invariants
Add Playwright “Pong launches” e2e

Milestone 3.4 — Snake


Implement fixed timestep accumulator
Deterministic RNG food placement (seeded)
Add Bun integration tests for movement/eating/game over
Add Playwright “Snake moves” + “Snake eats food (seeded)” e2e

Milestone 3.5 — Docs polish + “docs compile” gate


Ensure every doc references runnable examples
Add “compile all examples/games” test
Add /system/games/00-readme.md that lists controls and explains structure


11. Definition of Done (Phase 3)

Phase 3 is complete when:


There are two playable games in /system/games/ (Pong + Snake), runnable from the studio, responsive, with Stop working.


The seeded filesystem contains a numbered docs set that:

teaches usage in 5–10 minutes
points to runnable examples
points to source code locations for deeper reading


Tests:

Bun tests verify core game invariants and compile all examples
Playwright e2e verifies both games launch and respond to input


The final build remains a single dist/index.html containing docs, games, and system source.


If you want, next I can turn this Phase 3 plan into:

a concrete file-by-file outline (exact contents + key functions per file), and
a prioritized test list with specific test names and the minimal harness needed to run .tw game logic headlessly under Bun.
No results found