Skip to content

Instantly share code, notes, and snippets.

@mikehostetler
Created February 22, 2026 18:47
Show Gist options
  • Select an option

  • Save mikehostetler/da41ca5d156833374051bf76b14f0daa to your computer and use it in GitHub Desktop.

Select an option

Save mikehostetler/da41ca5d156833374051bf76b14f0daa to your computer and use it in GitHub Desktop.
Jido Fly control plane plan for jido_lib

Jido Fly Control Plane Guide (jido_lib)

Date: 2026-02-22

1) Purpose

Build a production-grade Fly control plane inside jido_lib for all GitHub bots while preserving current synchronous bot APIs.

Core stack:

  • Fly Machines API writes via req_fly
  • Fly GraphQL API for read-side metadata/inventory
  • Optional FLAME execution profile for burst workloads
  • ETS-first queue/state/lease/idempotency storage
  • Existing Jido/Runic bot workflows + jido_vfs artifact checkpoints

2) Design Constraints

  • jido_harness remains provider normalization only.
  • jido_runic remains workflow/delegation runtime.
  • jido_vfs remains artifact persistence boundary.
  • jido_lib owns GitHub orchestration and control-plane behavior.
  • No Postgres/Oban in v1 (ETS-first).
  • GraphQL is read-side only; do not depend on GraphQL for critical write paths.

3) Public API Surface

Add lib/jido_lib/github/control_plane.ex with:

@spec submit(atom(), map(), keyword()) :: {:ok, Jido.Lib.Github.ControlPlane.RunRef.t()} | {:error, term()}
@spec await(String.t(), keyword()) :: {:ok, map()} | {:error, term()}
@spec get(String.t()) :: {:ok, Jido.Lib.Github.ControlPlane.Run.t()} | {:error, term()}
@spec list(map()) :: [Jido.Lib.Github.ControlPlane.Run.t()]
@spec cancel(String.t(), keyword()) :: :ok | {:error, term()}
@spec retry(String.t(), keyword()) :: {:ok, Jido.Lib.Github.ControlPlane.RunRef.t()} | {:error, term()}
@spec reconcile(keyword()) :: {:ok, map()} | {:error, term()}

Behavior:

  • submit/3: validates bot + intake, creates queued run, returns RunRef.
  • await/2: blocks until terminal state or timeout.
  • get/1: full run envelope.
  • list/1: filter by status/bot/owner/repo.
  • cancel/2: marks cancellation and propagates to worker/provider.
  • retry/2: creates a new attempt from prior run envelope.
  • reconcile/1: executes orphan/stale run reconciliation pass.

4) Bot Integration Contract

For all bot agents under lib/jido_lib/github/agents/:

  • IssueTriageBot
  • IssueTriageCriticBot
  • PrBot
  • QualityBot
  • ReleaseBot
  • RoadmapBot

Add additive APIs:

  • enqueue_* helper returning RunRef
  • Existing sync methods remain available
  • Add mode: :inline | :control_plane (default :inline for backward compatibility)

Suggested wrappers:

  • IssueTriageBot.enqueue_issue/2
  • IssueTriageCriticBot.enqueue_issue/2
  • PrBot.enqueue_issue/2
  • QualityBot.enqueue_target/2
  • ReleaseBot.enqueue_repo/2
  • RoadmapBot.enqueue_plan/2

5) Mix Task Changes

Extend existing tasks with:

  • --control-plane
  • --async
  • --wait
  • --run-id

Add operator tasks:

  • mix jido_lib.github.runs
  • mix jido_lib.github.runs.cancel <run_id>
  • mix jido_lib.github.runs.retry <run_id>
  • mix jido_lib.github.runs.reconcile

CLI semantics:

  • --async returns immediately with run ref.
  • --wait blocks for terminal result.
  • If both absent and --control-plane is present, default to wait.

6) Module Layout

6.1 Core Control Plane

Create under lib/jido_lib/github/control_plane/:

  • supervisor.ex
  • state_machine.ex
  • queue.ex
  • scheduler.ex
  • dispatcher.ex
  • worker.ex
  • run_store.ex
  • lease_store.ex
  • idempotency_store.ex
  • reconciler.ex
  • quota.ex
  • policy.ex
  • telemetry.ex
  • run.ex
  • run_ref.ex
  • event.ex

6.2 Fly Platform Boundary

Create under lib/jido_lib/github/platform/fly/:

  • client.ex (behaviour)
  • req_fly_client.ex (Machines write path)
  • graphql_client.ex (read-side path)
  • machine_spec.ex (deterministic machine payloads)

6.3 Executors

Create under lib/jido_lib/github/control_plane/executor/:

  • direct.ex (default path)
  • flame.ex (optional)

6.4 Reused Existing Components

Keep using:

  • lib/jido_lib/bots/foundation/artifact_store.ex
  • lib/jido_lib/bots/foundation/role_runner.ex
  • Existing bot result contracts (no forced schema unification)

7) OTP Runtime Design

Root supervisor: Jido.Lib.Github.ControlPlane.Supervisor

Children:

  1. RunStore (ETS owner)
  2. LeaseStore (ETS owner)
  3. IdempotencyStore (ETS owner)
  4. Queue (GenServer)
  5. Scheduler (GenServer with tick)
  6. WorkerSupervisor (DynamicSupervisor)
  7. Reconciler (periodic GenServer)

Execution flow:

  1. submit validates and enqueues run.
  2. Scheduler admits run by quota/policy.
  3. Dispatcher starts worker.
  4. Worker acquires lease.
  5. Worker executes bot via selected executor profile.
  6. Worker checkpoints manifest/artifacts.
  7. Worker publishes/comments idempotently.
  8. Worker emits terminal event and releases lease.

8) Run State Machine

States:

  • :queued
  • :admitted
  • :provisioning
  • :running
  • :finalizing
  • :publishing
  • :succeeded
  • :failed
  • :canceled
  • :timed_out

Rules:

  • All transitions validated in state_machine.ex.
  • Terminal states are immutable.
  • retry creates a new run attempt rather than mutating terminal state.
  • Every transition emits telemetry + control-plane event.
  • manifest.json checkpoint updated at each major phase.

9) Fly Integration Policy

9.1 Machines API (req_fly) — Write Path

Authoritative path for:

  • Machine create/start/stop/restart/destroy
  • Metadata tags (run_id, bot, attempt, repo, owner)
  • TTL/cleanup metadata

9.2 GraphQL API — Read Path

Read-only usage for:

  • Fleet inventory
  • Region/capacity metadata
  • Historical lookup/diagnostic enrichment

9.3 FLAME — Optional Executor Profile

  • execution_profile: :direct | :flame
  • Default profile is :direct
  • :flame enabled only when configured and available
  • Clear fallback policy (fallback_to_direct?)

10) Reliability Controls

  • Bounded retries with backoff + jitter
  • Lease expiration and stale-worker takeover
  • Cancellation propagation to provider + machine
  • Reconciler for orphan machines and stale queued/running runs
  • Idempotency keys for publish/comment side effects
  • Fail-closed on invalid provider/runtime prerequisites

11) Implementation Plan

Phase 0: Baseline Quality Debt (Cross-Repo)

  • Commit pending quality fixes in:
    • jido_workspace
    • jido_runic
    • jido_codex
    • jido_gemini
  • Verify mix quality passes in touched repos.

Phase 1: Core Control Plane

  • Add stores + queue + scheduler + worker scaffolding.
  • Add run/event structs and telemetry hooks.
  • Add deterministic unit tests for transitions, retries, cancellation.

Phase 2: Fly Platform Layer

  • Add Fly behaviour + req_fly Machines client.
  • Add GraphQL read adapter.
  • Add deterministic machine payload/spec tests.

Phase 3: Bot Integration (All Bots)

  • Add enqueue_* API and mode: :control_plane.
  • Keep synchronous APIs unchanged by default.
  • Add inline-vs-queued parity tests.

Phase 4: Mix Task/Operator Surface

  • Extend task flags.
  • Add operator tasks (runs, cancel, retry, reconcile).
  • Add deterministic task tests.

Phase 5: Hardening

  • Add orphan sweeps, dead-letter categorization.
  • Harden race handling around cancellation and late completion.
  • Add chaos-style deterministic tests.

Phase 6: Documentation

Add docs:

  • docs/fly_control_plane_architecture.md
  • docs/fly_control_plane_ops.md
  • docs/fly_control_plane_rollout.md

Update:

  • README.md
  • mix.exs docs.extras

12) Test Plan

Unit

  • test/jido_lib/github/control_plane/state_machine_test.exs
  • test/jido_lib/github/control_plane/queue_test.exs
  • test/jido_lib/github/control_plane/scheduler_test.exs
  • test/jido_lib/github/control_plane/run_store_test.exs
  • test/jido_lib/github/control_plane/reconciler_test.exs
  • test/jido_lib/github/platform/fly/machine_spec_test.exs
  • test/jido_lib/github/platform/fly/req_fly_client_test.exs
  • test/jido_lib/github/platform/fly/graphql_client_test.exs

Bot Integration

  • Add queued-mode tests for all bot run suites under test/jido_lib/github/agents/.
  • Assert result-map compatibility with existing inline outputs.

Mix Task

  • Extend task tests for --control-plane, --async, --wait, --run-id.
  • Add run-operator task tests.

Integration (@tag :integration)

  • Machines lifecycle write calls.
  • GraphQL read queries.
  • Optional FLAME profile smoke.
  • Idempotent repost behavior with repeated run_id.

Gates

  • mix test
  • mix quality
  • Ignore jido_workspace_scenarios for this workstream.

13) Definition of Done

  1. All GitHub bots support control-plane mode and keep synchronous mode behavior.
  2. Fly Machines write path uses req_fly boundary.
  3. GraphQL is read-only in control plane.
  4. FLAME profile is optional and tested.
  5. ETS queue/state/lease/reconcile are covered by deterministic tests.
  6. Operator run-management Mix tasks are implemented.
  7. Documentation is publish-ready for internal/external handoff.
  8. jido_lib quality and test gates are green.

14) Defaults and Assumptions

  • req_fly is the canonical Fly write client.
  • ETS durability is acceptable for v1.
  • .jido/runs/<run_id>/manifest.json is the audit/recovery backbone.
  • Existing bot result shapes are preserved.
  • Feature flags gate rollout by bot and execution profile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment