Skip to content

Instantly share code, notes, and snippets.

@GrantGochnauer
Created September 20, 2025 11:39
Show Gist options
  • Select an option

  • Save GrantGochnauer/fd1bf5723b7b4a1891d789928f91ed46 to your computer and use it in GitHub Desktop.

Select an option

Save GrantGochnauer/fd1bf5723b7b4a1891d789928f91ed46 to your computer and use it in GitHub Desktop.
Grant's Codex Tech Plan Template
# Technical Plan Template — Use For New Milestones/Initiatives
Status: Draft | Proposed | Accepted | Superseded
Doc Version: 0.2
Owner: <name/team>
Reviewers: <names>
Date: <YYYY-MM-DD>
> LLM Authoring Instructions
> - Use this template as the canonical checklist. Match or exceed the depth defined here so an engineer can implement without guessing.
> - Include, at minimum: concrete interface contracts (Java signatures), config keys (names, defaults, semantics), metrics (names/tags), data model changes (nodes/tables, properties, indexes/constraints), sequence diagrams, JSON schemas, prompts/templates (if relevant), testing strategy, rollout and rollback plans.
> - Favor actionable guidance over narrative. Specify file paths to change, classes to introduce/update, and example code signatures and Cypher/SQL where applicable.
> - Explicitly address multi-tenancy, safety, auditability, performance, and cost. State how tenant isolation is enforced at the data layer (filters/partitions), how outputs are audited, and how latency/tokens are managed.
> - Provide feature flags/toggles for risky features, with kill-switches and fallback paths, and document safe defaults.
> - List risks with concrete mitigations and measurable success criteria and thresholds.
## Authority & Precedence (Required)
- This plan is the single source of truth for its scope. It supersedes conflicting guidance in older milestone docs. When conflicts arise, follow this plan and update prior docs/tests (including scenarios/fixtures) to align.
> Codex Execution Contract (Required)
> - Maintain a live plan using the `update_plan` tool with each milestone as a step. Exactly one step is `in_progress` at a time.
> - When completing a milestone, append a “Milestone Completion Report” (see template below) to this doc and mark the step `completed`.
> - Always run Gradle tests using the global `gradle` CLI with progress streaming: `gradle test -i --console=plain` (allow up to 30 minutes).
> - Prefer imports in code examples; avoid fully qualified class names in code snippets.
> - Update docs (including ADRs) and YAML property descriptions when behavior or configuration changes.
> - Follow AGENTS.md for coding standards, tenancy guardrails, debug/admin gating, lane order, and CI gates. Treat AGENTS.md as a contract.
### Execution Rules (Milestone-by-Milestone)
- Before marking any milestone complete:
- Update this plan with a Milestone Completion Report that includes: summary; detailed code changes; explanation of how the milestone advances goals; and identified enhancements/refactoring opportunities to schedule into upcoming milestones.
- Run and fix the full test suite: first `gradle test -i --console=plain` (30‑min timeout allowance), then `gradle e2eTest -i --console=plain`. If Pepper is available, optionally run readiness/demo lanes per AGENTS.md. All failures must be triaged and resolved.
- Always follow the Golden Path; avoid feature flags. If a toggle is strictly necessary for tests, confine it to test profiles, document its purpose and removal plan, and keep production defaults unchanged.
- Identify and remove legacy/deprecated code as you go; do not leave duplicate or dead paths.
### Milestone Completion Procedure (Copy into Plan)
- Append a “Milestone Completion Report” under the milestone with:
- Summary of changes delivered
- Detailed code/config/test changes (files/classes/keys)
- How the milestone advances the plan’s goals
- Enhancements/refactors discovered → add to upcoming milestones (owner/ETA)
- Run required “Test Lanes (DoD)” for the milestone using global Gradle with progress streaming (`-i --console=plain`) and a 30‑minute budget.
- Commit locally with detailed, scoped messages; do not push from this workflow.
- Update older docs/tests that conflict with the plan.
### Repo & Git Process (Required)
- Work on the `develop` branch (no new branch creation for this workflow).
- Commit changes with a detailed changelog (scoped, descriptive commits). Do not push from this workflow.
- Keep commits minimal and focused (one concern per commit). Update docs and YAML comments alongside code changes.
## Production Code Quality Guardrails
- Do not add production workarounds to satisfy tests. Fix the tests (or provide test‑profile scaffolding) rather than introducing brittle heuristics into main code.
- If a temporary workaround must be introduced due to a hard blocker, it must:
- Live under test profiles (src/test) or be guarded by a test‑only bean/config; and
- Include an explicit removal task in the next milestone with owner/date; and
- Be documented in the Milestone Completion Report as “temporary, test‑only.”
- Any production heuristic added for tests must be removed in the next milestone and replaced with proper stubs/mocks or policy/test fixtures.
## Plan Execution State (Live Tracker)
- Overall status: Draft | In‑Progress | Blocked | Completed.
- Milestones
- [ ] M1 — <title> (owner, ETA)
- [ ] M2 — <title> (owner, ETA)
- [ ] M3 — <title> (owner, ETA)
- …
- Links: Recent commits (hashes), test reports, dashboards.
> Tip: for each milestone list the “Test Lanes Required (DoD)” to make acceptance unambiguous (see Test Plan).
---
## Summary
One-paragraph summary of the problem and the proposed solution.
Example: “Enable LLM-first composition with verifiers to increase creative capability while preserving tenant isolation and audit.”
## Background & Context
- Current behavior/architecture (cite components/classes).
- Pain points and gaps.
- Related documents (ADRs, deep dives, runbooks).
- e.g., Architecture Overview, existing milestone docs, relevant ADRs.
### Why Now
- Explain the motivation and timing: what triggered this work, what downstream efforts it enables, and the cost of deferring.
## Problem Statement & Non-Goals
- Problem to solve, with examples and constraints.
- Explicit non-goals to avoid scope creep.
### Assumptions & Preconditions
- List assumptions about data, tenants, external systems, availability of APIs, or feature flags.
- Preconditions for starting execution (e.g., sample data seeded, env vars set).
## Goals & Success Criteria
- Functional goals (bullet list) with measurable targets.
- Non-functional goals: performance/latency, tokens/cost, reliability, security, maintainability.
- Success metrics and thresholds (e.g., +10% nDCG@10, −15% tokens/answer, p95 < 1.2× baseline).
## Requirements
- Functional requirements (behaviors, edge cases). For each requirement, capture both a *system requirement* (technical behavior) and a *user requirement* (plain language description of the customer impact).
- Non-functional requirements (SLOs, availability, data retention).
- Constraints (tenancy, compliance, data locality, dependencies).
### Out of Scope (Explicit)
- Call out adjacent ideas intentionally excluded to prevent duplication or hidden scope.
## Stakeholders & Tenancy
- Stakeholders (teams, roles).
- Tenancy boundaries and access controls; how requests are scoped and enforced at the data layer.
### Tenancy Checklist (Must Stay Green)
- All queries are tenant-scoped (filters in vector/graph/SQL layers).
- No free-form Cypher; only curated graph templates with tenant filters and LIMIT caps.
- Attributes/labels resolved via `AttributesRegistry`; no hard-coded tenant fields/values.
## Architecture Overview
### Current State (As-Is)
- Components involved.
- Key data flows (brief).
- Diagrams (optional) — sequence or block.
### Proposed Design (To-Be)
- Component changes: new services, updated classes, interfaces.
- Data model changes (graph nodes/rels, SQL tables), with indexes/constraints.
- Sequence diagrams for critical flows (mermaid).
- Config/feature flags required.
- Security, compliance, and privacy considerations (tenant isolation, PII/PHI handling).
- Observability: logs, metrics, traces; dashboards to update or create.
- Performance & scalability: concurrency, batching, backpressure, caching.
### Golden Path & Backwards‑Compat Policy
- Golden path by default. Avoid feature flags; when unavoidable, document: key, default, scope, safe fallback, and removal date.
- Backout plan and ownership must be stated.
- Remove legacy/deprecated code paths proactively; do not leave parallel implementations.
### Parity & Compatibility
- Parity Contract Matrix: explicitly compare the proposed path/mode to the golden path across:
- Sidecar semantics (fields, validation, Confidence Gate)
- Verifiers integration (Citation/Evidence/Policy)
- Audit behavior and storage
- Tenancy enforcement at all data layers
- DTO shapes at boundaries (API/SSE/GraphQL)
- Error envelopes and status codes
- Metrics and tracing spans
- Backpressure/timeout/cancellation behavior
- For any intentional deviations, list the follow-up plan id, owner, and timeline; provide compatibility strategy (adapters, deprecations, removal criteria).
### Design Rationale (Why These Choices)
- Summarize key decisions, alternatives considered, trade-offs, and why the chosen approach best meets goals.
#### Example Sequence (replace with your flow)
```mermaid
sequenceDiagram
autonumber
participant C as Client
participant S as Service
participant D as Data Store
C->>S: Request (tenant, payload)
S->>D: Query/Write (tenant‑scoped)
D-->>S: Result
S-->>C: Response (+audit id)
```
## Detailed Design Elements
### Interfaces & DTOs (Java)
Provide exact method signatures and minimal javadocs. Example:
```java
import java.util.List;
import org.springframework.ai.document.Document;
public interface KeywordRetriever {
List<Document> search(String tenantId, String query, int topK);
}
public record AggregateSpec(String groupBy, List<QuerySpec.Filter> filters, Sort order, Integer limit) {}
```
### Configuration & Feature Flags
List all keys, defaults, and behavior:
- `app.feature.name.enabled=true` — enables X.
- `app.feature.name.limit=200` — cap for Y.
- `spring.ai.vector-store.neo4j.index-name` — index routing notes.
For every new property added to Spring YAML, include a one-line description and valid values in this plan and inline comments in the YAML.
Feature Flags Assessment (Optional but Recommended)
- Add a short table enumerating flags in scope (name, type: capability/kill‑switch/debug/budget/test, prod/non‑prod/CI defaults, rationale, recommendation).
- Document removal/consolidation plans and update inline YAML comments.
### Data Model Updates
- Graph: nodes, relationships, properties; constraints and indexes (Cypher).
- SQL: tables, columns, indexes; migration steps.
- Keys/joins between stores.
### Algorithms & Policies
- Describe any selection/merge policy (e.g., MMR, RRF) with formulas, parameters, and early-stop conditions.
- Provide pseudocode if non-trivial.
### Safety & Compliance
- Tenant enforcement (filters at the store, never only in prompts).
- Policy profiles (disclaimers, banned terms, audience rules).
- Redaction or minimization where needed.
- Audit retention & access.
Security & Tenancy Authority
- Resolve tenant server‑side (principal/session) and validate against headers; reject mismatches. Apply the same rule to streaming (SSE) and cancellation endpoints.
- Avoid logging PII/secrets; gate full prompt logging behind explicit non‑prod flags.
### Observability
- Metrics: names and tags.
- Tracing spans.
- Log events.
- Example dashboards and alerts.
### Boundary DTO Reuse Checklist (Must Address)
- Reuse canonical DTOs for domain and sidecar where feasible. If a boundary needs a different wire shape (e.g., SSE event), provide a mapper at the edge and avoid ad‑hoc DTOs.
- Document any temporary duplication and the removal plan (milestone id + owner).
### Cross‑Plan Dependency Grid
- Upstream dependencies (plans/features this relies on) and how their invariants are maintained.
- Downstream dependents likely affected.
- For each dependency, state: invariant, impact, and mitigation if not in scope.
## Implementation Plan
Break into phases/milestones with deliverables, owners, and acceptance criteria. Include toggles and rollbacks.
### Milestone X (N weeks): Title
- Summary of changes and why they help us achieve our goal: …
- Scope: …
- Deliverables: …
- Interfaces/Files: …
- Acceptance Criteria: …
- Flags: …
- Risks/Mitigations: …
Repeat per milestone as needed.
### Milestone Completion Report (Append After Each Milestone)
Fill this block when the milestone is marked complete. Include: summary, detailed changes (files/classes/config/tests), goals alignment, and follow‑up items with owners/ETAs.
## Test Plan
Define lanes to run per milestone (fill concrete tasks in the plan):
- Unit + integration: `gradle test -i --console=plain` (30‑minute budget)
- E2E (offline): `gradle e2eTest -i --console=plain`
- Policy/schema validation (as applicable)
- ArchUnit/structural rules (as applicable)
- Perf advisory gate (as applicable)
- Pepper online + demo readiness + demo script suites (as applicable)
- Transport/contract tests (e.g., SSE or GraphQL contracts)
For each milestone, add “Test Lanes Required (DoD)” so acceptance is explicit.
## Optional Audit Appendices (Use as Needed)
- Policy Map & Gap vs Code — YAML rule → code sites → action → rationale.
- Fallbacks Inventory & RCA — location, trigger, behavior, frequency, RCA, keep/remove/guard.
- Legacy/Unused Code Audit — jdeps reachability, ArchUnit bans, coverage signals, candidate actions.
- Summary of Changes: one paragraph capturing what shipped and why it matters.
- Code Changes: key classes/files updated or added (paths).
- Tests & Results: commands run and outcomes. Must run and pass `gradle test -i --console=plain` (30‑min allowance) and `gradle e2eTest -i --console=plain`; include any focused lanes (e.g., demoScript if Pepper is reachable). Document failures triaged and fixes applied.
- Validation Evidence: screenshots/log snippets/metrics deltas (token/latency, KPIs).
- Docs Updated: docs/ files, ADRs created/updated, YAML comments added.
- Observations: notable findings, caveats, follow-up items uncovered.
- Enhancements & Refactoring Opportunities Identified: items to fold into upcoming milestones with brief rationale.
- Next Steps: options considered; recommended next step and why (tie to goals).
- Backout Plan Used? yes/no (if used, describe).
- Refactoring opportunities identified.
- Owner/Date: who completed, when.
Add the following if applicable:
- Changelog (concise bullets of the most important deltas).
- SSE/Contract Invariants Verified (met/not met).
- Metrics Observed (e.g., `advisor.*` counters/timers; deltas vs baseline).
### CI Gates (Add To PR Checklist and Pipelines)
- Contract tests covering acceptance criteria for new/changed interfaces (e.g., SSE event payloads).
- Parity matrix items implemented or explicitly deferred with plan linkage.
- Static rule to prevent boundary DTO drift (e.g., forbid deprecated streaming DTOs where `SidecarDto` is required).
- Secret scan, dependency scan, and SBOM generation remain green.
## Testing Strategy
- Unit: classes, edge cases, error handling.
- Integration: Testcontainers for Neo4j/DB; WireMock for external APIs.
- E2E: flows through ChatService/Planner/Tools if applicable.
- Evaluations (LLM/Retrieval): datasets, metrics (P@K, nDCG, tokens, latency), reporting.
- Performance/Soak: concurrency, throughput, memory.
- Chaos: retry/backoff, DLQ, recovery.
### Contract Tests (Required for Interfaces)
- For any public API or streaming interface, add a contract test that asserts event/response shape, required fields, and parity vs. the golden path (e.g., `complete.sidecar.citations` present under retrieval).
- Ensure cancellation/timeout semantics are exercised if applicable.
#### SSE/Streaming Invariants (If Applicable)
- Event order: `start` → `meta` → `delta*` → `complete|error`.
- `meta` includes `route`, `modelId`, `memoryUsed` (when available).
- `complete.sidecar` present when sidecar exists; `complete.sidecar.meta.gate` on non‑PASS outcomes.
- Backpressure: delta batching cadence (e.g., `app.chat.streaming.batch-ms`) observed within tolerance.
- Cancellation: `/api/chat/stop?runId=…` emits `complete` with `finishReason=cancelled`.
### Test Commands (Codex Must Use)
- `gradle test -i --console=plain` — stream progress; allow up to 30 minutes.
- Sequence per AGENTS: `test` → `e2eTest` → optional `llmTest` → `e2ePepperOnline` → `demoReadinessGate` → `demoScriptE2ENonVector` → optional `demoScriptE2EVector`.
- Stop rules: stop on `test` red; if `e2eTest` fails, fix wiring and skip online/demo lanes.
#### Readiness Gates (When External Systems/Data Are Required)
- Pepper readiness: `gradle e2ePepperProbe -i --console=plain` and `gradle demoReadinessGate -i --console=plain` prior to demo/e2e lanes.
- Tenant data assumptions: list required docs/attributes; link to seeding source.
- Env vars and local secrets: document keys and where they are read (.env, application-local-secrets.yaml).
### Prompt Testing (When Debugging Prompts)
- Add the prompt to `src/test/resources/demo-script/scenarios.yaml` as per plan-demo-script guidelines.
- Run `gradle demoScriptE2ENonVector -i --console=plain -Dtests.demo.scenarios=<id>` and ensure the test fails before fixes; then iterate until green.
## Rollout Plan
- Dark launch flags and staging validation.
- Shadow A/B (if applicable).
- Canary tenants; expand cohort.
- Kill switch and fallback plan.
- Communication and documentation updates.
## Formatting & Compliance (PR Checklist)
- Run `gradle spotlessApply` (or note repo-wide formatter constraints if present).
- Prefer Java imports; avoid fully qualified class names in code.
- Optionally run `gradle rewriteRun` with recipe `com.vodori.FqcnCleanup` to normalize imports.
- Run `gradle agentsCompliance -i --console=plain` and address advisory notes.
## Risks & Mitigations
- Enumerate top risks with concrete mitigations and detection signals.
## Documentation Updates
- Files to update (Architecture Overview, Tools Catalog, API Reference, Runbooks).
- New docs to add.
## Success Metrics & Review
- Target metrics, measurement windows, and data sources.
- Who reviews and when; exit criteria for the effort.
## Appendix
### JSON Schemas (if applicable)
Provide draft schemas for structured outputs (e.g., sidecars), request/response payloads.
### Prompt Templates (if applicable)
Guidance for dual‑channel outputs (human + JSON), variable placeholders, and compliance reminders.
### Example Config Blocks
Show production‑ready toggles and safe defaults with environment overrides.
---
> Authoring Checklist (Must Do)
> - [ ] Background, problems, non-goals, goals
> - [ ] Architecture (current vs proposed), diagrams
> - [ ] Interfaces/DTOs with signatures
> - [ ] Config keys, defaults, flags
> - [ ] Data model changes, constraints, migrations
> - [ ] Algorithms/policies with parameters
> - [ ] Safety/compliance, multi-tenancy enforcement
> - [ ] Observability metrics and traces
> - [ ] Parity & Compatibility matrix completed
> - [ ] Boundary DTO reuse checklist addressed
> - [ ] Cross‑plan dependency grid
> - [ ] Detailed milestones with acceptance criteria
> - [ ] Milestone Completion Report for each completed milestone
> - [ ] Testing plan (unit/integration/evals/perf/chaos) with Gradle commands
> - [ ] Contract tests for public interfaces (API/SSE)
> - [ ] CI gates (contract tests, parity checks, static DTO rules, SBOM, scans)
> - [ ] Rollout and rollback
> - [ ] Risks/mitigations
> - [ ] Success metrics and review plan
> - [ ] ADR needed? If yes, add under `docs/adr` with background, context, reasoning, goals, outcome, and implementation changes
---
## Decision Log (Keep Updated)
Record key decisions, alternatives considered, and rationale.
- <YYYY-MM-DD> — Decision: <what>. Alternatives: <A,B>. Why: <reasoning>. Impact: <perf/safety/UX>. Owner: <name>.
## Open Questions
- List unanswered questions and the plan to resolve them (ownership, timeline).
Performance Gates
- Define perf budgets (e.g., “perf advisory gate”) and how to validate them (tests/CI metrics).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment