Created
September 20, 2025 11:39
-
-
Save GrantGochnauer/fd1bf5723b7b4a1891d789928f91ed46 to your computer and use it in GitHub Desktop.
Grant's Codex Tech Plan Template
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Technical Plan Template — Use For New Milestones/Initiatives | |
| Status: Draft | Proposed | Accepted | Superseded | |
| Doc Version: 0.2 | |
| Owner: <name/team> | |
| Reviewers: <names> | |
| Date: <YYYY-MM-DD> | |
| > LLM Authoring Instructions | |
| > - Use this template as the canonical checklist. Match or exceed the depth defined here so an engineer can implement without guessing. | |
| > - Include, at minimum: concrete interface contracts (Java signatures), config keys (names, defaults, semantics), metrics (names/tags), data model changes (nodes/tables, properties, indexes/constraints), sequence diagrams, JSON schemas, prompts/templates (if relevant), testing strategy, rollout and rollback plans. | |
| > - Favor actionable guidance over narrative. Specify file paths to change, classes to introduce/update, and example code signatures and Cypher/SQL where applicable. | |
| > - Explicitly address multi-tenancy, safety, auditability, performance, and cost. State how tenant isolation is enforced at the data layer (filters/partitions), how outputs are audited, and how latency/tokens are managed. | |
| > - Provide feature flags/toggles for risky features, with kill-switches and fallback paths, and document safe defaults. | |
| > - List risks with concrete mitigations and measurable success criteria and thresholds. | |
| ## Authority & Precedence (Required) | |
| - This plan is the single source of truth for its scope. It supersedes conflicting guidance in older milestone docs. When conflicts arise, follow this plan and update prior docs/tests (including scenarios/fixtures) to align. | |
| > Codex Execution Contract (Required) | |
| > - Maintain a live plan using the `update_plan` tool with each milestone as a step. Exactly one step is `in_progress` at a time. | |
| > - When completing a milestone, append a “Milestone Completion Report” (see template below) to this doc and mark the step `completed`. | |
| > - Always run Gradle tests using the global `gradle` CLI with progress streaming: `gradle test -i --console=plain` (allow up to 30 minutes). | |
| > - Prefer imports in code examples; avoid fully qualified class names in code snippets. | |
| > - Update docs (including ADRs) and YAML property descriptions when behavior or configuration changes. | |
| > - Follow AGENTS.md for coding standards, tenancy guardrails, debug/admin gating, lane order, and CI gates. Treat AGENTS.md as a contract. | |
| ### Execution Rules (Milestone-by-Milestone) | |
| - Before marking any milestone complete: | |
| - Update this plan with a Milestone Completion Report that includes: summary; detailed code changes; explanation of how the milestone advances goals; and identified enhancements/refactoring opportunities to schedule into upcoming milestones. | |
| - Run and fix the full test suite: first `gradle test -i --console=plain` (30‑min timeout allowance), then `gradle e2eTest -i --console=plain`. If Pepper is available, optionally run readiness/demo lanes per AGENTS.md. All failures must be triaged and resolved. | |
| - Always follow the Golden Path; avoid feature flags. If a toggle is strictly necessary for tests, confine it to test profiles, document its purpose and removal plan, and keep production defaults unchanged. | |
| - Identify and remove legacy/deprecated code as you go; do not leave duplicate or dead paths. | |
| ### Milestone Completion Procedure (Copy into Plan) | |
| - Append a “Milestone Completion Report” under the milestone with: | |
| - Summary of changes delivered | |
| - Detailed code/config/test changes (files/classes/keys) | |
| - How the milestone advances the plan’s goals | |
| - Enhancements/refactors discovered → add to upcoming milestones (owner/ETA) | |
| - Run required “Test Lanes (DoD)” for the milestone using global Gradle with progress streaming (`-i --console=plain`) and a 30‑minute budget. | |
| - Commit locally with detailed, scoped messages; do not push from this workflow. | |
| - Update older docs/tests that conflict with the plan. | |
| ### Repo & Git Process (Required) | |
| - Work on the `develop` branch (no new branch creation for this workflow). | |
| - Commit changes with a detailed changelog (scoped, descriptive commits). Do not push from this workflow. | |
| - Keep commits minimal and focused (one concern per commit). Update docs and YAML comments alongside code changes. | |
| ## Production Code Quality Guardrails | |
| - Do not add production workarounds to satisfy tests. Fix the tests (or provide test‑profile scaffolding) rather than introducing brittle heuristics into main code. | |
| - If a temporary workaround must be introduced due to a hard blocker, it must: | |
| - Live under test profiles (src/test) or be guarded by a test‑only bean/config; and | |
| - Include an explicit removal task in the next milestone with owner/date; and | |
| - Be documented in the Milestone Completion Report as “temporary, test‑only.” | |
| - Any production heuristic added for tests must be removed in the next milestone and replaced with proper stubs/mocks or policy/test fixtures. | |
| ## Plan Execution State (Live Tracker) | |
| - Overall status: Draft | In‑Progress | Blocked | Completed. | |
| - Milestones | |
| - [ ] M1 — <title> (owner, ETA) | |
| - [ ] M2 — <title> (owner, ETA) | |
| - [ ] M3 — <title> (owner, ETA) | |
| - … | |
| - Links: Recent commits (hashes), test reports, dashboards. | |
| > Tip: for each milestone list the “Test Lanes Required (DoD)” to make acceptance unambiguous (see Test Plan). | |
| --- | |
| ## Summary | |
| One-paragraph summary of the problem and the proposed solution. | |
| Example: “Enable LLM-first composition with verifiers to increase creative capability while preserving tenant isolation and audit.” | |
| ## Background & Context | |
| - Current behavior/architecture (cite components/classes). | |
| - Pain points and gaps. | |
| - Related documents (ADRs, deep dives, runbooks). | |
| - e.g., Architecture Overview, existing milestone docs, relevant ADRs. | |
| ### Why Now | |
| - Explain the motivation and timing: what triggered this work, what downstream efforts it enables, and the cost of deferring. | |
| ## Problem Statement & Non-Goals | |
| - Problem to solve, with examples and constraints. | |
| - Explicit non-goals to avoid scope creep. | |
| ### Assumptions & Preconditions | |
| - List assumptions about data, tenants, external systems, availability of APIs, or feature flags. | |
| - Preconditions for starting execution (e.g., sample data seeded, env vars set). | |
| ## Goals & Success Criteria | |
| - Functional goals (bullet list) with measurable targets. | |
| - Non-functional goals: performance/latency, tokens/cost, reliability, security, maintainability. | |
| - Success metrics and thresholds (e.g., +10% nDCG@10, −15% tokens/answer, p95 < 1.2× baseline). | |
| ## Requirements | |
| - Functional requirements (behaviors, edge cases). For each requirement, capture both a *system requirement* (technical behavior) and a *user requirement* (plain language description of the customer impact). | |
| - Non-functional requirements (SLOs, availability, data retention). | |
| - Constraints (tenancy, compliance, data locality, dependencies). | |
| ### Out of Scope (Explicit) | |
| - Call out adjacent ideas intentionally excluded to prevent duplication or hidden scope. | |
| ## Stakeholders & Tenancy | |
| - Stakeholders (teams, roles). | |
| - Tenancy boundaries and access controls; how requests are scoped and enforced at the data layer. | |
| ### Tenancy Checklist (Must Stay Green) | |
| - All queries are tenant-scoped (filters in vector/graph/SQL layers). | |
| - No free-form Cypher; only curated graph templates with tenant filters and LIMIT caps. | |
| - Attributes/labels resolved via `AttributesRegistry`; no hard-coded tenant fields/values. | |
| ## Architecture Overview | |
| ### Current State (As-Is) | |
| - Components involved. | |
| - Key data flows (brief). | |
| - Diagrams (optional) — sequence or block. | |
| ### Proposed Design (To-Be) | |
| - Component changes: new services, updated classes, interfaces. | |
| - Data model changes (graph nodes/rels, SQL tables), with indexes/constraints. | |
| - Sequence diagrams for critical flows (mermaid). | |
| - Config/feature flags required. | |
| - Security, compliance, and privacy considerations (tenant isolation, PII/PHI handling). | |
| - Observability: logs, metrics, traces; dashboards to update or create. | |
| - Performance & scalability: concurrency, batching, backpressure, caching. | |
| ### Golden Path & Backwards‑Compat Policy | |
| - Golden path by default. Avoid feature flags; when unavoidable, document: key, default, scope, safe fallback, and removal date. | |
| - Backout plan and ownership must be stated. | |
| - Remove legacy/deprecated code paths proactively; do not leave parallel implementations. | |
| ### Parity & Compatibility | |
| - Parity Contract Matrix: explicitly compare the proposed path/mode to the golden path across: | |
| - Sidecar semantics (fields, validation, Confidence Gate) | |
| - Verifiers integration (Citation/Evidence/Policy) | |
| - Audit behavior and storage | |
| - Tenancy enforcement at all data layers | |
| - DTO shapes at boundaries (API/SSE/GraphQL) | |
| - Error envelopes and status codes | |
| - Metrics and tracing spans | |
| - Backpressure/timeout/cancellation behavior | |
| - For any intentional deviations, list the follow-up plan id, owner, and timeline; provide compatibility strategy (adapters, deprecations, removal criteria). | |
| ### Design Rationale (Why These Choices) | |
| - Summarize key decisions, alternatives considered, trade-offs, and why the chosen approach best meets goals. | |
| #### Example Sequence (replace with your flow) | |
| ```mermaid | |
| sequenceDiagram | |
| autonumber | |
| participant C as Client | |
| participant S as Service | |
| participant D as Data Store | |
| C->>S: Request (tenant, payload) | |
| S->>D: Query/Write (tenant‑scoped) | |
| D-->>S: Result | |
| S-->>C: Response (+audit id) | |
| ``` | |
| ## Detailed Design Elements | |
| ### Interfaces & DTOs (Java) | |
| Provide exact method signatures and minimal javadocs. Example: | |
| ```java | |
| import java.util.List; | |
| import org.springframework.ai.document.Document; | |
| public interface KeywordRetriever { | |
| List<Document> search(String tenantId, String query, int topK); | |
| } | |
| public record AggregateSpec(String groupBy, List<QuerySpec.Filter> filters, Sort order, Integer limit) {} | |
| ``` | |
| ### Configuration & Feature Flags | |
| List all keys, defaults, and behavior: | |
| - `app.feature.name.enabled=true` — enables X. | |
| - `app.feature.name.limit=200` — cap for Y. | |
| - `spring.ai.vector-store.neo4j.index-name` — index routing notes. | |
| For every new property added to Spring YAML, include a one-line description and valid values in this plan and inline comments in the YAML. | |
| Feature Flags Assessment (Optional but Recommended) | |
| - Add a short table enumerating flags in scope (name, type: capability/kill‑switch/debug/budget/test, prod/non‑prod/CI defaults, rationale, recommendation). | |
| - Document removal/consolidation plans and update inline YAML comments. | |
| ### Data Model Updates | |
| - Graph: nodes, relationships, properties; constraints and indexes (Cypher). | |
| - SQL: tables, columns, indexes; migration steps. | |
| - Keys/joins between stores. | |
| ### Algorithms & Policies | |
| - Describe any selection/merge policy (e.g., MMR, RRF) with formulas, parameters, and early-stop conditions. | |
| - Provide pseudocode if non-trivial. | |
| ### Safety & Compliance | |
| - Tenant enforcement (filters at the store, never only in prompts). | |
| - Policy profiles (disclaimers, banned terms, audience rules). | |
| - Redaction or minimization where needed. | |
| - Audit retention & access. | |
| Security & Tenancy Authority | |
| - Resolve tenant server‑side (principal/session) and validate against headers; reject mismatches. Apply the same rule to streaming (SSE) and cancellation endpoints. | |
| - Avoid logging PII/secrets; gate full prompt logging behind explicit non‑prod flags. | |
| ### Observability | |
| - Metrics: names and tags. | |
| - Tracing spans. | |
| - Log events. | |
| - Example dashboards and alerts. | |
| ### Boundary DTO Reuse Checklist (Must Address) | |
| - Reuse canonical DTOs for domain and sidecar where feasible. If a boundary needs a different wire shape (e.g., SSE event), provide a mapper at the edge and avoid ad‑hoc DTOs. | |
| - Document any temporary duplication and the removal plan (milestone id + owner). | |
| ### Cross‑Plan Dependency Grid | |
| - Upstream dependencies (plans/features this relies on) and how their invariants are maintained. | |
| - Downstream dependents likely affected. | |
| - For each dependency, state: invariant, impact, and mitigation if not in scope. | |
| ## Implementation Plan | |
| Break into phases/milestones with deliverables, owners, and acceptance criteria. Include toggles and rollbacks. | |
| ### Milestone X (N weeks): Title | |
| - Summary of changes and why they help us achieve our goal: … | |
| - Scope: … | |
| - Deliverables: … | |
| - Interfaces/Files: … | |
| - Acceptance Criteria: … | |
| - Flags: … | |
| - Risks/Mitigations: … | |
| Repeat per milestone as needed. | |
| ### Milestone Completion Report (Append After Each Milestone) | |
| Fill this block when the milestone is marked complete. Include: summary, detailed changes (files/classes/config/tests), goals alignment, and follow‑up items with owners/ETAs. | |
| ## Test Plan | |
| Define lanes to run per milestone (fill concrete tasks in the plan): | |
| - Unit + integration: `gradle test -i --console=plain` (30‑minute budget) | |
| - E2E (offline): `gradle e2eTest -i --console=plain` | |
| - Policy/schema validation (as applicable) | |
| - ArchUnit/structural rules (as applicable) | |
| - Perf advisory gate (as applicable) | |
| - Pepper online + demo readiness + demo script suites (as applicable) | |
| - Transport/contract tests (e.g., SSE or GraphQL contracts) | |
| For each milestone, add “Test Lanes Required (DoD)” so acceptance is explicit. | |
| ## Optional Audit Appendices (Use as Needed) | |
| - Policy Map & Gap vs Code — YAML rule → code sites → action → rationale. | |
| - Fallbacks Inventory & RCA — location, trigger, behavior, frequency, RCA, keep/remove/guard. | |
| - Legacy/Unused Code Audit — jdeps reachability, ArchUnit bans, coverage signals, candidate actions. | |
| - Summary of Changes: one paragraph capturing what shipped and why it matters. | |
| - Code Changes: key classes/files updated or added (paths). | |
| - Tests & Results: commands run and outcomes. Must run and pass `gradle test -i --console=plain` (30‑min allowance) and `gradle e2eTest -i --console=plain`; include any focused lanes (e.g., demoScript if Pepper is reachable). Document failures triaged and fixes applied. | |
| - Validation Evidence: screenshots/log snippets/metrics deltas (token/latency, KPIs). | |
| - Docs Updated: docs/ files, ADRs created/updated, YAML comments added. | |
| - Observations: notable findings, caveats, follow-up items uncovered. | |
| - Enhancements & Refactoring Opportunities Identified: items to fold into upcoming milestones with brief rationale. | |
| - Next Steps: options considered; recommended next step and why (tie to goals). | |
| - Backout Plan Used? yes/no (if used, describe). | |
| - Refactoring opportunities identified. | |
| - Owner/Date: who completed, when. | |
| Add the following if applicable: | |
| - Changelog (concise bullets of the most important deltas). | |
| - SSE/Contract Invariants Verified (met/not met). | |
| - Metrics Observed (e.g., `advisor.*` counters/timers; deltas vs baseline). | |
| ### CI Gates (Add To PR Checklist and Pipelines) | |
| - Contract tests covering acceptance criteria for new/changed interfaces (e.g., SSE event payloads). | |
| - Parity matrix items implemented or explicitly deferred with plan linkage. | |
| - Static rule to prevent boundary DTO drift (e.g., forbid deprecated streaming DTOs where `SidecarDto` is required). | |
| - Secret scan, dependency scan, and SBOM generation remain green. | |
| ## Testing Strategy | |
| - Unit: classes, edge cases, error handling. | |
| - Integration: Testcontainers for Neo4j/DB; WireMock for external APIs. | |
| - E2E: flows through ChatService/Planner/Tools if applicable. | |
| - Evaluations (LLM/Retrieval): datasets, metrics (P@K, nDCG, tokens, latency), reporting. | |
| - Performance/Soak: concurrency, throughput, memory. | |
| - Chaos: retry/backoff, DLQ, recovery. | |
| ### Contract Tests (Required for Interfaces) | |
| - For any public API or streaming interface, add a contract test that asserts event/response shape, required fields, and parity vs. the golden path (e.g., `complete.sidecar.citations` present under retrieval). | |
| - Ensure cancellation/timeout semantics are exercised if applicable. | |
| #### SSE/Streaming Invariants (If Applicable) | |
| - Event order: `start` → `meta` → `delta*` → `complete|error`. | |
| - `meta` includes `route`, `modelId`, `memoryUsed` (when available). | |
| - `complete.sidecar` present when sidecar exists; `complete.sidecar.meta.gate` on non‑PASS outcomes. | |
| - Backpressure: delta batching cadence (e.g., `app.chat.streaming.batch-ms`) observed within tolerance. | |
| - Cancellation: `/api/chat/stop?runId=…` emits `complete` with `finishReason=cancelled`. | |
| ### Test Commands (Codex Must Use) | |
| - `gradle test -i --console=plain` — stream progress; allow up to 30 minutes. | |
| - Sequence per AGENTS: `test` → `e2eTest` → optional `llmTest` → `e2ePepperOnline` → `demoReadinessGate` → `demoScriptE2ENonVector` → optional `demoScriptE2EVector`. | |
| - Stop rules: stop on `test` red; if `e2eTest` fails, fix wiring and skip online/demo lanes. | |
| #### Readiness Gates (When External Systems/Data Are Required) | |
| - Pepper readiness: `gradle e2ePepperProbe -i --console=plain` and `gradle demoReadinessGate -i --console=plain` prior to demo/e2e lanes. | |
| - Tenant data assumptions: list required docs/attributes; link to seeding source. | |
| - Env vars and local secrets: document keys and where they are read (.env, application-local-secrets.yaml). | |
| ### Prompt Testing (When Debugging Prompts) | |
| - Add the prompt to `src/test/resources/demo-script/scenarios.yaml` as per plan-demo-script guidelines. | |
| - Run `gradle demoScriptE2ENonVector -i --console=plain -Dtests.demo.scenarios=<id>` and ensure the test fails before fixes; then iterate until green. | |
| ## Rollout Plan | |
| - Dark launch flags and staging validation. | |
| - Shadow A/B (if applicable). | |
| - Canary tenants; expand cohort. | |
| - Kill switch and fallback plan. | |
| - Communication and documentation updates. | |
| ## Formatting & Compliance (PR Checklist) | |
| - Run `gradle spotlessApply` (or note repo-wide formatter constraints if present). | |
| - Prefer Java imports; avoid fully qualified class names in code. | |
| - Optionally run `gradle rewriteRun` with recipe `com.vodori.FqcnCleanup` to normalize imports. | |
| - Run `gradle agentsCompliance -i --console=plain` and address advisory notes. | |
| ## Risks & Mitigations | |
| - Enumerate top risks with concrete mitigations and detection signals. | |
| ## Documentation Updates | |
| - Files to update (Architecture Overview, Tools Catalog, API Reference, Runbooks). | |
| - New docs to add. | |
| ## Success Metrics & Review | |
| - Target metrics, measurement windows, and data sources. | |
| - Who reviews and when; exit criteria for the effort. | |
| ## Appendix | |
| ### JSON Schemas (if applicable) | |
| Provide draft schemas for structured outputs (e.g., sidecars), request/response payloads. | |
| ### Prompt Templates (if applicable) | |
| Guidance for dual‑channel outputs (human + JSON), variable placeholders, and compliance reminders. | |
| ### Example Config Blocks | |
| Show production‑ready toggles and safe defaults with environment overrides. | |
| --- | |
| > Authoring Checklist (Must Do) | |
| > - [ ] Background, problems, non-goals, goals | |
| > - [ ] Architecture (current vs proposed), diagrams | |
| > - [ ] Interfaces/DTOs with signatures | |
| > - [ ] Config keys, defaults, flags | |
| > - [ ] Data model changes, constraints, migrations | |
| > - [ ] Algorithms/policies with parameters | |
| > - [ ] Safety/compliance, multi-tenancy enforcement | |
| > - [ ] Observability metrics and traces | |
| > - [ ] Parity & Compatibility matrix completed | |
| > - [ ] Boundary DTO reuse checklist addressed | |
| > - [ ] Cross‑plan dependency grid | |
| > - [ ] Detailed milestones with acceptance criteria | |
| > - [ ] Milestone Completion Report for each completed milestone | |
| > - [ ] Testing plan (unit/integration/evals/perf/chaos) with Gradle commands | |
| > - [ ] Contract tests for public interfaces (API/SSE) | |
| > - [ ] CI gates (contract tests, parity checks, static DTO rules, SBOM, scans) | |
| > - [ ] Rollout and rollback | |
| > - [ ] Risks/mitigations | |
| > - [ ] Success metrics and review plan | |
| > - [ ] ADR needed? If yes, add under `docs/adr` with background, context, reasoning, goals, outcome, and implementation changes | |
| --- | |
| ## Decision Log (Keep Updated) | |
| Record key decisions, alternatives considered, and rationale. | |
| - <YYYY-MM-DD> — Decision: <what>. Alternatives: <A,B>. Why: <reasoning>. Impact: <perf/safety/UX>. Owner: <name>. | |
| ## Open Questions | |
| - List unanswered questions and the plan to resolve them (ownership, timeline). | |
| Performance Gates | |
| - Define perf budgets (e.g., “perf advisory gate”) and how to validate them (tests/CI metrics). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment