GrantGochnauer/gist:fd1bf5723b7b4a1891d789928f91ed46

## gistfile1.txt
# Technical Plan Template — Use For New Milestones/Initiatives

Status: Draft | Proposed | Accepted | Superseded

Doc Version: 0.2
Owner: <name/team>
Reviewers: <names>
Date: <YYYY-MM-DD>

> LLM Authoring Instructions
> - Use this template as the canonical checklist. Match or exceed the depth defined here so an engineer can implement without guessing.
> - Include, at minimum: concrete interface contracts (Java signatures), config keys (names, defaults, semantics), metrics (names/tags), data model changes (nodes/tables, properties, indexes/constraints), sequence diagrams, JSON schemas, prompts/templates (if relevant), testing strategy, rollout and rollback plans.
> - Favor actionable guidance over narrative. Specify file paths to change, classes to introduce/update, and example code signatures and Cypher/SQL where applicable.
> - Explicitly address multi-tenancy, safety, auditability, performance, and cost. State how tenant isolation is enforced at the data layer (filters/partitions), how outputs are audited, and how latency/tokens are managed.
> - Provide feature flags/toggles for risky features, with kill-switches and fallback paths, and document safe defaults.
> - List risks with concrete mitigations and measurable success criteria and thresholds.

## Authority & Precedence (Required)

- This plan is the single source of truth for its scope. It supersedes conflicting guidance in older milestone docs. When conflicts arise, follow this plan and update prior docs/tests (including scenarios/fixtures) to align.

> Codex Execution Contract (Required)
> - Maintain a live plan using the `update_plan` tool with each milestone as a step. Exactly one step is `in_progress` at a time.
> - When completing a milestone, append a “Milestone Completion Report” (see template below) to this doc and mark the step `completed`.
> - Always run Gradle tests using the global `gradle` CLI with progress streaming: `gradle test -i --console=plain` (allow up to 30 minutes).
> - Prefer imports in code examples; avoid fully qualified class names in code snippets.
> - Update docs (including ADRs) and YAML property descriptions when behavior or configuration changes.
> - Follow AGENTS.md for coding standards, tenancy guardrails, debug/admin gating, lane order, and CI gates. Treat AGENTS.md as a contract.

### Execution Rules (Milestone-by-Milestone)

- Before marking any milestone complete:
  - Update this plan with a Milestone Completion Report that includes: summary; detailed code changes; explanation of how the milestone advances goals; and identified enhancements/refactoring opportunities to schedule into upcoming milestones.
  - Run and fix the full test suite: first `gradle test -i --console=plain` (30‑min timeout allowance), then `gradle e2eTest -i --console=plain`. If Pepper is available, optionally run readiness/demo lanes per AGENTS.md. All failures must be triaged and resolved.
  - Always follow the Golden Path; avoid feature flags. If a toggle is strictly necessary for tests, confine it to test profiles, document its purpose and removal plan, and keep production defaults unchanged.
  - Identify and remove legacy/deprecated code as you go; do not leave duplicate or dead paths.

### Milestone Completion Procedure (Copy into Plan)

- Append a “Milestone Completion Report” under the milestone with:
  - Summary of changes delivered
  - Detailed code/config/test changes (files/classes/keys)
  - How the milestone advances the plan’s goals
  - Enhancements/refactors discovered → add to upcoming milestones (owner/ETA)
- Run required “Test Lanes (DoD)” for the milestone using global Gradle with progress streaming (`-i --console=plain`) and a 30‑minute budget.
- Commit locally with detailed, scoped messages; do not push from this workflow.
- Update older docs/tests that conflict with the plan.

### Repo & Git Process (Required)

- Work on the `develop` branch (no new branch creation for this workflow).
- Commit changes with a detailed changelog (scoped, descriptive commits). Do not push from this workflow.
- Keep commits minimal and focused (one concern per commit). Update docs and YAML comments alongside code changes.

## Production Code Quality Guardrails

- Do not add production workarounds to satisfy tests. Fix the tests (or provide test‑profile scaffolding) rather than introducing brittle heuristics into main code.
- If a temporary workaround must be introduced due to a hard blocker, it must:
  - Live under test profiles (src/test) or be guarded by a test‑only bean/config; and
  - Include an explicit removal task in the next milestone with owner/date; and
  - Be documented in the Milestone Completion Report as “temporary, test‑only.”
- Any production heuristic added for tests must be removed in the next milestone and replaced with proper stubs/mocks or policy/test fixtures.

## Plan Execution State (Live Tracker)

- Overall status: Draft | In‑Progress | Blocked | Completed.
- Milestones
  - [ ] M1 — <title> (owner, ETA)
  - [ ] M2 — <title> (owner, ETA)
  - [ ] M3 — <title> (owner, ETA)
  - …
- Links: Recent commits (hashes), test reports, dashboards.

> Tip: for each milestone list the “Test Lanes Required (DoD)” to make acceptance unambiguous (see Test Plan).

---

## Summary

One-paragraph summary of the problem and the proposed solution.
Example: “Enable LLM-first composition with verifiers to increase creative capability while preserving tenant isolation and audit.”

## Background & Context

- Current behavior/architecture (cite components/classes).
- Pain points and gaps.
- Related documents (ADRs, deep dives, runbooks).
  - e.g., Architecture Overview, existing milestone docs, relevant ADRs.

### Why Now

- Explain the motivation and timing: what triggered this work, what downstream efforts it enables, and the cost of deferring.

## Problem Statement & Non-Goals

- Problem to solve, with examples and constraints.
- Explicit non-goals to avoid scope creep.

### Assumptions & Preconditions

- List assumptions about data, tenants, external systems, availability of APIs, or feature flags.
- Preconditions for starting execution (e.g., sample data seeded, env vars set).

## Goals & Success Criteria

- Functional goals (bullet list) with measurable targets.
- Non-functional goals: performance/latency, tokens/cost, reliability, security, maintainability.
- Success metrics and thresholds (e.g., +10% nDCG@10, −15% tokens/answer, p95 < 1.2× baseline).

## Requirements

- Functional requirements (behaviors, edge cases). For each requirement, capture both a *system requirement* (technical behavior) and a *user requirement* (plain language description of the customer impact).
- Non-functional requirements (SLOs, availability, data retention).
- Constraints (tenancy, compliance, data locality, dependencies).

### Out of Scope (Explicit)

- Call out adjacent ideas intentionally excluded to prevent duplication or hidden scope.

## Stakeholders & Tenancy

- Stakeholders (teams, roles).
- Tenancy boundaries and access controls; how requests are scoped and enforced at the data layer.

### Tenancy Checklist (Must Stay Green)

- All queries are tenant-scoped (filters in vector/graph/SQL layers).
- No free-form Cypher; only curated graph templates with tenant filters and LIMIT caps.
- Attributes/labels resolved via `AttributesRegistry`; no hard-coded tenant fields/values.

## Architecture Overview

### Current State (As-Is)

- Components involved.
- Key data flows (brief).
- Diagrams (optional) — sequence or block.

### Proposed Design (To-Be)

- Component changes: new services, updated classes, interfaces.
- Data model changes (graph nodes/rels, SQL tables), with indexes/constraints.
- Sequence diagrams for critical flows (mermaid).
- Config/feature flags required.
- Security, compliance, and privacy considerations (tenant isolation, PII/PHI handling).
- Observability: logs, metrics, traces; dashboards to update or create.
- Performance & scalability: concurrency, batching, backpressure, caching.

### Golden Path & Backwards‑Compat Policy

- Golden path by default. Avoid feature flags; when unavoidable, document: key, default, scope, safe fallback, and removal date.
- Backout plan and ownership must be stated.
- Remove legacy/deprecated code paths proactively; do not leave parallel implementations.

### Parity & Compatibility

- Parity Contract Matrix: explicitly compare the proposed path/mode to the golden path across:
  - Sidecar semantics (fields, validation, Confidence Gate)
  - Verifiers integration (Citation/Evidence/Policy)
  - Audit behavior and storage
  - Tenancy enforcement at all data layers
  - DTO shapes at boundaries (API/SSE/GraphQL)
  - Error envelopes and status codes
  - Metrics and tracing spans
  - Backpressure/timeout/cancellation behavior
- For any intentional deviations, list the follow-up plan id, owner, and timeline; provide compatibility strategy (adapters, deprecations, removal criteria).

### Design Rationale (Why These Choices)

- Summarize key decisions, alternatives considered, trade-offs, and why the chosen approach best meets goals.

#### Example Sequence (replace with your flow)

```mermaid
sequenceDiagram
  autonumber
  participant C as Client
  participant S as Service
  participant D as Data Store
  C->>S: Request (tenant, payload)
  S->>D: Query/Write (tenant‑scoped)
  D-->>S: Result
  S-->>C: Response (+audit id)
```

## Detailed Design Elements

### Interfaces & DTOs (Java)

Provide exact method signatures and minimal javadocs. Example:

```java
import java.util.List;
import org.springframework.ai.document.Document;

public interface KeywordRetriever {
  List<Document> search(String tenantId, String query, int topK);
}

public record AggregateSpec(String groupBy, List<QuerySpec.Filter> filters, Sort order, Integer limit) {}
```

### Configuration & Feature Flags

List all keys, defaults, and behavior:
- `app.feature.name.enabled=true` — enables X.
- `app.feature.name.limit=200` — cap for Y.
- `spring.ai.vector-store.neo4j.index-name` — index routing notes.

For every new property added to Spring YAML, include a one-line description and valid values in this plan and inline comments in the YAML.

Feature Flags Assessment (Optional but Recommended)
- Add a short table enumerating flags in scope (name, type: capability/kill‑switch/debug/budget/test, prod/non‑prod/CI defaults, rationale, recommendation).
- Document removal/consolidation plans and update inline YAML comments.

### Data Model Updates

- Graph: nodes, relationships, properties; constraints and indexes (Cypher).
- SQL: tables, columns, indexes; migration steps.
- Keys/joins between stores.

### Algorithms & Policies

- Describe any selection/merge policy (e.g., MMR, RRF) with formulas, parameters, and early-stop conditions.
- Provide pseudocode if non-trivial.

### Safety & Compliance

- Tenant enforcement (filters at the store, never only in prompts).
- Policy profiles (disclaimers, banned terms, audience rules).
- Redaction or minimization where needed.
- Audit retention & access.

Security & Tenancy Authority
- Resolve tenant server‑side (principal/session) and validate against headers; reject mismatches. Apply the same rule to streaming (SSE) and cancellation endpoints.
- Avoid logging PII/secrets; gate full prompt logging behind explicit non‑prod flags.

### Observability

- Metrics: names and tags.
- Tracing spans.
- Log events.
- Example dashboards and alerts.

### Boundary DTO Reuse Checklist (Must Address)

- Reuse canonical DTOs for domain and sidecar where feasible. If a boundary needs a different wire shape (e.g., SSE event), provide a mapper at the edge and avoid ad‑hoc DTOs.
- Document any temporary duplication and the removal plan (milestone id + owner).

### Cross‑Plan Dependency Grid

- Upstream dependencies (plans/features this relies on) and how their invariants are maintained.
- Downstream dependents likely affected.
- For each dependency, state: invariant, impact, and mitigation if not in scope.

## Implementation Plan

Break into phases/milestones with deliverables, owners, and acceptance criteria. Include toggles and rollbacks.

### Milestone X (N weeks): Title

- Summary of changes and why they help us achieve our goal: …
- Scope: …
- Deliverables: …
- Interfaces/Files: …
- Acceptance Criteria: …
- Flags: …
- Risks/Mitigations: …

Repeat per milestone as needed.

### Milestone Completion Report (Append After Each Milestone)

Fill this block when the milestone is marked complete. Include: summary, detailed changes (files/classes/config/tests), goals alignment, and follow‑up items with owners/ETAs.

## Test Plan

Define lanes to run per milestone (fill concrete tasks in the plan):
- Unit + integration: `gradle test -i --console=plain` (30‑minute budget)
- E2E (offline): `gradle e2eTest -i --console=plain`
- Policy/schema validation (as applicable)
- ArchUnit/structural rules (as applicable)
- Perf advisory gate (as applicable)
- Pepper online + demo readiness + demo script suites (as applicable)
- Transport/contract tests (e.g., SSE or GraphQL contracts)

For each milestone, add “Test Lanes Required (DoD)” so acceptance is explicit.

## Optional Audit Appendices (Use as Needed)

- Policy Map & Gap vs Code — YAML rule → code sites → action → rationale.
- Fallbacks Inventory & RCA — location, trigger, behavior, frequency, RCA, keep/remove/guard.
- Legacy/Unused Code Audit — jdeps reachability, ArchUnit bans, coverage signals, candidate actions.

- Summary of Changes: one paragraph capturing what shipped and why it matters.
- Code Changes: key classes/files updated or added (paths).
- Tests & Results: commands run and outcomes. Must run and pass `gradle test -i --console=plain` (30‑min allowance) and `gradle e2eTest -i --console=plain`; include any focused lanes (e.g., demoScript if Pepper is reachable). Document failures triaged and fixes applied.
- Validation Evidence: screenshots/log snippets/metrics deltas (token/latency, KPIs).
- Docs Updated: docs/ files, ADRs created/updated, YAML comments added.
- Observations: notable findings, caveats, follow-up items uncovered.
- Enhancements & Refactoring Opportunities Identified: items to fold into upcoming milestones with brief rationale.
- Next Steps: options considered; recommended next step and why (tie to goals).
- Backout Plan Used? yes/no (if used, describe).
- Refactoring opportunities identified.
- Owner/Date: who completed, when.

Add the following if applicable:
- Changelog (concise bullets of the most important deltas).
- SSE/Contract Invariants Verified (met/not met).
- Metrics Observed (e.g., `advisor.*` counters/timers; deltas vs baseline).

### CI Gates (Add To PR Checklist and Pipelines)

- Contract tests covering acceptance criteria for new/changed interfaces (e.g., SSE event payloads).
- Parity matrix items implemented or explicitly deferred with plan linkage.
- Static rule to prevent boundary DTO drift (e.g., forbid deprecated streaming DTOs where `SidecarDto` is required).
- Secret scan, dependency scan, and SBOM generation remain green.

## Testing Strategy

- Unit: classes, edge cases, error handling.
- Integration: Testcontainers for Neo4j/DB; WireMock for external APIs.
- E2E: flows through ChatService/Planner/Tools if applicable.
- Evaluations (LLM/Retrieval): datasets, metrics (P@K, nDCG, tokens, latency), reporting.
- Performance/Soak: concurrency, throughput, memory.
- Chaos: retry/backoff, DLQ, recovery.

### Contract Tests (Required for Interfaces)

- For any public API or streaming interface, add a contract test that asserts event/response shape, required fields, and parity vs. the golden path (e.g., `complete.sidecar.citations` present under retrieval).
- Ensure cancellation/timeout semantics are exercised if applicable.

#### SSE/Streaming Invariants (If Applicable)

- Event order: `start` → `meta` → `delta*` → `complete|error`.
- `meta` includes `route`, `modelId`, `memoryUsed` (when available).
- `complete.sidecar` present when sidecar exists; `complete.sidecar.meta.gate` on non‑PASS outcomes.
- Backpressure: delta batching cadence (e.g., `app.chat.streaming.batch-ms`) observed within tolerance.
- Cancellation: `/api/chat/stop?runId=…` emits `complete` with `finishReason=cancelled`.

### Test Commands (Codex Must Use)

- `gradle test -i --console=plain` — stream progress; allow up to 30 minutes.
- Sequence per AGENTS: `test` → `e2eTest` → optional `llmTest` → `e2ePepperOnline` → `demoReadinessGate` → `demoScriptE2ENonVector` → optional `demoScriptE2EVector`.
- Stop rules: stop on `test` red; if `e2eTest` fails, fix wiring and skip online/demo lanes.

#### Readiness Gates (When External Systems/Data Are Required)

- Pepper readiness: `gradle e2ePepperProbe -i --console=plain` and `gradle demoReadinessGate -i --console=plain` prior to demo/e2e lanes.
- Tenant data assumptions: list required docs/attributes; link to seeding source.
- Env vars and local secrets: document keys and where they are read (.env, application-local-secrets.yaml).

### Prompt Testing (When Debugging Prompts)

- Add the prompt to `src/test/resources/demo-script/scenarios.yaml` as per plan-demo-script guidelines.
- Run `gradle demoScriptE2ENonVector -i --console=plain -Dtests.demo.scenarios=<id>` and ensure the test fails before fixes; then iterate until green.

## Rollout Plan

- Dark launch flags and staging validation.
- Shadow A/B (if applicable).
- Canary tenants; expand cohort.
- Kill switch and fallback plan.
- Communication and documentation updates.

## Formatting & Compliance (PR Checklist)

- Run `gradle spotlessApply` (or note repo-wide formatter constraints if present).
- Prefer Java imports; avoid fully qualified class names in code.
- Optionally run `gradle rewriteRun` with recipe `com.vodori.FqcnCleanup` to normalize imports.
- Run `gradle agentsCompliance -i --console=plain` and address advisory notes.

## Risks & Mitigations

- Enumerate top risks with concrete mitigations and detection signals.

## Documentation Updates

- Files to update (Architecture Overview, Tools Catalog, API Reference, Runbooks).
- New docs to add.

## Success Metrics & Review

- Target metrics, measurement windows, and data sources.
- Who reviews and when; exit criteria for the effort.

## Appendix

### JSON Schemas (if applicable)

Provide draft schemas for structured outputs (e.g., sidecars), request/response payloads.

### Prompt Templates (if applicable)

Guidance for dual‑channel outputs (human + JSON), variable placeholders, and compliance reminders.

### Example Config Blocks

Show production‑ready toggles and safe defaults with environment overrides.

---

> Authoring Checklist (Must Do)
> - [ ] Background, problems, non-goals, goals
> - [ ] Architecture (current vs proposed), diagrams
> - [ ] Interfaces/DTOs with signatures
> - [ ] Config keys, defaults, flags
> - [ ] Data model changes, constraints, migrations
> - [ ] Algorithms/policies with parameters
> - [ ] Safety/compliance, multi-tenancy enforcement
> - [ ] Observability metrics and traces
> - [ ] Parity & Compatibility matrix completed
> - [ ] Boundary DTO reuse checklist addressed
> - [ ] Cross‑plan dependency grid
> - [ ] Detailed milestones with acceptance criteria
> - [ ] Milestone Completion Report for each completed milestone
> - [ ] Testing plan (unit/integration/evals/perf/chaos) with Gradle commands
> - [ ] Contract tests for public interfaces (API/SSE)
> - [ ] CI gates (contract tests, parity checks, static DTO rules, SBOM, scans)
> - [ ] Rollout and rollback
> - [ ] Risks/mitigations
> - [ ] Success metrics and review plan
> - [ ] ADR needed? If yes, add under `docs/adr` with background, context, reasoning, goals, outcome, and implementation changes

---

## Decision Log (Keep Updated)

Record key decisions, alternatives considered, and rationale.

- <YYYY-MM-DD> — Decision: <what>. Alternatives: <A,B>. Why: <reasoning>. Impact: <perf/safety/UX>. Owner: <name>.

## Open Questions

- List unanswered questions and the plan to resolve them (ownership, timeline).
Performance Gates
- Define perf budgets (e.g., “perf advisory gate”) and how to validate them (tests/CI metrics).
	# Technical Plan Template — Use For New Milestones/Initiatives

	Status: Draft \| Proposed \| Accepted \| Superseded

	Doc Version: 0.2
	Owner: <name/team>
	Reviewers: <names>
	Date: <YYYY-MM-DD>

	> LLM Authoring Instructions
	> - Use this template as the canonical checklist. Match or exceed the depth defined here so an engineer can implement without guessing.
	> - Include, at minimum: concrete interface contracts (Java signatures), config keys (names, defaults, semantics), metrics (names/tags), data model changes (nodes/tables, properties, indexes/constraints), sequence diagrams, JSON schemas, prompts/templates (if relevant), testing strategy, rollout and rollback plans.
	> - Favor actionable guidance over narrative. Specify file paths to change, classes to introduce/update, and example code signatures and Cypher/SQL where applicable.
	> - Explicitly address multi-tenancy, safety, auditability, performance, and cost. State how tenant isolation is enforced at the data layer (filters/partitions), how outputs are audited, and how latency/tokens are managed.
	> - Provide feature flags/toggles for risky features, with kill-switches and fallback paths, and document safe defaults.
	> - List risks with concrete mitigations and measurable success criteria and thresholds.

	## Authority & Precedence (Required)

	- This plan is the single source of truth for its scope. It supersedes conflicting guidance in older milestone docs. When conflicts arise, follow this plan and update prior docs/tests (including scenarios/fixtures) to align.

	> Codex Execution Contract (Required)
	> - Maintain a live plan using the `update_plan` tool with each milestone as a step. Exactly one step is `in_progress` at a time.
	> - When completing a milestone, append a “Milestone Completion Report” (see template below) to this doc and mark the step `completed`.
	> - Always run Gradle tests using the global `gradle` CLI with progress streaming: `gradle test -i --console=plain` (allow up to 30 minutes).
	> - Prefer imports in code examples; avoid fully qualified class names in code snippets.
	> - Update docs (including ADRs) and YAML property descriptions when behavior or configuration changes.
	> - Follow AGENTS.md for coding standards, tenancy guardrails, debug/admin gating, lane order, and CI gates. Treat AGENTS.md as a contract.

	### Execution Rules (Milestone-by-Milestone)

	- Before marking any milestone complete:
	- Update this plan with a Milestone Completion Report that includes: summary; detailed code changes; explanation of how the milestone advances goals; and identified enhancements/refactoring opportunities to schedule into upcoming milestones.
	- Run and fix the full test suite: first `gradle test -i --console=plain` (30‑min timeout allowance), then `gradle e2eTest -i --console=plain`. If Pepper is available, optionally run readiness/demo lanes per AGENTS.md. All failures must be triaged and resolved.
	- Always follow the Golden Path; avoid feature flags. If a toggle is strictly necessary for tests, confine it to test profiles, document its purpose and removal plan, and keep production defaults unchanged.
	- Identify and remove legacy/deprecated code as you go; do not leave duplicate or dead paths.

	### Milestone Completion Procedure (Copy into Plan)

	- Append a “Milestone Completion Report” under the milestone with:
	- Summary of changes delivered
	- Detailed code/config/test changes (files/classes/keys)
	- How the milestone advances the plan’s goals
	- Enhancements/refactors discovered → add to upcoming milestones (owner/ETA)
	- Run required “Test Lanes (DoD)” for the milestone using global Gradle with progress streaming (`-i --console=plain`) and a 30‑minute budget.
	- Commit locally with detailed, scoped messages; do not push from this workflow.
	- Update older docs/tests that conflict with the plan.

	### Repo & Git Process (Required)

	- Work on the `develop` branch (no new branch creation for this workflow).
	- Commit changes with a detailed changelog (scoped, descriptive commits). Do not push from this workflow.
	- Keep commits minimal and focused (one concern per commit). Update docs and YAML comments alongside code changes.

	## Production Code Quality Guardrails

	- Do not add production workarounds to satisfy tests. Fix the tests (or provide test‑profile scaffolding) rather than introducing brittle heuristics into main code.
	- If a temporary workaround must be introduced due to a hard blocker, it must:
	- Live under test profiles (src/test) or be guarded by a test‑only bean/config; and
	- Include an explicit removal task in the next milestone with owner/date; and
	- Be documented in the Milestone Completion Report as “temporary, test‑only.”
	- Any production heuristic added for tests must be removed in the next milestone and replaced with proper stubs/mocks or policy/test fixtures.

	## Plan Execution State (Live Tracker)

	- Overall status: Draft \| In‑Progress \| Blocked \| Completed.
	- Milestones
	- [ ] M1 — <title> (owner, ETA)
	- [ ] M2 — <title> (owner, ETA)
	- [ ] M3 — <title> (owner, ETA)
	- …
	- Links: Recent commits (hashes), test reports, dashboards.

	> Tip: for each milestone list the “Test Lanes Required (DoD)” to make acceptance unambiguous (see Test Plan).

	---

	## Summary

	One-paragraph summary of the problem and the proposed solution.
	Example: “Enable LLM-first composition with verifiers to increase creative capability while preserving tenant isolation and audit.”

	## Background & Context

	- Current behavior/architecture (cite components/classes).
	- Pain points and gaps.
	- Related documents (ADRs, deep dives, runbooks).
	- e.g., Architecture Overview, existing milestone docs, relevant ADRs.

	### Why Now

	- Explain the motivation and timing: what triggered this work, what downstream efforts it enables, and the cost of deferring.

	## Problem Statement & Non-Goals

	- Problem to solve, with examples and constraints.
	- Explicit non-goals to avoid scope creep.

	### Assumptions & Preconditions

	- List assumptions about data, tenants, external systems, availability of APIs, or feature flags.
	- Preconditions for starting execution (e.g., sample data seeded, env vars set).

	## Goals & Success Criteria

	- Functional goals (bullet list) with measurable targets.
	- Non-functional goals: performance/latency, tokens/cost, reliability, security, maintainability.
	- Success metrics and thresholds (e.g., +10% nDCG@10, −15% tokens/answer, p95 < 1.2× baseline).

	## Requirements

	- Functional requirements (behaviors, edge cases). For each requirement, capture both a system requirement (technical behavior) and a user requirement (plain language description of the customer impact).
	- Non-functional requirements (SLOs, availability, data retention).
	- Constraints (tenancy, compliance, data locality, dependencies).

	### Out of Scope (Explicit)

	- Call out adjacent ideas intentionally excluded to prevent duplication or hidden scope.

	## Stakeholders & Tenancy

	- Stakeholders (teams, roles).
	- Tenancy boundaries and access controls; how requests are scoped and enforced at the data layer.

	### Tenancy Checklist (Must Stay Green)

	- All queries are tenant-scoped (filters in vector/graph/SQL layers).
	- No free-form Cypher; only curated graph templates with tenant filters and LIMIT caps.
	- Attributes/labels resolved via `AttributesRegistry`; no hard-coded tenant fields/values.

	## Architecture Overview

	### Current State (As-Is)

	- Components involved.
	- Key data flows (brief).
	- Diagrams (optional) — sequence or block.

	### Proposed Design (To-Be)

	- Component changes: new services, updated classes, interfaces.
	- Data model changes (graph nodes/rels, SQL tables), with indexes/constraints.
	- Sequence diagrams for critical flows (mermaid).
	- Config/feature flags required.
	- Security, compliance, and privacy considerations (tenant isolation, PII/PHI handling).
	- Observability: logs, metrics, traces; dashboards to update or create.
	- Performance & scalability: concurrency, batching, backpressure, caching.

	### Golden Path & Backwards‑Compat Policy

	- Golden path by default. Avoid feature flags; when unavoidable, document: key, default, scope, safe fallback, and removal date.
	- Backout plan and ownership must be stated.
	- Remove legacy/deprecated code paths proactively; do not leave parallel implementations.

	### Parity & Compatibility

	- Parity Contract Matrix: explicitly compare the proposed path/mode to the golden path across:
	- Sidecar semantics (fields, validation, Confidence Gate)
	- Verifiers integration (Citation/Evidence/Policy)
	- Audit behavior and storage
	- Tenancy enforcement at all data layers
	- DTO shapes at boundaries (API/SSE/GraphQL)
	- Error envelopes and status codes
	- Metrics and tracing spans
	- Backpressure/timeout/cancellation behavior
	- For any intentional deviations, list the follow-up plan id, owner, and timeline; provide compatibility strategy (adapters, deprecations, removal criteria).

	### Design Rationale (Why These Choices)

	- Summarize key decisions, alternatives considered, trade-offs, and why the chosen approach best meets goals.

	#### Example Sequence (replace with your flow)

	```mermaid
	sequenceDiagram
	autonumber
	participant C as Client
	participant S as Service
	participant D as Data Store
	C->>S: Request (tenant, payload)
	S->>D: Query/Write (tenant‑scoped)
	D-->>S: Result
	S-->>C: Response (+audit id)
	```

	## Detailed Design Elements

	### Interfaces & DTOs (Java)

	Provide exact method signatures and minimal javadocs. Example:

	```java
	import java.util.List;
	import org.springframework.ai.document.Document;

	public interface KeywordRetriever {
	List<Document> search(String tenantId, String query, int topK);
	}

	public record AggregateSpec(String groupBy, List<QuerySpec.Filter> filters, Sort order, Integer limit) {}
	```

	### Configuration & Feature Flags

	List all keys, defaults, and behavior:
	- `app.feature.name.enabled=true` — enables X.
	- `app.feature.name.limit=200` — cap for Y.
	- `spring.ai.vector-store.neo4j.index-name` — index routing notes.

	For every new property added to Spring YAML, include a one-line description and valid values in this plan and inline comments in the YAML.

	Feature Flags Assessment (Optional but Recommended)
	- Add a short table enumerating flags in scope (name, type: capability/kill‑switch/debug/budget/test, prod/non‑prod/CI defaults, rationale, recommendation).
	- Document removal/consolidation plans and update inline YAML comments.

	### Data Model Updates

	- Graph: nodes, relationships, properties; constraints and indexes (Cypher).
	- SQL: tables, columns, indexes; migration steps.
	- Keys/joins between stores.

	### Algorithms & Policies

	- Describe any selection/merge policy (e.g., MMR, RRF) with formulas, parameters, and early-stop conditions.
	- Provide pseudocode if non-trivial.

	### Safety & Compliance

	- Tenant enforcement (filters at the store, never only in prompts).
	- Policy profiles (disclaimers, banned terms, audience rules).
	- Redaction or minimization where needed.
	- Audit retention & access.

	Security & Tenancy Authority
	- Resolve tenant server‑side (principal/session) and validate against headers; reject mismatches. Apply the same rule to streaming (SSE) and cancellation endpoints.
	- Avoid logging PII/secrets; gate full prompt logging behind explicit non‑prod flags.

	### Observability

	- Metrics: names and tags.
	- Tracing spans.
	- Log events.
	- Example dashboards and alerts.

	### Boundary DTO Reuse Checklist (Must Address)

	- Reuse canonical DTOs for domain and sidecar where feasible. If a boundary needs a different wire shape (e.g., SSE event), provide a mapper at the edge and avoid ad‑hoc DTOs.
	- Document any temporary duplication and the removal plan (milestone id + owner).

	### Cross‑Plan Dependency Grid

	- Upstream dependencies (plans/features this relies on) and how their invariants are maintained.
	- Downstream dependents likely affected.
	- For each dependency, state: invariant, impact, and mitigation if not in scope.

	## Implementation Plan

	Break into phases/milestones with deliverables, owners, and acceptance criteria. Include toggles and rollbacks.

	### Milestone X (N weeks): Title

	- Summary of changes and why they help us achieve our goal: …
	- Scope: …
	- Deliverables: …
	- Interfaces/Files: …
	- Acceptance Criteria: …
	- Flags: …
	- Risks/Mitigations: …

	Repeat per milestone as needed.

	### Milestone Completion Report (Append After Each Milestone)

	Fill this block when the milestone is marked complete. Include: summary, detailed changes (files/classes/config/tests), goals alignment, and follow‑up items with owners/ETAs.

	## Test Plan

	Define lanes to run per milestone (fill concrete tasks in the plan):
	- Unit + integration: `gradle test -i --console=plain` (30‑minute budget)
	- E2E (offline): `gradle e2eTest -i --console=plain`
	- Policy/schema validation (as applicable)
	- ArchUnit/structural rules (as applicable)
	- Perf advisory gate (as applicable)
	- Pepper online + demo readiness + demo script suites (as applicable)
	- Transport/contract tests (e.g., SSE or GraphQL contracts)

	For each milestone, add “Test Lanes Required (DoD)” so acceptance is explicit.

	## Optional Audit Appendices (Use as Needed)

	- Policy Map & Gap vs Code — YAML rule → code sites → action → rationale.
	- Fallbacks Inventory & RCA — location, trigger, behavior, frequency, RCA, keep/remove/guard.
	- Legacy/Unused Code Audit — jdeps reachability, ArchUnit bans, coverage signals, candidate actions.

	- Summary of Changes: one paragraph capturing what shipped and why it matters.
	- Code Changes: key classes/files updated or added (paths).
	- Tests & Results: commands run and outcomes. Must run and pass `gradle test -i --console=plain` (30‑min allowance) and `gradle e2eTest -i --console=plain`; include any focused lanes (e.g., demoScript if Pepper is reachable). Document failures triaged and fixes applied.
	- Validation Evidence: screenshots/log snippets/metrics deltas (token/latency, KPIs).
	- Docs Updated: docs/ files, ADRs created/updated, YAML comments added.
	- Observations: notable findings, caveats, follow-up items uncovered.
	- Enhancements & Refactoring Opportunities Identified: items to fold into upcoming milestones with brief rationale.
	- Next Steps: options considered; recommended next step and why (tie to goals).
	- Backout Plan Used? yes/no (if used, describe).
	- Refactoring opportunities identified.
	- Owner/Date: who completed, when.

	Add the following if applicable:
	- Changelog (concise bullets of the most important deltas).
	- SSE/Contract Invariants Verified (met/not met).
	- Metrics Observed (e.g., `advisor.*` counters/timers; deltas vs baseline).

	### CI Gates (Add To PR Checklist and Pipelines)

	- Contract tests covering acceptance criteria for new/changed interfaces (e.g., SSE event payloads).
	- Parity matrix items implemented or explicitly deferred with plan linkage.
	- Static rule to prevent boundary DTO drift (e.g., forbid deprecated streaming DTOs where `SidecarDto` is required).
	- Secret scan, dependency scan, and SBOM generation remain green.

	## Testing Strategy

	- Unit: classes, edge cases, error handling.
	- Integration: Testcontainers for Neo4j/DB; WireMock for external APIs.
	- E2E: flows through ChatService/Planner/Tools if applicable.
	- Evaluations (LLM/Retrieval): datasets, metrics (P@K, nDCG, tokens, latency), reporting.
	- Performance/Soak: concurrency, throughput, memory.
	- Chaos: retry/backoff, DLQ, recovery.

	### Contract Tests (Required for Interfaces)

	- For any public API or streaming interface, add a contract test that asserts event/response shape, required fields, and parity vs. the golden path (e.g., `complete.sidecar.citations` present under retrieval).
	- Ensure cancellation/timeout semantics are exercised if applicable.

	#### SSE/Streaming Invariants (If Applicable)

	- Event order: `start` → `meta` → `delta*` → `complete\|error`.
	- `meta` includes `route`, `modelId`, `memoryUsed` (when available).
	- `complete.sidecar` present when sidecar exists; `complete.sidecar.meta.gate` on non‑PASS outcomes.
	- Backpressure: delta batching cadence (e.g., `app.chat.streaming.batch-ms`) observed within tolerance.
	- Cancellation: `/api/chat/stop?runId=…` emits `complete` with `finishReason=cancelled`.

	### Test Commands (Codex Must Use)

	- `gradle test -i --console=plain` — stream progress; allow up to 30 minutes.
	- Sequence per AGENTS: `test` → `e2eTest` → optional `llmTest` → `e2ePepperOnline` → `demoReadinessGate` → `demoScriptE2ENonVector` → optional `demoScriptE2EVector`.
	- Stop rules: stop on `test` red; if `e2eTest` fails, fix wiring and skip online/demo lanes.

	#### Readiness Gates (When External Systems/Data Are Required)

	- Pepper readiness: `gradle e2ePepperProbe -i --console=plain` and `gradle demoReadinessGate -i --console=plain` prior to demo/e2e lanes.
	- Tenant data assumptions: list required docs/attributes; link to seeding source.
	- Env vars and local secrets: document keys and where they are read (.env, application-local-secrets.yaml).

	### Prompt Testing (When Debugging Prompts)

	- Add the prompt to `src/test/resources/demo-script/scenarios.yaml` as per plan-demo-script guidelines.
	- Run `gradle demoScriptE2ENonVector -i --console=plain -Dtests.demo.scenarios=<id>` and ensure the test fails before fixes; then iterate until green.

	## Rollout Plan

	- Dark launch flags and staging validation.
	- Shadow A/B (if applicable).
	- Canary tenants; expand cohort.
	- Kill switch and fallback plan.
	- Communication and documentation updates.

	## Formatting & Compliance (PR Checklist)

	- Run `gradle spotlessApply` (or note repo-wide formatter constraints if present).
	- Prefer Java imports; avoid fully qualified class names in code.
	- Optionally run `gradle rewriteRun` with recipe `com.vodori.FqcnCleanup` to normalize imports.
	- Run `gradle agentsCompliance -i --console=plain` and address advisory notes.

	## Risks & Mitigations

	- Enumerate top risks with concrete mitigations and detection signals.

	## Documentation Updates

	- Files to update (Architecture Overview, Tools Catalog, API Reference, Runbooks).
	- New docs to add.

	## Success Metrics & Review

	- Target metrics, measurement windows, and data sources.
	- Who reviews and when; exit criteria for the effort.

	## Appendix

	### JSON Schemas (if applicable)

	Provide draft schemas for structured outputs (e.g., sidecars), request/response payloads.

	### Prompt Templates (if applicable)

	Guidance for dual‑channel outputs (human + JSON), variable placeholders, and compliance reminders.

	### Example Config Blocks

	Show production‑ready toggles and safe defaults with environment overrides.

	---

	> Authoring Checklist (Must Do)
	> - [ ] Background, problems, non-goals, goals
	> - [ ] Architecture (current vs proposed), diagrams
	> - [ ] Interfaces/DTOs with signatures
	> - [ ] Config keys, defaults, flags
	> - [ ] Data model changes, constraints, migrations
	> - [ ] Algorithms/policies with parameters
	> - [ ] Safety/compliance, multi-tenancy enforcement
	> - [ ] Observability metrics and traces
	> - [ ] Parity & Compatibility matrix completed
	> - [ ] Boundary DTO reuse checklist addressed
	> - [ ] Cross‑plan dependency grid
	> - [ ] Detailed milestones with acceptance criteria
	> - [ ] Milestone Completion Report for each completed milestone
	> - [ ] Testing plan (unit/integration/evals/perf/chaos) with Gradle commands
	> - [ ] Contract tests for public interfaces (API/SSE)
	> - [ ] CI gates (contract tests, parity checks, static DTO rules, SBOM, scans)
	> - [ ] Rollout and rollback
	> - [ ] Risks/mitigations
	> - [ ] Success metrics and review plan
	> - [ ] ADR needed? If yes, add under `docs/adr` with background, context, reasoning, goals, outcome, and implementation changes

	---

	## Decision Log (Keep Updated)

	Record key decisions, alternatives considered, and rationale.

	- <YYYY-MM-DD> — Decision: <what>. Alternatives: <A,B>. Why: <reasoning>. Impact: <perf/safety/UX>. Owner: <name>.

	## Open Questions

	- List unanswered questions and the plan to resolve them (ownership, timeline).
	Performance Gates
	- Define perf budgets (e.g., “perf advisory gate”) and how to validate them (tests/CI metrics).
No results found