Date: 2026-03-02
Participants: @vincentdesmet + Claude Opus 4.6 analysis
Baseline commit: fe8519f on main
Context: Extracted from #49 analysis — Tier 2 (LLM-required features)
Referenced spec: specledger/598-sdd-workflow-streamline/spec.md (4-layer CLI design)
Question: Which adoption strategy for the Tier 2 memory system best fits specledger's 4-layer architecture?
Options Presented:
| # | Strategy | Description |
|---|---|---|
| A | Strict Layer Separation | L1 stores/retrieves pre-computed data only. L2 AI commands do ALL LLM work. L0 hooks trigger capture. L3 skills inject knowledge context. L1 stays "no AI needed". |
| B | L1 Launcher to L2 | sl session summarize / sl memory synthesize are launcher commands (like sl revise) that invoke AI agent sessions. Data in L1, LLM in L2. |
| C | Edge Function Processing | L0 hook triggers capture → Supabase Edge Function does LLM processing server-side → L1 reads pre-computed results. |
| D | Hybrid (L2 explicit + L0 auto) | First run: L2 AI command bootstraps knowledge bank. Thereafter: L0 hook auto-updates incrementally. |
Decision: Option A — Strict Layer Separation
Reasoning: Specledger's hosting layer remains LLM-free and leverages fully customer BYO LLM / Agent shell. The memory-backed features are pure cloud storage for cross-team workflow improvements. This means:
- L1 (
slCLI) never calls an LLM — stays a pure Go binary - L2 (AI commands invoked via agent shell) owns all LLM processing
- L3 (skills) passively inject pre-computed knowledge into agent context
- Customer brings their own LLM through their agent shell (Claude Code, etc.)
Layer Assignment:
| Feature | L0 Hook | L1 CLI | L2 AI Command | L3 Skill |
|---|---|---|---|---|
| Session summaries | — | sl session list displays |
/specledger.memory (summarize mode) |
— |
| Auto-tagging | — | sl session list --tag |
/specledger.memory (tag mode) |
— |
| Recurring issues | — | sl memory show |
/specledger.memory (patterns mode) |
sl-memory injects |
| Knowledge bank synthesis | — | sl memory show |
/specledger.memory (synthesize mode) |
sl-memory injects |
| Context injection | SessionStart hook | — | — | sl-memory auto-loads |
Question: What storage format for the memory/knowledge bank?
Options Presented:
| # | Format | Description |
|---|---|---|
| A | Markdown files | Memory bank as markdown files in .specledger/memory/cache/. Gitignored, sits next to existing constitution.md. L3 skills can inject directly. |
| B | Structured JSON | JSON documents in Supabase Storage. Machine-queryable, supports cross-project aggregation. |
| C | Both — Markdown local + JSON cloud | Markdown committed to repo for local injection, JSON mirror in Supabase for cross-team queries. |
Decision: Option A — Markdown files (as local rendering format)
Reasoning: Placed at .specledger/memory/cache/ — next to the existing .specledger/memory/constitution.md. The .specledger/ directory is the project config/tooling root; specledger/<spec>/ is reserved for feature-scoped design artifacts (spec.md, plan.md, tasks.md). A cloud-materialized cache is project-level tooling, not a feature artifact. See D-MEM-4 for the full picture — markdown files are a gitignored local cache, not the source of truth.
Question: Should /specledger.memory be a single command with subactions, or separate commands?
Options Presented:
| # | Shape | Description |
|---|---|---|
| A | Single command with modes | /specledger.memory handles summarize, tag, patterns, synthesize as workflow stages within one invocation. |
| B | Separate AI commands | /specledger.memory-summarize, /specledger.memory-patterns, /specledger.memory-synthesize as distinct commands. |
| C | Single + explicit subcommand arg | /specledger.memory summarize, /specledger.memory patterns etc. One skill file, explicit subaction argument. |
Decision: Option A — Single command with modes
Reasoning: One entry point for discovery. The command determines which mode(s) to run based on context (e.g., run summarize first if no summaries exist, then patterns, then synthesize). Follows the pattern of /specledger.implement which handles multiple stages internally.
Question: How should the memory bank stay consistent across branches?
Options Presented:
| # | Model | Description |
|---|---|---|
| A | Branch-scoped, merge-forward | Memory files live in git per feature branch. On merge to main, memory merges with code. Git-native. |
| B | Cloud-indexed, local-rendered | Supabase is source of truth. Local markdown files are a gitignored materialized cache rendered from cloud data. |
| C | Local-first, cloud-promoted | Feature memory local-only. On promotion, pushed to cloud. Two-tier model. |
Decision: Option B — Cloud-indexed, local-rendered
Reasoning: Memory cache lives at .specledger/memory/cache/ — next to the existing .specledger/memory/constitution.md, following the convention that .specledger/ is the project config/tooling root. This is a gitignored materialized cache — like node_modules/ from a registry. Cloud (Supabase memory_entries table) is the single source of truth. This enables:
- Cross-branch visibility (
sl memory show --branch feat-x) - Cross-team access (shared cloud store)
- Offline L3 injection from local cache
- No git merge conflicts on memory files
- Cache refreshed by L2 command or
sl memory pull(L1)
Architecture:
.specledger/ ← project config/tooling root
├── memory/
│ ├── constitution.md ← existing, git-tracked
│ └── cache/ ← NEW, .gitignore'd
│ ├── knowledge.md ← rendered from cloud (promoted entries)
│ └── patterns.md ← rendered from cloud (observed entries)
├── templates/ ← existing
└── scripts/ ← existing
Supabase (source of truth)
├── memory_entries table
│ ├── id, project_id, branch, scope (feature|project)
│ ├── content, score, status (observed|promoted)
│ ├── recurrence_count, impact_score, specificity_score
│ └── created_at, updated_at
Data flow:
L2 /specledger.memory ──► writes to Supabase ──► renders .specledger/memory/cache/
L3 sl-memory skill ──► reads .specledger/memory/cache/ ──► injects into agent context
L1 sl memory show ──► queries Supabase directly (cross-branch capable)
L1 sl memory pull ──► refreshes .specledger/memory/cache/ from cloud
Question: What scoring dimensions should the agent use to evaluate learning entries?
Options Presented:
| # | Model | Dimensions |
|---|---|---|
| A | 3-axis | Recurrence + Impact + Specificity |
| B | 4-axis + freshness | Recurrence + Impact + Specificity + time-weighted freshness decay |
| C | 4-axis + transferability | Recurrence + Impact + Specificity + cross-project transferability |
| D | Full 5-axis | All five dimensions |
Decision: Option A — 3-axis scoring (Recurrence + Impact + Specificity)
Scoring Rubric (each axis 1-10, composite = average):
How often does this pattern/learning appear across sessions?
| Score | Criteria |
|---|---|
| 1-3 | Appeared once, may be situational |
| 4-6 | Appeared in 2-3 sessions, emerging pattern |
| 7-10 | Appeared in 4+ sessions, confirmed pattern |
How much time, effort, or debugging does this learning save when applied?
| Score | Criteria |
|---|---|
| 1-3 | Minor convenience, saves seconds |
| 4-6 | Moderate — saves minutes, avoids a known pitfall |
| 7-10 | Critical — prevents hours of debugging, blocks progress without it |
How actionable and concrete is the learning? (filters out vague platitudes)
| Score | Criteria |
|---|---|
| 1-3 | Vague ("be careful with async code") |
| 4-6 | Directional ("this API returns paginated results, check for next_page") |
| 7-10 | Precise and actionable ("set pool_mode=transaction in pgbouncer.ini when using prepared statements with Supabase") |
composite = (R + I + S) / 3
- Score >= 7.0 → auto-promoted to
knowledge.md(project-level) - Score 4.0-6.9 → stays in
patterns.md(observed, candidate for promotion) - Score < 4.0 → discarded (too vague, too situational, or low impact)
Reasoning: Three axes are sufficient to filter signal from noise without over-engineering the prompt. Freshness and transferability can be added later if needed. The threshold-based auto-promote model keeps the workflow automated within L2 — no human approval gate.
Question: What promotion model for moving learnings from session-level to project knowledge bank?
Options Presented:
| # | Model | Description |
|---|---|---|
| A | Threshold-based auto-promote | Learning exceeds composite score >= 7.0 → auto-promoted to knowledge.md. Fully automated within L2. |
| B | N-strike promotion | Learning must appear in N separate sessions (e.g., 3) before promotion. Proves recurrence through repetition. |
| C | Tiered: observe → candidate → promoted | 3-tier lifecycle with explicit promotion and justification. Most auditable. |
| D | Human-in-the-loop | Agent proposes, human approves via sl memory promote --approve. Maximum control. |
Decision: Option A — Threshold-based auto-promote
Reasoning: Already present in the original #49 design. Keeps the workflow fully automated within the L2 AI command — no human bottleneck. The 3-axis scoring provides sufficient quality filtering. Entries below threshold are retained in patterns.md for future re-evaluation (score may increase as recurrence grows across sessions).
| ID | Decision | Choice |
|---|---|---|
| D-MEM-1 | Adoption strategy | Strict Layer Separation — L1 LLM-free, L2 owns all LLM processing |
| D-MEM-2 | Storage format | Markdown files (as local rendering format) |
| D-MEM-3 | Command shape | Single /specledger.memory command with internal modes |
| D-MEM-4 | Sync model | Cloud-indexed (Supabase), local-rendered (gitignored .md cache) |
| D-MEM-5 | Scoring criteria | 3-axis: Recurrence + Impact + Specificity (composite avg, threshold >= 7.0) |
| D-MEM-6 | Promotion model | Threshold-based auto-promote (score >= 7.0 → knowledge.md) |