so0k/decision-log-memory-system.md

## decision-log-memory-system.md

      
    Raw
  

              decision-log-memory-system.md
            
          
    Decision Log: Advanced Memory Creation System (Tier 2)

Date: 2026-03-02
Participants: @vincentdesmet + Claude Opus 4.6 analysis
Baseline commit: fe8519f on main
Context: Extracted from #49 analysis — Tier 2 (LLM-required features)
Referenced spec: specledger/598-sdd-workflow-streamline/spec.md (4-layer CLI design)

D-MEM-1: Adoption Strategy — Layer Responsibility Mapping

Question: Which adoption strategy for the Tier 2 memory system best fits specledger's 4-layer architecture?
Options Presented:


#
Strategy
Description


A
Strict Layer Separation
L1 stores/retrieves pre-computed data only. L2 AI commands do ALL LLM work. L0 hooks trigger capture. L3 skills inject knowledge context. L1 stays "no AI needed".


B
L1 Launcher to L2
sl session summarize / sl memory synthesize are launcher commands (like sl revise) that invoke AI agent sessions. Data in L1, LLM in L2.


C
Edge Function Processing
L0 hook triggers capture → Supabase Edge Function does LLM processing server-side → L1 reads pre-computed results.


D
Hybrid (L2 explicit + L0 auto)
First run: L2 AI command bootstraps knowledge bank. Thereafter: L0 hook auto-updates incrementally.


Decision: Option A — Strict Layer Separation
Reasoning: Specledger's hosting layer remains LLM-free and leverages fully customer BYO LLM / Agent shell. The memory-backed features are pure cloud storage for cross-team workflow improvements. This means:

L1 (sl CLI) never calls an LLM — stays a pure Go binary
L2 (AI commands invoked via agent shell) owns all LLM processing
L3 (skills) passively inject pre-computed knowledge into agent context
Customer brings their own LLM through their agent shell (Claude Code, etc.)

Layer Assignment:


Feature
L0 Hook
L1 CLI
L2 AI Command
L3 Skill


Session summaries
—
sl session list displays
/specledger.memory (summarize mode)
—


Auto-tagging
—
sl session list --tag
/specledger.memory (tag mode)
—


Recurring issues
—
sl memory show
/specledger.memory (patterns mode)
sl-memory injects


Knowledge bank synthesis
—
sl memory show
/specledger.memory (synthesize mode)
sl-memory injects


Context injection
SessionStart hook
—
—
sl-memory auto-loads


D-MEM-2: Knowledge Bank Storage Format

Question: What storage format for the memory/knowledge bank?
Options Presented:


#
Format
Description


A
Markdown files
Memory bank as markdown files in .specledger/memory/cache/. Gitignored, sits next to existing constitution.md. L3 skills can inject directly.


B
Structured JSON
JSON documents in Supabase Storage. Machine-queryable, supports cross-project aggregation.


C
Both — Markdown local + JSON cloud
Markdown committed to repo for local injection, JSON mirror in Supabase for cross-team queries.


Decision: Option A — Markdown files (as local rendering format)
Reasoning: Placed at .specledger/memory/cache/ — next to the existing .specledger/memory/constitution.md. The .specledger/ directory is the project config/tooling root; specledger/<spec>/ is reserved for feature-scoped design artifacts (spec.md, plan.md, tasks.md). A cloud-materialized cache is project-level tooling, not a feature artifact. See D-MEM-4 for the full picture — markdown files are a gitignored local cache, not the source of truth.

D-MEM-3: AI Command Shape

Question: Should /specledger.memory be a single command with subactions, or separate commands?
Options Presented:


#
Shape
Description


A
Single command with modes
/specledger.memory handles summarize, tag, patterns, synthesize as workflow stages within one invocation.


B
Separate AI commands
/specledger.memory-summarize, /specledger.memory-patterns, /specledger.memory-synthesize as distinct commands.


C
Single + explicit subcommand arg
/specledger.memory summarize, /specledger.memory patterns etc. One skill file, explicit subaction argument.


Decision: Option A — Single command with modes
Reasoning: One entry point for discovery. The command determines which mode(s) to run based on context (e.g., run summarize first if no summaries exist, then patterns, then synthesize). Follows the pattern of /specledger.implement which handles multiple stages internally.

D-MEM-4: Cross-Branch Sync Model

Question: How should the memory bank stay consistent across branches?
Options Presented:


#
Model
Description


A
Branch-scoped, merge-forward
Memory files live in git per feature branch. On merge to main, memory merges with code. Git-native.


B
Cloud-indexed, local-rendered
Supabase is source of truth. Local markdown files are a gitignored materialized cache rendered from cloud data.


C
Local-first, cloud-promoted
Feature memory local-only. On promotion, pushed to cloud. Two-tier model.


Decision: Option B — Cloud-indexed, local-rendered
Reasoning: Memory cache lives at .specledger/memory/cache/ — next to the existing .specledger/memory/constitution.md, following the convention that .specledger/ is the project config/tooling root. This is a gitignored materialized cache — like node_modules/ from a registry. Cloud (Supabase memory_entries table) is the single source of truth. This enables:

Cross-branch visibility (sl memory show --branch feat-x)
Cross-team access (shared cloud store)
Offline L3 injection from local cache
No git merge conflicts on memory files
Cache refreshed by L2 command or sl memory pull (L1)

Architecture:
.specledger/                         ← project config/tooling root
├── memory/
│   ├── constitution.md              ← existing, git-tracked
│   └── cache/                       ← NEW, .gitignore'd
│       ├── knowledge.md             ← rendered from cloud (promoted entries)
│       └── patterns.md              ← rendered from cloud (observed entries)
├── templates/                       ← existing
└── scripts/                         ← existing

Supabase (source of truth)
├── memory_entries table
│   ├── id, project_id, branch, scope (feature|project)
│   ├── content, score, status (observed|promoted)
│   ├── recurrence_count, impact_score, specificity_score
│   └── created_at, updated_at

Data flow:
  L2 /specledger.memory ──► writes to Supabase ──► renders .specledger/memory/cache/
  L3 sl-memory skill ──► reads .specledger/memory/cache/ ──► injects into agent context
  L1 sl memory show ──► queries Supabase directly (cross-branch capable)
  L1 sl memory pull ──► refreshes .specledger/memory/cache/ from cloud


D-MEM-5: Agent Scoring Criteria for Learning Entries

Question: What scoring dimensions should the agent use to evaluate learning entries?
Options Presented:


#
Model
Dimensions


A
3-axis
Recurrence + Impact + Specificity


B
4-axis + freshness
Recurrence + Impact + Specificity + time-weighted freshness decay


C
4-axis + transferability
Recurrence + Impact + Specificity + cross-project transferability


D
Full 5-axis
All five dimensions


Decision: Option A — 3-axis scoring (Recurrence + Impact + Specificity)
Scoring Rubric (each axis 1-10, composite = average):
Recurrence (R)

How often does this pattern/learning appear across sessions?


Score
Criteria


1-3
Appeared once, may be situational


4-6
Appeared in 2-3 sessions, emerging pattern


7-10
Appeared in 4+ sessions, confirmed pattern


Impact (I)

How much time, effort, or debugging does this learning save when applied?


Score
Criteria


1-3
Minor convenience, saves seconds


4-6
Moderate — saves minutes, avoids a known pitfall


7-10
Critical — prevents hours of debugging, blocks progress without it


Specificity (S)

How actionable and concrete is the learning? (filters out vague platitudes)


Score
Criteria


1-3
Vague ("be careful with async code")


4-6
Directional ("this API returns paginated results, check for next_page")


7-10
Precise and actionable ("set pool_mode=transaction in pgbouncer.ini when using prepared statements with Supabase")


Composite Score

composite = (R + I + S) / 3

Promotion Threshold


Score >= 7.0 → auto-promoted to knowledge.md (project-level)
Score 4.0-6.9 → stays in patterns.md (observed, candidate for promotion)
Score < 4.0 → discarded (too vague, too situational, or low impact)

Reasoning: Three axes are sufficient to filter signal from noise without over-engineering the prompt. Freshness and transferability can be added later if needed. The threshold-based auto-promote model keeps the workflow automated within L2 — no human approval gate.

D-MEM-6: Promotion Model

Question: What promotion model for moving learnings from session-level to project knowledge bank?
Options Presented:


#
Model
Description


A
Threshold-based auto-promote
Learning exceeds composite score >= 7.0 → auto-promoted to knowledge.md. Fully automated within L2.


B
N-strike promotion
Learning must appear in N separate sessions (e.g., 3) before promotion. Proves recurrence through repetition.


C
Tiered: observe → candidate → promoted
3-tier lifecycle with explicit promotion and justification. Most auditable.


D
Human-in-the-loop
Agent proposes, human approves via sl memory promote --approve. Maximum control.


Decision: Option A — Threshold-based auto-promote
Reasoning: Already present in the original #49 design. Keeps the workflow fully automated within the L2 AI command — no human bottleneck. The 3-axis scoring provides sufficient quality filtering. Entries below threshold are retained in patterns.md for future re-evaluation (score may increase as recurrence grows across sessions).

Summary


ID
Decision
Choice


D-MEM-1
Adoption strategy
Strict Layer Separation — L1 LLM-free, L2 owns all LLM processing


D-MEM-2
Storage format
Markdown files (as local rendering format)


D-MEM-3
Command shape
Single /specledger.memory command with internal modes


D-MEM-4
Sync model
Cloud-indexed (Supabase), local-rendered (gitignored .md cache)


D-MEM-5
Scoring criteria
3-axis: Recurrence + Impact + Specificity (composite avg, threshold >= 7.0)


D-MEM-6
Promotion model
Threshold-based auto-promote (score >= 7.0 → knowledge.md)
#	Strategy	Description
A	Strict Layer Separation	L1 stores/retrieves pre-computed data only. L2 AI commands do ALL LLM work. L0 hooks trigger capture. L3 skills inject knowledge context. L1 stays "no AI needed".
B	L1 Launcher to L2	`sl session summarize` / `sl memory synthesize` are launcher commands (like `sl revise`) that invoke AI agent sessions. Data in L1, LLM in L2.
C	Edge Function Processing	L0 hook triggers capture → Supabase Edge Function does LLM processing server-side → L1 reads pre-computed results.
D	Hybrid (L2 explicit + L0 auto)	First run: L2 AI command bootstraps knowledge bank. Thereafter: L0 hook auto-updates incrementally.
Feature	L0 Hook	L1 CLI	L2 AI Command	L3 Skill
Session summaries	—	`sl session list` displays	`/specledger.memory` (summarize mode)	—
Auto-tagging	—	`sl session list --tag`	`/specledger.memory` (tag mode)	—
Recurring issues	—	`sl memory show`	`/specledger.memory` (patterns mode)	`sl-memory` injects
Knowledge bank synthesis	—	`sl memory show`	`/specledger.memory` (synthesize mode)	`sl-memory` injects
Context injection	SessionStart hook	—	—	`sl-memory` auto-loads
#	Format	Description
A	Markdown files	Memory bank as markdown files in `.specledger/memory/cache/`. Gitignored, sits next to existing `constitution.md`. L3 skills can inject directly.
B	Structured JSON	JSON documents in Supabase Storage. Machine-queryable, supports cross-project aggregation.
C	Both — Markdown local + JSON cloud	Markdown committed to repo for local injection, JSON mirror in Supabase for cross-team queries.
#	Shape	Description
A	Single command with modes	`/specledger.memory` handles summarize, tag, patterns, synthesize as workflow stages within one invocation.
B	Separate AI commands	`/specledger.memory-summarize`, `/specledger.memory-patterns`, `/specledger.memory-synthesize` as distinct commands.
C	Single + explicit subcommand arg	`/specledger.memory summarize`, `/specledger.memory patterns` etc. One skill file, explicit subaction argument.
#	Model	Description
A	Branch-scoped, merge-forward	Memory files live in git per feature branch. On merge to main, memory merges with code. Git-native.
B	Cloud-indexed, local-rendered	Supabase is source of truth. Local markdown files are a gitignored materialized cache rendered from cloud data.
C	Local-first, cloud-promoted	Feature memory local-only. On promotion, pushed to cloud. Two-tier model.
#	Model	Dimensions
A	3-axis	Recurrence + Impact + Specificity
B	4-axis + freshness	Recurrence + Impact + Specificity + time-weighted freshness decay
C	4-axis + transferability	Recurrence + Impact + Specificity + cross-project transferability
D	Full 5-axis	All five dimensions
Score	Criteria
1-3	Appeared once, may be situational
4-6	Appeared in 2-3 sessions, emerging pattern
7-10	Appeared in 4+ sessions, confirmed pattern
Score	Criteria
1-3	Minor convenience, saves seconds
4-6	Moderate — saves minutes, avoids a known pitfall
7-10	Critical — prevents hours of debugging, blocks progress without it
Score	Criteria
1-3	Vague ("be careful with async code")
4-6	Directional ("this API returns paginated results, check for next_page")
7-10	Precise and actionable ("set `pool_mode=transaction` in pgbouncer.ini when using prepared statements with Supabase")
#	Model	Description
A	Threshold-based auto-promote	Learning exceeds composite score >= 7.0 → auto-promoted to knowledge.md. Fully automated within L2.
B	N-strike promotion	Learning must appear in N separate sessions (e.g., 3) before promotion. Proves recurrence through repetition.
C	Tiered: observe → candidate → promoted	3-tier lifecycle with explicit promotion and justification. Most auditable.
D	Human-in-the-loop	Agent proposes, human approves via `sl memory promote --approve`. Maximum control.