Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@adagoral
Copy link
Copy Markdown

adagoral commented Apr 4, 2026

i have complex pdf (tables, images, colums), 100 - 300 technical manuals x 12, is this idea still feasible for enterprise data?

@freddavis00001-tech
Copy link
Copy Markdown

this is amazing! gotta build it. Thanks Andrej

@Equanox
Copy link
Copy Markdown

Equanox commented Apr 4, 2026

Let's see if this is the final piece for me to get rid of paper and pen.

@ediestel
Copy link
Copy Markdown

ediestel commented Apr 4, 2026

Detected a real bug in this:

Distinction:

“Human” → denotes biological classification (species: Homo sapiens), used in scientific, medical, or taxonomic contexts.
“Person / People” → denotes social, legal, or philosophical entities (agency, rights, identity).

Issue:
Using “human” in non-biological contexts (e.g., ethics, law, UX, sociology) can be imprecise because it reduces the subject to species membership rather than personhood.

Correction guideline:

Use “person / people” when referring to:
users, individuals, citizens, patients, actors
rights, responsibility, experience, behavior
Use “human” only when referring to:
biology, evolution, anatomy, physiology

If you thinkthat this is not important, please take a break for a moment and think about it - it is important, very importatnt.

@laphilosophia
Copy link
Copy Markdown

I think the core idea is strong. For personal research, long-running reading projects, due diligence, competitive analysis, or any domain where knowledge accumulates over time, a persistent wiki seems more useful than re-deriving synthesis from raw documents on every query. The index.md / log.md pattern is also a good instinct because it keeps the system simple and inspectable.

That said, I think the hardest part is understated a bit: truth maintenance. The appealing part of the workflow is that the LLM updates summaries, cross-links pages, integrates new sources, and flags contradictions. But that is also exactly where models tend to fail quietly. Bad synthesis, weak generalization, stale claims surviving new evidence, page sprawl, and false consistency can accumulate without being obvious. So for me the risky sentence is effectively “the LLM owns this layer entirely.” That is fine for low-stakes personal use, but it feels too aggressive for team or high-accuracy contexts.

My view is that the robust version of this pattern is not “autonomous wiki,” but “source-grounded, citation-first, review-gated wiki.” The LLM should act more like an editor that proposes patches, summaries, links, and synthesis, not like the final authority on what the wiki believes. If important claims are not tied to sources, uncertainty levels, contradiction states, and recency semantics, the system can drift into a very convincing but low-integrity knowledge base.

If I were implementing this, I would probably enforce a few constraints:

  • Separate facts, inferences, and open questions explicitly.
  • Require source links for important claims, ideally passage-level where possible.
  • Make ingest idempotent so the same source does not slowly distort the wiki.
  • Have the LLM propose diffs instead of silently overwriting pages.
  • Run lint passes for stale claims, unsupported claims, contradiction tracking, and source loss, not just orphan links and missing pages.

So overall: I think the pattern is genuinely useful, but the real product problem is not organization, it is epistemic integrity. If that layer is solved well, this becomes much more than “better RAG.”

@tomjwxf
Copy link
Copy Markdown

tomjwxf commented Apr 4, 2026

Hey @karpathy I've built something similar with multi-model verification, signed receipts and zero trust verification on an open-source project called Veritas Acta ("truth record" in Latin).

Instead of one LLM compiling the wiki, I route to 4 frontier models leading (in reasoning) at a given point in time to respond to canonical questions from Wiki (they can then self-reflect / council of experts / cross-critique with adversarial roles etc.) and then synthesize them into a structured / standardized Knowledge Unit = a wiki where each entry has a living record structured Knowledge Units of frontier knowledge at a proven point in time/context (e.g. model X, with human and/or agent Y and Z input/process) in a cryptographic receipt chain anyone can verify offline

Example (from yesterday): "Are LLMs approaching a capability plateau?": https://acta.today/s/ku-z36vuoreb2k3
(4 agreed points, 2 disputed - including whether emergent capabilities are real evidence for continued breakthroughs)

Verify the receipt chain: https://acta.today/v/ku-z36vuoreb2k3 (Fully offline, no server contact, no account. Anyone can check the math.)

The "linting" step happens automatically ,model disagreements surface inconsistencies. Each Knowledge Unit auto-generates follow-up questions that queue for future deliberation. The corpus compounds without human curation.

Live wiki: https://acta.today/wiki (building out the KU corpus, going to let people develop their own too)
Search API: https://acta-api.tomjwxf.workers.dev/api/ku/search?q=quantum+computing
Receipt format: IETF Internet-Draft (draft-farley-acta-signed-receipts)
Source: https://github.com/scopeblind/scopeblind-gateway (MIT)
Open Protocol: https://veritasacta.com (designed so that no one can rewrite history)

Would love to know what you think!

Best,
Tom

@fakechris
Copy link
Copy Markdown

Amazing, Vibed a Automated Maintenance Systems from this wiki, check https://github.com/fakechris/obsidian_vault_pipeline/blob/main/README_EN.md , also have an AutoPilot mode, which is the fully automated form of the Pipeline, Generate interpretation → LLM quality scoring → Extract Evergreen → Update MOC.

@dkushnikov
Copy link
Copy Markdown

dkushnikov commented Apr 4, 2026

Arrived at the same pattern independently — and seeing it described so cleanly is a convergent validation that the architecture is fundamentally right. Humans abandon wikis because the maintenance burden grows faster than the value; LLMs remove that bottleneck entirely.

Two open-source tools that together implement this, built around Obsidian and Claude Code:

Obsidian Seed — a discovery-driven wizard that builds a personalized Obsidian vault through conversation. Instead of a template, it asks who you are, what matters to you, and generates your vault structure, conventions, and a reader-context.md — a profile that captures your role, domains, goals, and thinking framework. This is effectively the schema layer you describe: the configuration that makes the LLM a disciplined knowledge maintainer rather than a generic chatbot.

Mnemon — the knowledge extraction pipeline. Implements Raw → Wiki → Frontend with immutable source.md + LLM-generated extract.md. Seven source-type-specific templates (article, video, podcast, book, paper, idea, conversation) — because a paper needs methodology rigor checks while a podcast needs speaker attribution and signal-to-noise analysis. Uses qmd for hybrid BM25/vector search, which you mention — works great.

The key addition: personalization as a first-class layer. Every extract is framed through the reader-context that Seed generates. Same article, different reader → different Executive Summary, different Key Ideas, different domain tags. The "seed" isn't just the source — it's the combination of source + reader-context + template.

We also have a Synthesis/ folder for filing back queries — your point about explorations compounding in the knowledge base, not disappearing into chat history. And an Obsidian-native frontend where the LLM writes and you browse in real time, exactly as you describe.

What we don't have yet: lint (contradiction detection, stale claims, orphan pages). That's next on the roadmap.

@longsco
Copy link
Copy Markdown

longsco commented Apr 4, 2026

Thanks for sharing Andrej!

@rajuptvs
Copy link
Copy Markdown

rajuptvs commented Apr 4, 2026

I have been thinking something along the same lines , about having a personal knowledge base, recently documented it.
Please feel free to suggest or share feedback or potential interest in using it.
This is the X post:
https://x.com/i/status/2040472969278042369

And direct blog post:
https://blog.rajuptvs.com/posts/i-keep-learning-things-and-forgetting-all-of-it-so-i-am-building-a-system/

@Datagniel
Copy link
Copy Markdown

Claude already wove your idea into our workflow and named it the "Karpathy-Index". I'm loving it. <3

@umbex
Copy link
Copy Markdown

umbex commented Apr 5, 2026

I'm testing something similar, with a structured file system and a cron heartbeat able to monitor inbox folders, move stuff into the appropriate section(domain), update foundations with facts that lasts forever or current data with temporary information, then update state.md memorry in each domain. A final process collects all state.md files and create a brief.md every morning and build a dashboard out of that.
I separates intake, routing, consolidation, and summarization.
So,
inbox/ is the intake layer for unprocessed material.
foundations/ holds stable source-of-truth knowledge.
data/current/ holds active temporal inputs and datasets.
data/archive/ holds superseded datasets
state.md is the current operational synthesis for a domain.

Typical domain with subdomains:

operating-system/
  <domain>/
    state.md
    foundations/
    data/
      current/
      archive/
    inbox/
    archive/
    <subdomain-a>/
    <subdomain-b>/

@jyothivenkat-hub
Copy link
Copy Markdown

Thanks @karpathy super userful!

@kfchou
Copy link
Copy Markdown

kfchou commented Apr 5, 2026

These ideas could be implemented via a set of skill files. Check out wiki-skills!

@peas
Copy link
Copy Markdown

peas commented Apr 5, 2026

map of chapter 5 of The Brothers Karamazov

@karpathy It's great to see you as a piece of the current Zeitgeist of how AI is actually being applied. You've been synthesizing a lot of scattered thinking and currents into clear patterns, bringing signal out of the noise of a thousand simultaneous mini-projects. This gist is another example — the pattern needed a name and a shape, and you gave it one.

I've been building a voice-first version of this since February — same core architecture (raw → wiki → schema), with some extensions that might be interesting.

Voice-first capture. Most knowledge systems fail at capture, not synthesis. I record voice memos into Telegram while walking. Whisper transcribes, an LLM classifier tags and routes, a synthesizer updates interlinked KB nodes. No laptop needed. 70+ voice memos have compiled into 100 KB nodes and several published blog posts.

Two wiki layers. I split the wiki into KB (machine-managed reference: concepts, people, projects) and Drafts (a writing workspace). An intent classifier detects when I'm developing a blog post vs. planning a project vs. noting a task, and routes entries to the right draft. Multiple voice memos about the same topic get merged over days. The system doesn't just accumulate — it produces.

No content invention. The hardest constraint and the most important. The LLM must be an editor, not a writer — every sentence must trace to something the user actually said. Gaps get [TODO: ...] markers, not hallucinated filler. Without this you get a wiki full of plausible content you never thought. Dostoevsky dictated to his wife as stenographer; the LLM is my stenographer, not my ghostwriter.

Cross-links are mechanical, not LLM-generated. Title mentions in body text, slug pattern matching, journal co-occurrence. This avoids hallucinated connections and makes the knowledge graph trustworthy. You can see the graph live at paulo.com.br/signals — 169 nodes, 195 links between posts, concepts, and source voice memos.

Provenance. Full traceability from published blog post back to the voice memo that sparked it. Each blog post links to its /signals subpage where you can listen to the original audio and read the raw transcription. The Zettelkasten had numbered cards with cross-references; this system has numbered voice memos with machine-traced lineage.

On why this is an idea, not a product. I think you're right to frame this as an idea rather than a spec. Each solution is deeply personal. How you capture (voice memos vs. web clippings vs. screenshots), how you process (pipeline vs. chat vs. deterministic scripts), how the graph gets wired — it's all particular to each person's thinking patterns. I don't think open source solves this. Each person will fabricate something that's a woven fabric of code and prompts that feed back into each other. It's disposable software that mutates constantly — neither the prompts nor the code are static. The system co-evolves with how you think.

More details:

@pedronauck
Copy link
Copy Markdown

@tkgally
Copy link
Copy Markdown

tkgally commented Apr 5, 2026

Thank you for the idea, Andrej!

For the last few months, I have been using Claude Code to build a Japanese-English dictionary for people studying Japanese (GitHub, live site). The project is moving along smoothly, but its unavoidable complexity is making me uneasy about whether I have a strong enough grasp of the dictionary’s overall design and possible future directions. So I created a new directory in the repository called planning/, put your LLM wiki markdown file in it, and told Claude to start building a knowledge base that it would be able to refer to in the weeks and months ahead as the project continues to grow. I have scheduled a prompt to have Claude Code work on the knowledge base every night. It seems to be off to a good start, and I look forward to seeing how well this might help my project in the future.

@arnoldadlv
Copy link
Copy Markdown

obsidian cli has been a life saver for this

@bluewater8008
Copy link
Copy Markdown

We've been running this pattern in production for a few weeks across multiple related knowledge domains. A few things we learned that might help others:

  1. Classify before you extract. When ingesting sources, don't treat every document the same. Classify by type first (e.g., report vs. letter vs. transcript vs. declaration), then run type-specific extraction. A 50-page report needs different handling than a 2-page letter. This comes from Folio's sensemaking pipeline — classify → narrow → extract → deepen — and it saves significant tokens while producing better results. Without it, you get shallow, uniform summaries of everything.

  2. Give the index a token budget. The progressive disclosure idea is right, but it helps to make it explicit. We use four levels with rough token targets: L0 (~200 tokens, project context, every session), L1 (~1-2K, the index, session start), L2 (~2-5K, search results), L3 (5-20K, full articles). The discipline of not reading full articles until you've checked the index first is what makes this scale. Without it, the agent either reads too little or burns context reading everything.

  3. One template per entity type, not one generic template. A person page needs different sections than an event page or a document summary. Define type-specific required sections in your schema. The LLM follows them consistently, and the wiki stays structurally coherent as it grows. Seven types has been our sweet spot — enough to be useful, not so many that the schema becomes overhead.

  4. Every task produces two outputs. This is the rule that makes the wiki compound. Whatever the user asked for — an analysis, a comparison, a set of questions — that's output one. Output two is updates to the relevant wiki articles. If you don't make this explicit in your schema, the LLM will do the work and let the knowledge evaporate into chat history.

  5. Design for cross-domain from day one. If there's any chance your knowledge spans multiple projects, cases, clients, or research areas — add a domain tag to your frontmatter now. Shared entities (people, organizations, concepts that appear in multiple domains) become the most valuable nodes in your graph. Retrofitting this is painful.

  6. The human owns verification. The wiki pattern works. But "the LLM owns this layer entirely" needs a caveat for anyone using this in high-stakes contexts. The LLM can synthesize without citing, and you won't notice unless you look. Build source citation into your schema rules, and budget time to spot-check the wiki — not just the deliverables. The LLM is the writer. You're the editor-in-chief.

@xoai
Copy link
Copy Markdown

xoai commented Apr 5, 2026

Built this. sage-wiki — a single Go binary working cross platforms that does exactly what you described end-to-end:

sage-wiki init --vault on an existing Obsidian vault, or simply run in a new empty folder.

Edit config.yaml to add API key, pick any LLM you want.

sage-wiki compile for the first time compile
sage-wiki compile --watch to incrementally compile sources into wiki articles with concepts, backlinks, and cross-references

The compiled outputs go back into Obsidian as markdown with [[wikilinks]] and YAML frontmatter — graph view spans both your source docs and the compiled articles.

sage-wiki search "any keyword" for searching through the knowledge base
sage-wiki query "ask any question" for Q&A against the wiki with cited answers

Also built the linting piece you described. It catches inconsistencies, suggests missing connections, fills in gaps. Feels like having a research assistant that never forgets what it read.

If you want your familiar LLM interface working with your personal knowledge base? No problem.

sage-wiki serve exposes the wiki as an MCP server so any LLM agent can operate on it

The part that clicked for me was the same thing you mentioned, filing query outputs back into the wiki. Once you start doing that, the knowledge base genuinely compounds. Every question you ask makes it better at answering the next one.

@KeremSalman
Copy link
Copy Markdown

Andrej, this is an absolute paradigm shift. Thank you.

I am currently going through a massive operational and personal "hard reset" in my life. I’ve been struggling with the stateless, fragmented nature of traditional RAG systems for personal knowledge management. Your concept of treating the LLM not just as a search engine, but as a continuously running "compiler" over a Markdown codebase provided the exact architecture I needed.

I am implementing this today as KS_LIFE_OS. I am feeding my raw daily data (physical rehab logs for a torn Achilles, complex VC meeting transcripts, and mental state markers) into the system, letting the LLM "lint" and compile them into a deterministic, version-controlled personal wiki in Obsidian.

As the lead architect of a Zero-Trust / Fail-Closed verification protocol (Mnemosyne), this approach deeply resonates with me. True memory isn't about semantic retrieval; it's about state management, lineage, and verifiable truth.

Thank you for open-sourcing your clarity. It just became the foundation of my reconstruction.

KS - Chief ArchiTech, Mnemosyne

@karesansui-u
Copy link
Copy Markdown

@karpathy
日本語失礼!
是非知ってもらいたいんだけど、トークン量じゃなくて、矛盾の蓄積が性能劣化の根本原因だから、
自律的に矛盾を解消するアーキテクチャにしたらすごいいい感じになったよ。

だからそのまま要約・圧縮してもそのデータをもっていても、いつかは性能劣化しちゃう。

論理崩壞は背景的にちゃんと数学的な理由があって、
それを解説してるから是非見てほしい!

https://github.com/karesansui-u/delta-zero
https://zenodo.org/records/19396452
https://zenodo.org/records/19396459

詳しく書いてます。
論文のPDFとgithubのURLを高性能なLLMに読み込ませて!
面白い反応すると思うよ!

・条件付き定理でLean証明済み
・条件のA3は近似で良いことも証明済み
・LLM実験でも効果あり確認済み

みんな早く気付いてくれるといいんだけど誰にも伝わらなくて困ってたよ。ありがとう。

@VictorVVedtion
Copy link
Copy Markdown

Loved this pattern. We implemented it in Vibe Sensei — an AI trading terminal with 52 historical master guardians (Soros, Livermore, Buffett, etc.) that watch your trades and warn you in character.

Here's how we adapted the LLM Wiki pattern for real-time trading:

Three-Layer Architecture (same spirit, trading twist)

  1. Raw Sources → JSONL Event Store: Every trade, guardian alert, ghost warning, regime change, and circuit breaker fires into ~/.vibe-sensei/events/YYYY-MM.jsonl. Nine event types, append-only, Zod-validated on read-back.

  2. The Wiki → ~/.vibe-sensei/wiki/: Markdown articles organized by domain:

    • markets/BTC-USDT.md — Per-symbol stats, win rate, regime history
    • patterns/overview.md — Behavioral pattern frequency tables
    • self/profile.md — Trader strengths/weaknesses (auto-derived)
    • notes/ — Query file-back articles (the compounding loop!)
  3. The Schema → WikiTool: 6 operations matching Karpathy's model — compile, query, ingest, lint, browse, status.

Key Adaptations

Dual compilation mode: Gemini 2.5 Flash for rich analysis, but a pure template fallback that generates valid wiki from statistics alone — zero API dependency. The wiki always works.

Incremental compilation: .compile-state.json tracks the last processed event. Only new events get compiled. Template mode reads all events (to avoid erasing history); LLM mode gets a delta + existing article context.

Guardian context injection: After every trade, the guardian observer calls queryWikiBySymbol(symbol) → injects ~400 chars of your historical performance with that symbol directly into the guardian's personalized alert. Your guardian literally remembers your trading history with each asset.

The compounding loop (my favorite part): query with fileBack=true synthesizes an answer from multiple wiki articles, then files the synthesis as a new article in notes/. Next query benefits from the synthesis. Knowledge compounds.

Morning brief: On first startup each day, the system auto-compiles (if needed) then generates a brief: current regime + your top behavioral pattern + discipline streak + alert-heeding accuracy + wiki health score. All voiced by your assigned guardian's personality.

Counterfactual tracking: We track which guardian alerts you heeded vs ignored, then measure outcome accuracy. This feeds back into the wiki's trader profile — the system learns whether its own advice was good.

What we learned

  • Template fallback is non-negotiable. LLM APIs fail; your knowledge base shouldn't.
  • ~400 chars is the sweet spot for context injection — enough to be useful, not enough to distract the LLM.
  • The file-back loop from queries → new articles is where the magic happens. It turns passive Q&A into active knowledge accumulation.
  • JSONL event store + markdown wiki is a surprisingly robust combo. Human-readable, git-friendly, zero infrastructure.

Built with Bun + TypeScript. The wiki system is ~2000 lines across compiler, query engine, ingest pipeline, health auditor, and the guardian integration layer.

Repo: github.com/VictorVVedtion/vibe-sensei

@pjmattingly
Copy link
Copy Markdown

Hi, thanks for this. I've been working on implementing something similar, but using NotebookLM as the backing "wiki" layer. Here's the latest ...

see:
https://github.com/pjmattingly/Claude-persistent-memory

It's not ready for release, but I'd welcome feedback.

Take care. <3

@ycc42
Copy link
Copy Markdown

ycc42 commented Apr 5, 2026

Thanks for sharing! Excited to put this into practice

@hrishikeshs
Copy link
Copy Markdown

This is exactly what I've been trying to do with this PR on claude code: anthropics/claude-code#25879

and a version of it is built into my emacs manager: https://github.com/hrishikeshs/magnus

@mpazik
Copy link
Copy Markdown

mpazik commented Apr 5, 2026

I've been doing this for a while now and there are two things that break first.

Queries. Once you're past a few hundred pages you want to ask your wiki things. "What did I add last week about X?" "Show me everything tagged unverified." You can't do that by reading files. The index helps early on but it doesn't scale.

Structure. It creeps in whether you plan it or not. Frontmatter, naming conventions, folder rules. The wiki grows a schema on its own. At some point you realize you're fighting your tools instead of working with them.

That's what got me to flip it. Instead of files that slowly become a database, start from structured data that renders as markdown. The index isn't a file the agent maintains by hand. It's a query. Always current.

I've been building Binder around this. Data goes into a transaction log, gets indexed in SQLite, and every entity shows up as a markdown file you can edit in whatever editor you want. Edits go back in. Agent writes through an API. Both directions.

https://assets.binder.do/binder-demo.mp4

@localwolfpackai
Copy link
Copy Markdown

with the Ingest/Query operation, a good idea might be to include a Divergence Check. Every time the LLM updates a concept page, it must generate a hidden section called ## Counter-Arguments & Data Gaps.

So if you ingest 5 articles praising a specific UI framework, the LLM should be tasked to search for (or simulate) the most sophisticated critique of that framework. could make a good sanitized version of your own biases.

ive been noticing my bias more lately....maybe just me 😉

@Astro-Han
Copy link
Copy Markdown

Turned this into a plug-and-play skill for Claude Code / Cursor / Codex. One install, then just tell your agent "ingest this URL" and it handles the raw → wiki compilation, cross-references, and index.

npx add-skill Astro-Han/karpathy-llm-wiki

The part that clicked for me: once you set up the three-layer flow (raw → wiki → index), each new source genuinely enriches the existing articles instead of just piling up. The wiki compounds.

https://github.com/Astro-Han/karpathy-llm-wiki

@tlk3
Copy link
Copy Markdown

tlk3 commented Apr 5, 2026

vibe-coded a potentially better IDE for this kind of thinking flow: https://github.com/anuragrpatil23/Thinking-Space

Curious to hear any thoughts or feedback from folks trying similar setups!   tldr: Obsidian updated for the Claude Code / agent era — local-first AI native Markdown workspace

This looks sick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment