Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@emailhuynhhuy
Copy link
Copy Markdown

Thank you for sharing. Your post gave me the courage to share my own 'raw' progress — and helped me understand why what I built actually works.

The problem that broke my trust in generation:
Using cloud LLMs or NotebookLM to build n8n automation workflows kept producing the same failure mode: plausible-looking JSON that missed critical execution details. The logic looked right. It failed silently in production. For complex automation, "mostly correct" isn't a degraded state — it's a broken state.

What I built instead — a Deterministic Retrieval System:

I organized thousands of validated n8n workflow JSONs on a local NAS. Each is mapped to an Obsidian MD file with rich metadata: tags, process steps, and a direct pointer to the source JSON.

It maps directly to your three-layer architecture:

  • Raw sources: validated JSONs — immutable, never touched by the LLM
  • Wiki layer: Obsidian MD files — not for reading, but for navigation
  • Schema: the local AI acts purely as a router. It traverses the graph, finds the right metadata pointer, and retrieves the pre-validated JSON for the team to paste and run.

Instead of asking an LLM to generate a workflow, we ask it to find one. 100% reliable. No hallucinated logic.

Your framing of the wiki as a "persistent, compounding artifact" is what made this click. The Obsidian graph is my fast navigation layer — seeing how workflows connect, identifying direction. The NAS is the deep execution layer — deterministic, no surprises.

Where I'm taking this next:

I'm now applying this same pointer-based pattern to other knowledge bases beyond workflows — testing whether the same reliability holds when the "source of truth" is less structured than JSON (documentation, SOPs, client briefs). The hypothesis is that the pattern generalizes: as long as the retrieval layer is deterministic and the wiki layer handles navigation, generation becomes optional rather than necessary.

The tension I can't fully resolve yet:

Pointer-based retrieval works perfectly when there's a match. But when a novel request arrives — something that doesn't exist in the library — the system is blind. Falling back to generation breaks the reliability I've built. Staying purely deterministic means the system can't grow into genuinely new territory.

Your wiki pattern handles novelty well because the LLM can still synthesize across existing pages. I'm wondering if there's a hybrid path: deterministic retrieval for known cases, but a wiki-style synthesis layer that absorbs novel cases over time — and promotes them into validated sources once tested in production.

Do you see a way to maintain that level of reliability at the retrieval layer while keeping the system fluid at the edges?

@K-Edmonds-G42
Copy link
Copy Markdown

I think the hope is that Grokipedia becomes a large scale version of this.

@bitsofchris
Copy link
Copy Markdown

bitsofchris commented Apr 7, 2026

I've been running this pattern against my personal Obsidian vault with 4,000+ journal entries, research notes, and project logs over 2+ years. Not curated papers per topic but like my real, everything second brain.

A few things I hit that might save others time:

  • Index files will break. It is simple and a great step on the path of "giving the LLM a map" so agentic retrieval can work. At 100 curated articles, auto-maintained indexes work great. At thousands of messy personal notes, with heterogenous note types and over lapping topics, you need some basic ETL from data engineering. And even then, naive semantic search returns 10 versions of your loudest thought — not 10 facets of your thinking. What actually fixed retrieval quality for me: overfetch 3x, deduplicate near-identical content, then re-rank for diversity (MMR). The difference is night and day. I did try more advanced versions of this by clustering on embeddings and summarizing clusters, this is pretty cool but the simpler de-dupe on retrieval helped a lot.
  • Links are the whole thing. I treat tags and links as first-class graph nodes, not just metadata. Then the agent can traverse from a search hit into the thought neighborhood around it. That's where the compound value lives. You're building a graph in this pattern whether you call it one or not. It also makes my new capture flow much easier b/c the LLM helps me maintain my taxonomy of work streams and topics.
  • Write-back is the key to compounding. The gist mentions filing outputs back into the wiki almost in passing, but after two years I think it's the single most important part. The knowledge base should grow through use, not just ingestion. Every research session, every synthesis, every new connection the agent find is written back. This is great for snipping key ideas from AI chat conversations (it's aMCP server I use so I can export data out of Claude or GPT ui easily.) It also helps me track active work streams. I always make it clear though which data I wrote and limit the agent to write to a specific location. Makes it easy to filter out what's my thinking.

Been building the open source tooling for this as an MCP server — one SQLite file, works with Claude natively: https://github.com/bitsofchris/openaugi.

If anyone's trying this with personal notes instead of research papers, happy to talk. I've also been using it to manage my human context better so I can orchestrate multiple agents.

@emailhuynhhuy
Copy link
Copy Markdown

emailhuynhhuy commented Apr 7, 2026

I've been running this pattern against my personal Obsidian vault with 4,000+ journal entries, research notes, and project logs over 2+ years. Not curated papers per topic but like my real, everything second brain.

A few things I hit that might save others time:

  • Index files will break. It is simple and a great step on the path of "giving the LLM a map" so agentic retrieval can work. At 100 curated articles, auto-maintained indexes work great. At thousands of messy personal notes, with heterogenous note types and over lapping topics, you need some basic ETL from data engineering. And even then, naive semantic search returns 10 versions of your loudest thought — not 10 facets of your thinking. What actually fixed retrieval quality for me: overfetch 3x, deduplicate near-identical content, then re-rank for diversity (MMR). The difference is night and day. I did try more advanced versions of this by clustering on embeddings and summarizing clusters, this is pretty cool but the simpler de-dupe on retrieval helped a lot.
  • Links are the whole thing. I treat tags and links as first-class graph nodes, not just metadata. Then the agent can traverse from a search hit into the thought neighborhood around it. That's where the compound value lives. You're building a graph in this pattern whether you call it one or not. It also makes my new capture flow much easier b/c the LLM helps me maintain my taxonomy of work streams and topics.
  • Write-back is the key to compounding. The gist mentions filing outputs back into the wiki almost in passing, but after two years I think it's the single most important part. The knowledge base should grow through use, not just ingestion. Every research session, every synthesis, every new connection the agent find is written back. This is great for snipping key ideas from AI chat conversations (it's aMCP server I use so I can export data out of Claude or GPT ui easily.) It also helps me track active work streams. I always make it clear though which data I wrote and limit the agent to write to a specific location. Makes it easy to filter out what's my thinking.

Been building the open source tooling for this as an MCP server — one SQLite file, works with Claude natively: https://github.com/bitsofchris/openaugi.

If anyone's trying this with personal notes instead of research papers, happy to talk. I've also been using it to manage my human context better so I can orchestrate multiple agents.

This resonates a lot — especially your point that write-back is the core mechanic. I went through a similar phase, but eventually realized that write-back alone doesn’t complete the learning loop.

What made a real difference for me was reframing the system around decision-based learning, not just knowledge accumulation.

A working learning loop looks like this:

  1. Task → not just retrieval, but decision context
    The system shouldn’t just surface relevant notes, but structure the problem into actionable options (A/B/C with reasoning and risk).

  2. Decision → human stays in the loop
    The user selects an option. This step is critical — without an explicit decision, the system has nothing to learn from.

  3. Outcome → success or failure signal
    After execution, the result is captured. Not just “what was done”, but whether it worked.

  4. Memory → structured experience, not raw notes
    Instead of just writing back summaries, I store:

decisions made
context
outcomes (success / failure)

Especially failure cases — they become high-priority constraints for future reasoning.

  1. System update → affects future routing and reasoning
    The key is that this memory is not passive. It actively changes:

what options are generated
which paths are avoided
how future tasks are interpreted

So the loop becomes:

task → options → decision → outcome → memory → better options next time

At that point, the system is no longer just retrieving knowledge or maintaining a graph — it’s accumulating operational experience.

That was the shift for me from “second brain” to something closer to a decision system.

Really appreciate you sharing your experience — especially the scaling challenges with real-world data. This is the learning loop i built for my small local LLM.

@waydelyle
Copy link
Copy Markdown

Built SwarmVault as an open-source TypeScript CLI that implements this pattern end-to-end: ingestcompilequerylint, with a persistent markdown wiki, knowledge graph (community detection, god nodes, confidence-scored edges), and local search index. Save-first queries, candidate staging before pages go live, per-project schemas, code-aware ingestion, and an MCP server for agent interop. Works with OpenAI, Anthropic, Gemini, Ollama, or any compatible backend. Directly inspired by this gist. Feedback from the discussion here shaped a lot of the design (candidate buffers, grounding in sources, scheduled agents). https://github.com/swarmclawai/swarmvault

@007bsd
Copy link
Copy Markdown

007bsd commented Apr 7, 2026

Great stuffs! Any examples one could refer to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment