Skip to content

Instantly share code, notes, and snippets.

@secondrealm
Created April 7, 2026 09:51
Show Gist options
  • Select an option

  • Save secondrealm/3c723ec1fc4a7d6e3fa2204a47e0017c to your computer and use it in GitHub Desktop.

Select an option

Save secondrealm/3c723ec1fc4a7d6e3fa2204a47e0017c to your computer and use it in GitHub Desktop.
llm-local-recall

LLM Local Recall

This is an idea note on extending LLM memory through local retrieval. It captures a system I’ve been using, not a finished product. I’m not a developer, so I’m just trying to describe the system I’ve been hacking together at a level where others can build on it, improve it, extend it, or adapt it.

The problem

I kept running into the same problem: I knew I’d already done the work somewhere, but I couldn’t reliably get back to it.

Not because it was gone. Not because I’d failed to think it through. The problem was that my past work was spread across too many places and too many formats to be consistently useful when I needed it. Some of it was in old LLM chats. Some of it was in notes apps. Some of it lived in PDFs, screenshots, exports from tools I no longer use, half-finished drafts, or folders I hadn’t opened in months. The material existed. The retrieval didn’t.

The more I worked with tools like OpenClaw, the more obvious that became. A lot of the conversation around LLMs is framed around generation: better answers, better summaries, better outputs. But that wasn’t the problem I kept having. I didn’t need the system to invent a new answer nearly as often as I needed it to help me recover something I’d already written, already thought through, or already decided.

That’s where LLM Local Recall comes from.

The idea

This is my attempt to think through a different kind of system: not one that mainly generates, but one that helps recover. Not a tool for replacing thought, but a system for making prior thought retrievable.

The basic idea is simple. Take the scattered remains of your digital history and turn them into a local, searchable archive that an agent can query directly. Instead of forcing a model to reconstruct context from scratch every time, the system processes what already exists and indexes it in a form that can return exact passages with source references attached.

That last part matters to me. I don’t just want something that sounds right. I want something grounded enough to show me the actual phrasing, the actual source, and the actual location where the idea already appeared.

The focus isn’t on generating new answers.

It’s on reliably finding what already exists.

How it works

At a high level, the system has three parts:

  • a local library of raw material
  • a processing and indexing layer
  • an agent interface that can search the result

The local library is just the intake point. Files go in as they are. The processing layer transforms those files into a consistent internal form. The index stores the processed results in a way that can be queried quickly and locally. The agent sits on top of that and returns grounded results when asked.

What I like about this is that it doesn’t require perfect organization at the point of capture. That matters because most systems quietly assume the user is going to become more disciplined: better file naming, better tagging, better folder structure, better habits. But most real digital history is messy. It accumulates unevenly. It reflects life as it was actually lived, not as it should’ve been documented.

So I wanted something that standardizes the mess after the fact.

Files get dropped into the library. The system processes them. The agent can then query the index and return grounded results. That’s the whole premise.

What goes in

The input side is intentionally broad because the problem itself is broad.

If something exists somewhere in your digital history, it should be possible to ingest it. That includes:

  • LLM chat transcripts
  • notes
  • PDFs
  • screenshots
  • exports from older tools
  • unfinished drafts
  • old folders that still contain useful work

The point isn’t to privilege one format over another. The point is to collapse unlike formats into a shared retrieval layer.

That matters because the work I want back is rarely all in one kind of file. A useful sentence might be trapped in a screenshot. A decision might live in an old chat. A half-formed but important thought might only exist in a draft. If the system only works well on one type of input, it misses the actual shape of the problem.

So the system has to be format-agnostic. It has to assume that valuable material is already spread across incompatible containers and treat that as normal.

What it does

Once material enters the library, the system transforms it into a more consistent internal representation.

For my own setup, that “processing” layer is a chain of scripts.

The first set of scripts pulled metadata from each input and created two separate outputs:

  • _summaries
  • _exports

And those are tied back to the original input through a unique ID so they could always be traced to the source.

The metadata generated for the _summaries layer looks like this:

{
  "document_id": "conv_2023_04_18_001",
  "source_path": "exports/conversations/conv_2023_04_18_001.json",
  "summary_path": "_summaries/conv_2023_04_18_001.md",
  "full_text_path": "_exports/conv_2023_04_18_001.md",
  "keywords": ["pricing", "positioning", "website copy"],
  "categories": ["llm-chat", "strategy", "writing"],
  "summary": "Conversation about homepage positioning and messaging direction.",
  "processed_at": "2026-04-07T00:00:00Z"
}

Another script then chunked the full-text _exports into roughly 250-word segments and attached related keywords to those too, with each chunk connected back to its source with a unique ID.

Once those layers exist, they are imported into a local SQLite database (e.g.,recall.db).

On top of that, I created a simple skill for my agent to use when searching the database by invoking something like:

recall {query}

That skill calls another script that queries the database and writes the relevant matches out to JSON. My agent then parses those results, sometimes across hundreds of matches, and uses them in relation to the conversation we’re already having.

I intentionally didn’t overdesign that final step. I wanted to see how my agent would naturally handle the retrieved material before I started forcing a more rigid output structure onto it.

I wanted the archive to feel like something I could own, not a feature I rented.

What retrieval looks like

The retrieval experience should be boring in the best possible way.

You ask the system to search for something. It returns:

  • relevant snippets
  • where they came from
  • enough source detail to verify what you’re seeing
  • ideally exact quoted lines or very tight passages

That’s where I think a lot of systems drift in the wrong direction. They become very good at sounding helpful while losing contact with the underlying material. They reconstruct. They paraphrase. They smooth over uncertainty. They give you an answer that feels coherent, but they don’t always give you the thing itself.

LLM Local Recall is meant to lean the other way.

The system isn’t trying to reassemble a plausible version of your past thinking. It’s trying to surface your past thinking directly.

That’s the difference between synthesis and recall, and for this project recall comes first.

Why this matters

Once your digital history becomes searchable in this way, your archive stops behaving like storage and starts behaving more like memory.

Old decisions become easier to revisit. Prior phrasing becomes reusable. Ideas that were effectively lost because they were buried in the wrong place become available again. Instead of re-deriving the same thought for the fifth time, you can recover the earlier version and decide whether it still holds up.

That shift may sound small, but I don’t think it is. A lot of intellectual work gets wasted not because it was bad, but because it became unretrievable. We keep producing material, but we don’t keep it available. So every new problem gets treated as if it begins from zero.

I wanted something that pushed against that. Not by pretending every old note is profound, but by making more of my own trail usable. Not everything deserves to be elevated into a maintained theory. But a surprising amount of it deserves to remain findable.

That’s what I think this system makes possible.

What it doesn’t do

I also want to be clear about the boundary here.

This system extends local memory, not understanding.

It doesn’t:

  • automatically reconcile contradictions
  • refine old thinking into better thinking by itself
  • produce a unified worldview from years of scattered material
  • decide which version of you was right
  • turn retrieval into judgment

It retrieves what exists.

That may sound narrow, but I think narrowing the promise is part of the point. Too many systems jump too quickly to claiming they understand the user when what they really do is reword what they find. I’d rather build from a more honest foundation: make the archive searchable first, keep the source attached, and let interpretation happen afterward.

Where it could go

What I have here is a system I’ve been using, not a finished product.

I’m describing it from the perspective of trying to solve my own retrieval problem, but I don’t think the problem is unique to me. Anyone with years of digital residue across chats, notes, screenshots, exports, and drafts is dealing with some version of the same thing. The details will vary, but the underlying frustration is familiar: useful work exists, but access to it is unreliable.

That’s why I think this is worth putting in front of other people to build on.

The core system can remain the same even if the implementation changes. Someone else might build:

  • a cleaner ingestion pipeline
  • better OCR handling
  • stronger chunking logic
  • richer metadata
  • a better retrieval interface
  • browsing tools on top of the index
  • a writing layer that maintains higher-level pages over time

What matters is simpler: treat digital sprawl as recoverable material, process it into a consistent retrieval layer, and make exact source-grounded recall the first priority.

Everything else can evolve from there.

Once retrieval works well, it also becomes easier to imagine a second layer built on top of it: persistent pages, updated summaries, tracked contradictions, shifts in viewpoint over time, or more durable syntheses that remain attached to the source material beneath them. That’s where the system starts moving from recall toward maintained understanding.

But I don’t think it makes sense to skip the first step.

The temptation with ideas like this is to jump immediately toward something grander: a living knowledge engine, a self-updating personal archive, a model that tracks all your beliefs and changes over time. I think those things can be enhanced by extending memory this way. But before any of that, I think there’s a more practical and more necessary move: make the past findable locally.

That alone solves a real problem.

TL;DR

LLM Local Recall is a system for turning scattered digital history into a local, searchable archive of prior work.

  • Files go into a library.
  • The system extracts, normalizes, chunks, and indexes them into a local database.
  • An agent can then query that archive and return exact snippets with source references intact.

The point isn’t better generation.

It’s better recall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment