Skip to content

Instantly share code, notes, and snippets.

@rsbohn
Created March 4, 2026 19:46
Show Gist options
  • Select an option

  • Save rsbohn/e386672eff73e0b374c285c73431186e to your computer and use it in GitHub Desktop.

Select an option

Save rsbohn/e386672eff73e0b374c285c73431186e to your computer and use it in GitHub Desktop.
Combining llm-clip with Parallel Coordinates

2026-03-03 — Wiggles

Goal

Explore zero-shot image classification using CLIP embeddings already stored in the artifacts llm collection (125 images), then build an interactive visualization to explore the results.

What We Learned

llm-clip

  • Plugin: llm-clip v0.1, installed in llm's uv tool environment
  • Model: clip-ViT-B-32 via sentence-transformers
  • Embeddings: 512-dimensional float32, stored as blobs in ~/.config/io.datasette.llm/embeddings.db
  • Both text and binary (image) input supported — same embedding space
  • Output of llm embed -m clip -c "text" is a JSON array

llm embed-multi

  • --files <dir> <glob> --binary -m clip embeds whole directories of images
  • Incremental: skips already-stored IDs on re-run
  • --prefix namespaces IDs across batches

Zero-shot classification trick

CLIP's shared text/image embedding space means you can query an image collection with plain text labels via llm similar artifacts -c "birth record". Score ranges:

  • ~0.18 = no match
  • ~0.29–0.33 = reasonable match

Parallel coordinates via wigglystuff

What We Built

skills/llm-clip/SKILL.md

Reference for three commands:

  • /llm-clip embed — batch-embed images with llm embed-multi
  • /llm-clip search — text query against image collection
  • /llm-clip classify — zero-shot classification script

skills/llm-clip/classify.py

CLI script: runs llm similar for each label, aggregates scores, outputs a table or CSV of image → best label.

uv run skills/llm-clip/classify.py artifacts "birth record" "census" "photograph"

explore.py — marimo notebook

Pipeline:

  1. Read all 125 CLIP embeddings directly from SQLite (fast, no model load)
  2. Embed each text label via subprocess llm embed -m clip -c "..." (~2s/label)
  3. Compute cosine similarity in numpy
  4. Build polars DataFrame: rows = images, columns = per-label scores
  5. Render ParallelCoordinates — brush axes to filter images interactively
  6. Show filtered image IDs below the plot

Issues & Fixes

  • polars not in initial uv add — all marimo cells failed with internal errors. Fix: uv add polars.
  • Marimo doesn't auto-open browser in WSL — open http://localhost:2718 manually.
  • Each llm embed subprocess reloads CLIP (~2s each) — 7 labels takes ~15s total. Acceptable for now; future optimization: embed all labels in one subprocess call.
  • model_id column doesn't exist in collections table — correct column is model.
  • "Failed to update model / invalid server token" popup: browser localStorage holds a stale token from a previous marimo session. Fix: run with a fixed token password so it's stable across restarts:
    uv run marimo run explore.py --token-password classify
    
    Access at http://localhost:2718?access_token=classify.

Next Steps

  • Show image thumbnails for the filtered selection (need image file paths)
  • Speed up label embedding: one subprocess call, newline-separated labels
  • Try color_by on the highest-scoring label column
import marimo
__generated_with = "0.20.3"
app = marimo.App(width="full")
@app.cell
def _():
import marimo as mo
import polars as pl
import numpy as np
import sqlite3
import struct
import subprocess
import json
from pathlib import Path
return Path, json, mo, np, pl, sqlite3, struct, subprocess
@app.cell
def _(mo, sqlite3):
DB = "$HOME/.config/io.datasette.llm/embeddings.db"
_con = sqlite3.connect(DB)
_clip_collections = [
row[0] for row in _con.execute(
"SELECT name FROM collections WHERE model='clip' ORDER BY name"
)
]
collection_dropdown = mo.ui.dropdown(
options=_clip_collections,
value=_clip_collections[0],
label="Collection",
)
collection_dropdown
return DB, collection_dropdown
@app.cell
def _(mo):
labels_input = mo.ui.text_area(
value=(
"birth record\n"
"death record\n"
"census record\n"
"marriage record\n"
"photograph\n"
"printed document\n"
"handwritten document"
),
label="Labels (one per line)",
rows=8,
)
labels_input
return (labels_input,)
@app.cell
def _(labels_input):
labels = [l.strip() for l in labels_input.value.splitlines() if l.strip()]
labels
return (labels,)
@app.cell
def _(DB, collection_dropdown, np, sqlite3, struct):
def load_embeddings(collection):
con = sqlite3.connect(DB)
cid = con.execute(
"SELECT id FROM collections WHERE name=?", (collection,)
).fetchone()[0]
rows = con.execute(
"SELECT id, embedding FROM embeddings WHERE collection_id=?", (cid,)
).fetchall()
ids = [r[0] for r in rows]
vecs = np.array([
struct.unpack(f"{len(r[1])//4}f", r[1]) for r in rows
])
return ids, vecs
artifact_ids, artifact_vecs = load_embeddings(collection_dropdown.value)
print(f"Loaded {len(artifact_ids)} embeddings, shape {artifact_vecs.shape}")
return artifact_ids, artifact_vecs, load_embeddings
@app.cell
def _(artifact_vecs, json, labels, np, subprocess):
# Embed each label text via llm CLI, then compute cosine similarity
def embed_text(text):
result = subprocess.run(
["llm", "embed", "-m", "clip", "-c", text],
capture_output=True, text=True
)
return np.array(json.loads(result.stdout.strip()), dtype=np.float32)
def cosine_sim(vecs, query):
# vecs: (N, D), query: (D,)
norms = np.linalg.norm(vecs, axis=1) * np.linalg.norm(query)
return (vecs @ query) / np.where(norms == 0, 1, norms)
scores = {}
for _label in labels:
scores[_label] = cosine_sim(artifact_vecs, embed_text(_label))
print(f"Scored {len(labels)} labels against {artifact_vecs.shape[0]} images")
return cosine_sim, embed_text, scores
@app.cell
def _(artifact_ids, labels, mo, pl, scores):
from wigglystuff import ParallelCoordinates
score_data = {"id": artifact_ids}
for _label in labels:
score_data[_label] = scores[_label].tolist()
df = pl.DataFrame(score_data)
widget = mo.ui.anywidget(
ParallelCoordinates(df.drop("id"), height=360, width=1200)
)
widget
return ParallelCoordinates, df, score_data, widget
@app.cell
def _(df, mo, widget):
filtered_ids = df["id"][widget.filtered_indices].to_list()
mo.md(
f"**{len(filtered_ids)} / {len(df)} images selected**\n\n"
+ "\n".join(f"- `{id_}`" for id_ in filtered_ids[:20])
+ ("\n- ..." if len(filtered_ids) > 20 else "")
)
return (filtered_ids,)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment