rsbohn/clip-wiggles.md

## clip-wiggles.md

      
    Raw
  

              clip-wiggles.md
            
          
    2026-03-03 — Wiggles

Goal

Explore zero-shot image classification using CLIP embeddings already stored in the
artifacts llm collection (125 images), then build an interactive visualization
to explore the results.
What We Learned

llm-clip


Plugin: llm-clip v0.1, installed in llm's uv tool environment
Model: clip-ViT-B-32 via sentence-transformers
Embeddings: 512-dimensional float32, stored as blobs in
~/.config/io.datasette.llm/embeddings.db
Both text and binary (image) input supported — same embedding space
Output of llm embed -m clip -c "text" is a JSON array

llm embed-multi


--files <dir> <glob> --binary -m clip embeds whole directories of images
Incremental: skips already-stored IDs on re-run
--prefix namespaces IDs across batches

Zero-shot classification trick

CLIP's shared text/image embedding space means you can query an image collection
with plain text labels via llm similar artifacts -c "birth record". Score ranges:

~0.18 = no match
~0.29–0.33 = reasonable match

Parallel coordinates via wigglystuff


ParallelCoordinates(df, height=360, width=1200) — axes = columns, lines = rows
Wrap with mo.ui.anywidget(...) in marimo
widget.filtered_indices gives row indices of currently brushed selection
Source page: https://koaning.github.io/wigglystuff/examples/parallelcoords/

What We Built

skills/llm-clip/SKILL.md

Reference for three commands:

/llm-clip embed — batch-embed images with llm embed-multi
/llm-clip search — text query against image collection
/llm-clip classify — zero-shot classification script

skills/llm-clip/classify.py

CLI script: runs llm similar for each label, aggregates scores, outputs a
table or CSV of image → best label.
uv run skills/llm-clip/classify.py artifacts "birth record" "census" "photograph"

explore.py — marimo notebook

Pipeline:

Read all 125 CLIP embeddings directly from SQLite (fast, no model load)
Embed each text label via subprocess llm embed -m clip -c "..." (~2s/label)
Compute cosine similarity in numpy
Build polars DataFrame: rows = images, columns = per-label scores
Render ParallelCoordinates — brush axes to filter images interactively
Show filtered image IDs below the plot

Issues & Fixes


polars not in initial uv add — all marimo cells failed with internal errors.
Fix: uv add polars.
Marimo doesn't auto-open browser in WSL — open http://localhost:2718 manually.
Each llm embed subprocess reloads CLIP (~2s each) — 7 labels takes ~15s total.
Acceptable for now; future optimization: embed all labels in one subprocess call.
model_id column doesn't exist in collections table — correct column is model.
"Failed to update model / invalid server token" popup: browser localStorage holds
a stale token from a previous marimo session. Fix: run with a fixed token password
so it's stable across restarts:
uv run marimo run explore.py --token-password classify

Access at http://localhost:2718?access_token=classify.

Next Steps


Show image thumbnails for the filtered selection (need image file paths)
Speed up label embedding: one subprocess call, newline-separated labels
Try color_by on the highest-scoring label column


## explore.py
import marimo

__generated_with = "0.20.3"
app = marimo.App(width="full")


@app.cell
def _():
    import marimo as mo
    import polars as pl
    import numpy as np
    import sqlite3
    import struct
    import subprocess
    import json
    from pathlib import Path
    return Path, json, mo, np, pl, sqlite3, struct, subprocess


@app.cell
def _(mo, sqlite3):
    DB = "$HOME/.config/io.datasette.llm/embeddings.db"
    _con = sqlite3.connect(DB)
    _clip_collections = [
        row[0] for row in _con.execute(
            "SELECT name FROM collections WHERE model='clip' ORDER BY name"
        )
    ]
    collection_dropdown = mo.ui.dropdown(
        options=_clip_collections,
        value=_clip_collections[0],
        label="Collection",
    )
    collection_dropdown
    return DB, collection_dropdown


@app.cell
def _(mo):
    labels_input = mo.ui.text_area(
        value=(
            "birth record\n"
            "death record\n"
            "census record\n"
            "marriage record\n"
            "photograph\n"
            "printed document\n"
            "handwritten document"
        ),
        label="Labels (one per line)",
        rows=8,
    )
    labels_input
    return (labels_input,)


@app.cell
def _(labels_input):
    labels = [l.strip() for l in labels_input.value.splitlines() if l.strip()]
    labels
    return (labels,)


@app.cell
def _(DB, collection_dropdown, np, sqlite3, struct):
    def load_embeddings(collection):
        con = sqlite3.connect(DB)
        cid = con.execute(
            "SELECT id FROM collections WHERE name=?", (collection,)
        ).fetchone()[0]
        rows = con.execute(
            "SELECT id, embedding FROM embeddings WHERE collection_id=?", (cid,)
        ).fetchall()
        ids = [r[0] for r in rows]
        vecs = np.array([
            struct.unpack(f"{len(r[1])//4}f", r[1]) for r in rows
        ])
        return ids, vecs

    artifact_ids, artifact_vecs = load_embeddings(collection_dropdown.value)
    print(f"Loaded {len(artifact_ids)} embeddings, shape {artifact_vecs.shape}")
    return artifact_ids, artifact_vecs, load_embeddings


@app.cell
def _(artifact_vecs, json, labels, np, subprocess):
    # Embed each label text via llm CLI, then compute cosine similarity
    def embed_text(text):
        result = subprocess.run(
            ["llm", "embed", "-m", "clip", "-c", text],
            capture_output=True, text=True
        )
        return np.array(json.loads(result.stdout.strip()), dtype=np.float32)

    def cosine_sim(vecs, query):
        # vecs: (N, D), query: (D,)
        norms = np.linalg.norm(vecs, axis=1) * np.linalg.norm(query)
        return (vecs @ query) / np.where(norms == 0, 1, norms)

    scores = {}
    for _label in labels:
        scores[_label] = cosine_sim(artifact_vecs, embed_text(_label))

    print(f"Scored {len(labels)} labels against {artifact_vecs.shape[0]} images")
    return cosine_sim, embed_text, scores


@app.cell
def _(artifact_ids, labels, mo, pl, scores):
    from wigglystuff import ParallelCoordinates

    score_data = {"id": artifact_ids}
    for _label in labels:
        score_data[_label] = scores[_label].tolist()

    df = pl.DataFrame(score_data)

    widget = mo.ui.anywidget(
        ParallelCoordinates(df.drop("id"), height=360, width=1200)
    )
    widget
    return ParallelCoordinates, df, score_data, widget


@app.cell
def _(df, mo, widget):
    filtered_ids = df["id"][widget.filtered_indices].to_list()
    mo.md(
        f"**{len(filtered_ids)} / {len(df)} images selected**\n\n"
        + "\n".join(f"- `{id_}`" for id_ in filtered_ids[:20])
        + ("\n- ..." if len(filtered_ids) > 20 else "")
    )
    return (filtered_ids,)
	import marimo

	__generated_with = "0.20.3"
	app = marimo.App(width="full")


	@app.cell
	def _():
	import marimo as mo
	import polars as pl
	import numpy as np
	import sqlite3
	import struct
	import subprocess
	import json
	from pathlib import Path
	return Path, json, mo, np, pl, sqlite3, struct, subprocess


	@app.cell
	def _(mo, sqlite3):
	DB = "$HOME/.config/io.datasette.llm/embeddings.db"
	_con = sqlite3.connect(DB)
	_clip_collections = [
	row[0] for row in _con.execute(
	"SELECT name FROM collections WHERE model='clip' ORDER BY name"
	)
	]
	collection_dropdown = mo.ui.dropdown(
	options=_clip_collections,
	value=_clip_collections[0],
	label="Collection",
	)
	collection_dropdown
	return DB, collection_dropdown


	@app.cell
	def _(mo):
	labels_input = mo.ui.text_area(
	value=(
	"birth record\n"
	"death record\n"
	"census record\n"
	"marriage record\n"
	"photograph\n"
	"printed document\n"
	"handwritten document"
	),
	label="Labels (one per line)",
	rows=8,
	)
	labels_input
	return (labels_input,)


	@app.cell
	def _(labels_input):
	labels = [l.strip() for l in labels_input.value.splitlines() if l.strip()]
	labels
	return (labels,)


	@app.cell
	def _(DB, collection_dropdown, np, sqlite3, struct):
	def load_embeddings(collection):
	con = sqlite3.connect(DB)
	cid = con.execute(
	"SELECT id FROM collections WHERE name=?", (collection,)
	).fetchone()[0]
	rows = con.execute(
	"SELECT id, embedding FROM embeddings WHERE collection_id=?", (cid,)
	).fetchall()
	ids = [r[0] for r in rows]
	vecs = np.array([
	struct.unpack(f"{len(r[1])//4}f", r[1]) for r in rows
	])
	return ids, vecs

	artifact_ids, artifact_vecs = load_embeddings(collection_dropdown.value)
	print(f"Loaded {len(artifact_ids)} embeddings, shape {artifact_vecs.shape}")
	return artifact_ids, artifact_vecs, load_embeddings


	@app.cell
	def _(artifact_vecs, json, labels, np, subprocess):
	# Embed each label text via llm CLI, then compute cosine similarity
	def embed_text(text):
	result = subprocess.run(
	["llm", "embed", "-m", "clip", "-c", text],
	capture_output=True, text=True
	)
	return np.array(json.loads(result.stdout.strip()), dtype=np.float32)

	def cosine_sim(vecs, query):
	# vecs: (N, D), query: (D,)
	norms = np.linalg.norm(vecs, axis=1) * np.linalg.norm(query)
	return (vecs @ query) / np.where(norms == 0, 1, norms)

	scores = {}
	for _label in labels:
	scores[_label] = cosine_sim(artifact_vecs, embed_text(_label))

	print(f"Scored {len(labels)} labels against {artifact_vecs.shape[0]} images")
	return cosine_sim, embed_text, scores


	@app.cell
	def _(artifact_ids, labels, mo, pl, scores):
	from wigglystuff import ParallelCoordinates

	score_data = {"id": artifact_ids}
	for _label in labels:
	score_data[_label] = scores[_label].tolist()

	df = pl.DataFrame(score_data)

	widget = mo.ui.anywidget(
	ParallelCoordinates(df.drop("id"), height=360, width=1200)
	)
	widget
	return ParallelCoordinates, df, score_data, widget


	@app.cell
	def _(df, mo, widget):
	filtered_ids = df["id"][widget.filtered_indices].to_list()
	mo.md(
	f"{len(filtered_ids)} / {len(df)} images selected\n\n"
	+ "\n".join(f"- `{id_}`" for id_ in filtered_ids[:20])
	+ ("\n- ..." if len(filtered_ids) > 20 else "")
	)
	return (filtered_ids,)
No results found