Explore zero-shot image classification using CLIP embeddings already stored in the
artifacts llm collection (125 images), then build an interactive visualization
to explore the results.
- Plugin:
llm-clipv0.1, installed in llm's uv tool environment - Model:
clip-ViT-B-32viasentence-transformers - Embeddings: 512-dimensional float32, stored as blobs in
~/.config/io.datasette.llm/embeddings.db - Both text and binary (image) input supported — same embedding space
- Output of
llm embed -m clip -c "text"is a JSON array
--files <dir> <glob> --binary -m clipembeds whole directories of images- Incremental: skips already-stored IDs on re-run
--prefixnamespaces IDs across batches
CLIP's shared text/image embedding space means you can query an image collection
with plain text labels via llm similar artifacts -c "birth record". Score ranges:
- ~0.18 = no match
- ~0.29–0.33 = reasonable match
ParallelCoordinates(df, height=360, width=1200)— axes = columns, lines = rows- Wrap with
mo.ui.anywidget(...)in marimo widget.filtered_indicesgives row indices of currently brushed selection- Source page: https://koaning.github.io/wigglystuff/examples/parallelcoords/
Reference for three commands:
/llm-clip embed— batch-embed images withllm embed-multi/llm-clip search— text query against image collection/llm-clip classify— zero-shot classification script
CLI script: runs llm similar for each label, aggregates scores, outputs a
table or CSV of image → best label.
uv run skills/llm-clip/classify.py artifacts "birth record" "census" "photograph"
Pipeline:
- Read all 125 CLIP embeddings directly from SQLite (fast, no model load)
- Embed each text label via subprocess
llm embed -m clip -c "..."(~2s/label) - Compute cosine similarity in numpy
- Build polars DataFrame: rows = images, columns = per-label scores
- Render
ParallelCoordinates— brush axes to filter images interactively - Show filtered image IDs below the plot
polarsnot in initialuv add— all marimo cells failed with internal errors. Fix:uv add polars.- Marimo doesn't auto-open browser in WSL — open
http://localhost:2718manually. - Each
llm embedsubprocess reloads CLIP (~2s each) — 7 labels takes ~15s total. Acceptable for now; future optimization: embed all labels in one subprocess call. model_idcolumn doesn't exist incollectionstable — correct column ismodel.- "Failed to update model / invalid server token" popup: browser localStorage holds
a stale token from a previous marimo session. Fix: run with a fixed token password
so it's stable across restarts:
Access atuv run marimo run explore.py --token-password classifyhttp://localhost:2718?access_token=classify.
- Show image thumbnails for the filtered selection (need image file paths)
- Speed up label embedding: one subprocess call, newline-separated labels
- Try
color_byon the highest-scoring label column