Skip to content

Instantly share code, notes, and snippets.

View do-me's full-sized avatar

Dominik Weckmüller do-me

View GitHub Profile
@do-me
do-me / parquet_one_liners.sh
Last active March 4, 2026 13:17
Useful Parquet oneliners, view and manipulate with DuckDB
# view 10 lines
uvx duckdb -c "FROM 'results.parquet' LIMIT 10"
# add 2 new columns
uvx duckdb -c "COPY (SELECT *, NULL::INT AS dominik_label, NULL::VARCHAR AS dominik_comments FROM 'results.parquet') TO 'results.parquet'"
# remove 2 columns
uvx duckdb -c "COPY (SELECT * EXCLUDE (dominik_label, dominik_comments) FROM 'results.parquet') TO 'results.parquet'"
# sort by 2 columns
@do-me
do-me / download.py
Created February 19, 2026 17:08
Download all EO metadata from WEkEO platform via API
# /// script
# dependencies = [
# "requests",
# "pandas",
# "pyarrow",
# "tqdm",
# ]
# ///
import requests
@do-me
do-me / delete_files_in_hf_dataset.py
Created February 16, 2026 16:59
Delete wrongly prefixed files in a hf dataset
from huggingface_hub import HfApi, CommitOperationDelete
# Configure your repo details
repo_id = "user/reponame"
token = "your token"
api = HfApi(token=token)
# 1. List files specifically in the target folder
target_folder = "files/2026"
@do-me
do-me / dir_size.sh
Created February 16, 2026 15:58
Get size of dir, number of files and average file size
read s c < <(find images -type f -printf '%s\n' 2>/dev/null | awk '{t+=$1} END{print t, NR}'); printf "Total size: %s\nFiles: %d\nAverage: %s\n" "$(numfmt --to=iec --suffix=B $s)" "$c" "$(numfmt --to=iec --suffix=B $((c? s/c : 0)))"
@do-me
do-me / tmux.sh
Created February 16, 2026 11:57
Tmux cheat sheet
tmux kill-server
tmux new -s mysession
exit
tmux kill-session -t mysession
@do-me
do-me / app.py
Last active February 15, 2026 19:49 — forked from Maxxen/app.py
DuckDB Vector Tile Serve w/ Flask + MapLibre. Loads geoparquet file with spatial sampling and uv tooling. Run with uv run app.py
# /// script
# dependencies = [
# "duckdb",
# "flask"
# ]
# ///
import duckdb
import flask
import gzip
@do-me
do-me / delete.py
Created February 12, 2026 09:40
Batch delete parquet files on root level on huggingface dataset (when accidentally pushed), leave anything else intact
from huggingface_hub import HfApi, CommitOperationDelete, RepoFile
# Configure your repo details
repo_id = "user/repo"
token = "your token" # Ensure your token has 'write' permissions
api = HfApi(token=token)
# 1. List files in the repo (non-recursive)
files = api.list_repo_tree(repo_id, repo_type="dataset")
@do-me
do-me / query.sql
Created January 30, 2026 15:50
Query AlphaEarth Embeddings Tiles, which ones to download for your AOI? Just use https://demo.duckui.com/
INSTALL spatial;
LOAD spatial;
-- query for Milan, Italy
SELECT *
FROM 'https://data.source.coop/tge-labs/aef/v1/annual/aef_index.parquet'
WHERE wgs84_west <= 9.25
AND wgs84_east >= 9.10
AND wgs84_south <= 45.55
AND wgs84_north >= 45.40
@do-me
do-me / load_data.sql
Created January 28, 2026 13:35
Simple DuckUI query with duckdb wasm to HF with filters
SELECT
*
FROM 'hf://datasets/do-me/EUR-LEX/**/*.parquet'
WHERE
-- 1. Date filter (highly efficient for narrowing down files/rows)
CAST(date AS DATE) >= '2026-01-21'
-- 2. Your specific keywords (case-insensitive)
--AND regexp_matches(text, '(?i)copernicus|earth observation')
ORDER BY date DESC
@do-me
do-me / clone.sh
Created January 27, 2026 16:14
Clone repo from Huggingface with hf cli via uvx, excluding one directory
rm -rf ~/.cache/huggingface/.gitignore.lock;
HF_HUB_READ_TIMEOUT=300 HF_HUB_HTTP_TIMEOUT=300 uvx hf download \
EuropeanParliament/Eurovoc_2025 \
--repo-type dataset \
--exclude "files/*" \
--local-dir .