Skip to content

Instantly share code, notes, and snippets.

View fblissjr's full-sized avatar

Fred Bliss fblissjr

View GitHub Profile
@fblissjr
fblissjr / plan_comparison_analysis.md
Created March 2, 2026 18:39
nndex perf-optimization: post-mortem and plan comparison analysis

Last updated: 2026-03-02

Report and Post-Mortem

Written by Claude, analysis and optimization strategy by Gemini Deep Think, implementation and modification by Claude

Summary

Three plans informed the fix for performance regressions on the perf-optimization branch. Gemini Deep Think v1 made the original optimizations -- batch search improved massively (60-64% Rust, 5.7x Python), but single-query APX regressed up to 383% and some exact cases regressed up to 58%. Gemini v2 was a revised plan after the regressions surfaced. Claude independently diagnosed the root causes and produced the final 6-step implementation plan.

The core story: Gemini Deep Think had the high-level strategy basically right for batch, but missed that its own Change 4 ("Remove ndarray allocations from ANN reranking") caused the APX problem. It didn't account for blas-accelerate being enabled in Python builds, which means ndarray::dot() dispatches to Apple Accelerate's AMX coprocessor -- not generic Rust matrixmultiply. Removing

@fblissjr
fblissjr / 4090build.sh
Last active February 5, 2026 14:28
rtx 4090 sm89 build script for various torch source compiles
#!/bin/bash
set -e
# --- Configuration ---
export TORCH_CUDA_ARCH_LIST="8.9"
export USE_CPP=1
export MAX_JOBS=$(nproc)
# Force CUDA 13.0 path if it exists, otherwise use default
if [ -d "/usr/local/cuda-13.0" ]; then
@fblissjr
fblissjr / claude-code-prompts.js
Created July 12, 2025 03:00 — forked from transitive-bullshit/claude-code-prompts.js
Unminified prompts and tool definitions for Claude Code
// Claude Code is a Beta product per Anthropic's Commercial Terms of Service.
// By using Claude Code, you agree that all code acceptance or rejection decisions you make,
// and the associated conversations in context, constitute Feedback under Anthropic's Commercial Terms,
// and may be used to improve Anthropic's products, including training models.
// You are responsible for reviewing any code suggestions before use.
// (c) Anthropic PBC. All rights reserved. Use is subject to Anthropic's Commercial Terms of Service (https://www.anthropic.com/legal/commercial-terms).
// Version: 0.2.9
@fblissjr
fblissjr / macos_memory.sh
Last active June 24, 2025 18:14 — forked from ivanfioravanti/mlx_memory.sh
Script to set MLX memory limits
#!/usr/bin/env bash
# --- Configuration ---
declare -A PRESETS
PRESETS["legacy_default"]="85 75" # Wired Limit: 85%, LWM: 75% (Original script's behavior)
PRESETS["conservative"]="80" # Wired Limit: 80%, LWM: High (Ample RAM for OS)
PRESETS["moderate"]="90" # Wired Limit: 90%, LWM: High (Recommended for general MLX use)
PRESETS["aggressive"]="95" # Wired Limit: 95%, LWM: High (Pushes GPU memory usage)
PRESETS["mlx_tuned"]="92" # Wired Limit: 92%, LWM: High (Optimized for MLX heavy tasks)
// ##################################################################################
// UNIFIED ADAPTIVE ENGINE - MULTI-PHASE EXECUTION PROTOCOL v1.1
// Target Fidelity: {required_fidelity_level} | Resource Profile: {active_resource_profile_name}
// Session ID: {session_guid} | Cycle ID: {execution_cycle_guid}
// Timestamp (Cycle Init): {iso_timestamp_initiation}
// Protocol Objective: End-to-end processing of Natural Language request through state configuration,
// detailed simulation, and finalized output generation within a single execution context.
// ##################################################################################
@fblissjr
fblissjr / TailwindCSS_LLMs.txt
Created May 13, 2025 13:35 — forked from ConstantinHvber/TailwindCSS_LLMs.txt
AI generated LLMs.txt for the Tailwind CSS docs (April 21, 2025)
# Tailwind CSS LLMs.txt Documentation
> This document provides a comprehensive overview of Tailwind CSS utility classes, examples, and customization options. It covers various CSS properties like layout, spacing, typography, backgrounds, borders, effects, transitions, transforms, and more, explaining how to use Tailwind's utility classes to style web elements effectively and responsively.
This document details the documentation of Tailwind CSS utilities. It explains how Tailwind scans source files for classes, the importance of using complete class names, and how utility classes can be applied conditionally using variants for states (hover, focus), responsive breakpoints, dark mode, and other conditions. It also covers customization via theme variables and adding custom styles.
**Core Concepts (from styling-with-utility-classes.mdx & responsive-design.mdx):**
* **Utility-First:** Style elements by combining many single-purpose utility classes directly in HTML.
* **Constraint-Based:** Utilities general
@fblissjr
fblissjr / parakeet-nemo-to-mlx.py
Created May 10, 2025 18:10 — forked from senstella/parakeet-nemo-to-mlx.py
A simple script to convert NeMo Parakeet weights to MLX.
import torch
from safetensors.torch import save_file
INPUT_NAME = "model_weights.ckpt"
OUTPUT_NAME = "model.safetensors"
state = torch.load(INPUT_NAME, map_location="cpu")
new_state = {}
for key, value in state.items():
@fblissjr
fblissjr / calibration_data_v5_rc.txt
Created May 7, 2025 22:55 — forked from tristandruyen/calibration_data_v5_rc.txt
Adapted from bartowskis v3, added more languages for sparse moe models like qwen 57B-A14B. Calibration data provided by Dampf, combines his own efforts on top of Kalomaze's. Used for calibrating GGUF imatrix files
===========
; A072257: a(n) = ((6*n-17)*4^n - 1)/3.
; -6,-15,-27,21,597,4437,25941,136533,677205,3233109,15029589,68506965,307582293,1364546901,5995058517,26127717717,113100805461,486762960213,2084490794325,8887718991189,37749899220309,159795689903445,674367131702613,2838206015165781,11915774014084437,49914895870022997,208666782734832981,870695927958295893,3626898899909039445,15084056351939581269,62642068416972019029,259791645704742851925,1076060070966390510933,4451814236455238456661,18397552756179659478357,75951394266153460520277,313250310030353132508501,1290780171984369691743573,5314236415389307413812565,21861408571364544242603349,89863485924687435319825749,369125350255666774676952405,1515187027250335232298407253,6215490613912013463556019541,25480932475290743991673640277,104399609979733736516492809557,427501960233217988265164232021,1749621922190004121857428903253,7156944013788545162616803513685,29261601355268295351215565657429,119581706621529640207855669040469,488468031287944396043396301804885,1994436944359
@fblissjr
fblissjr / README.md
Created March 6, 2025 14:26 — forked from awni/README.md
Test Time Scaling with R1-based Models and MLX LM

Test Time Scaling with MLX LM and R1-based LLMs

Install MLX LM:

pip install mlx-lm

And run:

@fblissjr
fblissjr / notes_on_sd_vae.md
Created September 5, 2024 21:35 — forked from madebyollin/notes_on_sd_vae.md
notes_on_sd_vae

Notes / Links about Stable Diffusion VAE

Stable Diffusion's VAE is a neural network that encodes images into a compressed "latent" format and decodes them back. The encoder performs 48x lossy compression, and the decoder generates new detail to fill in the gaps.

(Calling this model a "VAE" is sort of a misnomer - it's an encoder with some very slight KL regularization, and a conditional GAN decoder)

This document is a big pile of various links with more info.