Skip to content

Instantly share code, notes, and snippets.

View DocShotgun's full-sized avatar

Doctor Shotgun DocShotgun

View GitHub Profile
@DocShotgun
DocShotgun / llamacpp-moe-offload-guide.md
Last active January 20, 2026 10:17
Guide to optimizing inference performance of large MoE models across CPU+GPU using llama.cpp and its derivatives

Mixture-of-experts CPUmaxxing with GPU acceleration in llama.cpp

Introduction

So you want to try one of those fancy huge mixture-of-experts (MoE) models locally? Well, whether you've got a gaming PC or a large multi-GPU workstation, we've got you covered. As long as you've downloaded enough RAM beforehand.

Anatomy of a MoE Model

MoE models are described in terms of their total parameters and active parameters - i.e. DeepSeek V3 671B A37B has 671B total parameters, but we are using only 37B parameters at a time during each forward pass through the model.

@DocShotgun
DocShotgun / disable-numa-balancing.sh
Created January 14, 2026 18:32
Temporarily disable system-wide NUMA balancing for the duration of execution of a command
#!/usr/bin/env bash
set -euo pipefail
NUMA_KEY="kernel.numa_balancing"
if [[ $# -eq 0 ]]; then
echo "Usage: $0 <command> [args...]" >&2
exit 1
fi
@DocShotgun
DocShotgun / numactl-bind-socket.sh
Created January 14, 2026 18:31
Run a command bound to a specific socket's CPUs and memory
#!/bin/bash
# numactl-bind-socket.sh — Run a command bound to a specific socket's CPUs and memory
# Usage:
# ./numactl-bind-socket.sh --socket <id> --mode <physical|all> [--interleave <on|off>] <command> [args...]
set -euo pipefail
usage() {
echo "Usage: $0 --socket <id> --mode <physical|all> [--interleave <on|off>] <command> [args...]"
exit 1
@DocShotgun
DocShotgun / axolotl_ROCm_setup_v2.sh
Created September 16, 2024 20:11
Bash script to setup axolotl+FA2+BnB+liger-kernel on Runpod MI300X
#!/bin/bash
# Setup Axolotl with FA2 and BnB ROCm - doctorshotgun Sept 16, 2024
# Runpod image: RunPod Pytorch 2.1.2 ROCm 6.1 runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04
# Install torch and flash-attn
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/rocm6.1
#pip install https://download.pytorch.org/whl/nightly/pytorch_triton_rocm-3.0.0%2Bdafe145982-cp310-cp310-linux_x86_64.whl
pip install https://github.com/DocShotgun/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+rocm6.1+torch2.4.0-cp310-cp310-linux_x86_64.whl
@DocShotgun
DocShotgun / jank_convert_novel_to_markdown.py
Last active September 1, 2023 03:22
Scuffed script to convert novel style roleplay text into markdown format
import sys
read_file = open(sys.argv[1],"r")
roleplay_text = read_file.read()
read_file.close()
def convert_novel_to_md(text):
md = ""
if '"' in text:
parts = text.split('"')
@DocShotgun
DocShotgun / jank_convert_markdown_to_novel.py
Last active September 1, 2023 03:21
Scuffed script to convert markdown roleplay text into novel format
import sys
read_file = open(sys.argv[1],"r")
roleplay_text = read_file.read()
read_file.close()
def convert_md_to_novel(text):
novel = ""
if '*' in text:
parts = text.split('*')