Skip to content

Instantly share code, notes, and snippets.

@possibilities
Created March 9, 2026 18:07
Show Gist options
  • Select an option

  • Save possibilities/d64a8ce2645d311b12f0960d11d5132f to your computer and use it in GitHub Desktop.

Select an option

Save possibilities/d64a8ce2645d311b12f0960d11d5132f to your computer and use it in GitHub Desktop.

Models Alongside QMD on RTX 3060 Ti (8GB)

With QMD using ~3GB, you'd have ~5GB free.

Audio Transcription

Model VRAM Quality Fits alongside QMD?
Whisper small ~2 GB Good for clear audio Yes, easily
Whisper medium ~5 GB Great, near-SOTA Tight but yes
faster-whisper small ~1.5 GB Same quality, 4x faster Yes, best pick
faster-whisper medium ~3-4 GB Excellent Yes
Whisper large-v3 ~10 GB Best No, won't fit

faster-whisper (CTranslate2 backend) is the move — it uses int8 quantization, runs 4x faster than vanilla Whisper, and uses less VRAM. The medium model would give you excellent transcription quality in ~3-4GB alongside QMD.

Summarization

Model VRAM (4-bit) Quality Fits?
Qwen2.5-1.5B ~1.5 GB Decent for short summaries Yes
Phi-3-mini (3.8B) ~2.5 GB Good Yes
Gemma 2 2B ~2 GB Good Yes
Llama 3.1 8B ~5 GB Great Only if QMD models unloaded

The sweet spot

QMD (~3GB) + faster-whisper medium (~3.5GB) = ~6.5GB — fits in 8GB with headroom. You could transcribe audio and search your notes on the same card simultaneously.

If you want transcription + summarization at once, swap to faster-whisper small (~1.5GB) and add Qwen2.5-1.5B (~1.5GB): QMD (3) + whisper (1.5) + summarizer (1.5) = 6GB. Totally doable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment