With QMD using ~3GB, you'd have ~5GB free.
| Model | VRAM | Quality | Fits alongside QMD? |
|---|---|---|---|
| Whisper small | ~2 GB | Good for clear audio | Yes, easily |
| Whisper medium | ~5 GB | Great, near-SOTA | Tight but yes |
| faster-whisper small | ~1.5 GB | Same quality, 4x faster | Yes, best pick |
| faster-whisper medium | ~3-4 GB | Excellent | Yes |
| Whisper large-v3 | ~10 GB | Best | No, won't fit |
faster-whisper (CTranslate2 backend) is the move — it uses int8 quantization, runs 4x faster than vanilla Whisper, and uses less VRAM. The medium model would give you excellent transcription quality in ~3-4GB alongside QMD.
| Model | VRAM (4-bit) | Quality | Fits? |
|---|---|---|---|
| Qwen2.5-1.5B | ~1.5 GB | Decent for short summaries | Yes |
| Phi-3-mini (3.8B) | ~2.5 GB | Good | Yes |
| Gemma 2 2B | ~2 GB | Good | Yes |
| Llama 3.1 8B | ~5 GB | Great | Only if QMD models unloaded |
QMD (~3GB) + faster-whisper medium (~3.5GB) = ~6.5GB — fits in 8GB with headroom. You could transcribe audio and search your notes on the same card simultaneously.
If you want transcription + summarization at once, swap to faster-whisper small (~1.5GB) and add Qwen2.5-1.5B (~1.5GB): QMD (3) + whisper (1.5) + summarizer (1.5) = 6GB. Totally doable.