possibilities/rtx-3060-ti-model-stacking.md

## rtx-3060-ti-model-stacking.md

      
    Raw
  

              rtx-3060-ti-model-stacking.md
            
          
    Models Alongside QMD on RTX 3060 Ti (8GB)

With QMD using ~3GB, you'd have ~5GB free.
Audio Transcription


Model
VRAM
Quality
Fits alongside QMD?


Whisper small
~2 GB
Good for clear audio
Yes, easily


Whisper medium
~5 GB
Great, near-SOTA
Tight but yes


faster-whisper small
~1.5 GB
Same quality, 4x faster
Yes, best pick


faster-whisper medium
~3-4 GB
Excellent
Yes


Whisper large-v3
~10 GB
Best
No, won't fit


faster-whisper (CTranslate2 backend) is the move — it uses int8 quantization, runs 4x faster than vanilla Whisper, and uses less VRAM. The medium model would give you excellent transcription quality in ~3-4GB alongside QMD.
Summarization


Model
VRAM (4-bit)
Quality
Fits?


Qwen2.5-1.5B
~1.5 GB
Decent for short summaries
Yes


Phi-3-mini (3.8B)
~2.5 GB
Good
Yes


Gemma 2 2B
~2 GB
Good
Yes


Llama 3.1 8B
~5 GB
Great
Only if QMD models unloaded


The sweet spot

QMD (~3GB) + faster-whisper medium (~3.5GB) = ~6.5GB — fits in 8GB with headroom. You could transcribe audio and search your notes on the same card simultaneously.
If you want transcription + summarization at once, swap to faster-whisper small (~1.5GB) and add Qwen2.5-1.5B (~1.5GB): QMD (3) + whisper (1.5) + summarizer (1.5) = 6GB. Totally doable.
Model	VRAM	Quality	Fits alongside QMD?
Whisper small	~2 GB	Good for clear audio	Yes, easily
Whisper medium	~5 GB	Great, near-SOTA	Tight but yes
faster-whisper small	~1.5 GB	Same quality, 4x faster	Yes, best pick
faster-whisper medium	~3-4 GB	Excellent	Yes
Whisper large-v3	~10 GB	Best	No, won't fit
Model	VRAM (4-bit)	Quality	Fits?
Qwen2.5-1.5B	~1.5 GB	Decent for short summaries	Yes
Phi-3-mini (3.8B)	~2.5 GB	Good	Yes
Gemma 2 2B	~2 GB	Good	Yes
Llama 3.1 8B	~5 GB	Great	Only if QMD models unloaded