Skip to content

Instantly share code, notes, and snippets.

@zoltanctoth
Created February 21, 2026 13:33
Show Gist options
  • Select an option

  • Save zoltanctoth/d806d3edb5006e18a1829cf5549447f3 to your computer and use it in GitHub Desktop.

Select an option

Save zoltanctoth/d806d3edb5006e18a1829cf5549447f3 to your computer and use it in GitHub Desktop.
Solution Transcription Speed (1hr audio) Cost/hr Hungarian Quality Speaker Diarization Notes
MacBook Air (local) ~45-60 min Free Good Needs separate model (pyannote) Slow, pyannote adds ~20-30 min extra per hour
OpenAI Whisper API ~1-3 min $0.36 Good ❌ Not supported Need to combine with separate diarization
Whisper + pyannote ~5-10 min (GPU) Free Good ✅ Yes (local) Best free option, needs GPU + HuggingFace token
Deepgram ~30-60 sec $0.25 Decent ✅ Built-in Fast, easy API, HU quality not the best
Google Speech-to-Text ~1-2 min $1.44 Good ✅ Built-in Good HU support, expensive
Azure Speech ~1-2 min $1.00 Good ✅ Built-in Good quality, mid-price
Amazon Transcribe ~2-5 min $0.72 Decent ✅ Built-in HU supported, decent quality, nice S3 integration

For 100 hours of audio:

Solution Total Time Total Cost Has Diarization Recommended?
MacBook Air (local) 5-7 days $0 ⚠️ Extra setup Only if budget is zero
Whisper API only 2-5 hours $36 Not if you need speakers
Whisper + pyannote 8-16 hours $0 Best free option, needs decent GPU
Deepgram 1-2 hours $25 Cheapest managed with diarization
Amazon Transcribe 3-8 hours $72 Good if already on AWS
Azure Speech 2-3 hours $100 Good quality, mid-price
Google STT 2-3 hours $144 Best quality, most expensive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment