Below is a mental model I’ve found useful when turning FFmpeg (or yt‑dlp + FFmpeg) into a tiny “MCP server” that feels as friendly as, say, an image‑resize API:
- Job = ( source → transform → destination )
- source – URL, upload, or pipe
- transform – one of a handful of named presets (audio‑only/mp3, “YouTube → HLS 360p”, sprite‑sheet, etc.)
- destination – file download, cloud bucket, or a streamable response
- The user POSTs a tiny JSON blob; the server turns it into the full FFmpeg command, runs it, and returns a
job_id.
POST /jobs
{
"source": "https://youtu.be/abc123",
"preset": "mp3",
"callback": "https://example.com/webhook"
}Presets are just JSON files checked into the repo:
Why it helps:
- Users never see CLI flags.
- You can tune presets without breaking API clients.
- Power users can still POST
"custom_args": ["-vf", "scale=1280:-1"]if you expose an escape‑hatch field.
| verb | path | purpose |
|---|---|---|
| GET | /presets |
list & describe presets |
| POST | /jobs |
create a job (returns job_id) |
| GET | /jobs/{id} |
live status JSON (queue, running, …) |
| GET | /jobs/{id}/log |
stream stderr/stdout (tail -f) |
| GET | /jobs/{id}/out |
download final artifact |
| DELETE | /jobs/{id} |
cancel |
Everything else (auth, rate‑limit, etc.) sits in a gateway.
┌──────────┐ enqueue ┌───────────────┐ spawn ┌──────────┐
│ API ├────────────▶ Job Queue ├──────────▶ Worker │
└──────────┘ └───────────────┘ │ ffmpeg │
└──────────┘
- Queue = Redis, Rabbit, or just BullMQ.
- Worker pulls the preset, does string interpolation (
${SOURCE},${DEST}), thenexeca()the full ffmpeg/yt‑dlp chain. - Progress: parse FFmpeg’s
stderrlines (frame=…) → push to Redis pub/sub → surface via/jobs/{id}.
| Feature | Implementation sketch |
|---|---|
| Self‑documenting | /presets plus OpenAPI (Swagger) generated from same JSON |
| Dry‑run | dry_run=true → return the full command string |
| Webhook / SSE | POST callback or Server‑Sent Events for real‑time UI |
| Version pinning | Accept-Version: 2025‑05‑02 header, or /v1/ path |
| Batch jobs | Accept an array in POST; respond with an array of IDs |
| Health & metrics | /healthz, /metrics (Prometheus) |
| CLI concept (yt‑dlp/ffmpeg) | API surface |
|---|---|
-S sort descriptors |
part of a “quality” preset param ("quality": "720p"). |
--compat-options |
hidden behind "legacy": true toggle |
--ignore-errors |
job‑level flag "best_effort": true |
Long -vf filterchains |
separate pipeline step in preset |
| Updates / channels | container image tag; handled by dev‑ops, not the API |
curl -s -X POST https://ff.mcp/api/jobs \
-H "Authorization: Bearer $TOKEN" \
-d '{"source":"https://youtu.be/dQw4w9WgXcQ","preset":"gif:5s"}' \
| jq -r '.job_id'Five seconds later:
curl -O https://ff.mcp/api/jobs/$ID/out # rickroll.gifTreat FFmpeg like a render farm with opinionated presets, not a Swiss‑army CLI.
People shouldn’t need to know about -b:v vs. -maxrate; they just pick “1080p‑hq” or “mp3‑128k” and your server owns the rest.
Hope this gives you a clear mental framework. Once you’re comfortable, you can expose a second “expert” endpoint that takes raw CLI strings—just keep it separate so your simple path stays simple.
{ "id": "mp3", "description": "Extract best audio → MP3 192 kbps", "pipeline": [ "yt-dlp -f bestaudio -o - ${SOURCE}", "ffmpeg -i pipe:0 -vn -codec:a libmp3lame -b:a 192k ${DEST}" ], "priority": 5, "timeout": 600000 }