ffmpeg is a command line utility that presents a API to interacting with a variety of media types/encodings in a uniform fashion.
Depending on the ffmpeg distribiont, you may get access to utilities such as ffprobe (which provides information on a file) and ffplay (will play back a file). Those tools are critical.
Those tools, by default, will show all the arguments that ffmpeg was compiled with, which can get a little verbose. If you're going to run many ffmpeg commands, I suggest you get used to passing the -hide_banner argument.
While you can often do just about everything media file related with ffmpeg, that does not mean you should; I would recommend installing sox, a command line utility that primarily deals with audio files, it's nowhere as feature complete but the arguments are a lot easier to remember.
Often uses of ffmpeg are sort of last-resort-ish... we do not like modifying our data in /dat/corpora, while some audio transformations can be done w/o losing any data, doing so can cause other issues. When feasible, it's best to leave the files alone and have us modify our tools accordingly.
I'm going to start this bit with a rant. While I am perfectly good with non-sensical names for things, please take care to name things that are relatively easy to search for. If you name an audio encoding "shorten", please consider the head ache you will cause having people googling "shorten audio file" or something along those lines.
Early in the Fred development, I got reports th at Fred was unable to read one of the test audio files from the toolkit. The explanation I got was the file was a NIST encoding. I thought this was odd, as the audio library I use to interact with audio files (libsndfile) does support NIST files. So what gives? What makes this file so special?
ognyan@wfh:~$ ffprobe -hide_banner -i testSample.wav
Input #0, nistsphere, from 'testSample.wav':
Metadata:
microphone : Sennheiser
recording_site : SRI
database_id : wsj1
database_version: 1.0
recording_environment: quiet
speaker_session_number: 01
session_utterance_number: 03
prompt_id : adapt.03
utterance_id : 44aa0103
speaking_mode : read-adaptation
speaker_id : 44a
sample_min : -854
sample_max : 683
sample_checksum : 63835
recording_date : 02-Dec-1992
recording_time : 12:45:30.00
Duration: 00:00:05.41, bitrate: 87 kb/s
Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16pAs you can see, ffprobe gives all sorts of good information, but this bottom bit is the bit of interest:
Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16pI had to do a fair amount of googling, to discover shorten is an encoding scheme that was used ages ago. Our toolkit and wave package has support for it (which is why this file works in SView), but Fred does not.
I created an issue with libsndfile to support it; which they said they're open to a PR but given how nobody (but us) uses shorten encoded audio files, they aren't particularly motivated.
ffmpeg is able to play back this audio file via ffplay command with ease. A future potential feature I'll roll out down the line is embed ffmpeg into Fred to handle reading of data.
I came across this issue almost on accident as I was added to an email chain about corrupt audio files. Only a handful of the files were corrupt, which struck me as somewhat odd.
First step, on macOS, there is a command line utility, afplay.
ognyan@wfh:~$ afplay ambiance_20200412_21h.wavThis immediately exits, no audio playing. Given this, as the email report saying our tools were not handling this file, clearly there was something wrong.
ognyan@wfh:~$ ffplay -hide_banner ambiance_20200412_21h.wavHere we can hear the audio, so what gives?
ognyan@wfh:~$ ffprobe -hide_banner ambiance_20200412_21h.wav
[wav @ 0x7fe3a3808200] Estimating duration from bitrate, this may be inaccurate
Input #0, wav, from 'ambiance_20200412_21h.wav':
Duration: 01:08:55.68, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
So it has this warning about the duration...
Doing a quick google search turned me to sox command that can patch it up
ognyan@wfh:~$ sox --ignore-length ambiance_20200412_21h.wav fixed.wavAfter this, we can verify we don't get the warning
ognyan@wfh:~$ ffprobe -hide_banner fixed.wav
Input #0, wav, from 'fixed.wav':
Duration: 01:08:55.68, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s👍
Saw a message from Grace requesting this late at night, there was some email chains already going around regarding 4 channel audio files, but when asking for the path to the file, the first thing I do, make sure it has actually 4 channels:
ognyan@wfh:~$ ffprobe -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav
Input #0, wav, from '200629_prePV_SET_Max_Noise_micin.wav':
Duration: 00:15:36.00, bitrate: 1024 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 4 channels, s16, 1024 kb/syup...4 channels... who does that... anyway
To split it up, this is pretty easy with ffmpeg, still had to google the command. Before I show you what I did, I want to reference this awesome stack overflow post:
https://superuser.com/questions/685910/ffmpeg-stereo-channels-into-two-mono-channels
ognyan@wfh:~$ ffmpeg -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav \
-map_channel 0.0.0 first.wav \
-map_channel 0.0.1 second.wav \
-map_channel 0.0.2 third.wav \
-map_channel 0.0.3 fourth.wavThis generates 4 files, first.wav, second.wav, third.wav and fourth.wav, each file corresponding to which audio channel there was.
The link above showcases a few other ways of handling this as well.
This isn't really a request, but this is an issue brought up about having some files but there were in "amr" format.
After getting the path to a file:
ognyan@wfh:~$ ffplay -hide_banner -i 2_26134184.amrSure enough, audio plays fine... now Stan did not request this, but let's say we want to convert the file into some kind of usable wav file.
ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr file.wavYou can get fancy, and specify different sample rates if you want or convert to other kind of fancy loss-less audio encoding
ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr c:a libopus file.oggHere, we said to use the libopus audio codec which supports the ogg type audio files.
I have a cute video of my cats playing...but I have a bazillion of these videos and I'd like to squish it down some...
ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-crf 28 \
-c:a copy \
first.mp4Let's say, I know I'm going to play this on a chromecast, which does have native x265 decoding capability, but I'm going to want to enable some flags to make the video a bit easier to playback
ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-crf 28 \
-level 4.1 \
-tune fastdecode \
-c:a copy \
second.mp4Now let's say, I have a fair amount of time to encode this video, but I really want to compress the file size.
ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-preset veryslow \
-crf 36 \
-level 4.1 \
-tune fastdecode \
-c:a copy \
third.mp4Sorry, don't have the original on hand, so you'll have to bear with my explanations.
I have a Star Wars Theatrical Release DVD from China (from ~20 years ago). While the movie on it certainly worked, there were some very significant problems regarding playback.
- Hard-coded vertical black bars, while the video was in a widescreen format, the people who made it decided to force the video into a 4:3 aspect ratio by adding black bars above and below the video. When you played the video on a 16:9 screen (current standard) you would get black bars on the sides; the end result being you would have a very thick black border all the way around the video, taking up a majority of the pixels on the screen
- The subtitles were in VOB format, which are effectively bitmaps. On high resolution screens they look like rubish, and for media playback software such as Plex, they make transcoding substantially more difficult
- The video encdoding was something ancient (maybe MPEG1?), which is not only horribly inefficient, but a lot of modern video players (VNC) would display many artifacts throughout playback, and skipping forward/behind would be a problematic operations as well.
First thing to address was getting rid of those black bars.
ognyan@wfh:~$ ffmpeg -i input.mkv -vf cropdetect=24:16:0 dummy.mkv
...
[Parsed_cropdetect_0 @ 0x3704360] x1:0 x2:639 y1:43 y2:317 w:640 h:272 x:0 y:46 pts:181320 t:181.320000 crop=640:272:0:46
...
Here the bit we care about is 640:272:0:46
ognyan@wfh:~$ ffmpeg -i Star\ Wars\ -\ Episode\ VI\ -\ Return\ of\ the\ Jedi.mkv \
-vf crop=704:272:8:104 \
-aspect 704:272 \
-c:v libx264 \
-crf 17 \
-c:a copy \
-profile:v high \
-preset medium \
-tune fastdecode \
-tune film \
-tune grain \
-level 4.1 \
-movflags +faststart \
output.mkvin this command, I do the following operations:
- crop the video (to a different value as this is a different video)
- set the
aspectto704:272figuring this would guarantee that the video would not be stretched innapropriately in either direction - re-encode the video to
h264format usinglibx264(original does not matter) - I set the "constant rate factor" (
crf) to 17, which should make the output indistinguishable from the source material (crfof 0 is technically lossless, but you would get huge file output sizes) -c:a copymeans copy the audio and do nothing to do it...-profile:v highthis is ah264specific encoding parameter, that restricts which filters are applied to the video. Without this argument, the video is compatible with the largest variety of devices, but virtually all modern phones, tablets, video-players, etc support thehighprofile setting, so this allows for slightly smaller file-sizes as a result-preset mediumWe discussed, the preset settings have to do with encoding, the idea is that for a slower preset, the quality of the result will be higher for the same filesize. This should be set to the slowest setting you can tolerate, oftenfastorveryfastare used.-tune <parameter>more can be seen about these in the ffmpeg wiki documentation for x264 encoding.-level 4.1this is anotherx264setting, by specifying the level, I reduce compatability of the output video, however this setting is compatible with virtually all my devices, chromecasts, raspberry pi's etc...-movflags +faststartthis setting is only applicable with some output video encodings, but the idea here is that video/audio index in the very beginning of the file (not compatible with all applications, but offers significant benefits such as videos being able to start right away).
While this takes care of the video, I still have problems with the subtitles, while I'm sure there was a way I can deal with this with ffmpeg, I elected to use tools such as mkvtools, tesseract (which is usable with ffmpeg) and vobsub2srt.
First, I identify subtitles
ognyan@wfh:~$ mkvinfo some_movies.mkv
| + Track
| + Track number: 3 (track ID for mkvmerge & mkvextract: 2)
| + Track UID: 3
| + Track type: subtitles
| + Enabled: 1
| + Default track flag: 1
| + Forced track flag: 0
| + Lacing flag: 0
| + Minimum cache: 0
| + Maximum block additional ID: 0
| + Codec ID: S_VOBSUB
| + Codec decode all: 1
| + Language: eng
| + Codec's private data: size 508
| + Track
| + Track number: 4 (track ID for mkvmerge & mkvextract: 3)
| + Track UID: 4
| + Track type: subtitles
| + Enabled: 1
| + Default track flag: 0
| + Forced track flag: 0
| + Lacing flag: 0
| + Minimum cache: 0
| + Maximum block additional ID: 0
| + Codec ID: S_VOBSUB
| + Codec decode all: 1
| + Language: fre
| + Codec's private data: size 508
| + Track
| + Track number: 5 (track ID for mkvmerge & mkvextract: 4)
| + Track UID: 5
| + Track type: subtitles
| + Enabled: 1
| + Default track flag: 0
| + Forced track flag: 0
| + Lacing flag: 0
| + Minimum cache: 0
| + Maximum block additional ID: 0
| + Codec ID: S_VOBSUB
| + Codec decode all: 1
| + Language: spa
| + Codec's private data: size 508I can see trakcs, 3, 4, and 5 are subtitle tracks... I then extract them
ognyan@wfh:~$ mkvextract tracks \
some_movie.mkv \
3:some_movie.eng.srt \
4:some_movie.fre.srt \
5:some_movie.spa.srtHere, I had to manually install a tool that's not part of the homebrew distribution:
ognyan@wfh:~$ brew install --with-all-languages tesseract
brew install --HEAD https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt.rbI then create my srt subtitles:
ognyan@wfh:~$ vobsub2srt some_movie.eng
vobsub2srt some_movie.fre
vobsub2srt some_movie.spaass is considered the most versatile/compatible format, so I use ffmpeg to convert the subtitles from srt to ass
ognyan@wfh:~$ ffmpeg -i some_movie.eng.srt some_movie.eng.assI then add the subtitle files back into the video:
ognyan@Wfh:~$ ffmpeg -i some_movie.mkv -i some_movie.eng.ass \
-codec copy \
-map 0 \
-map 1 \
-metadata:s:s:0 language=eng \
output.mkvffmpeg has cuda support for video transcoding and some filters... I won't go into specific use-cases but you can genereally transcode from most video formats, to either h264 or h265 ... in addition the cuda support does support some video filters such as scaling.
the hardware supported ffmpeg offers significant performance improvements (testing some videos I was transcoding at almost 200x playback speed vs. 2-5x).