ntrrgc/ffmpeg_sbr.md

## ffmpeg_sbr.md

      
    Raw
  

              ffmpeg_sbr.md
            
          
    How ffmpeg detects SBR in AAC files
First frame decoding and configuration

static int decode_frame_ga(AVCodecContext *avctx, AACDecContext *ac,
                           GetBitContext *gb, int *got_frame_ptr)
{
    // ... parse frame ...

    multiplier = (ac->oc[1].m4ac.sbr == 1) ? ac->oc[1].m4ac.ext_sample_rate > ac->oc[1].m4ac.sample_rate : 0;
    samples <<= multiplier;

    spectral_to_sample(ac, samples);

    if (ac->oc[1].status && audio_found) {
        avctx->sample_rate = ac->oc[1].m4ac.sample_rate << multiplier;
        avctx->frame_size = samples;
        ac->oc[1].status = OC_LOCKED;
    }
ac->oc[1].m4ac.sbr == 1 means SBR is considered present, either because a TYPE_FIL element with EXT_SBR_DATA was found, or because there was explicit out-of-band signalling (e.g. in MP4 esds box).
multiplier is actually a shift offset that is applied to the base sample rate of AAC LC (ac->oc[1].m4ac.sample_rate) to obtain the output sample rate (avctx->sample_rate).
A multiplier value of 1 (i.e. doubling the sample rate) implies that we have dual-rate (i.e. non-downsampled) SBR. A value of 0 implies that either we don't have SBR or that SBR shall operate in downsampled mode.
There are the following states:
/**
 * Output configuration status
 */
enum OCStatus {
    OC_NONE,        ///< Output unconfigured
    OC_TRIAL_PCE,   ///< Output configuration under trial specified by an inband PCE
    OC_TRIAL_FRAME, ///< Output configuration under trial specified by a frame header
    OC_GLOBAL_HDR,  ///< Output configuration set in a global header but not yet locked
    OC_LOCKED,      ///< Output configuration locked in place
};
OC_LOCKED is reached after the first frame. After that point, the configuration is final.
SBR extension detection

static int decode_extension_payload(AACDecContext *ac, GetBitContext *gb, int cnt,
                                    ChannelElement *che, enum RawDataBlockType elem_type)
{ // ...
    switch (type) { // extension type
    case EXT_SBR_DATA_CRC:
        crc_flag++;
    case EXT_SBR_DATA:
        /* ... */ if (ac->oc[1].m4ac.ps == -1 && ac->oc[1].status < OC_LOCKED &&
                   ac->avctx->ch_layout.nb_channels == 1) {
            ac->oc[1].m4ac.sbr = 1;
            ac->oc[1].m4ac.ps = 1;
            ac->avctx->profile = AV_PROFILE_AAC_HE_V2;
            ff_aac_output_configure(ac, ac->oc[1].layout_map, ac->oc[1].layout_map_tags,
                                    ac->oc[1].status, 1);
        } else {
            ac->oc[1].m4ac.sbr = 1;
            ac->avctx->profile = AV_PROFILE_AAC_HE;
        }

        ac->proc.sbr_decode_extension(ac, che, gb, crc_flag, cnt, elem_type);
The code above is what makes implicit signalling work.
The logic to tell PS from bare mono SBR is not clear to me. A mono .wav encoded with qaac --he into .m4a will be reported as stereo by ffmpeg. I don't know if this is qaac's/Apple's AAC encoder's fault, or ffmpeg (see next section). I'm already too sidetracked to look more into it.
The sbr_decode_extension() function pointer points to ff_aac_sbr_decode_extension, which includes this code:
    if (!sbr->sample_rate)
        sbr->sample_rate = 2 * ac->oc[1].m4ac.sample_rate; //TODO use the nominal sample rate for arbitrary sample rate support
    if (!ac->oc[1].m4ac.ext_sample_rate)
        ac->oc[1].m4ac.ext_sample_rate = 2 * ac->oc[1].m4ac.sample_rate;
PS detection (unsure)

It seems that PS is assumed whenever SBR is explicitly signalled and mono is used ???
static int decode_ga_specific_config(AACDecContext *ac, AVCodecContext *avctx,
                                     GetBitContext *gb,
                                     int get_bit_alignment,
                                     MPEG4AudioConfig *m4ac,
                                     int channel_config)
{
    // ...

    if (count_channels(layout_map, tags) > 1) {
        m4ac->ps = 0;
    } else if (m4ac->sbr == 1 && m4ac->ps == -1)
        m4ac->ps = 1;
sbr->sample_rate

sbr->sample_rate represents the sample rate of the SBR tool.

0 if no SBR frames are found (which triggers https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/20671).
Twice the the base sample rate if AAC LC is used (regardless of ext_sample_rate in DecoderSpecificConfig).
ext_sample_rate from DecoderSpecificConfig if USAC is used.

I don't know if downsampled SBR is a thing in USAC, but if it is and it is signalled the same way as for AAC LC, wouldn't this be broken?


Notice that in downsampled mode sbr->sample_rate is still double the base sample rate. SBR always operates at twice the base sample rate.
Downsampling check

void AAC_RENAME(ff_aac_sbr_apply)(AACDecContext *ac, ChannelElement *che,
                                  int id_aac, void *L_, void *R_)
{
    int downsampled = ac->oc[1].m4ac.ext_sample_rate < sbr->sample_rate;


Base sample rate
ext_sample_rate
SBR sample rate
sbr->sample_rate


SBR dual-rate
16000
36000
36000
36000


SBR single-rate
36000
36000
72000
72000


LC signaled as single-rate SBR
24000
24000
48000
0


LC signaled as single-rate SBR
48000
48000
96000
0


LC signaled as dual-rate SBR
24000
48000
48000
0


When should we actually downsample?

To prevent audible corruption we must downsample if and only if we're operating the SBR tool at a sample rate above the output sample rate.
Recap: This is how the output sample rate is computed (assuming ac->oc[1].m4ac.sbr == 1).
multiplier = ac->oc[1].m4ac.ext_sample_rate > ac->oc[1].m4ac.sample_rate;
avctx->sample_rate = ac->oc[1].m4ac.sample_rate << multiplier
Does this mean that a 44.1kHz SBR single-rate file with implicit signaling would be output at 96kHz? I have to be wrong, right? I just made a test vector for this (Quizas_sbr_sr_44k.adts) and found I was actually right. See Footnote 1.
ext_sample_rate contains the author-intended (or worst-case asumed) output sample rate. This can be equal to sample_rate (downsampled) or double that (dual-rate).
For the rest of this, I will assume that other values or ext_sample_rate won't occur, as this assumption is already implicit through the rest of the decoder.
We must downsample when ext_sample_rate equals sample_rate. Alternatively, because the SBR tool always operates at 2 * sample_rate, we must downsample when ext_sample_rate is half the SBR tool sample rate.
int downsampled = ac->oc[1].m4ac.ext_sample_rate < sbr->sample_rate;
The above line would therefore work if sbr->sample_rate was always initialized, or if it was replaced by 2 * ac->oc[1].m4ac.sample_rate.
Footnote 1

To be fair to ffmpeg, with implicit signaling you don't have any way to query whether the author intended downsampling or not.
The spec has a bunch of clauses like «For level 3, level 4 and level 6 decoders, it is mandatory to operate the SBR tool in downsampled mode if the sampling rate of the AAC core is higher than 24kHz» but that is not enough to cover all cases.
For instance, you couldn't tell 16k single-rate from 32k dual-rate. You can't cap output sample rate at 48k either because levels 5 and 7 require support of up to 96kHz output.
	Base sample rate	ext_sample_rate	SBR sample rate	sbr->sample_rate
SBR dual-rate	16000	36000	36000	36000
SBR single-rate	36000	36000	72000	72000
LC signaled as single-rate SBR	24000	24000	48000	0
LC signaled as single-rate SBR	48000	48000	96000	0
LC signaled as dual-rate SBR	24000	48000	48000	0
No results found