Skip to content

Instantly share code, notes, and snippets.

@ntrrgc
Last active October 27, 2025 01:56
Show Gist options
  • Select an option

  • Save ntrrgc/f06a5c027a5960f5205d60d919c06ccf to your computer and use it in GitHub Desktop.

Select an option

Save ntrrgc/f06a5c027a5960f5205d60d919c06ccf to your computer and use it in GitHub Desktop.
How ffmpeg detects SBR in AAC files

How ffmpeg detects SBR in AAC files

First frame decoding and configuration

static int decode_frame_ga(AVCodecContext *avctx, AACDecContext *ac,
                           GetBitContext *gb, int *got_frame_ptr)
{
    // ... parse frame ...

    multiplier = (ac->oc[1].m4ac.sbr == 1) ? ac->oc[1].m4ac.ext_sample_rate > ac->oc[1].m4ac.sample_rate : 0;
    samples <<= multiplier;

    spectral_to_sample(ac, samples);

    if (ac->oc[1].status && audio_found) {
        avctx->sample_rate = ac->oc[1].m4ac.sample_rate << multiplier;
        avctx->frame_size = samples;
        ac->oc[1].status = OC_LOCKED;
    }

ac->oc[1].m4ac.sbr == 1 means SBR is considered present, either because a TYPE_FIL element with EXT_SBR_DATA was found, or because there was explicit out-of-band signalling (e.g. in MP4 esds box).

multiplier is actually a shift offset that is applied to the base sample rate of AAC LC (ac->oc[1].m4ac.sample_rate) to obtain the output sample rate (avctx->sample_rate).

A multiplier value of 1 (i.e. doubling the sample rate) implies that we have dual-rate (i.e. non-downsampled) SBR. A value of 0 implies that either we don't have SBR or that SBR shall operate in downsampled mode.

There are the following states:

/**
 * Output configuration status
 */
enum OCStatus {
    OC_NONE,        ///< Output unconfigured
    OC_TRIAL_PCE,   ///< Output configuration under trial specified by an inband PCE
    OC_TRIAL_FRAME, ///< Output configuration under trial specified by a frame header
    OC_GLOBAL_HDR,  ///< Output configuration set in a global header but not yet locked
    OC_LOCKED,      ///< Output configuration locked in place
};

OC_LOCKED is reached after the first frame. After that point, the configuration is final.

SBR extension detection

static int decode_extension_payload(AACDecContext *ac, GetBitContext *gb, int cnt,
                                    ChannelElement *che, enum RawDataBlockType elem_type)
{ // ...
    switch (type) { // extension type
    case EXT_SBR_DATA_CRC:
        crc_flag++;
    case EXT_SBR_DATA:
        /* ... */ if (ac->oc[1].m4ac.ps == -1 && ac->oc[1].status < OC_LOCKED &&
                   ac->avctx->ch_layout.nb_channels == 1) {
            ac->oc[1].m4ac.sbr = 1;
            ac->oc[1].m4ac.ps = 1;
            ac->avctx->profile = AV_PROFILE_AAC_HE_V2;
            ff_aac_output_configure(ac, ac->oc[1].layout_map, ac->oc[1].layout_map_tags,
                                    ac->oc[1].status, 1);
        } else {
            ac->oc[1].m4ac.sbr = 1;
            ac->avctx->profile = AV_PROFILE_AAC_HE;
        }

        ac->proc.sbr_decode_extension(ac, che, gb, crc_flag, cnt, elem_type);

The code above is what makes implicit signalling work.

The logic to tell PS from bare mono SBR is not clear to me. A mono .wav encoded with qaac --he into .m4a will be reported as stereo by ffmpeg. I don't know if this is qaac's/Apple's AAC encoder's fault, or ffmpeg (see next section). I'm already too sidetracked to look more into it.

The sbr_decode_extension() function pointer points to ff_aac_sbr_decode_extension, which includes this code:

    if (!sbr->sample_rate)
        sbr->sample_rate = 2 * ac->oc[1].m4ac.sample_rate; //TODO use the nominal sample rate for arbitrary sample rate support
    if (!ac->oc[1].m4ac.ext_sample_rate)
        ac->oc[1].m4ac.ext_sample_rate = 2 * ac->oc[1].m4ac.sample_rate;

PS detection (unsure)

It seems that PS is assumed whenever SBR is explicitly signalled and mono is used ???

static int decode_ga_specific_config(AACDecContext *ac, AVCodecContext *avctx,
                                     GetBitContext *gb,
                                     int get_bit_alignment,
                                     MPEG4AudioConfig *m4ac,
                                     int channel_config)
{
    // ...

    if (count_channels(layout_map, tags) > 1) {
        m4ac->ps = 0;
    } else if (m4ac->sbr == 1 && m4ac->ps == -1)
        m4ac->ps = 1;

sbr->sample_rate

sbr->sample_rate represents the sample rate of the SBR tool.

  • 0 if no SBR frames are found (which triggers https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/20671).
  • Twice the the base sample rate if AAC LC is used (regardless of ext_sample_rate in DecoderSpecificConfig).
  • ext_sample_rate from DecoderSpecificConfig if USAC is used.
    • I don't know if downsampled SBR is a thing in USAC, but if it is and it is signalled the same way as for AAC LC, wouldn't this be broken?

Notice that in downsampled mode sbr->sample_rate is still double the base sample rate. SBR always operates at twice the base sample rate.

Downsampling check

void AAC_RENAME(ff_aac_sbr_apply)(AACDecContext *ac, ChannelElement *che,
                                  int id_aac, void *L_, void *R_)
{
    int downsampled = ac->oc[1].m4ac.ext_sample_rate < sbr->sample_rate;
Base sample rate ext_sample_rate SBR sample rate sbr->sample_rate
SBR dual-rate 16000 36000 36000 36000
SBR single-rate 36000 36000 72000 72000
LC signaled as single-rate SBR 24000 24000 48000 0
LC signaled as single-rate SBR 48000 48000 96000 0
LC signaled as dual-rate SBR 24000 48000 48000 0

When should we actually downsample?

To prevent audible corruption we must downsample if and only if we're operating the SBR tool at a sample rate above the output sample rate.

Recap: This is how the output sample rate is computed (assuming ac->oc[1].m4ac.sbr == 1).

multiplier = ac->oc[1].m4ac.ext_sample_rate > ac->oc[1].m4ac.sample_rate;
avctx->sample_rate = ac->oc[1].m4ac.sample_rate << multiplier

Does this mean that a 44.1kHz SBR single-rate file with implicit signaling would be output at 96kHz? I have to be wrong, right? I just made a test vector for this (Quizas_sbr_sr_44k.adts) and found I was actually right. See Footnote 1.

ext_sample_rate contains the author-intended (or worst-case asumed) output sample rate. This can be equal to sample_rate (downsampled) or double that (dual-rate).

For the rest of this, I will assume that other values or ext_sample_rate won't occur, as this assumption is already implicit through the rest of the decoder.

We must downsample when ext_sample_rate equals sample_rate. Alternatively, because the SBR tool always operates at 2 * sample_rate, we must downsample when ext_sample_rate is half the SBR tool sample rate.

int downsampled = ac->oc[1].m4ac.ext_sample_rate < sbr->sample_rate;

The above line would therefore work if sbr->sample_rate was always initialized, or if it was replaced by 2 * ac->oc[1].m4ac.sample_rate.

Footnote 1

To be fair to ffmpeg, with implicit signaling you don't have any way to query whether the author intended downsampling or not.

The spec has a bunch of clauses like «For level 3, level 4 and level 6 decoders, it is mandatory to operate the SBR tool in downsampled mode if the sampling rate of the AAC core is higher than 24kHz» but that is not enough to cover all cases.

For instance, you couldn't tell 16k single-rate from 32k dual-rate. You can't cap output sample rate at 48k either because levels 5 and 7 require support of up to 96kHz output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment