Opened 4 months ago

Closed 4 months ago

#11082 closed defect (duplicate)

Converting multichannel audio in FLTP sample format to stereo in S16 attenuates volume unexpectedly

Reported by: Jiamin.X Owned by:
Priority: important Component: swresample
Version: unspecified Keywords: resampling
Cc: Jiamin.X Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

When converting multichannel audio in FLTP sample format to stereo in S16 sample format, volume is decreased unexpectedly.

The original 6-channel audio input file in FLTP sample format:

% ffprobe multich-audio.mp4
ffprobe version 6.0 Copyright (c) 2007-2023 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.0.40.1)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomdby1iso2mp41
    encoder         : www.aliyun.com - Media Transcoding
  Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s
  Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
    Side data:
      audio service type: main

Converted from 6-channel in fltp to 2-chhanel in flt, output: stereo-flt.mkv

% ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_f32le stereo-flt.mkv
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
  Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s
  Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default)
Stream mapping:
  Stream #0:0 -> #0:0 (eac3 (native) -> pcm_f32le (native))
Output #0, matroska, to 'stereo-flt.mkv':
  Stream #0:0(und): Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s (default)

Converted from 6-channel in fltp to 2-chhanel in s16, output: stereo-s16.mkv

% ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_s16le stereo-s16.mkv
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
  Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default)
Stream mapping:
  Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
Output #0, matroska, to 'stereo-s16.mkv':
  Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s (default)

Converted from 6-channel in fltp to 2-chhanel in s16, with -rematrix_maxval 1000, output: stereo-s16-rematrix_maxval-1000.mkv

% ffmpeg -i multich-audio.mp4 -rematrix_maxval 1000 -ac 2 -c:a pcm_s16le stereo-s16-rematrix_maxval-1000.mkv
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
  Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default)
Stream mapping:
  Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
Output #0, matroska, to 'stereo-s16-rematrix_maxval-1000.mkv':
  Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s (default)

Using volumedetect to check the max and mean volumes of the original file and the 3 generated files above:

  1. Volume statistics of multich-audio.mp4 (the original file):
    % ffmpeg -i multich-audio.mp4 -af "volumedetect" -vn -sn -f null /dev/null
    Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
      Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default)
    Stream mapping:
      Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
    Output #0, null, to '/dev/null':
      Stream #0:0(und): Audio: pcm_s16le, 48000 Hz, 5.1(side), s16, 4608 kb/s (default)
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] n_samples: 19150848
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] mean_volume: -25.4 dB
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] max_volume: -1.9 dB
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_1db: 68
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_2db: 2022
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_3db: 3665
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_4db: 6371
    [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_5db: 10144
    
  1. Volume statistics of stereo-flt.mkv:
    % ffmpeg -i stereo-flt.mkv -af "volumedetect" -vn -sn -f null /dev/null
    Input #0, matroska,webm, from 'stereo-flt.mkv':
      Stream #0:0: Audio: pcm_f32le, 48000 Hz, 2 channels, flt, 3072 kb/s (default)
    Stream mapping:
      Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
    Output #0, null, to '/dev/null':
      Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
    [Parsed_volumedetect_0 @ 0x7fce1e404200] n_samples: 6383616
    [Parsed_volumedetect_0 @ 0x7fce1e404200] mean_volume: -21.6 dB
    [Parsed_volumedetect_0 @ 0x7fce1e404200] max_volume: 0.0 dB
    [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_0db: 1466
    [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_1db: 1310
    [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_2db: 3452
    [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_3db: 4591
    
  1. Volume statistics of stereo-rematrix_maxval-1000.mkv:
    % ffmpeg -i stereo-s16-rematrix_maxval-1000.mkv -af "volumedetect" -vn -sn -f null /dev/null
    Input #0, matroska,webm, from 'stereo-s16-rematrix_maxval-1000.mkv':
      Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default)
    Stream mapping:
      Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
    Output #0, null, to '/dev/null':
      Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
    [Parsed_volumedetect_0 @ 0x7febc6a06180] n_samples: 6383616
    [Parsed_volumedetect_0 @ 0x7febc6a06180] mean_volume: -21.6 dB
    [Parsed_volumedetect_0 @ 0x7febc6a06180] max_volume: 0.0 dB
    [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_0db: 1466
    [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_1db: 1310
    [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_2db: 3452
    [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_3db: 4591
    
  1. Volume statistics of stereo-s16.mkv:
    % ffmpeg -i stereo-s16.mkv -af "volumedetect" -vn -sn -f null /dev/null
    Input #0, matroska,webm, from 'stereo-s16.mkv':
      Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default)
    Stream mapping:
      Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
    Output #0, null, to '/dev/null':
      Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] n_samples: 6383616
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] mean_volume: -29.3 dB
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] max_volume: -5.9 dB
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_5db: 21
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_6db: 294
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_7db: 598
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_8db: 831
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_9db: 1816
    [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_10db: 4223
    

From the above, we can see that converting to fltp or s16 with -rematrix_maxval 100 have the exact same max and mean volume, while converting to s16 directly without setting -rematrix_maxval results in lot weaker volume. If I convert from 2-channel flt to 2-channel s16, the volume will not be affected at all.

I checked the code related to the rematrix_maxval setting in libswresample/rematrix.c (shown below), we can see if rematrix_maxval is not manually set, it will be treated differently based on the input and output sample formats, basically for s16 as either the input or output sample format, rematrix_maxval will be set to 1, which will affect the matrix params used in the later rematrix process, essentially attenuates the coefficients used for downmixing and cause volume attenuation as a result.

My confusion is that why we have to check the output sample format and adjust this rematrix_maxval accordingly before dowxmixing, it looks to me that only the input sample format will affect the rematrix/downmix process, because rematrix/downmix will operate on the input data the same way regardless of the output sample format. If I am right, we may need to remove the av_get_packed_sample_fmt(s->out_sample_fmt) < AV_SAMPLE_FMT_FLT check in the fowlloing code (If this is correct, I may send a pull request later):

av_cold static int auto_matrix(SwrContext *s)
{
    double maxval;
    int ret;

    if (s->rematrix_maxval > 0) {
        maxval = s->rematrix_maxval;
    } else if (   av_get_packed_sample_fmt(s->out_sample_fmt) < AV_SAMPLE_FMT_FLT
               || av_get_packed_sample_fmt(s->int_sample_fmt) < AV_SAMPLE_FMT_FLT) {
        maxval = 1.0;
    } else
        maxval = INT_MAX;
    ...
}

av_cold int swr_build_matrix(uint64_t in_ch_layout_param, uint64_t out_ch_layout_param,
                             double center_mix_level, double surround_mix_level,
                             double lfe_mix_level, double maxval,
                             double rematrix_volume, double *matrix_param,
                             int stride, enum AVMatrixEncoding matrix_encoding, void *log_context)
{
    ...

    if(maxcoef > maxval || rematrix_volume  < 0){
        maxcoef /= maxval;
        for(i=0; i<SWR_CH_MAX; i++)
            for(j=0; j<SWR_CH_MAX; j++){
                matrix_param[stride*i + j] /= maxcoef;
            }
    }
    ....
}

Change History (1)

comment:1 by Jiamin.X, 4 months ago

Resolution: duplicate
Status: newclosed
Note: See TracTickets for help on using tickets.