Opened 10 years ago
Last modified 12 days ago
#4175 open enhancement
support phase shift for Dolby Pro Logic II / Dolby matrix downmix
Reported by: | AndreG | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | swresample |
Version: | git-master | Keywords: | dplii |
Cc: | otonvm | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
Hello guys. I am trying to downmix a 5.1 audio to 2.0 (stereo) using the audio resampler filter "matrix_encoding=dplii". The resulting stereo audio presents some audio level bias towards the right speaker when there is more than one active channel on the surround stream. That is, it seems that the audio volume on the right speaker is higher than it should be. This is, of course, using a test signal where all surround channels produce the same sound at the same volume.
How to reproduce:
% ffmpeg -i ChID-BLITS-EBU-Narration441-16b.wav -ac 2 -filter:a "aresample=matrix_encoding=dplii" dplii.wav ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg developers built on Dec 4 2014 14:13:11 with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
You may freely obtain the surround test file I used on the command above from Fraunhoffer's website.
Direct link to the surround wave file: https://www2.iis.fraunhofer.de/AAC/ChID-BLITS-EBU-Narration441-16b.wav
Fraunhoffer's page where that file's URL is located: https://www2.iis.fraunhofer.de/AAC/multichannel.html
If you analyze both the surround input file and the resulting stereo wave file on Audacity, for example, you will notice that by 19 seconds into the audio stream all channels are active with a test signal. The same point in time on the resulting DPL-II downmixed shows a "higher volume" on the right speaker. This effect is even more noticeable by 22 seconds when they change the test signal. It is interesting to note that in the initial seconds of the file, when a male's voice is announcing each channel individually, I could not notice this right speaker bias. The channel downmixing in that segment sounds correct. This effect is very noticeable on sound tracks with very active surround channels, they all seem to be stronger on the right speaker.
Thank you.
Change History (13)
comment:1 by , 10 years ago
Keywords: | downmix surround stereo removed |
---|
follow-up: 3 comment:2 by , 10 years ago
Hello!
I was just getting ready to write a similar report before I found this one.
I'm doing this (same surround wave):
- Extract all channels to mono (as described in the wiki)
ffmpeg started on 2014-12-17 at 09:21:14 Report written to "ffmpeg-20141217-092114.log" Command line: ffmpeg -report -i ChID-BLITS-EBU-Narration441-16b.wav -filter_complex "channelsplit=channel_layout=5.1[FL][FR][FC][LFE][BL][BR]" -map "[FL]" test2_front_left.wav -map "[FR]" test2_front_right.wav -map "[FC]" test2_front_center.wav -map "[LFE]" test2_lfe.wav -map "[BL]" test2_back_left.wav -map "[BR]" test2_back_right.wav ffmpeg version N-68500-g3ba1050 Copyright (c) 2000-2014 the FFmpeg developers built on Dec 17 2014 01:55:42 with gcc 4.9.2 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-decklink --enable-zlib libavutil 54. 15.100 / 54. 15.100 libavcodec 56. 15.100 / 56. 15.100 libavformat 56. 15.105 / 56. 15.105 libavdevice 56. 3.100 / 56. 3.100 libavfilter 5. 4.100 / 5. 4.100 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc 53. 3.100 / 53. 3.100 Splitting the commandline. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-i' ... matched as input file with argument 'ChID-BLITS-EBU-Narration441-16b.wav'. Reading option '-filter_complex' ... matched as option 'filter_complex' (create a complex filtergraph) with argument 'channelsplit=channel_layout=5.1[FL][FR][FC][LFE][BL][BR]'. Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[FL]'. Reading option 'test2_front_left.wav' ... matched as output file. Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[FR]'. Reading option 'test2_front_right.wav' ... matched as output file. Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[FC]'. Reading option 'test2_front_center.wav' ... matched as output file. Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[LFE]'. Reading option 'test2_lfe.wav' ... matched as output file. Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[BL]'. Reading option 'test2_back_left.wav' ... matched as output file. Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[BR]'. Reading option 'test2_back_right.wav' ... matched as output file. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option filter_complex (create a complex filtergraph) with argument channelsplit=channel_layout=5.1[FL][FR][FC][LFE][BL][BR]. Successfully parsed a group of options. Parsing a group of options: input file ChID-BLITS-EBU-Narration441-16b.wav. Successfully parsed a group of options. Opening an input file: ChID-BLITS-EBU-Narration441-16b.wav. [wav @ 000000000036d0c0] Format wav probed with size=2048 and score=99 [wav @ 000000000036d0c0] Before avformat_find_stream_info() pos: 68 bytes read:46722 seeks:2 [wav @ 000000000036d0c0] parser not found for codec pcm_s16le, packets or times may be invalid. [wav @ 000000000036d0c0] probing stream 0 pp:14 [wav @ 000000000036d0c0] probing stream 0 pp:13 [wav @ 000000000036d0c0] probing stream 0 pp:12 [wav @ 000000000036d0c0] probing stream 0 pp:11 [wav @ 000000000036d0c0] probing stream 0 pp:10 [wav @ 000000000036d0c0] probing stream 0 pp:9 [wav @ 000000000036d0c0] probing stream 0 pp:8 [wav @ 000000000036d0c0] probing stream 0 pp:7 [wav @ 000000000036d0c0] probing stream 0 pp:6 [wav @ 000000000036d0c0] probing stream 0 pp:5 [wav @ 000000000036d0c0] probing stream 0 pp:4 [wav @ 000000000036d0c0] probing stream 0 pp:3 [wav @ 000000000036d0c0] probing stream 0 pp:2 [wav @ 000000000036d0c0] probing stream 0 pp:1 [wav @ 000000000036d0c0] probed stream 0 [wav @ 000000000036d0c0] parser not found for codec pcm_s16le, packets or times may be invalid. [wav @ 000000000036d0c0] All info found [wav @ 000000000036d0c0] After avformat_find_stream_info() pos: 204668 bytes read:276098 seeks:2 frames:50 Input #0, wav, from 'ChID-BLITS-EBU-Narration441-16b.wav': Metadata: encoder : Adobe Audition CS6 (Windows) date : 2012-05-15 creation_time : 20:53:02 time_reference : 0 Duration: 00:00:46.53, bitrate: 4236 kb/s Stream #0:0, 50, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 5.1, s16, 4233 kb/s Successfully opened the file. Parsing a group of options: output file test2_front_left.wav. Applying option map (set input stream mapping) with argument [FL]. Successfully parsed a group of options. Opening an output file: test2_front_left.wav. detected 4 logical cores [Parsed_channelsplit_0 @ 0000000002bb67a0] Setting 'channel_layout' to value '5.1' [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'time_base' to value '1/44100' [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'sample_rate' to value '44100' [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'sample_fmt' to value 's16' [graph 0 input from stream 0:0 @ 0000000002c12a40] Setting 'channel_layout' to value '0x3f' [graph 0 input from stream 0:0 @ 0000000002c12a40] tb:1/44100 samplefmt:s16 samplerate:44100 chlayout:0x3f [audio format for output stream 0:0 @ 0000000002c05060] Setting 'sample_fmts' to value 's16' Successfully opened the file. Parsing a group of options: output file test2_front_right.wav. Applying option map (set input stream mapping) with argument [FR]. Successfully parsed a group of options. Opening an output file: test2_front_right.wav. [audio format for output stream 1:0 @ 0000000002c1b9a0] Setting 'sample_fmts' to value 's16' Successfully opened the file. Parsing a group of options: output file test2_front_center.wav. Applying option map (set input stream mapping) with argument [FC]. Successfully parsed a group of options. Opening an output file: test2_front_center.wav. [audio format for output stream 2:0 @ 0000000002c1c1c0] Setting 'sample_fmts' to value 's16' Successfully opened the file. Parsing a group of options: output file test2_lfe.wav. Applying option map (set input stream mapping) with argument [LFE]. Successfully parsed a group of options. Opening an output file: test2_lfe.wav. [audio format for output stream 3:0 @ 0000000002c39c80] Setting 'sample_fmts' to value 's16' Successfully opened the file. Parsing a group of options: output file test2_back_left.wav. Applying option map (set input stream mapping) with argument [BL]. Successfully parsed a group of options. Opening an output file: test2_back_left.wav. [audio format for output stream 4:0 @ 0000000002c455e0] Setting 'sample_fmts' to value 's16' Successfully opened the file. Parsing a group of options: output file test2_back_right.wav. Applying option map (set input stream mapping) with argument [BR]. Successfully parsed a group of options. Opening an output file: test2_back_right.wav. [audio format for output stream 5:0 @ 0000000002c50a60] Setting 'sample_fmts' to value 's16' Successfully opened the file. [Parsed_channelsplit_0 @ 0000000002bb67a0] auto-inserting filter 'auto-inserted resampler 0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_channelsplit_0' [audio format for output stream 0:0 @ 0000000002c05060] auto-inserting filter 'auto-inserted resampler 1' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 0:0' [audio format for output stream 1:0 @ 0000000002c1b9a0] auto-inserting filter 'auto-inserted resampler 2' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 1:0' [audio format for output stream 2:0 @ 0000000002c1c1c0] auto-inserting filter 'auto-inserted resampler 3' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 2:0' [audio format for output stream 3:0 @ 0000000002c39c80] auto-inserting filter 'auto-inserted resampler 4' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 3:0' [audio format for output stream 4:0 @ 0000000002c455e0] auto-inserting filter 'auto-inserted resampler 5' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 4:0' [audio format for output stream 5:0 @ 0000000002c50a60] auto-inserting filter 'auto-inserted resampler 6' between the filter 'Parsed_channelsplit_0' and the filter 'audio format for output stream 5:0' [AVFilterGraph @ 0000000002c01440] query_formats: 14 queried, 18 merged, 21 already done, 0 delayed [auto-inserted resampler 0 @ 0000000002c50b20] ch:6 chl:5.1 fmt:s16 r:44100Hz -> ch:6 chl:5.1 fmt:s16p r:44100Hz [auto-inserted resampler 1 @ 0000000002c50be0] ch:1 chl:1 channels (FL) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (FL) fmt:s16 r:44100Hz [auto-inserted resampler 2 @ 0000000002c50ca0] ch:1 chl:1 channels (FR) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (FR) fmt:s16 r:44100Hz [auto-inserted resampler 3 @ 0000000002c50e20] ch:1 chl:mono fmt:s16p r:44100Hz -> ch:1 chl:mono fmt:s16 r:44100Hz [auto-inserted resampler 4 @ 0000000002c50d60] ch:1 chl:1 channels (LFE) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (LFE) fmt:s16 r:44100Hz [auto-inserted resampler 5 @ 0000000002c50ee0] ch:1 chl:1 channels (BL) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (BL) fmt:s16 r:44100Hz [auto-inserted resampler 6 @ 0000000002c50fa0] ch:1 chl:1 channels (BR) fmt:s16p r:44100Hz -> ch:1 chl:1 channels (BR) fmt:s16 r:44100Hz Output #0, wav, to 'test2_front_left.wav': Metadata: time_reference : 0 ICRD : 2012-05-15 ISFT : Lavf56.15.105 Stream #0:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (FL), s16, 705 kb/s Metadata: encoder : Lavc56.15.100 pcm_s16le Output #1, wav, to 'test2_front_right.wav': Metadata: time_reference : 0 ICRD : 2012-05-15 ISFT : Lavf56.15.105 Stream #1:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (FR), s16, 705 kb/s Metadata: encoder : Lavc56.15.100 pcm_s16le Output #2, wav, to 'test2_front_center.wav': Metadata: time_reference : 0 ICRD : 2012-05-15 ISFT : Lavf56.15.105 Stream #2:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s Metadata: encoder : Lavc56.15.100 pcm_s16le Output #3, wav, to 'test2_lfe.wav': Metadata: time_reference : 0 ICRD : 2012-05-15 ISFT : Lavf56.15.105 Stream #3:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (LFE), s16, 705 kb/s Metadata: encoder : Lavc56.15.100 pcm_s16le Output #4, wav, to 'test2_back_left.wav': Metadata: time_reference : 0 ICRD : 2012-05-15 ISFT : Lavf56.15.105 Stream #4:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (BL), s16, 705 kb/s Metadata: encoder : Lavc56.15.100 pcm_s16le Output #5, wav, to 'test2_back_right.wav': Metadata: time_reference : 0 ICRD : 2012-05-15 ISFT : Lavf56.15.105 Stream #5:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels (BR), s16, 705 kb/s Metadata: encoder : Lavc56.15.100 pcm_s16le Stream mapping: Stream #0:0 (pcm_s16le) -> channelsplit channelsplit:FL -> Stream #0:0 (pcm_s16le) channelsplit:FR -> Stream #1:0 (pcm_s16le) channelsplit:FC -> Stream #2:0 (pcm_s16le) channelsplit:LFE -> Stream #3:0 (pcm_s16le) channelsplit:BL -> Stream #4:0 (pcm_s16le) channelsplit:BR -> Stream #5:0 (pcm_s16le) Press [q] to stop, [?] for help size= 1338kB time=00:00:15.53 bitrate= 705.7kbits/s size= 2494kB time=00:00:28.95 bitrate= 705.6kbits/s [output stream 0:0 @ 0000000002c132e0] EOF on sink link output stream 0:0:default. [output stream 2:0 @ 0000000002c30580] EOF on sink link output stream 2:0:default. [output stream 5:0 @ 0000000002c509a0] EOF on sink link output stream 5:0:default. [output stream 4:0 @ 0000000002c454a0] EOF on sink link output stream 4:0:default. [output stream 1:0 @ 0000000002c1b8e0] EOF on sink link output stream 1:0:default. [output stream 3:0 @ 0000000002c39bc0] EOF on sink link output stream 3:0:default. No more output streams to write to, finishing. size= 4008kB time=00:00:46.52 bitrate= 705.6kbits/s video:0kB audio:24045kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (ChID-BLITS-EBU-Narration441-16b.wav): Input stream #0:0 (audio): 6018 packets read (24621912 bytes); 6018 frames decoded (2051826 samples); Total: 6018 packets (24621912 bytes) demuxed Output file #0 (test2_front_left.wav): Output stream #0:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); Total: 6018 packets (4103652 bytes) muxed Output file #1 (test2_front_right.wav): Output stream #1:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); Total: 6018 packets (4103652 bytes) muxed Output file #2 (test2_front_center.wav): Output stream #2:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); Total: 6018 packets (4103652 bytes) muxed Output file #3 (test2_lfe.wav): Output stream #3:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); Total: 6018 packets (4103652 bytes) muxed Output file #4 (test2_back_left.wav): Output stream #4:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); Total: 6018 packets (4103652 bytes) muxed Output file #5 (test2_back_right.wav): Output stream #5:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (4103652 bytes); Total: 6018 packets (4103652 bytes) muxed 6018 frames successfully decoded, 0 decoding errors [AVIOContext @ 0000000002c1a4c0] Statistics: 4 seeks, 6021 writeouts [AVIOContext @ 0000000002c2f600] Statistics: 4 seeks, 6021 writeouts [AVIOContext @ 0000000002c1c540] Statistics: 4 seeks, 6021 writeouts [AVIOContext @ 0000000002c3b000] Statistics: 4 seeks, 6021 writeouts [AVIOContext @ 0000000002c45980] Statistics: 4 seeks, 6021 writeouts [AVIOContext @ 0000000002c46fc0] Statistics: 4 seeks, 6021 writeouts [AVIOContext @ 000000000036d780] Statistics: 24682588 bytes read, 2 seeks
- Mix mono channels with Surcode (dolby certified) to use as reference. I've used this encoder before and I'm quite confident that produces correct results. Surcode downmix (This is on my mega account, should I put it somewhere else?)
Also if muxed together with the video that identifies channels it sounds right. The sound stage is centered both on stereo and on 5.1.
It should be centered on stereo because DPLII is supposed to be for both stereo and 5.1 delivery.
- I get an almost identical result (poins of dB in difference) when using qaac with this matrix:
1 0 0.7071067811865476 0 -0.8717797887081347j -0.4898979485566356j 0 1 0.7071067811865476 0 0.4898979485566356j 0.8717797887081347j
This numbers are from Wikipedia.
%ffmpeg% -loglevel quiet -i ChID-BLITS-EBU-Narration441-16b.wav -f wav -y - | %qaac% --tvbr 127 --quality 2 --verbose --native-resampler=bats,127 --matrix-preset=dpl2 --no-matrix-normalize - -o ChID-BLITS-EBU-Narration441-16b_qaac_dpl2.m4a %ffmpeg% -i ChID-BLITS-EBU-Narration441-16b_qaac_dpl2.m4a -y ChID-BLITS-EBU-Narration441-16b_qaac_dpl2.wav
- For reference, this is the result from aresample:
ffmpeg started on 2014-12-17 at 10:37:23 Report written to "ffmpeg-20141217-103723.log" Command line: ffmpeg -report -i ChID-BLITS-EBU-Narration441-16b.wav -af "aresample=out_channel_layout=stereo:matrix_encoding=dplii" -y ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav ffmpeg version N-68500-g3ba1050 Copyright (c) 2000-2014 the FFmpeg developers built on Dec 17 2014 01:55:42 with gcc 4.9.2 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-decklink --enable-zlib libavutil 54. 15.100 / 54. 15.100 libavcodec 56. 15.100 / 56. 15.100 libavformat 56. 15.105 / 56. 15.105 libavdevice 56. 3.100 / 56. 3.100 libavfilter 5. 4.100 / 5. 4.100 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc 53. 3.100 / 53. 3.100 Splitting the commandline. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-i' ... matched as input file with argument 'ChID-BLITS-EBU-Narration441-16b.wav'. Reading option '-af' ... matched as option 'af' (set audio filters) with argument 'aresample=out_channel_layout=stereo:matrix_encoding=dplii'. Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'. Reading option 'ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav' ... matched as output file. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option y (overwrite output files) with argument 1. Successfully parsed a group of options. Parsing a group of options: input file ChID-BLITS-EBU-Narration441-16b.wav. Successfully parsed a group of options. Opening an input file: ChID-BLITS-EBU-Narration441-16b.wav. [wav @ 000000000032bf60] Format wav probed with size=2048 and score=99 [wav @ 000000000032bf60] Before avformat_find_stream_info() pos: 68 bytes read:46722 seeks:2 [wav @ 000000000032bf60] parser not found for codec pcm_s16le, packets or times may be invalid. [wav @ 000000000032bf60] probing stream 0 pp:14 [wav @ 000000000032bf60] probing stream 0 pp:13 [wav @ 000000000032bf60] probing stream 0 pp:12 [wav @ 000000000032bf60] probing stream 0 pp:11 [wav @ 000000000032bf60] probing stream 0 pp:10 [wav @ 000000000032bf60] probing stream 0 pp:9 [wav @ 000000000032bf60] probing stream 0 pp:8 [wav @ 000000000032bf60] probing stream 0 pp:7 [wav @ 000000000032bf60] probing stream 0 pp:6 [wav @ 000000000032bf60] probing stream 0 pp:5 [wav @ 000000000032bf60] probing stream 0 pp:4 [wav @ 000000000032bf60] probing stream 0 pp:3 [wav @ 000000000032bf60] probing stream 0 pp:2 [wav @ 000000000032bf60] probing stream 0 pp:1 [wav @ 000000000032bf60] probed stream 0 [wav @ 000000000032bf60] parser not found for codec pcm_s16le, packets or times may be invalid. [wav @ 000000000032bf60] All info found [wav @ 000000000032bf60] After avformat_find_stream_info() pos: 204668 bytes read:276098 seeks:2 frames:50 Input #0, wav, from 'ChID-BLITS-EBU-Narration441-16b.wav': Metadata: encoder : Adobe Audition CS6 (Windows) date : 2012-05-15 creation_time : 20:53:02 time_reference : 0 Duration: 00:00:46.53, bitrate: 4236 kb/s Stream #0:0, 50, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 5.1, s16, 4233 kb/s Successfully opened the file. Parsing a group of options: output file ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav. Applying option af (set audio filters) with argument aresample=out_channel_layout=stereo:matrix_encoding=dplii. Successfully parsed a group of options. Opening an output file: ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav. Successfully opened the file. detected 4 logical cores [Parsed_aresample_0 @ 00000000003296a0] Setting 'out_channel_layout' to value 'stereo' [Parsed_aresample_0 @ 00000000003296a0] Setting 'matrix_encoding' to value 'dplii' [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'time_base' to value '1/44100' [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'sample_rate' to value '44100' [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'sample_fmt' to value 's16' [graph 0 input from stream 0:0 @ 000000000032be20] Setting 'channel_layout' to value '0x3f' [graph 0 input from stream 0:0 @ 000000000032be20] tb:1/44100 samplefmt:s16 samplerate:44100 chlayout:0x3f [audio format for output stream 0:0 @ 0000000002bb4080] Setting 'sample_fmts' to value 's16' [AVFilterGraph @ 0000000002bbbae0] query_formats: 4 queried, 9 merged, 0 already done, 0 delayed 0.325401 0.000000 0.230093 0.000000 -0.281805 -0.162700 0.000000 0.325401 0.230093 0.000000 0.162700 0.281805 [Parsed_aresample_0 @ 00000000003296a0] ch:6 chl:5.1 fmt:s16 r:44100Hz -> ch:2 chl:stereo fmt:s16 r:44100Hz Output #0, wav, to 'ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav': Metadata: time_reference : 0 ICRD : 2012-05-15 ISFT : Lavf56.15.105 Stream #0:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s Metadata: encoder : Lavc56.15.100 pcm_s16le Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help [output stream 0:0 @ 0000000002bb3fc0] EOF on sink link output stream 0:0:default. No more output streams to write to, finishing. size= 8015kB time=00:00:46.52 bitrate=1411.2kbits/s video:0kB audio:8015kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001194% Input file #0 (ChID-BLITS-EBU-Narration441-16b.wav): Input stream #0:0 (audio): 6018 packets read (24621912 bytes); 6018 frames decoded (2051826 samples); Total: 6018 packets (24621912 bytes) demuxed Output file #0 (ChID-BLITS-EBU-Narration441-16b_ff_aresample.wav): Output stream #0:0 (audio): 6018 frames encoded (2051826 samples); 6018 packets muxed (8207304 bytes); Total: 6018 packets (8207304 bytes) muxed 6018 frames successfully decoded, 0 decoding errors [AVIOContext @ 000000000032d6a0] Statistics: 4 seeks, 6021 writeouts [AVIOContext @ 000000000032c7e0] Statistics: 24682588 bytes read, 2 seeks
Let me know if you need anything else.
comment:3 by , 10 years ago
Cc: | added |
---|---|
Version: | 2.4.4 → git-master |
follow-up: 5 comment:4 by , 10 years ago
Done!
Did a build from git and if works!
Result
The mix is overall quieter but otherwise matches reference.
Since it's not a huge patch how long usually before approval?
comment:5 by , 10 years ago
Replying to otonvm:
The mix is overall quieter but otherwise matches reference.
Since it's not a huge patch how long usually before approval?
It is not a question of time but your email to the development mailing list confirming that you tested the patch (and compared it with a reference encoder) and that you believe it is correct.
comment:6 by , 10 years ago
Just to get some clarity..
You quoted a matrix you used for qaac, and you said this matrix gives you a good result, right?
This quoted matrix has two negative operations for Lt, and two positive operations for Rt, just like the swresample code (BEFORE the patch!).
If anything, the coefficients may be slightly wrong, but the +/- are in the proper places from what I can tell.
The only limitation is that swresample cannot actually apply the phase shift the DPLII spec asks for, so the result will never be perfect.
comment:7 by , 10 years ago
Summary: | Dolby Pro Logic II / Dolby matrix downmixing level balance → support phase shift for Dolby Pro Logic II / Dolby matrix downmix |
---|---|
Type: | defect → enhancement |
comment:8 by , 10 years ago
Please check all the samples provided, including the final post-patch result.
Put them into Audacity or similar and even visually the differences are obvious.
Then listen by swithing from solo to solo while playing eash piece on a loop.
Post-patch the difference with reference and/or qaac matrix are minimal at best.
I actually cannot explain what happens... I tried to fix this before I found this patch and could not mostly because I did not know how to ricreate that phase shift.
comment:9 by , 10 years ago
Its possible that the missing phase shift causes the difference you are observing, however blindly adjusting the formula to something that "feels" good in one particular circumstance is not the way to go here.
Any reference and independent sources I can find suggest that the current code is correct, albeit missing the phase shift. It also matches your qaac matrix (sans phase shift).
So we know its incomplete and therefor results in "wrong" output, no reason to actually "break" it more (by diversing the formula from the references), imho. The mixer in swresample (and avresample for that matter) is only a simple matrix mixer, until someone teaches it how to apply the phase shift, any modifications to the DPL/DPLII mixing are rather pointless.
Fact of the matter is, no change to the matrix coefficients will give you actual proper DPLII encoded audio.
comment:10 by , 10 years ago
I'm sorry but I must disagree.
And by just listening to those samples you could easily hear the difference yourself.
I belive whatever Surcode produces is a reliable reference, my qaac matrix matches it (therefore also what's noted on wikipedia) but most importantly it sounds right.
Any other audio that identifies each channel and mixes between each channel produces the same results.
comment:11 by , 10 years ago
I wouldn't recommend using the patch, I didn't realise about the phase shifting when I wrote it (It will stop the waveform cancellation, but also get steered to the wrong channel).
To quote some documentation from Dolby:
"The 90-Degree Phase Shift filter provides a means for an encoding engineer to create a multichannel Dolby Digital bitstream that can be downmixed to a Dolby Surround compatible Lt/Rt output. Without this filter, point-source elements panned from Surround to Center in the multichannel mix would seem to pan from Surround to Left and then to Center when downmixed to Lt/Rt and reproduced using a Dolby Surround Pro Logic decoder.
This filter should generally be used whenever encoding a multichannel signal unless it is known that the 5.1-channel source does not contain point-source element pans. For example, if the source was recorded using five discrete microphones placed in the corners of an auditorium, there is no panning between channels and the filter could be safely disabled. If in doubt, use a DP562 to downmix the 5.1-channel program to Lt/Rt, Dolby Surround Pro Logic decode the Lt/Rt signals, and then set the filter to the setting that sounds best."
This is pretty much what happened in #3455. Basically, as heleppkes says, it will not give you proper encoded audio even if the coefficients are corrected (I do think they're off too).
comment:12 by , 4 years ago
Status: | new → open |
---|
So will 637c154a5048f60d0b59d35941d4d528edf56370 do something for this issue? Also after reading https://en.wikipedia.org/wiki/Talk:Matrix_decoder#Mathematical_definition_of_matrix on https://en.wikipedia.org/wiki/Hilbert_transform, IMHO this is quite insane stuff to implement.
comment:13 by , 12 days ago
Some progress, whatever that does https://patchwork.ffmpeg.org/project/ffmpeg/patch/20241226181306.691-3-jamrial@gmail.com/
Please test current FFmpeg git head and please provide the command line you tested together with the complete, uncut console output to make this a valid ticket.
Are you sure that it isn't just a characteristic of Dolby downmixing that the right speaker gets louder if you play back on stereo equipment?