Opened 12 years ago
Last modified 8 months ago
#2325 open defect
AAC audio delayed ~ 20 ms after conversion to PCM
Reported by: | BChap | Owned by: | |
---|---|---|---|
Priority: | important | Component: | avcodec |
Version: | git-master | Keywords: | aac regression |
Cc: | MasterQuestionable | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
When using ffmpeg to convert an aac audio stream from an mp4 to pcm, the result is out of sync by about 2ms. By adding -ss 00:00:00.02 after the input, then output is correctly aligned.
How to reproduce:
% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav ffmpeg version 1.1.git git revision: faa0068 built on Mar 4 2013 11:40:27
Attachments (2)
Change History (38)
by , 12 years ago
Attachment: | test100.mp4 added |
---|
by , 12 years ago
Attachment: | Screen Shot 2013-03-04 at 8.05.40 PM.png added |
---|
comment:1 by , 12 years ago
Version: | unspecified → git-master |
---|
comment:2 by , 12 years ago
Keywords: | mov added; mp4 pcm removed |
---|
Please provide your failing command line together with complete, uncut console output to make this a valid ticket.
comment:3 by , 12 years ago
The command doesn't fail, but here's the output
% ffmpeg -i test100.mp4 -c:a pcm_s16le test100_audio.wav ffmpeg version 1.1.git Copyright (c) 2000-2013 the FFmpeg developers built on Mar 4 2013 11:40:27 with Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn) configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-nonfree --enable-hardcoded-tables --enable-avresample --cc=cc --host-cflags= --host-ldflags= --enable-libx264 --enable-libfaac --enable-libmp3lame --enable-libxvid --enable-libfreetype --enable-ffplay libavutil 52. 17.103 / 52. 17.103 libavcodec 54. 92.100 / 54. 92.100 libavformat 54. 63.102 / 54. 63.102 libavdevice 54. 3.103 / 54. 3.103 libavfilter 3. 41.100 / 3. 41.100 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 creation_time : 2013-03-04 21:40:01 Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Video Media Handler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Sound Media Handler File 'test100_audio.wav' already exists. Overwrite ? [y/N] y Output #0, wav, to 'test100_audio.wav': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 ISFT : Lavf54.63.102 Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Sound Media Handler Stream mapping: Stream #0:1 -> #0:0 (aac -> pcm_s16le) Press [q] to stop, [?] for help size= 2344kB time=00:00:12.50 bitrate=1536.1kbits/s video:0kB audio:2344kB subtitle:0 global headers:0kB muxing overhead 0.003333%
follow-up: 6 comment:4 by , 12 years ago
Keywords: | regression added |
---|---|
Priority: | normal → important |
If there is an issue, it is a regression since 1edea05
Could you explain how you know that the delay is a bug? What is the other application you are testing?
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac? I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?
follow-up: 7 comment:6 by , 12 years ago
Replying to cehoyos:
If there is an issue, it is a regression since 1edea05
Just tried pulling that revision, and building. I still get the same delay.
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
What is the other application you are testing?
Adobe After Effects
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
I see the same delay when decoding the extracted aac file with FFmpeg, but does your reference application also decode the extracted aac file without this delay?
After Effects decodes it without the delay.
However, I just noticed that when exporting a wave file from the mp4 using Quicktime 7 Pro, it's audio delay was reversed. The Quicktime's audio is 2ms ahead of the source, whereas ffmpeg's audio is 2ms behind the source.
If there is an alternate solution using command line flags, I'd be up for that as well. Just trying to figure out how to make sure the source and output audio lines up exactly.
follow-up: 8 comment:7 by , 12 years ago
Replying to brchapman:
Replying to cehoyos:
If there is an issue, it is a regression since 1edea05
Just tried pulling that revision, and building. I still get the same delay.
Since it is a regression since that revision, you will have to test an earlier version;-)
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
follow-up: 9 comment:8 by , 12 years ago
Replying to cehoyos:
Replying to brchapman:
Replying to cehoyos:
If there is an issue, it is a regression since 1edea05
Just tried pulling that revision, and building. I still get the same delay.
Since it is a regression since that revision, you will have to test an earlier version;-)
Just tried pulling 0332324, and everything lines up great! there's no 2ms delay, and the other ticket about a duplicate first frame I posted #2324 is also fixed!
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.
follow-up: 10 comment:9 by , 12 years ago
Replying to brchapman:
Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,
and the other ticket about a duplicate first frame I posted #2324 is also fixed!
No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?
Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.
And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects, you see the same delay as when transcoding the original mp4 file, or am I wrong?
follow-up: 11 comment:10 by , 12 years ago
Replying to cehoyos:
Replying to brchapman:
Just tried pulling 0332324, and everything lines up great! there's no 2ms delay,
and the other ticket about a duplicate first frame I posted #2324 is also fixed!
No, the output file is not valid.
(You can easily change the FFmpeg source to allow writing VFR mov files, but they are not conforming to any specification.)
Could you explain how you know that the delay is a bug?
When converting through After Effects, I don't get this delay. Everything lines up exactly in the output wave file.
How does it "line up"? (I don't understand how the numbers should relate to the sound. I am certainly not claiming there is no issue - I don't know - but since we know already of three - very different - applications that decode the sample differently from After Effects, I wonder how you can be sure that it is correct.)
I'm defining correct as placing the source mp4 and output wave file in a timeline together and checking if the waveforms match between them. This is shown in the attached screenshot. Since I'm not doing anything other than just reading a file in and encoding it to a different, I would expect the input and output sound to line up exactly. Is this how you would expect it work?
Your reasoning basically assumes that After Effects is right and FFmpeg, nero and QuickTime are wrong. While I am not saying this isn't the case, it is no proof imo.
(There is a mov sample from a camera somewhere on this tracker that shows a "visible" noise (knocking on a table iirc), it would be interesting to test that sample with all applications, I unfortunately fail to find it atm.)
Do you see the same problem if you extract the audio stream from the mov file with "ffmpeg -i test100.mp4 -acodec copy out.aac" ? Ie, is the problem in any way related to the container or only to aac?
No, I don't get the delay. It lines up perfectly.
You mean you get the same delay if you use FFmpeg but no delay with After Effects if you try with the aac file - or do I misunderstand?
When I run the command "ffmpeg -i test100.mp4 -acodec copy out.aac", the out.aac file audio matches the source mp4's audio exactly, without any delay.
And if you transcode the out.aac file with FFmpeg and compare it in AfterEffects, you see the same delay as when transcoding the original mp4 file, or am I wrong?
yes, if i first transcode the orignal
% ffmpeg -i test100.mp4 -c:a copy test100.aac
then:
% ffmpeg -i test100.aac -c:a pcm_s16le test100_audio.wav
test100_audio.wav is delayed.
Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:
% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.mov
I don't get the duplicate first frame bug in #2324
Based on this I would guess that this would work:
ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.mov
However, it doesn't. The first frame is still duplicated.
comment:11 by , 12 years ago
Replying to brchapman:
Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:
% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.movI don't get the duplicate first frame bug in #2324
You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).
Based on this I would guess that this would work:
ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.movHowever, it doesn't. The first frame is still duplicated.
Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.
follow-up: 13 comment:12 by , 12 years ago
I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)
At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?
follow-up: 14 comment:13 by , 12 years ago
Replying to cehoyos:
Replying to brchapman:
Also, if I encode test100.mp4 without the aac audio stream (ie with no audio) and then convert it:
% ffmpeg -i test100_no_aac.mp4 -c:v prores test100_ffmpeg.movI don't get the duplicate first frame bug in #2324
You are using a different input file that is cfr, your original sample has a longer first frame (that needs to be duplicated to get cfr output).
Based on this I would guess that this would work:
ffmpeg -i test100.mp4 -c:v prores -an test100_ffmpeg.movHowever, it doesn't. The first frame is still duplicated.
Because the timestamps still require a duplication (they do not change just because you don't encode the audio). Use -vsync 0 to ignore the timestamps so that no frame duplication happens.
So when I use -vsync 0, the first frame isn't duplicated, but rather the first frame is now completely black. Any other flags I can use to get rid of this?
I'd use -ss to skip past the first frame, which works on it's own. However if I try to use it with -filter_complex overlay and an image sequence that's overlaid on top of the source video, the sequence doesn't end up starting until frame 2 (frame 1 on screen). Here's that command:
% ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24 -start_number 1 -i test100_hu ffmpeg version N-37747-g058e1f8 Copyright (c) 2000-2013 the FFmpeg developers built on Mar 5 2013 19:38:09 with llvm-gcc 4.2.1 (LLVM build 2336.11.00) configuration: --prefix=/usr/local/ --enable-shared --enable-pthreads --enable-gpl libavutil 52. 17.103 / 52. 17.103 libavcodec 54. 92.100 / 54. 92.100 libavformat 54. 63.103 / 54. 63.103 libavdevice 54. 3.103 / 54. 3.103 libavfilter 3. 42.103 / 3. 42.103 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test100.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 creation_time : 2013-03-04 21:40:01 Duration: 00:00:12.50, start: 0.000000, bitrate: 283 kb/s Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 85 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Video Media Handler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s Metadata: creation_time : 2013-03-04 21:40:01 handler_name : Mainconcept MP4 Sound Media Handler [image2 @ 0x7fe4d9033c00] max_analyze_duration 5000000 reached at 5000000 microseconds Input #1, image2, from 'test100_hud/test100_transcoder%05d.png': Duration: 00:00:12.50, start: 0.000000, bitrate: N/A Stream #1:0: Video: png, rgba, 1280x720, 24 fps, 24 tbr, 24 tbn, 24 tbc [prores @ 0x7fe4d9466400] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d946a800] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d946d000] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d946f800] encoding with ProRes standard (apcn) profile [prores @ 0x7fe4d9038000] encoding with ProRes standard (apcn) profile Output #0, mov, to 'test100_ffmpeg.mov': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 encoder : Lavf54.63.103 Stream #0:0: Video: prores (apcn) (apcn / 0x6E637061), yuv422p10le, 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 12288 tbn, 24 tbc Stream mapping: Stream #0:0 (h264) -> overlay:main Stream #1:0 (png) -> overlay:overlay overlay -> Stream #0:0 (prores) Press [q] to stop, [?] for help frame= 300 fps= 34 q=0.0 Lsize= 8944kB time=00:00:12.50 bitrate=5861.8kbits/s video:8942kB audio:0kB subtitle:0 global headers:0kB muxing overhead 0.021776% ffmpeg -y -ss 00:00:00.042 -i test100.mp4 -vsync 0 -f image2 -force_fps -r 24 15.18s user 0.24s system 174% cpu 8.838 total
Looking at mediainfo for test100.mp4, I can see that the video track is cfr, where as the audio is variable. Which I'm guessing causes the "Overall bit rate mode" to become variable. Is this what your talking about?
% mediainfo test100.mp4 General Complete name : test100.mp4 Format : MPEG-4 Format profile : Base Media / Version 2 Codec ID : mp42 File size : 433 KiB Duration : 12s 500ms Overall bit rate mode : Variable Overall bit rate : 284 Kbps Encoded date : UTC 2013-03-04 21:40:01 Tagged date : UTC 2013-03-04 21:41:11 ?TIM : 00:00:00:00 ?TSC : 24 ?TSZ : 1 Video ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : Main@L5.1 Format settings, CABAC : Yes Format settings, ReFrames : 3 frames Format settings, GOP : M=4, N=33 Codec ID : avc1 Codec ID/Info : Advanced Video Coding Duration : 12s 500ms Bit rate : 85.5 Kbps Width : 1 280 pixels Height : 720 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 24.000 fps Standard : NTSC Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.004 Stream size : 130 KiB (30%) Language : English Encoded date : UTC 2013-03-04 21:40:01 Tagged date : UTC 2013-03-04 21:40:01 Audio ID : 2 Format : AAC Format/Info : Advanced Audio Codec Format profile : LC Codec ID : 40 Duration : 12s 500ms Source duration : 12s 501ms Bit rate mode : Variable Bit rate : 192 Kbps Maximum bit rate : 329 Kbps Channel(s) : 2 channels Channel positions : Front: L R Sampling rate : 48.0 KHz Compression mode : Lossy Stream size : 289 KiB (67%) Source stream size : 289 KiB (67%) Language : English Encoded date : UTC 2013-03-04 21:40:01 Tagged date : UTC 2013-03-04 21:40:01
Replying to cehoyos:
I tried different players and re-encoded the original sample and FFmpeg's behaviour is consistent afaict. (It may of course be wrong.)
At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?
Gongs start on 0, 41, 84, 116, 167, 207, 246, 285
comment:14 by , 12 years ago
Replying to brchapman:
At what frames are the gongs supposed to play? Ie, which numbers should be shown on screen at the time each gong starts?
Gongs start on 0, 41, 84, 116, 167, 207, 246, 285
This is exactly what I see here with different players, both with the original and a sample re-encoded by FFmpeg, if FFmpeg would cut the first 0.02 seconds of audio, this would get wrong or do I miss something?
comment:15 by , 12 years ago
0.02 seconds of audio is about half a video frame.
And 0.002 seconds, as stated in the title of this ticket, is the audio delay you get by lounging on the couch instead of sitting straight (70 cm more for the sound to travel).
comment:16 by , 9 years ago
I have run into this as well and the way I see it, ffmpeg reads but does not write the custom mov metadata (udta/meta) "iTunSMPB". Check around line 3121 in mov.c where priming is set based on the values from that metadata entry. It the mov muxer wrote that metadata, ffmpeg would at least be compatible with itself as far as priming is concerned (i.e. not decode an offset) and with Apple tools. That would be an improvement. I don't know if trailing samples are treated in any way in ffmpeg but AFAIC the priming problem is the more serious one as it leads to a/v desync when the delay is long enough (2112 samples is what Apple uses by default and that's a bit more than a 25 FPS video frame and that's noticable/significant for some people).
For further explanations see also http://ffmpeg.org/pipermail/ffmpeg-devel/2012-July/127834.html
comment:17 by , 3 years ago
Status: | new → open |
---|
ffmpeg can write a pgap inside the track's udta via -metadata:s:a gapless_playback=X where X is an 8-bit value.
No, that is just a flag. Oh, there is a Remainder in seamless playback (iTunSMPB). See https://bugs.chromium.org/p/chromium/issues/detail?id=668999 and https://stackoverflow.com/questions/31093736/removing-both-leading-and-trailing-silence-from-m4a-files-using-ffmpeg
Full description of iTunSMPB https://hydrogenaud.io/index.php?topic=48231.msg430949#msg430949
"Additionally, the sixth value is the byte offset from the first audio frame to the 8th-from-last frame. This provides a resynchronization mechanism to restore a decoder's true sample number after a seek."
Apparently a real problem, wow. https://www.reddit.com/r/PrismMusic/comments/a9r7h5/comment/ecmou6i/
comment:18 by , 3 years ago
iTunSMPB is available for mp3 too! https://patchwork.ffmpeg.org/project/ffmpeg/list/?submitter=41
Not applied, yet:
For example exiftool on (yes, those are already in FATE, yet not in USE, LOL) https://fate-suite.ffmpeg.org/gapless/gapless-itunes.mp3
shows
Encoded By : iTunes 12.7.0.166 Comment : (iTunSMPB) 00000000 00000210 0000086A 0000000000066486 00000000 0002DA9D 00000000 00000000 00000000 00000000 00000000 00000000
and exiftool -v
Comment = (iTunPGAP) 0 | EncodedBy = iTunes 12.7.0.166 | Comment = (iTunNORM) 00000362 000004C0 0000308F 00003CC5 00000DAC 00000DAC 00007D1[snip] | Comment = (iTunSMPB) 00000000 00000210 0000086A 0000000000066486 00000000 0002DA9D[snip]
and ffmpeg itself:
encoded_by : iTunes 12.7.0.166 iTunNORM : 00000362 000004C0 0000308F 00003CC5 00000DAC 00000DAC 00007D14 00007AC9 000007C1 0000175E iTunSMPB : 00000000 00000210 0000086A 0000000000066486 00000000 0002DA9D 00000000 00000000 00000000 00000000 00000000 00000000
Which means: 528 (0x210) samples of priming, 2154 (0x86A) of Remainder samples, 418950 (0x66486) samples total (that obviously does not include garbage samples 528 + 2154) and 0x2DA9D is the byte offset from the first audio frame to the 8th-from-last frame. So alltogether there are 421 632 samples. Same is in audacity when you open wav decoded in ffmpeg! Very bad.
What is even more funny is that it prints those warnings:
[mp3float @ 000001545cdc72c0] overread, skip -6 enddists: -4 -4A [mp3float @ 000001545cdc72c0] overread, skip -5 enddists: -2 -2
Some implementation of it: https://git.rockbox.org/cgit/rockbox.git/log/?qt=grep&q=gapless
comment:19 by , 3 years ago
I just found a mp4/aac file that has crazy 10 116 priming samples!! WOW. It is "Advanced Audio Codec with Spectral Band Replication".
It has media time set to 5058, and you do x2 because it is HC since mdhd_TimeScale (?). Nice, just nice, I suppose you should look into that too. File Abba - Don't Shut Me Down.mp4 from Tidal. It does have sgpd and sbgp though, all good but another file Abba - I Still Have Faith In You.mp4 has 310720 ms duration which is BS, because it is actually (after discarding priming) 8 ms longer than the audio. WTF. mdhd.duration is also different after -c copy. Oogh.
follow-up: 24 comment:20 by , 3 years ago
Please apply https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220128052107.1678032-1-kode54@gmail.com/
For example exiftool on (yes, those are already in FATE, yet not in USE, LOL) https://fate-suite.ffmpeg.org/gapless/gapless-itunes.mp3
Hey!
comment:21 by , 3 years ago
I've run at this problem a few days ago.
From my perspective it's a problem with aac at FFmpeg's side. Not from the side of the codec.
I generate 1 frame (1024) of silence or sinus samples but ffmpeg adds about two frames o silence before my frame.
Same happens when I push packets of existing audio from some source. FFmpeg adds two frames of silence in the beginning.
No such thing happens for other codecs. Checked with ac3 and eac3. Audio starts correctly from first sample. No silence added in the beginning.
I'm absolutely sure about my data I feed into ffmpeg so please don't argue.
There's some kind of problem with ffmpeg-aac code. I've tested is with
- aac builtin
- libfdk-aac
BOTH have a problem with adding about two frames of silence in front of passed audio.
comment:22 by , 3 years ago
OK. I know all :)
Some codes needs ramp-up samples. Some of them plenty, for example 2048 or more which can be more than one frame. It depends on the codec.
Those samples are added in front of data that comes from the codec.
One can check how many samples are added via
VCodecContext::initial_padding;
Those samples need to be discarded!
This can be easily done if you're using ffmpeg as a library. But from command line this (to my best knowledge) cannot be discarded. If somebody knows a way - please add a comment.
I've came up with a solution if initial_padding is different that a multiply of a frame size. You have to add missing samples to fill up to next full frame and discard those frames. Now you can push real audio data and enjoy output without latency and padding.
This can be achieved only using ffmpeg as a library.
comment:23 by , 2 years ago
mdhd.duration is also different after -c copy. Oogh.
It is actually the same, while post-editlist durations (mvhd and tkhd) become wrong, 238915 vs 238800.
Anyway after #9671 looks like decoding HE-AAC to wav is more correct, i.e. it does not cut off too much good sound (after priming) in the beginning. By "looks like" I mean I compared to Don't Shut Me Down.flac that has 3m 58s 800ms (to get 800 ms info I used Medianfo in "advanced mode"), while after #9671 you get when decoding HE-AAC to WAV 3m 58s 819ms but original aac file in Mediainfo says -19 ms should be done (Duration_LastFrame: -19 ms) due to editlist duration. And that way both are the same. We really need to do https://bugs.chromium.org/p/chromium/issues/detail?id=668999
comment:24 by , 8 months ago
Cc: | added |
---|---|
Component: | undetermined → avcodec |
Keywords: | mov removed |
Summary: | MP4 AAC Audio is delayed by 2ms when converted to PCM → AAC audio delayed ~ 20 ms after conversion to PCM |
͏ I don't think the "mp3dec" patch could be directly related.
͏ Though the gapless metadata thing could somewhat relate.
͏ However I much question the necessity of such metadata implementation: much useless.
͏ Shall the player be able to make use of such: so shall they be able to directly pre-filter the audio outright, without needing such metadata.
͏ (that shall eventually give nothing meaningful: for what's needed to support gapless playback may be programmatically acquired by silence detection)
͏ That causes the ~ 20 ms offset is primarily for:
͏ https://developer.apple.com/documentation/quicktime-file-format/background_aac_encoding
͏ https://en.wikipedia.org/wiki/Gapless_playback#Compression_artifacts
͏ .
͏ Unsure about the appropriate handling really:
͏ Technically, these are part of the samples... whether sensible or not.
comment:25 by , 8 months ago
Same code will have to be written for mp4, yes. Not directly related, but file for this is in FATE.
comment:26 by , 8 months ago
͏ I doubt such adding could address the real problems.
͏ And only MP4? (ADTS AAC, MKV etc..?)
comment:27 by , 8 months ago
And only MP4?
In fact only Apple style mp4. This will not fix new editlist ISO style, there remainder needs to be fixed only, patch is here that applies duration field in editlist and thus removes remainder: https://patchwork.ffmpeg.org/project/ffmpeg/patch/20190429225027.81295-1-fumoboy007@me.com/
ADTS
Cannot be solved. It only has the author of the audio stream, not the priming info. Same for mkv.
comment:28 by , 8 months ago
͏ Hard to tell if it's really fix... Or breaking things further.
͏ Comparable to previous PNG havoc. [ #11002 ]
comment:29 by , 8 months ago
Well, actually it is very simple just as with png. Apple decoder decodes as I said, aac mp4 and mp3. Chrome uses remainder fix for ffmpeg, here is a test for that, https://jakearchibald.github.io/aac-decode-bug/
As for the mp3dec patch, it is used by the author of the patch here and file in FATE also is his. https://github.com/losnoco/Cog/tree/main/ThirdParty/ffmpeg/patches
comment:30 by , 8 months ago
͏ The primary issue is broken inconsistently across implementations.
͏ (as in [ https://trac.ffmpeg.org/ticket/11002#comment:11 ])
͏ And logically there's no strong reason to justify such unusual display by default:
͏ These are part of the samples: due to compression artifacts, still.
͏ Taking it into an analogy, that would be:
͏ The decoder deeming part of the frames ugly and refuses to display...
comment:31 by , 8 months ago
These are part of the samples: due to compression artifacts, still.
You are correct! The remainder, which is after 5 seconds is not always just silence. It can be some strange decreasing amplitude noise, like in the case of perfect audio sinusoid. But also see my report #9471. Paul that helped with Dolby EAC3 left the ffmpeg project too. So WE have mp3, mp4, eac3. Apple actually handles EAC3 correctly too.
follow-up: 33 comment:32 by , 8 months ago
͏ Any non-broken codec should not cause mere silence be significantly translated to anything else more sophisticated (less compressible).
͏ Those have strange leading/trailing patterns are typically recording noise unhandled: preferably to be dealt from source.
comment:33 by , 8 months ago
Replying to MasterQuestionable:
Any non-broken codec should not cause mere silence be significantly
Think about it, all codecs have frames or something like that. And thus after the end of audio it will have artifical sound. E.g. truehd uses shorten_by to remove trailing silence. And flac also has some metadata.
comment:34 by , 8 months ago
͏ "significantly" refers to no more than 1 audio frame: for some codecs' overlapping transformation.
͏ Otherwise, definitely broken.
comment:35 by , 8 months ago
Haha, yes, multiple frames by Apple aac is funny and different amount used in HE aac and again different in HEv2 is even funnier. Not to mention EAC3 having different priming in different encoders.
You would love to learn that aac can have not 1024 but 960 samples and that is used in DAB+ digital radio. #1407
In both cases standards suggest minimal amount: 1024 and 256 to be used. LOL.
Sample Screenshot of waveform