#2484 closed defect (fixed)
atempo - low accuracy
Reported by: | bars | Owned by: | Pavel Koshevoy |
---|---|---|---|
Priority: | normal | Component: | avfilter |
Version: | git-master | Keywords: | atempo |
Cc: | pkoshevoy@gmail.com | Blocked By: | |
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | no |
Description
Summary of the bug: I can't reach an exact length (microseconds). The precision for atempo can only be specified up to 3 digits (x.xxx).
How to reproduce:
Example test.wav (Duration 00:40:44.864)
To convert from 25 fps to 23.976fps (PAL->NTSC)
(24000/1001)/25=0.959040959041 => atempo=0.959040959041
00:40:44.864 => 00:42:29.407 (407ms) (atempo=0.959040959041)
00:40:44.864 => 00:42:29.407 (407ms) (atempo=0.959)
00:40:44.864 => 00:42:29.407 (407ms) (atempo=0.959999999999)
00:40:44.864 => 00:42:29.407 (407ms) (atempo=0.959111111111)
00:40:44.864 => 00:42:29.407 (407ms) (atempo=24000/25025)
When 0.959040959041 correct length must be at 00:42:29.280 (280ms!)
ffmpeg -i "test.wav" -y -acodec pcm_s16le -af "atempo=0.959040959041" "test_atempo.wav" ffmpeg version N-52233-gee94362 Copyright (c) 2000-2013 the FFmpeg developers built on Apr 18 2013 02:55:39 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-libxvid --enable-zlib libavutil 52. 26.100 / 52. 26.100 libavcodec 55. 2.100 / 55. 2.100 libavformat 55. 2.100 / 55. 2.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 56.103 / 3. 56.103 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 3.100 / 52. 3.100 [wav @ 0000000000329fe0] max_analyze_duration 5000000 reached at 5013333 microseconds Guessed Channel Layout for Input Stream #0.0 : stereo Input #0, wav, from 'test.wav': Metadata: encoder : Lavf55.2.100 Duration: 00:40:44.86, bitrate: 1536 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s Output #0, wav, to 'test_atempo.wav': Metadata: ISFT : Lavf55.2.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le) Press [q] to stop, [?] for help size= 16642kB time=00:01:28.75 bitrate=1536.0kbits/s size= 36550kB time=00:03:14.93 bitrate=1536.0kbits/s size= 46708kB time=00:04:09.11 bitrate=1536.0kbits/s size= 64952kB time=00:05:46.41 bitrate=1536.0kbits/s size= 78344kB time=00:06:57.83 bitrate=1536.0kbits/s size= 96750kB time=00:08:35.99 bitrate=1536.0kbits/s size= 109712kB time=00:09:45.13 bitrate=1536.0kbits/s size= 129966kB time=00:11:33.15 bitrate=1536.0kbits/s size= 144280kB time=00:12:49.49 bitrate=1536.0kbits/s size= 159015kB time=00:14:08.08 bitrate=1536.0kbits/s size= 172411kB time=00:15:19.52 bitrate=1536.0kbits/s size= 188794kB time=00:16:46.90 bitrate=1536.0kbits/s size= 203688kB time=00:18:06.33 bitrate=1536.0kbits/s size= 221898kB time=00:19:43.45 bitrate=1536.0kbits/s size= 234376kB time=00:20:50.00 bitrate=1536.0kbits/s size= 254330kB time=00:22:36.42 bitrate=1536.0kbits/s size= 264514kB time=00:23:30.73 bitrate=1536.0kbits/s size= 284585kB time=00:25:17.78 bitrate=1536.0kbits/s size= 305369kB time=00:27:08.63 bitrate=1536.0kbits/s size= 326328kB time=00:29:00.41 bitrate=1536.0kbits/s size= 342023kB time=00:30:24.12 bitrate=1536.0kbits/s size= 362457kB time=00:32:13.10 bitrate=1536.0kbits/s size= 374388kB time=00:33:16.73 bitrate=1536.0kbits/s size= 391906kB time=00:34:50.16 bitrate=1536.0kbits/s size= 401618kB time=00:35:41.96 bitrate=1536.0kbits/s size= 415769kB time=00:36:57.43 bitrate=1536.0kbits/s size= 431130kB time=00:38:19.35 bitrate=1536.0kbits/s size= 449682kB time=00:39:58.30 bitrate=1536.0kbits/s size= 467467kB time=00:41:33.15 bitrate=1536.0kbits/s size= 478014kB time=00:42:29.40 bitrate=1536.0kbits/s video:0kB audio:478014kB subtitle:0 global headers:0kB muxing overhead 0.000016%
Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.
Change History (18)
follow-up: 2 comment:1 by , 12 years ago
Component: | FFmpeg → avfilter |
---|---|
Keywords: | atempo added |
Version: | unspecified → git-master |
comment:2 by , 12 years ago
Replying to cehoyos:
I tested the following - could you confirm if this is the problem you are seeing?
Encoding 24000 seconds (6:40:00) of audio with 48kHz:
Slowing audio down as proposed by you (6:57:05), the output size is ~1.5 seconds off for seven hours, indicating noticeable A/V desync for a two-hour movie (iiuc)
$ ffmpeg -i out24000.wav -af "atempo=24000/25025" out25025.wav
...
size= 2346232kB time=06:57:06.47 bitrate= 768.0kbits/s
}}}
Yes, the problem in this. For 6:40:00.00 correctly 06:57:05.00 (PAL->NTSC 25->23.976fps)
A/V desync ~1470ms if source duration=06:40:00
A/V desync ~440ms for a two-hour movie (440ms this much!)
P.S.
24000/25025 = (24000/1001)/25 = 0.959040959041
comment:3 by , 12 years ago
Reproduced by developer: | set |
---|---|
Status: | new → open |
comment:5 by , 12 years ago
Owner: | set to |
---|
The error is much worse at lower sample rates (8000Hz) -- off by more than 52 seconds.
$ /Developer/x86_64/bin/ffmpeg -f lavfi -i 'aevalsrc=s=8000:c=mono:exprs=sin(0.5*2*PI*t)*sin(440*2*PI*t):d=24000' -af atempo=23976/25000 -y /nfs/scratch/DataSets/Video/au/06h-40m-23976to25000-v2.au ffmpeg version N-52339-g0dd25e4 Copyright (c) 2000-2013 the FFmpeg developers built on Apr 20 2013 13:29:29 with llvm-gcc 4.2.1 (LLVM build 2336.11.00) configuration: --prefix=/Developer/x86_64 --enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-postproc --enable-libx264 --enable-libxvid --enable-yasm --enable-runtime-cpudetect --extra-cflags=-I/opt/local/include --extra-ldflags='-headerpad_max_install_names -L/opt/local/lib' libavutil 52. 27.100 / 52. 27.100 libavcodec 55. 5.100 / 55. 5.100 libavformat 55. 3.100 / 55. 3.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 58.100 / 3. 58.100 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 3.100 / 52. 3.100 Input #0, lavfi, from 'aevalsrc=s=8000:c=mono:exprs=sin(0.5*2*PI*t)*sin(440*2*PI*t):d=24000': Duration: N/A, start: 0.000000, bitrate: 512 kb/s Stream #0:0: Audio: pcm_f64le, 8000 Hz, mono, dbl, 512 kb/s Output #0, au, to '/nfs/scratch/DataSets/Video/au/06h-40m-23976to25000-v2.au': Metadata: encoder : Lavf55.3.100 Stream #0:0: Audio: pcm_s16be ([3][0][0][0] / 0x0003), 8000 Hz, mono, s16, 128 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_f64le -> pcm_s16be) Press [q] to stop, [?] for help size= 391837kB time=06:57:57.54 bitrate= 128.0kbits/s video:0kB audio:391837kB subtitle:0 global headers:0kB muxing overhead 0.000008%
comment:7 by , 12 years ago
Analyzed by developer: | unset |
---|
Use "Analyzed by developer" if you added the analysis here to the bug tracker (not necessary anymore), thank you for fixing this!
follow-up: 9 comment:8 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | open → closed |
Fixed by Pavel Koshevoy, thank you for the report!
comment:9 by , 12 years ago
Replying to cehoyos:
Fixed
Not yet fully
For a test file from the first post:
been: 00:40:44.864 => 00:42:29.407 (407ms) (atempo=24000/25025)
it is now: 00:40:44.864 => 00:42:29.257 (257ms) (atempo=24000/25025)
When atempo=24000/25025(0.959040959041) correct length must be at 00:42:29.280 (280ms!)
My test file, link: http://www.sendspace.com/file/oa1org
link(mirror): http://sendfile.su/801269
$ ffmpeg -i Real_Sound(=2444.864).wav -af "atempo=24000/25025" out_Real_Sound.wav ffmpeg version N-52458-gaa96439 Copyright (c) 2000-2013 the FFmpeg developers built on Apr 24 2013 22:24:12 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-libxvid --enable-zlib libavutil 52. 27.101 / 52. 27.101 libavcodec 55. 6.100 / 55. 6.100 libavformat 55. 3.100 / 55. 3.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 60.101 / 3. 60.101 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 3.100 / 52. 3.100 [wav @ 000000000031b5e0] max_analyze_duration 5000000 reached at 5034667 microseconds Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'Real_Sound(=2444.864).wav': Metadata: encoder : Lavf55.3.100 Duration: 00:40:44.86, bitrate: 768 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s Output #0, wav, to 'out_Real_Sound.wav': Metadata: ISFT : Lavf55.3.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le) Press [q] to stop, [?] for help size= 14153kB time=00:02:30.96 bitrate= 768.0kbits/s size= 28214kB time=00:05:00.94 bitrate= 768.0kbits/s size= 42475kB time=00:07:33.06 bitrate= 768.0kbits/s size= 56673kB time=00:10:04.51 bitrate= 768.0kbits/s size= 70780kB time=00:12:34.98 bitrate= 768.0kbits/s size= 85096kB time=00:15:07.68 bitrate= 768.0kbits/s size= 99353kB time=00:17:39.76 bitrate= 768.0kbits/s size= 113576kB time=00:20:11.47 bitrate= 768.0kbits/s size= 127875kB time=00:22:43.99 bitrate= 768.0kbits/s size= 142136kB time=00:25:16.11 bitrate= 768.0kbits/s size= 156406kB time=00:27:48.32 bitrate= 768.0kbits/s size= 170708kB time=00:30:20.88 bitrate= 768.0kbits/s size= 184961kB time=00:32:52.91 bitrate= 768.0kbits/s size= 199143kB time=00:35:24.19 bitrate= 768.0kbits/s size= 213275kB time=00:37:54.93 bitrate= 768.0kbits/s size= 227332kB time=00:40:24.87 bitrate= 768.0kbits/s size= 238993kB time=00:42:29.25 bitrate= 768.0kbits/s video:0kB audio:238993kB subtitle:0 global headers:0kB muxing overhead 0.000033%
To create a file with a duration 00:40:44.864 - tempo is working correct 00:42:29.279 (nearly 280ms):
$ ffmpeg -f lavfi -i "aevalsrc=sin(2*PI*t*440):d=2444.864:s=48k" 2444.864.wav ffmpeg version N-52458-gaa96439 Copyright (c) 2000-2013 the FFmpeg developers built on Apr 24 2013 22:24:12 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-libxvid --enable-zlib libavutil 52. 27.101 / 52. 27.101 libavcodec 55. 6.100 / 55. 6.100 libavformat 55. 3.100 / 55. 3.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 60.101 / 3. 60.101 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 3.100 / 52. 3.100 Input #0, lavfi, from 'aevalsrc=sin(2*PI*t*440):d=2444.864:s=48k': Duration: N/A, start: 0.000000, bitrate: 3072 kb/s Stream #0:0: Audio: pcm_f64le, 48000 Hz, mono, dbl, 3072 kb/s Output #0, wav, to '2444.864.wav': Metadata: ISFT : Lavf55.3.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_f64le -> pcm_s16le) Press [q] to stop, [?] for help size= 7070kB time=00:01:15.41 bitrate= 768.0kbits/s size= 14170kB time=00:02:31.14 bitrate= 768.0kbits/s size= 21234kB time=00:03:46.49 bitrate= 768.0kbits/s size= 28354kB time=00:05:02.44 bitrate= 768.0kbits/s size= 35470kB time=00:06:18.34 bitrate= 768.0kbits/s size= 42594kB time=00:07:34.33 bitrate= 768.0kbits/s size= 49738kB time=00:08:50.53 bitrate= 768.0kbits/s size= 56842kB time=00:10:06.31 bitrate= 768.0kbits/s size= 63962kB time=00:11:22.26 bitrate= 768.0kbits/s size= 71002kB time=00:12:37.35 bitrate= 768.0kbits/s size= 78112kB time=00:13:53.19 bitrate= 768.0kbits/s size= 85176kB time=00:15:08.54 bitrate= 768.0kbits/s size= 92306kB time=00:16:24.59 bitrate= 768.0kbits/s size= 99430kB time=00:17:40.58 bitrate= 768.0kbits/s size= 106554kB time=00:18:56.57 bitrate= 768.0kbits/s size= 113678kB time=00:20:12.56 bitrate= 768.0kbits/s size= 120794kB time=00:21:28.46 bitrate= 768.0kbits/s size= 127910kB time=00:22:44.37 bitrate= 768.0kbits/s size= 135030kB time=00:24:00.32 bitrate= 768.0kbits/s size= 142092kB time=00:25:15.64 bitrate= 768.0kbits/s size= 149204kB time=00:26:31.50 bitrate= 768.0kbits/s size= 156338kB time=00:27:47.60 bitrate= 768.0kbits/s size= 163466kB time=00:29:03.63 bitrate= 768.0kbits/s size= 170552kB time=00:30:19.22 bitrate= 768.0kbits/s size= 177674kB time=00:31:35.18 bitrate= 768.0kbits/s size= 184804kB time=00:32:51.24 bitrate= 768.0kbits/s size= 191918kB time=00:34:07.12 bitrate= 768.0kbits/s size= 199032kB time=00:35:23.00 bitrate= 768.0kbits/s size= 206140kB time=00:36:38.82 bitrate= 768.0kbits/s size= 213266kB time=00:37:54.83 bitrate= 768.0kbits/s size= 220322kB time=00:39:10.10 bitrate= 768.0kbits/s size= 227446kB time=00:40:26.09 bitrate= 768.0kbits/s size= 229206kB time=00:40:44.86 bitrate= 768.0kbits/s video:0kB audio:229206kB subtitle:0 global headers:0kB muxing overhead 0.000034%
$ ffmpeg -i 2444.864.wav -af "atempo=24000/25025" out.wav ffmpeg version N-52458-gaa96439 Copyright (c) 2000-2013 the FFmpeg developers built on Apr 24 2013 22:24:12 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-libxvid --enable-zlib libavutil 52. 27.101 / 52. 27.101 libavcodec 55. 6.100 / 55. 6.100 libavformat 55. 3.100 / 55. 3.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 60.101 / 3. 60.101 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 3.100 / 52. 3.100 [wav @ 00000000002cd6c0] max_analyze_duration 5000000 reached at 5034667 microseconds Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from '2444.864.wav': Metadata: encoder : Lavf55.3.100 Duration: 00:40:44.86, bitrate: 768 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s Output #0, wav, to 'out.wav': Metadata: ISFT : Lavf55.3.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le) Press [q] to stop, [?] for help size= 14186kB time=00:02:31.31 bitrate= 768.0kbits/s size= 28426kB time=00:05:03.21 bitrate= 768.0kbits/s size= 42583kB time=00:07:34.22 bitrate= 768.0kbits/s size= 56824kB time=00:10:06.11 bitrate= 768.0kbits/s size= 71060kB time=00:12:37.96 bitrate= 768.0kbits/s size= 85321kB time=00:15:10.08 bitrate= 768.0kbits/s size= 99582kB time=00:17:42.20 bitrate= 768.0kbits/s size= 113831kB time=00:20:14.19 bitrate= 768.0kbits/s size= 128083kB time=00:22:46.22 bitrate= 768.0kbits/s size= 142311kB time=00:25:17.98 bitrate= 768.0kbits/s size= 156481kB time=00:27:49.12 bitrate= 768.0kbits/s size= 170650kB time=00:30:20.26 bitrate= 768.0kbits/s size= 184844kB time=00:32:51.67 bitrate= 768.0kbits/s size= 199072kB time=00:35:23.43 bitrate= 768.0kbits/s size= 213271kB time=00:37:54.88 bitrate= 768.0kbits/s size= 227473kB time=00:40:26.38 bitrate= 768.0kbits/s size= 238995kB time=00:42:29.27 bitrate= 768.0kbits/s video:0kB audio:238995kB subtitle:0 global headers:0kB muxing overhead 0.000033%
comment:10 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
follow-up: 12 comment:11 by , 12 years ago
How do you recognise a time offset of 23 ms?
What effect does this have?
comment:12 by , 12 years ago
Replying to cehoyos:
How do you recognise a time offset of 23 ms?
What effect does this have?
Now error at 23ms, at other times may be longer 23ms (do not know how much more), especially when the length of ~ 2 hours
P.S. And for example A/V desync ~50ms already heard
comment:13 by , 12 years ago
atempo is never going to produce an exact output_duration=input_duration/tempo. However, there is an upper bound on the number of samples it may be off by. It is the closest power-of-two greater than or equal to (input_sample_rate/24), I think. So, approximately +/- 42ms.
comment:14 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
follow-up: 16 comment:15 by , 12 years ago
Also, when specifying tempo scale, 23976/25000 is more accurate than 24000/25025
comment:16 by , 12 years ago
Replying to pkoshevoy:
Also, when specifying tempo scale, 23976/25000 is more accurate than 24000/25025
Hmmm...thanks
For my test file:
for 24000/25025 => 00:42:29.257 (offset 23ms)
for 23976/25000 => 00:42:29.260 (offset 20ms)
comment:17 by , 12 years ago
Try a different file, it will probably be off by some other amount (unless both files end the same way, same credits music, etc...)
If the error is about 1 frame or less (at 24fps) I don't really consider it a bug. It's a limitation of the algorithm. The output from atempo is off a little pretty much all of the time (unless the input is something trivial, like silence). How much off the output is at any instance in time depends on the waveform at that time point. Overall mean error will be converging to 0.
comment:18 by , 12 years ago
Replying to pkoshevoy:
Try a different file, it will probably be off by some other amount (unless both files end the same way, same credits music, etc...)
On another file (too 00:40:44.864) => 00:42:29.266 (offset 14ms)
Replying to pkoshevoy:
If the error is about 1 frame or less (at 24fps) I don't really consider it a bug. It's a limitation of the algorithm. The output from atempo is off a little pretty much all of the time (unless the input is something trivial, like silence). How much off the output is at any instance in time depends on the waveform at that time point.
ОК
I tested the following - could you confirm if this is the problem you are seeing?
Encoding 24000 seconds (6:40:00) of audio with 48kHz:
Slowing audio down as proposed by you (6:57:05), the output size is ~1.5 seconds off for seven hours, indicating noticeable A/V desync for a two-hour movie (iiuc):