Opened 11 years ago
Closed 11 years ago
#3363 closed defect (fixed)
ffprobe silently drops non-ASCII metadata in VQF files
Reported by: | Trejkaz | Owned by: | |
---|---|---|---|
Priority: | important | Component: | ffprobe |
Version: | git-master | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | no |
Description
Summary of the bug:
How to reproduce:
% ffprobe -show_format -show_streams -print_format json test.vqf % ffprobe -version ffprobe version N-60503-g28975cb-tessus built on Jan 28 2014 18:43:59 with llvm-gcc 4.2.1 (LLVM build 2336.1.00) configuration: --prefix=/Users/tessus/data/ext/ffmpeg/sw --as=yasm --extra-version=tessus --disable-shared --enable-static --disable-ffplay --enable-gpl --enable-pthreads --enable-postproc --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid --enable-libspeex --enable-bzlib --enable-zlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libxavs --enable-version3 --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvpx --enable-libgsm --enable-libopus --enable-libmodplug --enable-fontconfig --enable-libfreetype --enable-libass --enable-libbluray --enable-filters --enable-runtime-cpudetect libavutil 52. 63.100 / 52. 63.100 libavcodec 55. 49.101 / 55. 49.101 libavformat 55. 28.100 / 55. 28.100 libavdevice 55. 7.100 / 55. 7.100 libavfilter 4. 1.101 / 4. 1.101 libswscale 2. 5.101 / 2. 5.101 libswresample 0. 17.104 / 0. 17.104 libpostproc 52. 3.100 / 52. 3.100
[json @ 0x103000000] 1 invalid UTF-8 sequence(s) found in string 'Bl?mchen', replaced with
The value ffprobe emits is "Blchen".
The value it emitted before fixing #2502 was "Bl�mchen" (invalid character intentional) - which although containing an invalid character, at least retained all the valid characters. The current builds drop the "m" as well as the invalid character.
The value I would like to see, however, is "Blümchen".
If the issue is that the VQF module is doing something wrong to convert to Unicode, it would be good to get that fixed.
If the issue is that VQF is one of those legacy formats where the encoding isn't known, would it be possible to have some way to specify the system encoding? I can't just change the encoding of the entire system, because doing that in a cross-platform way is not really practical.
There is a sample exhibiting the issue in the mplayer samples:
Change History (4)
comment:1 by , 11 years ago
follow-up: 3 comment:2 by , 11 years ago
ffmpeg
outputs:
Input #0, vqf, from '/Users/trejkaz/Downloads/handinha.vqf': Metadata: title : Hand in Hand (Gewalt ist doof!) comment : http://bluemchen.koti.com.pl copyright : Edel Records GmbH filename : handinha.vqf author : Bl?mchen size : 300441
So it hasn't lost the character, but it has still mangled it.
comment:3 by , 11 years ago
Replying to trejkaz:
ffmpeg
outputs:
Input #0, vqf, from '/Users/trejkaz/Downloads/handinha.vqf': Metadata: title : Hand in Hand (Gewalt ist doof!) comment : http://bluemchen.koti.com.pl copyright : Edel Records GmbH filename : handinha.vqf author : Bl?mchen size : 300441So it hasn't lost the character, but it has still mangled it.
This depends on our UTF-8 decoding mechanism.
The '?' and the following character are interpreted as a single invalid UTF-8 sequence, and thus are consumed as a single "invalid" sequence. We could add a new flag for lazy decoding (starts from the second character if the whole sequence is invalid, which seems the system used by the terminal), or allow to set the text encoding.
comment:4 by , 11 years ago
Priority: | normal → important |
---|---|
Reproduced by developer: | set |
Resolution: | → fixed |
Status: | new → closed |
Version: | unspecified → git-master |
The ffprobe issue was a regression afaict.
Fixed by Michael in a31547ce
Is this not reproducible with
ffmpeg
?