Opened 5 years ago
Last modified 5 years ago
#7970 new enhancement
Guess character encoding of ID3v1 tags
Reported by: | Jyrki Vesterinen | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avformat |
Version: | git-master | Keywords: | id3v1 |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
This ticket is essentially a variant of #7203. #7203 was closed because the file in that ticket uses ID3v2 tags and the tag in question was encoded as Windows-1251 but tagged as ISO-8859-1.
However, as far as I can tell, ID3v1 has no way to specify the character encoding. FFmpeg simply assumes UTF-8, and it's not always right. The attached file has the "artist" field encoded as Shift-JIS. Its value is すずき けいこ.
It would be great if FFmpeg attempted to heuristically detect character encoding of ID3v1 tags.
Attachments (1)
Change History (2)
by , 5 years ago
Attachment: | shift_jis_artist_name.mp3 added |
---|
comment:1 by , 5 years ago
Note:
See TracTickets
for help on using tickets.
I looked around as to how much effort it would be to implement such heuristics. If it was easy enough (e.g. by utilizing an existing function somewhere), I could implement it myself and send a pull request.
It looks like FFmpeg doesn't currently have automatic character set detection anywhere. See #4054 for related discussion.
The mpv media player uses the uchardet library for character set detection: https://github.com/mpv-player/mpv/blob/c9e7473d67893d9248bedf63530a1e0325a3036a/misc/charset_conv.c#L136
It seems that implementing this would require pulling in a new library, either uchardet or something else. Such a change would be much larger than what I'm willing to do.