Opened 11 years ago
Last modified 8 years ago
#3118 open enhancement
SAMI: multiple languages not detected
Reported by: | eelco | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avformat |
Version: | git-master | Keywords: | |
Cc: | nfxjfg@googlemail.com | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
SAMI files can contain multiple languages, but handles the file as containing a single stream with no way to filter only one language.
How to reproduce:
./ffmpeg -i multiple_languages.smi out.srt ffmpeg version N-57932-g89a3be8 Copyright (c) 2000-2013 the FFmpeg developers built on Nov 5 2013 16:30:18 with Apple LLVM version 5.0 (clang-500.2.78) (based on LLVM 3.3svn) configuration: --prefix=/Users/eelco/Projects/Beamer/FFmpeg/build --disable-shared libavutil 52. 52.100 / 52. 52.100 libavcodec 55. 41.100 / 55. 41.100 libavformat 55. 21.100 / 55. 21.100 libavdevice 55. 5.100 / 55. 5.100 libavfilter 3. 90.102 / 3. 90.102 libswscale 2. 5.101 / 2. 5.101 libswresample 0. 17.104 / 0. 17.104 Input #0, sami, from 'multiple_languages.smi': Duration: N/A, bitrate: N/A Stream #0:0: Subtitle: sami Output #0, srt, to 'out.srt': Metadata: encoder : Lavf55.21.100 Stream #0:0: Subtitle: subrip Stream mapping: Stream #0:0 -> #0:0 (sami -> subrip) Press [q] to stop, [?] for help size= 38kB time=00:11:43.56 bitrate= 0.4kbits/s video:0kB audio:0kB subtitle:23 global headers:0kB muxing overhead 63.508757%
The input file (multiple_languages.smi) defines the different language in the ‘style sheet’:
... <STYLE TYPE="text/css"> <!-- P { margin-left:2pt; margin-right:2pt; margin-bottom:1pt; font-size:20pt; text-align:center; font-weight:bold; color:white; } .ENCC { Name:English; lang:en-US; SAMIType:CC; } .KRCC { Name:한국어; lang:ko-KR; SAMIType:CC; } --> </STYLE> ...
And uses the classes to mark the language:
... <SYNC Start=10109><P Class=KRCC> <br>사랑과 배신<br>탐욕과 살육의 이야기죠 <SYNC Start=13977><P Class=KRCC> <SYNC Start=17667><P Class=KRCC> <br>선악의 정의에 대해서<br>대립하는 가치관을 가진 ...
The output however, mixes both languages:
... 4 00:00:10,109 --> 00:00:13,979 There is love and betrayal, greed and murder. 5 00:00:17,667 --> 00:00:17,667 선악의 정의에 대해서 대립하는 가치관을 가진 6 00:00:17,667 --> 00:00:21,717 It's set in this interesting world of contrasting ideology, ...
Attachments (3)
Change History (10)
by , 11 years ago
Attachment: | multiple_languages.smi added |
---|
by , 11 years ago
comment:1 by , 11 years ago
Component: | undetermined → avformat |
---|---|
Status: | new → open |
Type: | defect → enhancement |
Version: | unspecified → git-master |
follow-up: 5 comment:2 by , 11 years ago
The different languages should really handled as separate subtitle tracks.
This is also how they're meant to be used AFAIK.
Note that SAMI has other issues, and the ffmpeg SAMI decoder is apparently completely unusable for Korean users.
comment:3 by , 11 years ago
Cc: | added |
---|
comment:4 by , 8 years ago
I very often get smi files with several languages in it, and it is very annoying because our only solution right now to view them well with media players based on ffmpeg is to edit and delete the unwanted language. It would be good to have multi-lang support.
Here a link to the spec which explains how this works: https://msdn.microsoft.com/en-us/library/ms971327.aspx
comment:5 by , 8 years ago
Replying to gjdfgh:
Note that SAMI has other issues, and the ffmpeg SAMI decoder is apparently completely unusable for Korean users.
By the way about this specific sentence. Maybe it was true 3 years ago when this was written (though not sure about this either, because for a dozen years, all my media players of choice were based on ffmpeg and I have read Korean subtitles for many years). But currently, apart from the fact that I have to edit out the English part of SAMI files before playing, SAMI subtitles work ok. I play videos on mpv (hence with ffmpeg) with SAMI subtitles in Korean nearly every day (with a native Korean person reading the said subtitles).
by , 8 years ago
Attachment: | 0001-avformat-basic-language-support-in-SAMI-subtitles.patch added |
---|
Basic language support in SAMI files.
comment:6 by , 8 years ago
So I have been having a look at the code for SAMI support and I propose this first basic version. It is not yet a full support which extracts several subtitle tracks from a single SAMI. Instead it simply extracts only the subtitles of the default language (the first language in the list, cf. section "Class and Localization" in the spec).
Not perfect yet still better than the current situation.
comment:7 by , 8 years ago
Please send your patch to the development mailing list, it will be ignored here.
As said above, the lang is set using a CSS property applied using classes.
Currently, the SAMI demuxer just copies the CSS stylesheet into the extradata, and the decoder ignores it.
Handling styled text is still an open issue.