Opened 16 hours ago

Last modified 8 hours ago

#11461 open defect

Corruption of 608 captions data on extraction

Reported by: Zach Owned by:
Priority: important Component: avformat
Version: git-master Keywords: Captions, CC, EIA608
Cc: Marth64 Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Summary of the bug:
How to reproduce:

Extract:
ffmpeg.exe -f lavfi -i "movie=Ep.12_Origional_CaptionMaker.mp4"[out+subcc] -map 0:1 -c:s copy Ep.12_ffmpeg.scc

Import captions into Premiere or view data with notepad++ and compare to Ep.12_CaptionMaker.scc

ffmpeg-2025-02-06-git-6da82b4485-full_build
built on 2025-02-06

FFmpeg outputs an scc file with different formatting than what Premiere or CaptionMaker produce. The resulting file shows corrupted data when imported into premiere even though

https://drive.google.com/drive/folders/13lBxT4alNCJm60mXO5AKuTvCRxDWkyOe?usp=sharing

Here are the first 10 lines of the scc from Caption Maker which are similar in format to what Premiere outputs.

Scenarist_SCC V1.0

00:00:01:14	9420 9470 9723 d3ef 20c2 ec75 e520 cdef 756e f461 e96e 2054 e5ec e576 e973 e9ef 6e80 942c 8080 8080 942f

00:00:02:11	9420 9452 9723 e973 20ec efe3 61f4 e564 20e9 6e20 5761 ecec 6180 94f2 9723 5761 ecec 612c 2057 6173 68e9 6e67 f4ef 6eae 942c 8080 8080 942f

00:00:06:00	9420 9470 97a1 57e5 a7f2 e520 68e5 6164 942c e564 20f4 ef20 64ef 2073 ef6d e520 e6e9 ec6d e96e 6780 942c 8080 8080 942f

00:00:08:02	9420 94f2 97a2 e6ef f220 6120 68ef 6de5 f4ef f76e 2073 68ef f7ae 942c 8080 8080 942f

FFMPEG outputs almost every frame with only one 4 digit chunk per frame.

Scenarist_SCC V1.0

00:00:01:16	9420

00:00:01:17	94ae

00:00:01:18	9470

00:00:01:19	9723

Attachments (5)

EP.12_CaptionMaker.scc (89.4 KB ) - added by Zach 16 hours ago.
Origional 608 Captions from CaptionMaker
Ep.12_Premiere_DropFrame.scc (85.8 KB ) - added by Zach 16 hours ago.
Premiere export (sourced from CaptionMaker mcc file) drop frame timecode
Ep.12 AIM Dave & Rosa 2.mcc (2.2 MB ) - added by Zach 16 hours ago.
Origional captions from Caption Maker
Ep.12_Premiere_NonDropFrame.scc (85.8 KB ) - added by Zach 16 hours ago.
Premiere export (sourced from CaptionMaker mcc file) non drop frame timecode
Ep.12_ffmpeg.scc (278.8 KB ) - added by Zach 16 hours ago.
Output from the referenced ffmpeg command, slightly different format that is not compatible with premiere.

Change History (10)

by Zach, 16 hours ago

Attachment: EP.12_CaptionMaker.scc added

Origional 608 Captions from CaptionMaker

by Zach, 16 hours ago

Premiere export (sourced from CaptionMaker mcc file) drop frame timecode

by Zach, 16 hours ago

Attachment: Ep.12 AIM Dave & Rosa 2.mcc added

Origional captions from Caption Maker

by Zach, 16 hours ago

Premiere export (sourced from CaptionMaker mcc file) non drop frame timecode

by Zach, 16 hours ago

Attachment: Ep.12_ffmpeg.scc added

Output from the referenced ffmpeg command, slightly different format that is not compatible with premiere.

comment:1 by Marth64, 16 hours ago

Cc: Marth64 added
Component: ffmpegavformat
Keywords: CC EIA608 added
Priority: criticalimportant
Reproduced by developer: set
Status: newopen

I have reproduced this same issue before, SubtitleEdit will also fail to read this output. I suspect (but have not had time to prove) that it might be an issue with the SCC muxer

comment:2 by Zach, 15 hours ago

The errors start on caption numbers 10-12 starting at 00:00:28:21

I also tested placing a video filter "-vf fps=30000/1001 -timecode "00:00:00;00"" to try to force drop frame timecode. This had no effect on whether premiere could import. I am open to other ways to test and identify where the bug is.

in reply to:  1 comment:3 by Zach, 11 hours ago

I was able to get a working .scc file from the output of ffmpeg by the following modifications in notpad++ with these two steps.

1: Find and replace all \n with \r\n (to match working files, not sure if it matters)

2: Find and replace all with regex:

\r\n\r\n.{12}(?!9420)

) with a space character

I noticed that in the files from other programs that are working, almost every line starts with 9240 which is the resume caption loading command for channel cc1 and 94ae which is the erase non displayed memory buffer code. 9240 might only apply to pop on captions so a bit of research might need to be done.

9240 Resume caption loading (pop up style) (not frame sensitive)
94ae Clear Buffer (not frame sensitive)
942f End of Caption (display on screen) (Frame sensitive)
942c Erase Caption (remove caption from screen) (Frame Sensitive)

Obviously the end of caption and erase caption commands must happen on a specific timecode to preserve the timing of the captions track.

I believe what is happening here is that ffmpeg is generating an .scc file with non drop frame time code as evidenced by a colon instead of semicolon for the frame field in the timecode. Since the caption codes are distributed individually on each frame of video, there is guaranteed to be caption data on the frame numbers that are absent from a dropped frame timeline.

Premiere or SubTitleEdit will then import the .scc file and align by timecode. Any data stored on the dropped frames is destroyed and the caption track becomes corrupted.

Required Information:
Identify frame sensitive codes to make sure the time code is not modified:
Identify start codes to assist in grouping caption codes
Identify the max number of captions codes per frame

Solution: Group captions codes sufficiently to avoid captions codes on dropped frame numbers while following a rule set to prevent altering the caption display times.

Sources:
http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML

http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML#ccd

https://docs.inqscribe.com/2.2/format_scc.html

Last edited 8 hours ago by Zach (previous) (diff)

comment:4 by Zach, 10 hours ago

Care needs to be taken to make sure that the various components of ffmpeg do not alter the embedded frame for captions data display codes unless absolutely required, then only a maximum of 1 frame.

comment:5 by Zach, 8 hours ago

It should also be possible to have much fewer timecode entries for the .scc file output as long as the proper padding is used to adjust nonconsecutive time code lines such as the following

Scenarist_SCC V1.0

00:00:02:04	942c

00:00:02:07	942f

This would be equivilent to the following file.

Scenarist_SCC V1.0

00:00:02:04	942c 8080 8080 942f

Note: See TracTickets for help on using tickets.