Context Navigation

#8578 closed defect (fixed)

FFmpeg 4.2 breaks Matroska streaming

Reported by:	Sesse	Owned by:
Priority:	normal	Component:	avformat
Version:	git-master	Keywords:	mkv regression
Cc:	andreas.rheinhardt@gmail.com	Blocked By:
Blocking:		Reproduced by developer:	no
Analyzed by developer:	no

Description

Hi,

I'm having an application that creates Matroska streams on-the-fly by reusing chunks between HTTP clients:

Set up a callback in avio->pb.
Call avformat_write_header(), store what is returned in variable.
Whenever a client connects, immediately send the contents of said variable.
For each packet to mux, call avformat_interleaved_write() as usual, and send the results to every connected client.

In other words, there might be intermediate packets from the start of the stream that a client won't be seeing, but this is fairly common (I believe e.g. VLC does the same thing). It has worked fine in FFmpeg for a few years, but after 4.2, clients now cannot decode anything:

[mkv] Invalid EBML length at position 716
[ffmpeg] Cannot seek backward in linear streams!
[mkv] Unexpected end of file (no clusters found)

I've bisected, and although the results are a bit fuzzy, I believe this is the commit that broke it:

commit add68dcca958f0f6b42cabea6f546ceae5c7f16d
Author: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Date:   Sat Apr 20 01:41:06 2019 +0200

    avformat/matroskaenc: Write CRC-32 in non-seekable mode
    
    Given that in both the seekable as well as the non-seekable mode dynamic
    buffers are used to write level 1 elements and that now no seeks are
    used in the seekable case any more, the two modes can be combined; as a
    consequence, the non-seekable mode automatically inherits the ability to
    write CRC-32 elements.
    
    There are no differences in case the output is seekable; when it is not
    and writing CRC-32 elements is disabled, there can still be minor
    differences because before this commit, the EBML ID and length field
    were counted towards the cluster size limit; now they no longer are.
    
    Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
    Signed-off-by: James Almer <jamrial@gmail.com>

I can't find any way to fix it with flags. I've tried setting "live" to 1, and I've tried setting "write_crc32" to 0 (both of them with av_dict_set()), but it doesn't seem to change anything.

Attachments (1)

0001-avformat-matroskaenc-Write-level-1-elements-in-one-g.patch (11.5 KB ) - added by mkver 5 years ago.: Patch to write level 1 elements in one go again for release 4.2

Download all attachments as: .zip

Change History (11)

comment:1 by Carl Eugen Hoyos, 5 years ago

Keywords:	mkv regression added; matroska removed
Version:	4.2 → git-master

Please confirm that the issue you see is reproducible with current FFmpeg git head.

comment:2 by Sesse, 5 years ago

Reproduced on c455a28a.

comment:3 by Sesse, 5 years ago

Martin Storsjö helped me debug this. The gist of it is: Earlier, there would be only one write callback per chunk. Now, there is two; one for the EBML ID (4 bytes) and one for the actual data (50–60 kB). Since the Matroska muxer doesn't use data markers (type in the callback, after the header has been written, is always AVIO_DATA_MARKER_UNKNOWN), my code assumes that the point right after the EBML ID is a valid starting point for the stream, whereas it isn't.

Worse yet, since there's a sizeable delay in wall clock time after the EBML ID is written, it's practically guaranteed that the client will indeed hit that point. If I add a hack to my code saying that

        if (buf_size == 4) type = AVIO_DATA_MARKER_SYNC_POINT;

then things instantly start working again.

comment:4 by mkver, 5 years ago

The "variable" returned by avformat_write_header() that you store and send to the client is the header written by avformat_write_header() and not the return value of avformat_write_header(), isn't it? (Yes, a stupid question, but I just want to be sure.)
The AVIOContext's is flagged as non-seekable (i.e. pb->seekable & AVIO_SEEKABLE_NORMAL is wrong), I presume?
Before the patchset that the cited commit belongs to was applied there were two interwoven codepaths for writing clusters (and other level 1 elements). Both assembled the clusters in memory via a dynamic buffer (an AVIOContext that you can open with avio_open_dyn_buf()) (let's call it dyn) before outputting via the real AVIOContext (called pb in the following).

a) The codepath for seekable output: Write the Cluster element ID via pb; also write Matroska's "unknown-length" length via pb (using the maximal length of 8 bytes for the length field). If a CRC-32 element is going to be written, reserve six bytes for it by writing an EBML-Void element in dyn. Then write content of the cluster as usual into dyn. When the cluster is to be finished, output the CRC-32 (if it is to be written) via pb and then write the whole content of dyn (containing the Cluster's content) with the exception of the bytes reserved for the CRC-32 (if any) via pb. Then seek back (in pb) to length field to update it (i.e. overwrite it with the real length that is now known) and seek back to the end of the Cluster again to write the next Cluster (or the Cues if this was the last one and Cues should be written at the end).
b) The codepath for unseekable output: Write the Cluster element ID into dyn; also write an "unknown-length" length field into dyn. Then write the Cluster into dyn (without writing CRC-32 elements, because that was just not supported by the implementation). When closing the Cluster, the length field was updated (requiring seeks but that is possible because we are seeking with dyn, not pb) and the whole Cluster has been output (i.e. sent to pb) in one avio_write().

Given that both codepaths relied on dynamic buffers they could be merged into one: Write Cluster element ID to pb; open dyn and reserve space for the CRC-32 in dyn (if it is to be written). Write Cluster content into dyn. When the Cluster is to be finished, write the correct length field for the Cluster (said size is now known!).* Then write the CRC-32 to pb (if ...) and then write the actual Cluster content (excluding the stuff at the beginning reserved for the CRC-32). This is what the patchset to which the cited patch belongs does. This patch is the very one that stopped the Cluster to be output in one go for nonseekable output; so your fuzzy result makes sense.
As it happens, I already have a patch that should fix this. It modifies the process as follows: Open dyn and reserve space for CRC-32 (if ...). Write Cluster content into dyn. When the Cluster is to be finished, write the Cluster element ID, the length field, the CRC-32 (if ...) and the Cluster content (excluding the part reserved for the CRC-32 element (if ...)). You can find it here; yet it is part of a longer patchset and you will also have to apply this patch, this patch, this patch, this patch, this patch, this patch, this patch and finally this patch. Can you test whether this indeed fixes your problem?

*: We save a few bytes here: Matroska (or actually EBML, the thing Matroska is built on) uses variable-length length-fields where smaller lengths take less bytes to encode and given that the length to write is known at this point the smallest possible length field is chosen.

comment:5 by mkver, 5 years ago

Now that you have mentioned the markers: They are broken: The check for the beginning of a new cluster should be "if (mkv->cluster_pos == -1)" and not "if (!mkv->cluster_pos)". Will send a patch.

by mkver, 5 years ago

Attachment:	0001-avformat-matroskaenc-Write-level-1-elements-in-one-g.patch added

Patch to write level 1 elements in one go again for release 4.2

comment:6 by mkver, 5 years ago

For your convenience I have added a patch for you to apply on top of release 4.2 that you can test. This is what will be backported if you confirm that it fixes the issue.

comment:7 by mkver, 5 years ago

Cc:	andreas.rheinhardt@gmail.com added

comment:8 by Sesse, 5 years ago

Hi,

Thanks for the patch! I'll test it out after work today.

comment:9 by Sesse, 5 years ago

The patch, applied to 4.2.2, fixes the problem.

comment:10 by mkver, 5 years ago

Resolution:	→ fixed
Status:	new → closed

Fixed in d9c21ec63999501afc39ffc26f363245e8624304, backported as 477275795865f074e60635d813a7b765284ca948.

Note: See TracTickets for help on using tickets.

Download in other formats: