Opened 4 months ago

Last modified 2 weeks ago

#11096 new enhancement

Chaotic escaping rules

Reported by: ManDay Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords: filter_complex expression
Cc: MasterQuestionable Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by ManDay)

Consider a filter which draws the text "early", if the time is less than 5.0 seconds and "late", otherwise. The only reasonable syntax for a filter-chain to achieve this, with using only a single type of quotes, would be the following:

drawtext=text='%{e:if(lt(t,5),\'early\',\'late\')}'

At the different levels of parsing, we have

...the filter complex: drawtext=text='...'
...the text argument: %{...}
...the expansion: e:....
...the expr parser: if(lt(t,5),'early','late')

Where the ellipses ... are syntactically opaque to the respective receiver by their notion of quotes/scoping. There is logically no need for further escaping. In the current state of affairs, the above does not work and requires complicated and lengthy escaping to the point where I could not even figure out a working combination.

Please note

That further escaping outside of ffmpeg is not considered in the above description, as it seems irrelevant to the complexity in ffmpeg alone. If, however, we want to consider the above in an SH context, assume the entire argument to ffmpeg were quoted in "s, so that all occurences of ' remain unaltered.

Change History (10)

comment:1 by ManDay, 4 months ago

Description: modified (diff)

comment:2 by MasterQuestionable, 4 months ago

Cc: MasterQuestionable added

͏    The quoting havoc appears to be a general issue: non-exclusive to FFmpeg.
͏    (informatics and linguistics, overall)
͏    .
͏    Reverse Solidus Toothbrush: "\"\\\\\""

͏    Summary:
͏    How to properly represent the quoted content, without unjustifiable escaping havoc?

͏    I have working solutions optimal:
͏    Haven't yet found time to synthesize and publish the logic.

͏    However backward compatibility would also be of concern.
͏    (the very reason many poor designs become incorrigible)


͏    Note:
͏    The `'` alike cases are symmetric quoting:
͏    Where escaping must occur to include the quote character/symbol.
͏    (impossible to locate the boundary without, generally [1])
[1] Unless reserved certain symbol as terminator.

͏    See [ https://github.com/MasterInQuestion/Markup/blob/main/Markup%20Intro.htm ] for reference.

Last edited 4 months ago by MasterQuestionable (previous) (diff)

comment:3 by ManDay, 4 months ago

I don't see a general (conceptual) issue. Could you elaborate what you mean (I'm not a CStist and "Reverse Solidus Toothbrush" didn't bring up any results).

In principle, I think escaping and quoting follows a simple pattern, which is unrecognizable in FFMPEG:

Starting from the "top" (whole data), recursively define parsers which operate on the input and, by their respective syntactic elements, decompose the data into semantically related blocks, which are fed into "lower" parsers. These blocks are atomic/opaque to the parser at this level.
To prevent data from being (mis)interpreted as syntactic elements, two special syntactic elements are defined:

  • Quotes (prevent interpretation in their range)
  • Escapes (prevent interpretation immediately after)

In this case, one possibility to split the parsing could be:

At the "top" we have the filter-complex parser with escape \ and quote ' which parses drawtext=text='%{e:if(1,\'a\',\'b\')}' into blocks drawtext text %{e:if(1,'a','b')}. By their semantics, drawtext is used to create a filter to which the arguments of type pair[], here {(text,%{e:if(1,'a','b')})} are handed over.

At a lower level we have the drawtext parser for parameter names which parses text (trivial) and the drawtext text parser which parses %{e:if(1,'a','b')}. The drawtext text parser uses escape \ and no quotes (afaik). It yields the blocks: e and if(1,'a','b').

At the lowest level we have the expression parser with escape \ and quote '.

One thing we have been able to isolate during a short discussion on IRC is that, in the current implementation, the parser for the filter complex does not recognize the ' quotes; they are interpreted by "something else". And only by some other mechanic, such quotes will do prevent interpretation of , in cases such as overlay=x='lerp(1,2,0.5)', where otherwise they'd end the filter. So this looks like perhaps the first design issue to adress.

Last edited 4 months ago by ManDay (previous) (diff)

comment:4 by MasterQuestionable, 4 months ago

͏    Mere analogy for that toothbrush-looking absurdity... "unrecognizable" humanly.

͏    Simple to implement != Simple to use
͏    And such has resulted in unfixable havocs.


͏    Alike:
͏    https://trac.ffmpeg.org/wiki/WikiFormatting#SettingAnchors
͏    (compare "span" macro and "Image")

comment:5 by MasterQuestionable, 4 months ago

͏    Note your description has syntax error:
͏    Missing parenthesis of "if".

comment:6 by ManDay, 4 months ago

Description: modified (diff)

comment:7 by superbonaci, 4 months ago

Question: which characters need to be escaped in ;FFMETADATA1 files?

According to this https://ffmpeg.org/ffmpeg-utils.html#Quoting-and-escaping only the ' and \ need to be escaped, but according to https://ffmpeg.org/ffmpeg-formats.html#metadata it's =, ;, #, \ and a newline.

According to some tests we make in the project https://github.com/lay295/TwitchDownloader/pull/1161 only the backslash has to be escaped. The single quotes can be escaped or not, works with both the same.

Of course, the backslash at end of line to continue in a new line must not be escaped, we just mean the literal values we want into the output metadata.

Could you explain why each page has different values and how it works really, better with examples?

Last edited 4 months ago by superbonaci (previous) (diff)

comment:8 by MasterQuestionable, 3 months ago

͏    For everyone was confused... mostly.
͏    There had been the tradition of over-escaping disregard to make things "outright work".
͏    As in URI, RegEx, ...

͏    Straight answer:
͏    Whatever works doesn't need to be escaped.
͏    Though several relevant implementations have to be considered considering the interoperability.
͏    E.g.
͏    https://github.com/orgs/community/discussions/139005#discussioncomment-11010254
͏    https://github.com/orgs/community/discussions/142344

Last edited 2 weeks ago by MasterQuestionable (previous) (diff)

comment:9 by superbonaci, 3 months ago

Any chances of updating the docs?

comment:10 by MasterQuestionable, 3 months ago

͏    Unsure how to properly handle Texinfo...
͏    https://trac.ffmpeg.org/ticket/11044#comment:3

Note: See TracTickets for help on using tickets.