Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1922 closed enhancement (invalid)

Broken or incomplete parser for filters

Reported by: burek Owned by:
Priority: wish Component: undetermined
Version: unspecified Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
The parser for filters seems to be broken or incomplete, because it doesn't correctly interpret some cases, for example:

-vf select='not(mod(n,100))'

has to be written like:

-vf select='not(mod(n\,100))'

I guess this is a workaround to solve some problematic parsing problems or something, but imho it should be fixed rather than be ignored or even moved to documentation as such.

A comma character does not need to be escaped, since it is inside the parenthesis, and it is obvious that it can't be a delimiter for another filter in the chain.

Change History (7)

comment:1 by Cigaes, 12 years ago

The graph parser does not know that parentheses must be nested. In order to take it into account, it would need to distinguish select=not(mod(n,100)) from drawtext=text=8-(,drawtext=8-). One of them would require quoting/escaping. Not doing a special case for the parentheses seems the most logical, I am not sure whether it is the most practical.

The other solution would be to have the graph parser aware of the syntax of the various filters. That is not impossible, it could check the class field, but that is rather complex to implement, and possibly not compatible with the principle of least surprise.

comment:2 by Stefano Sabatini, 12 years ago

Resolution: invalid
Status: newclosed

Closing as invalid, there is no bug in the parser. This might be confusing but there is nothing which can be fixed at the coding level. Note also that the usability issues are mentioned in the "Notes on filtergraph escaping" section in the filters manual.

comment:3 by burek, 12 years ago

@Cigaes: I'm not sure why any of those 2 need escaping? The only thing that is not logical is double '=' character, which can be solved the way VLC solved it, by surrounding the text "text=8-(,drawtext=8-)" with curly braces (or quotes/parentheses), so you would get:

drawtext={text=8-(,drawtext=8-)}

or

drawtext=(text=8-(,drawtext=8-))

Making the parser aware of this is as simple as implementing one of several well known algorithms (for which there is also a sample code available for most popular programming languages): http://stackoverflow.com/questions/4582398/writing-a-simple-equation-parser

I believe this won't be as complex as it looks like, because people have solved this kind of a problem a long time ago and it would be wise to not reinvent the wheel. Also, as soon as the parentheses thing is parsed correctly, you can place parentheses in most of expressions, to clarify the expression for reading and to make it simpler for parser to understand what did you want to say with that expression.

@saste: It maybe isn't a bug, but it's better to fix it to have a cleaner usage. People who use ffmpeg already have to think about shell escaping + this escaping because of the filter's inability to parse the expression correctly. I'm just suggesting that this ticket shouldn't be closed as fast as possible, just to lower the number of open/unresolved tickets, but rather should be discussed more to find the best way to fix this issue, because it is an issue and it should be fixed.

in reply to:  3 comment:4 by Cigaes, 12 years ago

Replying to burek:

@Cigaes: I'm not sure why any of those 2 need escaping?

You missed something: the second example is not a single filter, it is two filters, one printing the "8-(" smiley, the other printint the "8-)" smiley.

Making the parser aware of this is as simple as implementing one of several well known algorithms (for which there is also a sample code available for most popular programming languages): http://stackoverflow.com/questions/4582398/writing-a-simple-equation-parser

I believe this won't be as complex as it looks like, because people have solved this kind of a problem a long time ago and it would be wise to not reinvent the wheel.

It is not as simple as you make it: filters expect a flat string, and parse it themselves. More and more use the options system, but some use a custom parser.

Changing the way it works constitutes a big API change, and can not be disregarded as "as simple as" anything.

in reply to:  3 comment:5 by Stefano Sabatini, 12 years ago

Replying to burek:
[...]

@saste: It maybe isn't a bug, but it's better to fix it to have a cleaner usage. People who use ffmpeg already have to think about shell escaping + this escaping because of the filter's inability to parse the expression correctly. I'm just suggesting that this ticket shouldn't be closed as fast as possible, just to lower the number of open/unresolved tickets, but rather should be discussed more to find the best way to fix this issue, because it is an issue and it should be fixed.

If it isn't a bug (and it isn't since that's the expected parser behavior), then it should not be considered as such. Syntax enhancements are possible, but they would break the current behavior. The {} thing may be implemented at some point, but won't change the escaping problem since you would still need to escape some special character ("}" or ")"), so it would just move it to another level. Other solutions have been discussed recently on the mailing list, just don't believe that there is some "simple" silver-bullet solution. Also this is the kind of problem which can not be easily marked as fixed, since different persons perceive the problem in different ways.

comment:6 by burek, 12 years ago

Priority: normalwish
Type: defectenhancement

@Cigaes:
In the 1st filter you provided the "text" parameter and in the 2nd you didn't, hence the confusion. If this is your intended example:

drawtext=text=8-(,drawtext=text=8-)

then this would be a solution:

drawtext=(text="8-("),drawtext=(text="8-)")

When you have a proper parser, it is just a matter of assigning a proper precedence to make the quotes having the highest priority and everything inside quotes should be considered as a text (and in that case it is obvious and logical that you have to escape just a quote itself + escaping character, of course, and nothing more). That being said, this would be an example of it:

drawtext=(text="~!@#$%^&*()_+-=8-(\"'.|,\\<>"),drawtext=(text="8-)")

Also, regarding API changes, I have a feeling that the mistake was made a long time ago when the proper interface towards users/programmers wasn't defined, which later caused all the mess with those famous "API changes that break everything". The interface here would represent an analogy to a class in OOP, which has its internals hidden away from the users/other programmers and the class definition never changes (API never breaks), but internal implementation of the class changes constantly, improving things. Interfaces are usually used for such cases, so I believe that we have a much bigger issue here than just the filter parser(s). Also, I believe that time involved in designing a proper interface now (for future versions of ffmpeg) would save you a lot of time and headaches in the future and would finally allow you to focus on improvements, not taking much of a care about API breakages, like you do right now, which makes implementing/fixing a lot of features a nightmare.

@saste:
Well, ok, I've changed it to an enhancement, although it is an obvious malfunction of a basic parsing feature (I've not seen a single newbee user who figured out on his own that a comma character needs to be escaped, i.e. it's just not intuitive to them as it is to you, developers).

I, as a user, don't know much about internal coding problems of such a feature, but from the outside look, this seems like it's not working well and the first thing to think of is that it's a bug. What is really missing in the filters parser(s) is clarity, that's so obvious and that's the reason for creating this ticket. I just thought that parentheses could help reading someone else's command line, which involves complex filters, more easily than it is right now.

This logic could also be extended to ffmpeg command line in such a way that you can group your options with each input/output, like described here: https://ffmpeg.org/trac/ffmpeg/ticket/1480
That way, there would be no ambiguity, like when you place the -ss option in between 3rd and 4th input, and you never know if that option will be applied to the output (slow seek) or to the 4th input (fast seek).

Long story short, if you practice frequent workarounds instead of proper solutions, sooner or later you'll get into this kind of situation where you just can't change anything, because it will break something. Stacked pile of patches and workarounds will hold the water for some time, but on a long run, there really should be some radical improvement and API breakage, that will resolve some fundamental problems in the coding process.

in reply to:  6 comment:7 by Stefano Sabatini, 12 years ago

Replying to burek:

@saste:
Well, ok, I've changed it to an enhancement, although it is an obvious malfunction of a basic parsing feature (I've not seen a single newbee user who figured out on his own that a comma character needs to be escaped, i.e. it's just not intuitive to them as it is to you, developers).

It is not an "obvious malfunction", I already stated there is no bug in the parser (at least not related to escaping), also the implementation is behaving according to design. I already pointed out that there is no way to workaround this at this level.

You have several filter description of the kind:
FILTER,FILTER,FILTER

since you have some special characters (e.g. ',' to separate each FILTER), you need some way to mark such characters if they are contained in the filter description: the classic solution is escaping, and we adopt a rather standard de-escaping algorithm (no wheel reinvention here) to de-escape literal "," and other special chars contained within FILTER.

Then each FILTER can support a specific syntax (which depends on the filter, each filter has a separate parser for that), which in turn may need a second level of escaping.

There is nothing you can solve on the design level here, we need two levels of escaping, you can change the escaping mechanics but cannot eliminate the fact that two levels of escaping are needed.

I, as a user, don't know much about internal coding problems of such a feature, but from the outside look, this seems like it's not working well and the first thing to think of is that it's a bug. What is really missing in the filters parser(s) is clarity, that's so obvious and that's the reason for creating this ticket. I just thought that parentheses could help reading someone else's command line, which involves complex filters, more easily than it is right now.

We could adopt a different escaping algorithm or use different special characters, or make the special chars selectable, but still can't avoid the need for the double level escaping.

This logic could also be extended to ffmpeg command line in such a way that you can group your options with each input/output, like described here: https://ffmpeg.org/trac/ffmpeg/ticket/1480
That way, there would be no ambiguity, like when you place the -ss option in between 3rd and 4th input, and you never know if that option will be applied to the output (slow seek) or to the 4th input (fast seek).

Long story short, if you practice frequent workarounds instead of proper solutions, sooner or later you'll get into this kind of situation where you just can't change anything, because it will break something. Stacked pile of patches and workarounds will hold the water for some time, but on a long run, there really should be some radical improvement and API breakage, that will resolve some fundamental problems in the coding process.

I object this simplistic view, which also borrows the flawed assumption that "developers don't think hard when coding / designing", which is not justified in this specific instance. Provided that I didn't design the graphparser syntax myself, I don't consider it flawed, still it is not very suited if there is the need to embed literal strings for which there is the hardly avoidable need of understanding the escaping logic (this is not different from other areas where multiple escaping is required).

I'm all for trying to improve the usability if there are suitable solutions (which save backward compatibility), and tried to extend the docs with mention of the syntax issues, and I invite you to point out what could be improved/clarified.

I already proposed to adopt an alternative syntax (more suited for "file" scripting)). Another solution working at a different level would be to write a "visual" graph editor.

Note: See TracTickets for help on using tickets.