#8466 closed defect (fixed)
av_url_split violates RFC2396, fails to parse URLs
Reported by: | Jacob | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avformat |
Version: | unspecified | Keywords: | http |
Cc: | Marton Balint | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
(Slightly related ticket: #7871)
Summary of the bug:
This took a long time to track down, but the exact file and problem has now been identified.
The bug is in "libavformat/utils.c", function "av_url_split".
When given a URL that conforms to RFC2398, the 1998 URL standard, FFmpeg incorrectly parses the URL and fails to download the file.
This is because FFmpeg's parser looks for the earliest slash after the domain name, by assuming that there must be a slash after the domain name, before the query string, which is not true. The standard says that the slash is totally optional (https://stackoverflow.com/a/42193734/8874388).
How to reproduce:
ffmpeg version 4.2.1 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 9.1.1 (GCC) 20190807 INCORRECTLY PARSED BY FFMPEG: % ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key=" [https @ 000002308f499b00] HTTP error 400 Bad Request CORRECTLY PARSED (VIA USER MANUALLY ADDING A SLASH): % ffmpeg -i "https://key.vscdns.com/?key_id=2&model_id=991011&&access_key=" [https @ 00000262520c9b00] HTTP error 403 Forbidden
It affects any URL where the query string is immediate. Which means that it affects direct media links, M3U8/HLS playlists (where the user has no control over the URLs and no way to fix it themselves), etc.
This is problematic because the ".com?" format is entirely up to the streaming media provider, and cannot be affected by the user. Any streaming playlist that uses a field such as "EXT-X-KEY" with such a URL (as in my example URL) will fail completely in FFmpeg and there's nothing the user can do about it. So it is imperative that FFmpeg understands RFC2396 and allows the slash to be optional.
I looked at the parsing code and it seems quite simple to fix. It currently looks for the first "/" slash and "?" question mark. The fix would probably be to say "If a question mark was found, and it is earlier than the first slash, or no slash was found at all" then set the pointers so that it treats the "?" as the start of the query and the character immediately before that as the end of the domain name.
PS: I have set the same priority as the earlier URL parsing error ticket I referred to above. Feel free to lower the priority.
Change History (11)
comment:1 by , 5 years ago
Keywords: | RFC2396 parsing removed |
---|---|
Priority: | important → normal |
Version: | 4.2 → unspecified |
comment:2 by , 5 years ago
I am looking at the bugged code in question:
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/utils.c
(In function "av_url_split".)
It uses "strchr" to scan for a slash and a question mark. The strchr function returns NULL if the character is not found, else it returns a pointer to where the character was found.
4806 /* separate path from hostname */ 4807 ls = strchr(p, '/'); 4808 ls2 = strchr(p, '?'); 4809 if (!ls) 4810 ls = ls2; 4811 else if (ls && ls2) 4812 ls = FFMIN(ls, ls2);
"ls" is slash and "ls2" is question mark.
As you can see, it works as follows:
- if "ls" not found, set "ls" to "ls2" (query location)
- else if "ls" and "ls2" both found, set "ls" to the lowest of "ls" or "ls2"
This seems to be correct and to properly understand that URLs can start with a query string.
So I assume that the bug is further down in the function, where it splits the URI string based on the pointers it found...
There are many potential reasons for the bug:
- Perhaps "ls" must be set to the character BEFORE "ls2" whenever "ls" was not found. Currently it's setting "ls" to the same position as "ls2" (the question mark).
- Perhaps the splitting code further down needs to understand that the path can be completely empty.
- Perhaps the splitting code needs to generate a "/" path when the path is empty, so that the generated URL in the HTTP GET request that is sent to the server will be "/?key=..." instead of "?key=..."
The latter is the most likely cause. It probably generates absolutely nothing as path when given an URL such as this, and then sends a HTTP/1.1 GET "?key=..." to the server, which the server in turn fails to understand, since the HTTP/1.1 GET protocol probably demands a leading slash, whereas the URL standard does not.
If that's the cause, then there are two potential areas to fix this:
- Either in the function "av_url_split", where we can make it generate a default "/" path whenever no path was given.
- Or in the HTTP connection library (the one generating the HTTP/1.1 GET request), where we can make it request "/..." whenever the given request path "..." doesn't begin with a slash
comment:3 by , 5 years ago
Alright. Here's the latest Git build's output:
INCORRECTLY PARSED BY FFMPEG: $ ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key=" ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 9.2.1 (GCC) 20191125 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf libavutil 56. 38.100 / 56. 38.100 libavcodec 58. 65.102 / 58. 65.102 libavformat 58. 35.101 / 58. 35.101 libavdevice 58. 9.103 / 58. 9.103 libavfilter 7. 70.101 / 7. 70.101 libswscale 5. 6.100 / 5. 6.100 libswresample 3. 6.100 / 3. 6.100 libpostproc 55. 6.100 / 55. 6.100 [https @ 00000229894b92c0] HTTP error 400 Bad Request https://key.vscdns.com?key_id=2&model_id=991011&&access_key=: Server returned 400 Bad Request MANUALLY REWRITTEN TO MAKE FFMPEG UNDERSTAND: $ ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key=" ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 9.2.1 (GCC) 20191125 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf libavutil 56. 38.100 / 56. 38.100 libavcodec 58. 65.102 / 58. 65.102 libavformat 58. 35.101 / 58. 35.101 libavdevice 58. 9.103 / 58. 9.103 libavfilter 7. 70.101 / 7. 70.101 libswscale 5. 6.100 / 5. 6.100 libswresample 3. 6.100 / 3. 6.100 libpostproc 55. 6.100 / 55. 6.100 [https @ 00000205c0b592c0] HTTP error 403 Forbidden https://key.vscdns.com/?key_id=2&model_id=991011&&access_key=: Server returned 403 Forbidden (access denied)
My previous reply above has located the bug to both the URL parser (it can be fixed in there), and the HTTP GET connection library (it can be fixed there too). Where it's fixed doesn't matter, and it would be most trivial to fix it via the URL parser (av_url_split).
comment:4 by , 5 years ago
The 2nd (manually rewritten) example above accidentally copied the command line part of the 1st example. That's what happens when you do a quick copy... sigh.
Here's the used commands, yet again:
INCORRECTLY PARSED BY FFMPEG: $ ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key=" ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 9.2.1 (GCC) 20191125 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf libavutil 56. 38.100 / 56. 38.100 libavcodec 58. 65.102 / 58. 65.102 libavformat 58. 35.101 / 58. 35.101 libavdevice 58. 9.103 / 58. 9.103 libavfilter 7. 70.101 / 7. 70.101 libswscale 5. 6.100 / 5. 6.100 libswresample 3. 6.100 / 3. 6.100 libpostproc 55. 6.100 / 55. 6.100 [https @ 00000229894b92c0] HTTP error 400 Bad Request https://key.vscdns.com?key_id=2&model_id=991011&&access_key=: Server returned 400 Bad Request MANUALLY REWRITTEN TO MAKE FFMPEG UNDERSTAND: $ ffmpeg -i "https://key.vscdns.com/?key_id=2&model_id=991011&&access_key=" ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 9.2.1 (GCC) 20191125 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf libavutil 56. 38.100 / 56. 38.100 libavcodec 58. 65.102 / 58. 65.102 libavformat 58. 35.101 / 58. 35.101 libavdevice 58. 9.103 / 58. 9.103 libavfilter 7. 70.101 / 7. 70.101 libswscale 5. 6.100 / 5. 6.100 libswresample 3. 6.100 / 3. 6.100 libpostproc 55. 6.100 / 55. 6.100 [https @ 00000205c0b592c0] HTTP error 403 Forbidden https://key.vscdns.com/?key_id=2&model_id=991011&&access_key=: Server returned 403 Forbidden (access denied)
follow-up: 6 comment:5 by , 5 years ago
Cc: | added |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Fixed in a3d8875df1f71e62a36c583cee638ab1d51a71d3.
comment:6 by , 5 years ago
Replying to cus:
Fixed in a3d8875df1f71e62a36c583cee638ab1d51a71d3.
Hi Marton, I saw your patches below:
https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=365b817b51630447305f49a4e2f79ab8ad842473
https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=554576b6cfe79a91d37e14d3617ca417562085db
https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b
https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=a3d8875df1f71e62a36c583cee638ab1d51a71d3
I've also reviewed them now and have to say that you have done a brilliant job! The new URL parser code is so much cleaner and safer, and you noticed that hashmark support was also needed (slash, question mark and hashmark all being supported by RFC3986).
Really great work, thank you so much! :-)
Best Regards,
follow-up: 8 comment:7 by , 5 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Oh and it was great to see that you even made a sanitizer for the connection algorithm ("http_open_cnx_internal") which auto-adds a slash if there was no slash in the input URL given to the function.
I'm actually re-opening this to ask a question, though, for other reviewers to consider:
https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=a3d8875df1f71e62a36c583cee638ab1d51a71d3
Note how the code is adding a "/" slash if the first character of the path is a question mark. Is there any problem if the URL is "https://example.com#foo"?
It looks to me like the hashmark check a few lines earlier ( https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b ) puts a null byte at the hashmark position to "erase the hashmark and everything after it" already, so I guess there is no problem! Just asking to be sure. :-)
follow-up: 9 comment:8 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Replying to aitte:
Note how the code is adding a "/" slash if the first character of the path is a question mark. Is there any problem if the URL is "https://example.com#foo"?
It looks to me like the hashmark check a few lines earlier ( https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b ) puts a null byte at the hashmark position to "erase the hashmark and everything after it" already, so I guess there is no problem! Just asking to be sure. :-)
Yeah, that should be fine as well, empty path (it becomes empty after removing the hashmark) was converted to / even before this patchset.
comment:9 by , 5 years ago
Replying to cus:
Replying to aitte:
Note how the code is adding a "/" slash if the first character of the path is a question mark. Is there any problem if the URL is "https://example.com#foo"?
It looks to me like the hashmark check a few lines earlier ( https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b ) puts a null byte at the hashmark position to "erase the hashmark and everything after it" already, so I guess there is no problem! Just asking to be sure. :-)
Yeah, that should be fine as well, empty path (it becomes empty after removing the hashmark) was converted to / even before this patchset.
Hey, thanks again, I really almost can't believe how much cleaner the new URL code is. The old one you replaced (ls, ls2 system) was a convoluted mess. Hehe. This change is truly brilliant: https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=554576b6cfe79a91d37e14d3617ca417562085db
And ahh yeah I see what you mean now about the empty path converting to "/", at this area of the code:
https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/http.c;h=d0df1b6d585da422aef14eb6128eb319ddd9ac98;hb=a3d8875df1f71e62a36c583cee638ab1d51a71d3#l220
So "http://example.com#tag" would be split via "av_url_split()" into "http", "example.com" and "#tag" (path1). Then path1 variable gets truncated to "\0tag", and then it says "if (path1[0] == '\0') { path = "/"; }" which means it should properly be converting the path to "/" when there was just a hashmark after the domain name.
And if the path is non-empty but starts with "?", your new code formats it as "/?" properly. And lastly, if the path is normal and starts with "/" already then it's used as-is.
Alright this looks perfect.
You've done a fantastic job!
- Cleaned up the buggy, messy av_url_split, wohoo!
- Added support for "example.com?x" and "example.com#x" URLs in the URL splitter.
- Properly removing hashmark fragments from paths when connecting to target URLs. That would have caused problems in the old code which kept them. Great catch!
- Added support for auto-inserting "/" when connecting to target URLs that didn't begin their path with a slash already.
This looks like a good implementation of RFC3986. Thank you so much for the great work!
:-)
comment:10 by , 5 years ago
I see that you did a lot more commits that I didn't notice earlier. You've cleaned up the HTTP and FTP connection library too, so that they properly handle special characters and escaping. Wow. Very nice works, thanks Marton!
https://git.videolan.org/?p=ffmpeg.git;a=commit;h=86b2fe941126fd4fee86f2aca91b43019b7b0c5f
https://git.videolan.org/?p=ffmpeg.git;a=commit;h=6a7b5226e1c868fe6406b114e7303c70d886900b
https://git.videolan.org/?p=ffmpeg.git;a=commit;h=8224f1e0ba65c064cf0fa2e11e4c688501b6eddd
https://git.videolan.org/?p=ffmpeg.git;a=commit;h=3004ef1b1b1bcd6bec4ad3509662ab1a4b644149
https://git.videolan.org/?p=ffmpeg.git;a=commit;h=f204a38e08f937f6204bb21a3d95a23089679fe0
https://git.videolan.org/?p=ffmpeg.git;a=commit;h=04f1d49709dac2d0e35f54bbe49cf00ba632e6dd
https://git.videolan.org/?p=ffmpeg.git;a=commit;h=6f2e38990caff7ae8cb07675417fbeb9eb1a9f0e
comment:11 by , 4 years ago
This is now live in the latest stable release! FFmpeg 4.3 Stable brought this fix to the masses. Thank you so much again, for your great work @cus!
Please test current FFmpeg git head and provide the command line you tested together with the complete, uncut console output to make this a valid ticket.