Opened 11 years ago
Closed 8 years ago
#3651 closed enhancement (wontfix)
UT Video Codec is inefficient compared to libutvideo
Reported by: | Zerowalker | Owned by: | |
---|---|---|---|
Priority: | wish | Component: | avcodec |
Version: | git-master | Keywords: | utvideo |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | no |
Description
Summary of the bug:
Not really sure if i am supposed to write these things here, as it's not really a bug, but here goes.
LAV Filter use ffmpeg for Decoding, and hence i direct this here.
The performance in decoding Lagarith and UT Video Codec are extremely bad, most of the time it's over 100% slower.
Originally i thought it was faster, either i have been mistaken or something has happened.
However, worth noting, Lagarith is limited to 2 threads in it's original decoder, however comparing the performance make this insignificant as ffmpeg will use more and still not be at the same pace.
How to reproduce:
Pretty sure you can just use:
ffmpeg -i "lagarith.avi" -o "Raw.avi"
so just decode a lagarith file to raw, and you will see the performance, than compare it to using the original decoder.
Change History (37)
comment:1 by , 11 years ago
Priority: | normal → wish |
---|
comment:2 by , 11 years ago
What command should i run?
Can't really do "rawvideo" as the HDD will be the bottleneck then.
comment:3 by , 11 years ago
Keywords: | utvideo added |
---|
You reported that the FFmpeg decoders are slower than the original decoders: How do you know and how much slower are they?
comment:4 by , 11 years ago
I don't know the precise number, but i have used LAV Filter which uses ffmpeg to Decode, hence how i could compare the playback performance simply watching CPU Usage.
comment:5 by , 11 years ago
Reproduced by developer: | set |
---|---|
Status: | new → open |
Summary: | Lagarith and UT Video Codec has Bad Performance → UT Video Codec is inefficient compared to libutvideo |
Version: | unspecified → git-master |
While the native utvideo decoder is significantly faster here than libutvideo (because it uses eight threads but libutvideo only four afaict) I can confirm that the libutvideo decoder is much more efficient (takes less CPU cycles for the same input).
comment:6 by , 11 years ago
Hmm, this is very weird.
For example, if i playback an UT Video Codec video of a certain caliber, it will take about 50% CPU usage for me (i got 4 core, so 2 x 100%), while the original decoder only uses about 1 core.
The same goes for everything i tried, the original decoder won by quite a large difference, same goes for Lagarith.
For me that should mean it's faster at the same threads, in pure efficiency right?
As i can't even use 8 threads in an optimal way as i only got 4 cores with No-HT.
Thanks
comment:7 by , 11 years ago
As said, please post numbers.
If you don't want to post any numbers, please accept my tests.
comment:8 by , 11 years ago
But i don't know what numbers you want, what commands do you want me to run?
If possible, i would like to to try LAV Filter to decode, and also the original decoder, and see the CPU Usage.
Thanks
comment:9 by , 11 years ago
I tested with the following file:
$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers built on May 24 2014 11:12:43 with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' libavutil 52. 86.100 / 52. 86.100 libavcodec 55. 64.100 / 55. 64.100 libavformat 55. 40.100 / 55. 40.100 libavdevice 55. 13.101 / 55. 13.101 libavfilter 4. 5.100 / 4. 5.100 libswscale 2. 6.100 / 2. 6.100 libswresample 0. 19.100 / 0. 19.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.40.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf55.40.100 Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc Metadata: encoder : Lavc55.64.100 rawvideo Stream mapping: Stream #0:0 -> #0:0 (utvideo -> rawvideo) Press [q] to stop, [?] for help [null @ 0xb1fca40] Encoder did not produce proper pts, making some up. frame= 1500 fps=162 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000% bench: utime=69.881s bench: maxrss=71984kB real 0m9.276s user 1m9.883s sys 0m0.707s
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers built on May 24 2014 11:17:02 with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl libavutil 52. 86.100 / 52. 86.100 libavcodec 55. 64.100 / 55. 64.100 libavformat 55. 40.100 / 55. 40.100 libavdevice 55. 13.101 / 55. 13.101 libavfilter 4. 5.100 / 4. 5.100 libswscale 2. 6.100 / 2. 6.100 libswresample 0. 19.100 / 0. 19.100 libpostproc 52. 3.100 / 52. 3.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.40.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf55.40.100 Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc Metadata: encoder : Lavc55.64.100 rawvideo Stream mapping: Stream #0:0 -> #0:0 (libutvideo -> rawvideo) Press [q] to stop, [?] for help [null @ 0xac54c00] Encoder did not produce proper pts, making some up. frame= 1500 fps=133 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000% bench: utime=40.733s bench: maxrss=24320kB real 0m11.317s user 0m40.739s sys 0m2.059s
follow-up: 11 comment:10 by , 11 years ago
Is it possible to make a benchmark using the same file?
You see i got a file that's UT Video RGB.
and with LAV Filter it uses a bit above 50% for it's 30fps speed.
With UT Video Codec it uses 25% all the time.
LAV Filter is possibly faster, but but at twice the cost for me.
(25% = 1 core)
If both use the same amount of threads, won't LAV lose by quite the amount for you as well?
comment:11 by , 11 years ago
Replying to Zerowalker:
Is it possible to make a benchmark using the same file?
Isn't this what I have done?
Or do I misunderstand?
You see i got a file that's UT Video RGB.
That is what I tested or am I missing something?
Do you agree that in my comment:5 and my comment:9 I confirm your original post or is there still something unconfirmed about your utvideo issue?
As said the lagarith issue is different and while performance improvements are always possible, the (non-portable and non-future-proof) original decoder will probably always be faster than our (portable and future-proof) lagarith decoder.
follow-up: 13 comment:12 by , 11 years ago
Oh, wait so you are saying.
Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)
Lagarith is slower cause of some issue, which i don't understand, which sadly is quite bad as it's A Lot slower, but as you understand what i mean and still see the issue, then i can't but agree it's not possible for it to be better at it's current state.
If this is correct, then that indeed confirms my issue,
only thing left would be a suggestion to improve the Lagarith decoder anymore if possible to make it closer to the original, but that is probably something you will do if you find the time and worth anyway:)
Thanks
follow-up: 17 comment:13 by , 11 years ago
Replying to Zerowalker:
Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)
Isn't that what I wrote in comment:5?
Lagarith is slower cause of some issue, which i don't understand
The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any (non-broken) C compiler and you will always get correct (bitexact) output. Fixed-point arithmetic is often slower than floating-point on typical hardware.
See also (for example):
https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2009-September/079680.html
http://mod16.org/hurfdurf/?p=142
follow-up: 15 comment:14 by , 11 years ago
Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?
Actually heard, or read about this, ppl pointed out the issues that it may not always be correct when decoding, but i could never get a real answer from it, so it ended up being ("a bug that probably has been solved").
But now that you confirm it, this makes things a bit, scary even as i use Lagarith a bit.
I knew Floating Point was superfast but lacking in precision in cases (not a programmer here;P), but i didn't know the difference in speed could be this big.
But just to confirm this now, my CPU is an i5 760, Intel with x64.
Is this a non-x86 or not (I know it's x64, but i also think it's in the x86 system, so if you could please clarify).
Cause if i use the Native decoder, will i always get the expected results if i have this CPU, and it's only occuring on other builds, and other systems (i am guessing, ARM etc?).
If so, that is "okay" for me, but indeed not alright for a lossless codec, but it's hard to give up on the Codec, in some cases it yields unparalleled results (Pixelated Game Capture), but it's pretty much only there.
Sorry for many extra questions, but you certainly bring it up in a good way, hope i am not wasting to much of your time!
comment:15 by , 11 years ago
Replying to Zerowalker:
Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?
On the FFmpeg bug tracker the native utvideo decoder is of course the libavcodec utvideo decoder (as opposed to libutvideo).
comment:16 by , 11 years ago
Oh, so got it backwards, Native = ffmpeg?
If so, than we are in an agreement:)
follow-up: 19 comment:17 by , 11 years ago
Replying to cehoyos:
Replying to Zerowalker:
[...]
Lagarith is slower cause of some issue, which i don't understand
The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any
Thats not an excuse to be slower on x86 though, also making softfloat_mul() 30% faster by using floats has no meassureable effect on the overall speed (with the file i tested).
did someone compare the decoders part by part speedwise ?
follow-up: 21 comment:18 by , 11 years ago
The only comparison i have done is on Playback, looking at CPU Usage etc.
And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.
But on LAV Filter, it's so slow on some parts that i would call it unwatchable.
That's where i noticed that there was a huge difference, but i have no numbers on it.
(The slow parts are Noisy stuff at 1920x1080p, so is probably quite possible to make a testbench with just much noise going on).
comment:19 by , 11 years ago
Replying to michael:
did someone compare the decoders part by part speedwise ?
I don't know how to test the original Lagarith decoder / how to do a performance comparison.
comment:20 by , 11 years ago
Lagarith performance can be tested with MPlayer and -vo gl (-vo null crashes currently):
The dll in http://samples.ffmpeg.org/V-codecs/lagarith/ is 10% faster for lagarith.avi (RGB) but much slower than FFmpeg for lagarith422.avi (even if I do the conversion from yuv422p to rgb that lagarith seems to do internally). It is possible that the dll is outdated, I don't know how to find out.
For a better test, a longer and larger RGB sample would be needed.
follow-up: 23 comment:21 by , 11 years ago
Replying to Zerowalker:
The only comparison i have done is on Playback, looking at CPU Usage etc.
And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.
But on LAV Filter, it's so slow on some parts that i would call it unwatchable.
where can i find this file ?
comment:22 by , 11 years ago
Lagarith version 1323 also works with MPlayer and is >20% faster than FFmpeg's decoder. The performance disadvantage with FFmpeg is apparently not so bad for Lagarith than it is for utvideo.
comment:23 by , 11 years ago
Replying to michael:
where can i find this file ?
It's just a video i got, made myself in After Effects, nothing special or something that's available.
I am sure any file will show the same results, just make a video with much noise and movement at 1080p or more, than compare Lagarith with ffmpeg.
cehoyos, so utvideo is worse of than the Lagarith decoder, do you include the efficiency in a single thread then, or do you mean overall?
comment:24 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | open → closed |
libutvideo is no more and I gonna push patch that makes utvideo decoder faster.
comment:25 by , 8 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Implemented for median prediction in ea93052db3594f93f2d10be085a770184da0513d and 68e5598e22b6b51cd796b55c4111ccd1638474d9
comment:26 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
There is no way to do same for left prediction. Left prediction is not SIMDable. And there is no other predictions available.
Please reopen only if you can provide real benchmarks numbers proving that decoder is slower than reference one. And with steps to reproduce
comment:27 by , 8 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-83022-g0006384 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' libavutil 55. 43.100 / 55. 43.100 libavcodec 57. 71.100 / 57. 71.100 libavformat 57. 62.100 / 57. 62.100 libavdevice 57. 2.100 / 57. 2.100 libavfilter 6. 68.100 / 6. 68.100 libswscale 4. 3.101 / 4. 3.101 libswresample 2. 4.100 / 2. 4.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.48.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf57.62.100 Stream #0:0: Video: wrapped_avframe, rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc Metadata: encoder : Lavc57.71.100 wrapped_avframe Stream mapping: Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native)) Press [q] to stop, [?] for help frame= 1500 fps=156 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A speed=6.22x video:545kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown bench: utime=69.007s bench: maxrss=80832kB real 0m9.651s user 1m9.011s sys 0m0.670s
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers built on Jan 7 2017 16:23:03 with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl libavutil 52. 86.100 / 52. 86.100 libavcodec 55. 64.100 / 55. 64.100 libavformat 55. 40.100 / 55. 40.100 libavdevice 55. 13.101 / 55. 13.101 libavfilter 4. 5.100 / 4. 5.100 libswscale 2. 6.100 / 2. 6.100 libswresample 0. 19.100 / 0. 19.100 libpostproc 52. 3.100 / 52. 3.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.48.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf55.40.100 Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc Metadata: encoder : Lavc55.64.100 rawvideo Stream mapping: Stream #0:0 -> #0:0 (libutvideo -> rawvideo) Press [q] to stop, [?] for help [null @ 0x9cd2c00] Encoder did not produce proper pts, making some up. frame= 1500 fps=132 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000% bench: utime=40.979s bench: maxrss=22828kB real 0m11.342s user 0m40.982s sys 0m2.014s
comment:28 by , 8 years ago
Resolution: | → worksforme |
---|---|
Status: | reopened → closed |
You are comparing decoding speed with crystal ball?
Compare single threaded decoding.
follow-up: 31 comment:29 by , 8 years ago
I read in the report:
Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native)) bench: utime=69.007s bench: maxrss=80832kB real 0m9.651s user 1m9.011s sys 0m0.670s
and
Stream #0:0 -> #0:0 (libutvideo -> rawvideo) bench: utime=40.979s bench: maxrss=22828kB real 0m11.342s user 0m40.982s sys 0m2.014s
I find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?
comment:30 by , 8 years ago
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
comment:31 by , 8 years ago
Replying to Cigaes:
I read in the report:
Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native)) bench: utime=69.007s bench: maxrss=80832kB real 0m9.651s user 1m9.011s sys 0m0.670sand
Stream #0:0 -> #0:0 (libutvideo -> rawvideo) bench: utime=40.979s bench: maxrss=22828kB real 0m11.342s user 0m40.982s sys 0m2.014sI find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?
One gives more FPS than other.
comment:32 by , 8 years ago
So I guess you are looking at the "real" time instead of the "user" time. The "user" time is usually more relevant; differences in "real" time may mean other processes getting scheduled or slow input from disk.
Still, let us assume it is not the case here. We can compute the efficiency. For libutvideo: 40.982 user for 11.342 real means 361% CPU use; for the native decoder, 69.011 user for 9.651 real means 715% CPU use.
361% versus 715% looks like ~90% of respectively 4 and 8 threads. Would this be a quad-code hyper-threaded system?
Anyway, the native decoder is still not on par with the library, this bug cannot be closed.
comment:34 by , 8 years ago
Please do not. The quoted report already show more than 60% in user time with default options, which is significant.
comment:37 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
Sounds like two completely independent issues to me or do I miss something?
The original Lagarith decoder is not portable and cannot be compared to FFmpeg's decoder afaik.
Could you add some numbers and post FFmpeg console output to make this a complete ticket?