Opened 12 years ago
Closed 12 years ago
#2540 closed defect (fixed)
-threads with libx264rgb do not work
Reported by: | hirschhornsalz | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avcodec |
Version: | git-master | Keywords: | libx264 |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
-threads options is ignored when using it with -c:v libx264rgb.
Details:
I use ffmpeg to grab video from a video game (World of Warcraft). The minimum requirements to successfully grab are 1920x1080 (HD) at 30 fps, with sound. The video data needs to be comressed somewhat, because the raw video stream of about 240 MByte/s isn't exactly easy manageable. I use libx264 with -preset ultrafast, which works reasonably well, because it runs on multiple cores (I use -threads 4), which get rarely over 30% usage.
After upgrading from 0.11.7 to 1.2 I was no longer able to grab at the required frame rate - the bottleneck seems to be the RGB to YUV conversion, which seems to be a lot slower in newer versions.
So I tried -c:v libx264rgb to avoid the costly rgb to yuv conversion, only to discover that it was even slower. The culprit is that threading is disabled. After enabling threading in the source code libx264rgb runs reasonably, and it outperforms the YUV version as expected.
How to reproduce:
% ffmpeg -f x11grab -r 100 -s 1920x1080 -c:v libx264rgb -preset ultrafast -threads 4 -crf 20 test.avi frame= 208 fps= 27 q=14.0 Lsize= 1524kB time=00:00:09.09 bitrate=1373.8kbits/s After changing line 746 in libavcodec/libx264.c from .capabilities = CODEC_CAP_DELAY, to .capabilities = CODEC_CAP_DELAY | CODEC_CAP_AUTO_THREADS, % ffmpeg -f x11grab -r 100 -s 1920x1080 -c:v libx264rgb -preset ultrafast -threads 4 -crf 20 test.avi frame= 757 fps= 94 q=-1.0 Lsize= 5238kB time=00:00:09.38 bitrate=4574.7kbits/s
Change History (5)
comment:2 by , 12 years ago
Keywords: | libx264 added |
---|
follow-up: 4 comment:3 by , 12 years ago
Regarding your original problem, maybe look at the pixel format, I believe the default format negotiation has changed at some point before 1.2: you probably were encoding for yuv420p and now for yuv444p, which has better quality but is slower; -pix_fmt yuv420p
should fix it.
I do not know what will be faster: on one hand, yuv420p has the colorspace conversion, on the other hand rgb is not subsamples, benchmark is needed. Also, please remember that H.264 RGB is not standard.
And of course, you should submit your patch to ffmpeg-devel as cehoyos suggested.
comment:4 by , 12 years ago
Replying to Cigaes:
Regarding your original problem, maybe look at the pixel format, I believe the default format negotiation has changed at some point before 1.2: you probably were encoding for yuv420p and now for yuv444p, which has better quality but is slower;
-pix_fmt yuv420p
should fix it.
I do not know what will be faster: on one hand, yuv420p has the colorspace conversion, on the other hand rgb is not subsamples, benchmark is needed.
Thank you for this suggestion, very good idea. I did a short test with 1.2 and yuv420p - and it is indeed faster than yuv444p but OTOH not fast as 0.11.3.
Maximum frame rate for HD video capturing using x11grab and x264
1.2 with yuv444p 27 fps 1.2 with yuv420p 37 fps 1.2 with rgb 94 fps 0.11.3 with yuv420p 74 fps
Interesting is the oprofile sample test for the 1.2+yuv420p:
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 1137688 42.1727 libswscale.so.2.2.100 hScale16To15_c 244454 9.0616 libswscale.so.2.2.100 bgr32ToUV_half_c 227943 8.4496 libswscale.so.2.2.100 bgr32ToY_c 160825 5.9616 libswscale.so.2.2.100 yuv2plane1_8_c 130781 4.8479 libc-2.17.so __memcpy_ssse3_back 105577 3.9136 libx264.so.125 x264_prefetch_ref_mmx2 ......... more libx264 stuff....
The hScale16To15_c function doesn't even show up on 0.11.3, and it seems to be a bottleneck.
And of course, you should submit your patch to ffmpeg-devel as cehoyos suggested.
Will do.
comment:5 by , 12 years ago
Please open a new ticket for the performance regression, mixing different problems in one ticket makes following the tickets impossible. Please don't forget to post command lines and console output of 0.11 and 1.2 to allow us reproducing the issue.
comment:6 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed by you.
If there is a performance regression, please report it in a new ticket!
Please send a patch fixing the threading issue to ffmpeg-devel.