Opened 12 years ago

Last modified 12 years ago

#1882 open enhancement

Multi-threading wmv encoder

Reported by: txspaderz Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: wmv2
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

I'm having issues using multiple cores when using the wmv encoder. It appears to be locked to a single core only.

Any chance we could get support for multiple threads?

Please refer to:
http://forum.serviio.org/viewtopic.php?f=5&t=7698

Attachments (1)

ffmpeg.log.txt (1.6 MB ) - added by Carl Eugen Hoyos 12 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 by Carl Eugen Hoyos, 12 years ago

Component: FFmpegavcodec
Keywords: wmv2 added
Priority: normalwish
Reproduced by developer: set
Status: newopen
Version: unspecifiedgit-master

Please post the command line from the forum here together with complete, uncut console output.
(I suspect there is either a work-around for your actual problem or it is actually a different problem than wmv2 encoding speed.)

comment:2 by txspaderz, 12 years ago

Here's a tee'd output at this link (it's quite large, in a ascii format):
http://houstondad.com/ffmpeg.log.txt

When the process is running, it's actually switching cores (39th field of /proc/pid/stat shows the last executed core), here's a breakup if it matters:

root@media-center ~ # while true; do cat /proc/12514/stat | awk '{print $39}'; done > /home/perk/ffmpeg_cores.log
root@media-center ~ # sort /home/perk/ffmpeg_cores.log | uniq -c | sort -k2 -n
 136778 0
 409369 1
 405346 2
 406885 3

However during the encoding time, the process is continually using only 1 cores worth of cycles, and nothing more:

12514 root      20   0 91080  65m 4632 R  100  0.4  10:00.98 ffmpeg
12514 root      20   0 91548  65m 4632 R   97  0.4  10:01.96 ffmpeg
12514 root      20   0 90456  65m 4632 R  100  0.4  10:02.98 ffmpeg
12514 root      20   0 90456  65m 4632 R   98  0.4  10:03.97 ffmpeg
12514 root      20   0 90684  65m 4632 R  100  0.4  10:04.98 ffmpeg
12514 root      20   0 90684  65m 4632 R  100  0.4  10:06.00 ffmpeg
12514 root      20   0 90436  65m 4632 R   98  0.4  10:07.00 ffmpeg
12514 root      20   0 90436  65m 4632 R  100  0.4  10:08.01 ffmpeg
12514 root      20   0 90436  65m 4632 R   99  0.4  10:09.02 ffmpeg
12514 root      20   0 90436  65m 4632 R   99  0.4  10:10.03 ffmpeg
12514 root      20   0 90740  65m 4632 R   99  0.4  10:11.03 ffmpeg
12514 root      20   0 91176  65m 4632 R   99  0.4  10:12.03 ffmpeg
12514 root      20   0 91176  65m 4632 R  101  0.4  10:13.05 ffmpeg
12514 root      20   0 91176  65m 4632 R   96  0.4  10:14.03 ffmpeg
12514 root      20   0 91176  65m 4632 R  101  0.4  10:15.06 ffmpeg

Because of this, and the source files fps is 24, the encoder isn't able to stream in real time.

comment:3 by Carl Eugen Hoyos, 12 years ago

Please always post all necessary information here on the bug tracker, do not use external resources, they may disappear!

There are two possibilities to fix your original problem (performance on transcoding from vc1 to wmv2 is too low):
The first is to implement wmv2 multi-threaded encoding. Given that the task is not trivial and this codec was deprecated by Microsoft years ago, I am not sure how likely this is to happen.
The second is to implement vc1 multi-threaded decoding. While this is probably not simpler, vc1 decoder is an important part of libavcodec, I would therefore suspect that the chances are (very slightly) higher, consider opening a second enhancement request (or wait for me to do it).

by Carl Eugen Hoyos, 12 years ago

Attachment: ffmpeg.log.txt added

comment:4 by Carl Eugen Hoyos, 12 years ago

Out of curiosity: Could you explain why you are using -copyts ?
(Did you test if constant quantiser is faster?)

in reply to:  4 comment:5 by txspaderz, 12 years ago

Replying to cehoyos:

Out of curiosity: Could you explain why you are using -copyts ?
(Did you test if constant quantiser is faster?)

Thanks, if you could put in that request I'd greatly appreciate it.

I'm not sure why it's adding -copyts, the command line is being generated by a program that acts as a dlna server.

comment:6 by Carl Eugen Hoyos, 12 years ago

Reading your forum post again:
Is this really a regression? Was the performance better for the same input file and an older FFmpeg version? That would be a serious bug, please report the previous FFmpeg version.

comment:7 by xnejp03, 12 years ago

Hi, I'm the author of the software. wmv2 is the only wmv-based encoder that FFmpeg supports, and is needed for on-the-fly transcoding for Xbox (among others). The encoder doesn't support -threads with value other than 1. It's a major bottleneck, especially for HD videos. I'm not sure multi-threaded VC1 decoder will help, as the same will happen when transcoding (for example) MKV/H264 HD file. And no, this is not a regression, it was always the case.

I appreciate wmv2 is deprecated, but this would really help a lot, unless there is a chance of wmv3 encoder.

Regarding -copyts: it's an attempt to keep audio/video in sync, but it might be completely inappropriate, the documentation on this parameter is a bit scarce :-)

Version 1, edited 12 years ago by xnejp03 (previous) (next) (diff)

comment:7 by xnejp03, 12 years ago

Hi, I'm the author of the software. wmv2 is the only wmv-based encoder that FFmpeg supports, and is needed for on-the-fly transcoding for Xbox (among others). The encoder doesn't support -threads with value other than 1. It's a major bottleneck, especially for HD videos. I'm not sure multi-threaded VC1 decoder will help, as the same will happen when transcoding (for example) MKV/H264 HD file.

I appreciate wmv2 is deprecated, but this would really help a lot, unless there is a chance of wmv3 encoder.

Regarding -copyts: it's an attempt to keep audio/video in sync, but it might be completely inappropriate, the documentation on this parameter is a bit scarce :-)

in reply to:  7 comment:8 by Carl Eugen Hoyos, 12 years ago

Replying to xnejp03:

I'm not sure multi-threaded VC1 decoder will help, as the same will happen when transcoding (for example) MKV/H264 HD file.

But that is not what the OP reported, so please provide a sample and command line together with complete, uncut console output.

comment:9 by xnejp03, 12 years ago

Actually it looks like the threads parameter is now not breaking the command any more. It also looks that when not supplied, it uses all cores (at least for the mpeg2video encoder) and the threads parameter is now used mostly to limit the usage of CPUs (can you confirm)?

When run with -threads 1 I'm getting about 39 fps on my example file, when run without -threads it does about 80 fps, so it'd seem that there is some parallelism implemented. It doesn't push all the cores to maximum, as mpeg2video does though, so there is possible room for improvement.

Also, does it make difference if -threads is provided before -i and after? Would that specify number of CPUs separately for decoder and encoder? Or is just one definition (before -i) enough?

in reply to:  9 comment:10 by Carl Eugen Hoyos, 12 years ago

Replying to xnejp03:

Actually it looks like the threads parameter is now not breaking the command any more.

?
(I am curious: Could you point me to the bug report?)

It also looks that when not supplied, it uses all cores (at least for the mpeg2video encoder) and the threads parameter is now used mostly to limit the usage of CPUs (can you confirm)?

The threads parameter allows to specify the number of used threads, the default is "0" (auto).

When run with -threads 1 I'm getting about 39 fps on my example file, when run without -threads it does about 80 fps, so it'd seem that there is some parallelism implemented. It doesn't push all the cores to maximum, as mpeg2video does though, so there is possible room for improvement.

I am not sure if auto is always a good choice: It detects the number of cores, but in nearly all cases, you should specify a higher number for maximum performance.

Also, does it make difference if -threads is provided before -i and after? Would that specify number of CPUs separately for decoder and encoder? Or is just one definition (before -i) enough?

You can specify -threads for the decoder and the encoder.

Note: See TracTickets for help on using tickets.