Opened 3 days ago
Last modified 2 days ago
#11387 new defect
dnn_detect filter won't work with yolo4-tiny model when both anchors and labels filenames are defined
Reported by: | Leandro Santiago | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avfilter |
Version: | git-master | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
System: Manjaro Linux stable (latest as on on Dec 30th 2024).
OpenVino version: 2024.6.0.
FFMpeg version:
ffmpeg version N-118193-g5f38c82536 Copyright (c) 2000-2024 the FFmpeg developers built with gcc 14.2.1 (GCC) 20240910 configuration: --enable-libopenvino --enable-libharfbuzz --enable-libfribidi --enable-libfreetype --enable-libfontconfig --enable-openssl libavutil 59. 53.100 / 59. 53.100 libavcodec 61. 28.100 / 61. 28.100 libavformat 61. 9.102 / 61. 9.102 libavdevice 61. 4.100 / 61. 4.100 libavfilter 10. 6.101 / 10. 6.101 libswscale 8. 13.100 / 8. 13.100 libswresample 5. 4.100 / 5. 4.100
How to reproduce:
Install the openvino-dev
python package to download the models:
pip install openvino-dev tensorflow
And download and convert the the yolo-v4-tiny-tf
and the labels file:
omz_downloader --name yolo-v4-tiny-tf omz_converter --name yolo-v4-tiny-tf wget https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/refs/heads/master/data/dataset_classes/coco_80cl.txt
Then run ffplay on some arbitrary video containing several objects that should be detected by this model, and drawing rectangles and labels on the detected objects:
ffplay \ https://videos.pexels.com/video-files/5222540/5222540-uhd_3840_2160_30fps.mp4 \ -vf 'dnn_detect=dnn_backend=openvino:model=public/yolo-v4-tiny-tf/FP32/yolo-v4-tiny-tf.xml:input=image_input:confidence=0.4:model_type=yolov4:anchors=81&82&135&169&344&319:labels=coco_80cl.txt:async=1:nb_classes=80,drawbox=box_source=side_data_detection_bboxes:color=yellow,drawtext=text_source=side_data_detection_bboxes:fontcolor=yellow:bordercolor=yellow:fontsize=40,showinfo'
You'see many log lines like this:
[Parsed_dnn_detect_0 @ 0x785bd2f21680] anchors is not set
As the anchors=
filter option on dnn_detect
is not passed to the filter, and anchors are required by yolo4
.
The correct behaviour is the drawbox and drawtext filters writing on the image, as well as the information about the detected objects being logged to the terminal:
... [Parsed_showinfo_3 @ 0x743b02f22b00] index: 0, region: (145, 1042) -> (740, 1495), label: car, confidence: 9918/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 1, region: (551, 893) -> (551, 893), label: person, confidence: 4277/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 2, region: (791, 1012) -> (791, 1012), label: person, confidence: 4069/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 3, region: (1375, 1055) -> (1375, 1055), label: person, confidence: 5944/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 4, region: (1505, 1065) -> (1505, 1065), label: person, confidence: 7363/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 5, region: (794, 1011) -> (794, 1011), label: person, confidence: 8378/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 6, region: (915, 1010) -> (915, 1010), label: person, confidence: 8011/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 7, region: (1088, 1117) -> (1088, 1117), label: person, confidence: 9511/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 8, region: (1385, 1052) -> (1385, 1052), label: person, confidence: 7692/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 9, region: (1644, 1172) -> (1644, 1172), label: person, confidence: 9132/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 10, region: (1801, 1173) -> (1801, 1173), label: person, confidence: 9828/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 11, region: (2480, 1299) -> (2480, 1299), label: person, confidence: 9496/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 12, region: (414, 1239) -> (414, 1239), label: car, confidence: 8610/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 13, region: (422, 1265) -> (422, 1265), label: car, confidence: 9608/10000. [Parsed_showinfo_3 @ 0x743b02f22b00] index: 14, region: (452, 1266) -> (452, 1266), label: car, confidence: 9239/10000. ...
Change History (2)
comment:1 by , 3 days ago
comment:2 by , 2 days ago
Version: | unspecified → git-master |
---|
A fix for this issue has been provided in the ffmpeg-devel list: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-December/338019.html