Opened 3 days ago

Last modified 2 days ago

#11387 new defect

dnn_detect filter won't work with yolo4-tiny model when both anchors and labels filenames are defined

Reported by: Leandro Santiago Owned by:
Priority: normal Component: avfilter
Version: git-master Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

System: Manjaro Linux stable (latest as on on Dec 30th 2024).

OpenVino version: 2024.6.0.

FFMpeg version:

ffmpeg version N-118193-g5f38c82536 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 14.2.1 (GCC) 20240910
configuration: --enable-libopenvino --enable-libharfbuzz --enable-libfribidi --enable-libfreetype --enable-libfontconfig --enable-openssl
libavutil      59. 53.100 / 59. 53.100
libavcodec     61. 28.100 / 61. 28.100
libavformat    61.  9.102 / 61.  9.102
libavdevice    61.  4.100 / 61.  4.100
libavfilter    10.  6.101 / 10.  6.101
libswscale      8. 13.100 /  8. 13.100
libswresample   5.  4.100 /  5.  4.100

How to reproduce:

Install the openvino-dev python package to download the models:

pip install openvino-dev tensorflow

And download and convert the the yolo-v4-tiny-tf and the labels file:

omz_downloader --name yolo-v4-tiny-tf
omz_converter --name yolo-v4-tiny-tf
wget https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/refs/heads/master/data/dataset_classes/coco_80cl.txt

Then run ffplay on some arbitrary video containing several objects that should be detected by this model, and drawing rectangles and labels on the detected objects:

ffplay \
 https://videos.pexels.com/video-files/5222540/5222540-uhd_3840_2160_30fps.mp4 \
 -vf 'dnn_detect=dnn_backend=openvino:model=public/yolo-v4-tiny-tf/FP32/yolo-v4-tiny-tf.xml:input=image_input:confidence=0.4:model_type=yolov4:anchors=81&82&135&169&344&319:labels=coco_80cl.txt:async=1:nb_classes=80,drawbox=box_source=side_data_detection_bboxes:color=yellow,drawtext=text_source=side_data_detection_bboxes:fontcolor=yellow:bordercolor=yellow:fontsize=40,showinfo'

You'see many log lines like this:

[Parsed_dnn_detect_0 @ 0x785bd2f21680] anchors is not set

As the anchors= filter option on dnn_detect is not passed to the filter, and anchors are required by yolo4.

The correct behaviour is the drawbox and drawtext filters writing on the image, as well as the information about the detected objects being logged to the terminal:

...
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 0,  region: (145, 1042) -> (740, 1495), label: car, confidence: 9918/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 1,  region: (551, 893) -> (551, 893), label: person, confidence: 4277/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 2,  region: (791, 1012) -> (791, 1012), label: person, confidence: 4069/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 3,  region: (1375, 1055) -> (1375, 1055), label: person, confidence: 5944/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 4,  region: (1505, 1065) -> (1505, 1065), label: person, confidence: 7363/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 5,  region: (794, 1011) -> (794, 1011), label: person, confidence: 8378/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 6,  region: (915, 1010) -> (915, 1010), label: person, confidence: 8011/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 7,  region: (1088, 1117) -> (1088, 1117), label: person, confidence: 9511/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 8,  region: (1385, 1052) -> (1385, 1052), label: person, confidence: 7692/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 9,  region: (1644, 1172) -> (1644, 1172), label: person, confidence: 9132/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 10, region: (1801, 1173) -> (1801, 1173), label: person, confidence: 9828/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 11, region: (2480, 1299) -> (2480, 1299), label: person, confidence: 9496/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 12, region: (414, 1239) -> (414, 1239), label: car, confidence: 8610/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 13, region: (422, 1265) -> (422, 1265), label: car, confidence: 9608/10000.
[Parsed_showinfo_3 @ 0x743b02f22b00] index: 14, region: (452, 1266) -> (452, 1266), label: car, confidence: 9239/10000.
...

Change History (2)

comment:1 by Leandro Santiago, 3 days ago

A fix for this issue has been provided in the ffmpeg-devel list: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-December/338019.html

comment:2 by Leandro Santiago, 2 days ago

Version: unspecifiedgit-master
Note: See TracTickets for help on using tickets.