Opened 3 months ago

Last modified 13 days ago

#11148 open defect

DirectShow (dshow) support lacking beyond device defaults

Reported by: blurbdust Owned by: Diederick Niehorster
Priority: normal Component: avdevice
Version: git-master Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Hello,

First and foremost, I would like to thank the dev team as a whole for the amazing software and continued support over the years. ffmpeg really is a fantastic tool that doesn't get as much credit as it should.

I have a few main requested changes surrounding DirectShow. I normally provide PRs for my bug reports but unfortunately I have not been able to determine the root cause of the main issue.

A quick bug is the sample_size argument within libavdevice/dshow.c caps at 16 bits when the max DirectShow supports is 32 bits.
https://learn.microsoft.com/en-us/windows/win32/directshow/bitdepth-attribute#possible-values

When testing the above quick fix, I ended up stumbling into ffmpeg's DirectShow support ignoring the provided -sample_size field (it appears to solely use the device defaults). When digging in with WinDbg, the device capabilities is correctly enumerated to support up to 24 bits when checking acaps on https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/refs/heads/master:/libavdevice/dshow.c#l990. However, for an unknown reason, requested_sample_size is set to an invalid memory region according to WinDbg and the value cannot be read at this line https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/refs/heads/master:/libavdevice/dshow.c#l1008. ctx->sample_size is still valid and set to the expected 24. I traced the write to corrupt requested_sample_size back to immediately following the return of IPin_QueryInterface within dshow_cycle_formats https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/refs/heads/master:/libavdevice/dshow.c#l823.

I did look into the main previous bug reports https://trac.ffmpeg.org/ticket/9420 and as a temporary workaround of reverting 436-440 within commit d9a9b4c877b85fea5a5bad74c3d592a756047f79, revert line 1007 from d2d8b9b972ba2df6b2a2ebe29f5307cbb7a69c33, keep requested_sample_size defined but not used, the capture does work as expected. I initially thought the write of fx->wBitsPerSample = ctx->sample_size would be the answer but after rebuilding ffmpeg with support for more third party libraries, 24 bit capture is no longer working. A sample patch of what I expected to work can be found at https://gist.githubusercontent.com/blurbdust/5fa38b6e854d4cfe251a466131bd9b0c/raw/69287eb13d88b86b1cda06016c6438b86707133d/fix-dshow-requested-variables.patch. I popped open the generated ffmpeg_g.exe in IDA Home (output attached at bottom and search for // start of workaround) and requested_sample_size is in fact missing when doing stack rebuilding of dshow_cycle_formats. It feels like maybe some stack corruption is occurring or maybe WinDbg is leading me astray with broken symbols after the IPin_QueryInterface call?

I (or one other person with the same capture card and issue) would be happy to help debug in any way we can.

Please let me know if you need any more information from me.

Thank you for your time

void __fastcall dshow_cycle_formats(AVFormatContext *avctx, dshowDeviceType devtype, IPin *pin, int *pformat_set)
{
  int *priv_data; // rdi
  bool v5; // zf
  AVPixelFormat v6; // eax
  __int64 v8; // rbx
  void (__stdcall *v9)(LPVOID); // rbp
  unsigned int *v10; // r12
  _DWORD *v11; // rax
  int v12; // esi
  dshow_format_info *v13; // rax
  BYTE *v14; // r8
  AM_MEDIA_TYPE *v15; // rdx
  BYTE *v16; // rcx
  __int64 *v17; // r9
  AM_MEDIA_TYPE *v18; // rcx
  BYTE *v19; // rcx
  dshow_format_info *format_info; // rax
  dshow_format_info *v21; // rbx
  BYTE *pbFormat; // r8
  HRESULT v23; // eax
  AM_MEDIA_TYPE *v24; // rcx
  BYTE *v25; // rcx
  unsigned int v26; // eax
  unsigned int v27; // eax
  unsigned int v28; // eax
  BYTE *v29; // rcx
  AVPixelFormat pix_fmt; // ecx
  const char *pix_fmt_name; // rax
  const char *v32; // rax
  const char *v33; // r8
  const char *v34; // rdx
  const char *v35; // rcx
  const char *v36; // r9
  const char *v37; // r9
  const AVCodec *decoder; // rax
  const char *v39; // r9
  __int64 v40; // [rsp+20h] [rbp-E8h]
  __int64 v41; // [rsp+28h] [rbp-E0h]
  __int64 v42; // [rsp+30h] [rbp-D8h]
  __int64 v43; // [rsp+38h] [rbp-D0h]
  __int64 v44; // [rsp+40h] [rbp-C8h]
  AM_MEDIA_TYPE *previous_match_type; // [rsp+50h] [rbp-B8h]
  __int64 requested_framerate; // [rsp+58h] [rbp-B0h]
  int requested_width; // [rsp+64h] [rbp-A4h]
  int requested_height; // [rsp+68h] [rbp-A0h]
  AVCodecID requested_video_codec_id; // [rsp+6Ch] [rbp-9Ch]
  AVPixelFormat requested_pixel_format; // [rsp+70h] [rbp-98h]
  int wait_for_better; // [rsp+74h] [rbp-94h]
  bool v52; // [rsp+7Ah] [rbp-8Eh]
  int v53; // [rsp+7Ch] [rbp-8Ch]
  const char *prim; // [rsp+80h] [rbp-88h]
  const char *prima; // [rsp+80h] [rbp-88h]
  const char *space; // [rsp+88h] [rbp-80h]
  const char *range; // [rsp+90h] [rbp-78h]
  const char *chroma; // [rsp+98h] [rbp-70h]
  int n; // [rsp+A0h] [rbp-68h] BYREF
  int size; // [rsp+A4h] [rbp-64h] BYREF
  IAMStreamConfig *config; // [rsp+A8h] [rbp-60h] BYREF
  AM_MEDIA_TYPE *previous_match_type_0; // [rsp+B0h] [rbp-58h] BYREF
  dshow_format_info *arg; // [rsp+B8h] [rbp-50h] BYREF

  priv_data = (int *)avctx->priv_data;
  v5 = *((_QWORD *)priv_data + 39) == 0LL;
  requested_video_codec_id = priv_data[77];
  v6 = priv_data[76];
  config = 0LL;
  previous_match_type_0 = 0LL;
  requested_pixel_format = v6;
  requested_framerate = 0LL;
  if ( !v5 )
    requested_framerate = 10000000LL * priv_data[83] / priv_data[82];
  requested_width = priv_data[80];
  requested_height = priv_data[81];
  LODWORD(v8) = pin->lpVtbl->QueryInterface(pin, refptr_IID_IAMStreamConfig, (void **)&config);
  if ( (_DWORD)v8 )
    return;
  if ( config->lpVtbl->GetNumberOfCapabilities(config, &n, &size) || (v10 = (unsigned int *)av_malloc(size)) == 0LL )
  {
    previous_match_type = 0LL;
    v9 = CoTaskMemFree;
    v10 = 0LL;
LABEL_6:
    v9(previous_match_type);
    goto LABEL_7;
  }
  v11 = avctx->priv_data;
  if ( devtype )
  {
    if ( !v11[86] && !*((_QWORD *)v11 + 42) )
      goto LABEL_49;
  }
  else if ( !*((_QWORD *)v11 + 39) && (!v11[80] || !v11[81]) && *((_QWORD *)v11 + 38) == 0xDFFFFFFFFLL )
  {
LABEL_49:
    if ( !pformat_set )
    {
      if ( n <= 0 )
      {
        CoTaskMemFree(0LL);
        config->lpVtbl->Release(config);
        av_free(v10);
        return;
      }
      v53 = 0;
      v9 = CoTaskMemFree;
      v52 = 0;
      goto LABEL_15;
    }
    dshow_get_default_format(pin, config, devtype, &previous_match_type_0);
    if ( previous_match_type_0 )
    {
      format_info = dshow_get_format_info(previous_match_type_0);
      v21 = format_info;
      if ( format_info )
      {
        if ( format_info->devtype == dshowDeviceType::VideoDevice )
        {
          requested_video_codec_id = format_info->codec_id;
          requested_pixel_format = format_info->pix_fmt;
          requested_framerate = format_info->framerate;
          requested_width = format_info->width;
          requested_height = format_info->height;
        }
        av_free(format_info);
        if ( !previous_match_type_0 )
        {
          v9 = CoTaskMemFree;
          CoTaskMemFree(0LL);
          previous_match_type_0 = 0LL;
          goto LABEL_57;
        }
        pbFormat = previous_match_type_0->pbFormat;
        if ( !pbFormat )
        {
          v9 = CoTaskMemFree;
          CoTaskMemFree(previous_match_type_0);
          previous_match_type_0 = 0LL;
          goto LABEL_57;
        }
      }
      else
      {
        if ( !previous_match_type_0 )
        {
          LODWORD(v8) = 0;
          CoTaskMemFree(0LL);
          previous_match_type_0 = 0LL;
          CoTaskMemFree(0LL);
          goto LABEL_63;
        }
        pbFormat = previous_match_type_0->pbFormat;
        if ( !pbFormat )
        {
          LODWORD(v8) = 0;
          CoTaskMemFree(previous_match_type_0);
          previous_match_type_0 = 0LL;
          CoTaskMemFree(0LL);
          goto LABEL_63;
        }
      }
      v9 = CoTaskMemFree;
      CoTaskMemFree(pbFormat);
      CoTaskMemFree(previous_match_type_0);
      previous_match_type_0 = 0LL;
      if ( !v21 )
      {
        LODWORD(v8) = 0;
        goto LABEL_62;
      }
LABEL_57:
      v53 = 0;
      v52 = 1;
      if ( n <= 0 )
      {
LABEL_58:
        dshow_get_default_format(pin, config, devtype, &previous_match_type_0);
        v23 = config->lpVtbl->SetFormat(config, previous_match_type_0);
        v24 = previous_match_type_0;
        v8 = v23 == 0;
        if ( previous_match_type_0 && previous_match_type_0->pbFormat )
        {
          v9(previous_match_type_0->pbFormat);
          v24 = previous_match_type_0;
        }
        v9(v24);
        previous_match_type_0 = 0LL;
        goto LABEL_62;
      }
      goto LABEL_15;
    }
LABEL_93:
    LODWORD(v8) = 0;
    CoTaskMemFree(0LL);
    goto LABEL_63;
  }
  v52 = pformat_set != 0LL;
  if ( n <= 0 )
  {
    if ( pformat_set )
      goto LABEL_93;
    LODWORD(v8) = 0;
    CoTaskMemFree(0LL);
LABEL_7:
    config->lpVtbl->Release(config);
    av_free(v10);
    if ( pformat_set )
      goto LABEL_8;
    return;
  }
  v53 = 1;
  v9 = CoTaskMemFree;
LABEL_15:
  wait_for_better = 0;
  previous_match_type = 0LL;
  v12 = 0;
  while ( 1 )
  {
    arg = 0LL;
    LODWORD(v8) = config->lpVtbl->GetStreamCaps(config, v12, &previous_match_type_0, (BYTE *)v10);
    if ( (_DWORD)v8 )
    {
LABEL_76:
      LODWORD(v8) = 0;
      goto next;
    }
    v13 = dshow_get_format_info(previous_match_type_0);
    arg = v13;
    if ( v13 )
    {
      if ( devtype )
      {
        v15 = previous_match_type_0;
        if ( *(_OWORD *)refptr_FORMAT_WaveFormatEx != *(_OWORD *)&previous_match_type_0->formattype )
          goto next;
        v25 = previous_match_type_0->pbFormat;
        if ( !pformat_set )
        {
          LODWORD(v44) = v10[11];
          LODWORD(v43) = v10[8];
          LODWORD(v42) = v10[5];
          LODWORD(v41) = v10[10];
          LODWORD(v40) = v10[7];
          av_log(
            avctx,
            32,
            "  min ch=%lu bits=%lu rate=%6lu max ch=%lu bits=%lu rate=%6lu\n",
            v10[4],
            v40,
            v41,
            v42,
            v43,
            v44);
          goto next;
        }
        v26 = priv_data[84]; // start of workaround
        if ( v26 )
        {
          if ( v26 > v10[11] || v26 < v10[10] )
            goto next;
          *((_DWORD *)v25 + 1) = v26;
        }
        v27 = priv_data[85];
        if ( v27 )
        {
          if ( v27 > v10[8] || v27 < v10[7] )
            goto LABEL_76;
          *((_WORD *)v25 + 7) = v27;
        }
        v28 = priv_data[86];
        if ( v28 )
        {
          if ( v28 > v10[5] || v28 < v10[4] )
            goto LABEL_76;
          *((_WORD *)v25 + 1) = v28; // end of workaround
        }
LABEL_37:
        if ( wait_for_better )
        {
          if ( !previous_match_type )
          {
            previous_match_type_0 = 0LL;
            previous_match_type = v15;
          }
        }
        else
        {
          v8 = ((unsigned int (__fastcall *)(IAMStreamConfig *, AM_MEDIA_TYPE *, BYTE *))config->lpVtbl->SetFormat)(
                 config,
                 v15,
                 v14) == 0;
        }
        goto next;
      }
      if ( v13->devtype )
        goto next;
      v15 = previous_match_type_0;
      if ( *(_OWORD *)&previous_match_type_0->formattype - *(_OWORD *)refptr_FORMAT_VideoInfo == 0LL )
      {
        v16 = previous_match_type_0->pbFormat;
        wait_for_better = 1;
        v17 = (__int64 *)(v16 + 40);
        v14 = v16 + 48;
      }
      else if ( *(_OWORD *)refptr_FORMAT_VideoInfo2 == *(_OWORD *)&previous_match_type_0->formattype )
      {
        v29 = previous_match_type_0->pbFormat;
        wait_for_better = 0;
        v17 = (__int64 *)(v29 + 40);
        v14 = v29 + 72;
      }
      else
      {
        v17 = 0LL;
        v14 = 0LL;
      }
      if ( pformat_set )
      {
        if ( requested_video_codec_id != AVCodecID::AV_CODEC_ID_RAWVIDEO && requested_video_codec_id != v13->codec_id
          || requested_pixel_format != AVPixelFormat::AV_PIX_FMT_NONE && v13->pix_fmt != requested_pixel_format )
        {
          goto next;
        }
        if ( requested_framerate )
        {
          if ( *((_QWORD *)v10 + 14) < requested_framerate || *((_QWORD *)v10 + 13) > requested_framerate )
            goto next;
          *v17 = requested_framerate;
        }
        if ( requested_height != 0 && requested_width != 0 )
        {
          if ( (int)v10[17] < requested_width
            || (int)v10[15] > requested_width
            || (int)v10[18] < requested_height
            || (int)v10[16] > requested_height )
          {
            goto next;
          }
          *((_DWORD *)v14 + 1) = requested_width;
          *((_DWORD *)v14 + 2) = requested_height;
        }
        goto LABEL_37;
      }
      prim = (const char *)v14;
      chroma = av_chroma_location_name(v13->chroma_loc);
      pix_fmt = arg->pix_fmt;
      if ( pix_fmt == AVPixelFormat::AV_PIX_FMT_NONE )
      {
        decoder = avcodec_find_decoder(arg->codec_id);
        if ( arg->codec_id && decoder )
          av_log(avctx, 32, "  vcodec=%s", decoder->name);
        else
          av_log(avctx, 32, "  unknown compression type 0x%X", *((unsigned int *)prim + 4));
      }
      else
      {
        pix_fmt_name = av_get_pix_fmt_name(pix_fmt);
        av_log(avctx, 32, "  pixel_format=%s", pix_fmt_name);
      }
      LODWORD(v43) = v10[18];
      LODWORD(v42) = v10[17];
      LODWORD(v40) = v10[16];
      av_log(
        avctx,
        32,
        "  min s=%ldx%ld fps=%g max s=%ldx%ld fps=%g",
        v10[15],
        v40,
        10000000.0 / (double)(int)*((_QWORD *)v10 + 14),
        v42,
        v43,
        10000000.0 / (double)(int)*((_QWORD *)v10 + 13));
      if ( *(_QWORD *)&arg->col_range == 0x200000000LL && *(_QWORD *)&arg->col_prim == 0x200000002LL )
      {
        if ( arg->chroma_loc )
        {
          v39 = "unknown";
          if ( chroma )
            v39 = chroma;
          av_log(avctx, 32, "(%s)", v39);
        }
      }
      else
      {
        range = av_color_range_name(arg->col_range);
        space = av_color_space_name(arg->col_space);
        prima = av_color_primaries_name(arg->col_prim);
        v32 = av_color_transfer_name(arg->col_trc);
        v33 = prima;
        v34 = space;
        v35 = v32;
        v36 = range;
        if ( !v32 )
          v35 = "unknown";
        if ( !prima )
          v33 = "unknown";
        if ( !space )
          v34 = "unknown";
        if ( !range )
          v36 = "unknown";
        av_log(avctx, 32, " (%s, %s/%s/%s", v36, v34, v33, v35);
        if ( arg->chroma_loc )
        {
          v37 = chroma;
          if ( !chroma )
            v37 = "unknown";
          av_log(avctx, 32, ", %s", v37);
        }
        av_log(avctx, 32, ")");
      }
      av_log(avctx, 32, "\n");
    }
next:
    av_freep(&arg);
    v18 = previous_match_type_0;
    if ( previous_match_type_0 && previous_match_type_0->pbFormat )
    {
      v9(previous_match_type_0->pbFormat);
      v18 = previous_match_type_0;
    }
    v9(v18);
    ++v12;
    previous_match_type_0 = 0LL;
    if ( n <= v12 )
      break;
    if ( (_DWORD)v8 )
      goto end;
  }
  if ( (v8 & 1) != 0 || !v52 )
  {
end:
    if ( !previous_match_type )
      goto LABEL_6;
    goto LABEL_45;
  }
  if ( previous_match_type )
  {
    v8 = ((unsigned int (__fastcall *)(IAMStreamConfig *))config->lpVtbl->SetFormat)(config) == 0;
LABEL_45:
    v19 = previous_match_type->pbFormat;
    if ( v19 )
      v9(v19);
    goto LABEL_6;
  }
  if ( !v53 )
    goto LABEL_58;
  LODWORD(v8) = 0;
LABEL_62:
  v9(0LL);
LABEL_63:
  config->lpVtbl->Release(config);
  av_free(v10);
LABEL_8:
  *pformat_set = v8;
}

Change History (7)

comment:1 by Balling, 3 months ago

Owner: set to Diederick Niehorster
Status: newopen

A quick bug is the sample_size argument within libavdevice/dshow.c caps at 16 bits when the max DirectShow supports is 32 bits.

Wait a second, 32 bits are for video? It says

The bitdepth attribute specifies the bit depth of the output video, in bits per pixel.

If this is correct then I imagine it should be indeed.

+    { "sample_size", "set audio sample size", OFFSET(sample_size), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 32, DEC },

comment:2 by blurbdust, 2 months ago

I apologize, I grabbed the wrong link from my notes. https://learn.microsoft.com/en-us/windows/win32/api/mmreg/ns-mmreg-waveformatex
shouldn't be too device specific since the code jumps to next if the capture type isn't refptr_FORMAT_WaveFormatEx. This at least shows > 16 is supported by DirectShow and the limit within ffmpeg should be raised.

wBitsPerSample

Specifies the number of bits per sample for the format type specified by wFormatTag. If wFormatTag = WAVE_FORMAT_PCM, then wBitsPerSample should be set to either 8 or 16. If wFormatTag = WAVE_FORMAT_IEEE_FLOAT, wBitsPerSample should be set to 32. For non-PCM formats, set the value of this member according to the manufacturer's specification for the format tag. Some compression schemes cannot define a value for wBitsPerSample. In this case, set wBitsPerSample to zero. 
Version 0, edited 2 months ago by blurbdust (next)

comment:3 by Balling, 2 months ago

ffmpeg should be raised.

Raised to what? 32 bit or 24 bit? Also, why would you need support for non-defaults? Would not it possibly break the device? Is it possible that sample_size only allowed value is 16 since that is used to overwrite default of 24, while overwriting with 24 bit on unsupported devices can lead to defects?

Last edited 2 months ago by Balling (previous) (diff)

comment:4 by blurbdust, 2 months ago

Raised to what? 32 bit or 24 bit?

32 bit as this is the maximum that I can find that is supported by DirectShow for audio devices.

Also, why would you need support for non-defaults? Would not it possibly break the device? Is it possible that sample_size only allowed value is 16 since that is used to overwrite default of 24, while overwriting with 24 bit on unsupported devices can lead to defects?

My particular device is configurable to 16 bit, 24 bit, or 32 bit audio capture. The default for DirectShow for my device is 16 bit 48kHz for audio capture. The vendor created basically a DirectShow shim to their own software for use with third-party software. I would strongly prefer to use ffmpeg to not be locked into the vendor's software thus would like the ability to configure the capture beyond the offered defaults. If I use an older version of ffmpeg from before 2021-11-03, not only does the device report support for > 16 bit within ffmpeg, if I add the small hack of + { "sample_size", "set audio sample size", OFFSET(sample_size), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 32, DEC }, then configuration and capture from the device works from ffmpeg.

comment:5 by Balling, 2 months ago

My particular device is configurable to 16 bit, 24 bit, or 32 bit audio capture

What is 32 bit capture, is that 32 bit float or 32 bit integer? Almost nothing supports 32 bit int, besides recently flac and wavpack

Last edited 2 months ago by Balling (previous) (diff)

comment:6 by blurbdust, 2 weeks ago

After doing some testing, the device actually upscales from 24 bit to 32 bit int when doing a 32 bit capture in the vendor's software. I will be staying away from 32 bit capture from this device to avoid the upscaling. As for the requested updates within ffmpeg, if 32 bit int is too niche, then I suggest upping the limit to 24 bit int instead. That would change the suggested line update within the patch to + { "sample_size", "set audio sample size", OFFSET(sample_size), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 24, DEC },.

comment:7 by Balling, 13 days ago

Okay, so it does support 32 bit integer. That should be supported.

Note: See TracTickets for help on using tickets.