Opened 13 years ago

Closed 12 years ago

#1227 closed defect (fixed)

crash in ff_put_h264_chroma_mc8_neon

Reported by: elioxia Owned by:
Priority: important Component: avcodec
Version: git-master Keywords: arm crash h264
Cc: bruce-wu Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

attached video: 1024x384 h264 annexb byte stream

the video size is important (half of 1024x768) as a similar full sized video works fine

reproducible on master (fc882b6e), 6.5, 10.2

built with gcc4.4.3 (android ndk r7b), gcc4.6.2 (linaro android toolchain)
tested on various android devices

Attachments (3)

out.h264 (28.0 KB ) - added by elioxia 13 years ago.
sample video file
bug-test-video.mp4 (2.0 MB ) - added by bruce-wu 12 years ago.
format: mp4, video: h264, audio: aac, size:672x384
0001-modify-macro-h264_chroma_mc8-and-h264_chroma_mc4-pat.patch (6.7 KB ) - added by bruce-wu 12 years ago.
a patch for bug #1227

Change History (23)

by elioxia, 13 years ago

Attachment: out.h264 added

sample video file

comment:1 by Carl Eugen Hoyos, 13 years ago

Keywords: arm added

Please provide gdb backtrace etc. as described on http://ffmpeg.org/bugreports.html

comment:2 by Michael Niedermayer, 12 years ago

Not reproduceable on OMAP4 Panda board you will have to provide a backtrace or its very unlikely we will be able to fix this.

by bruce-wu, 12 years ago

Attachment: bug-test-video.mp4 added

format: mp4, video: h264, audio: aac, size:672x384

comment:3 by bruce-wu, 12 years ago

I have the same problem. My test android phone is google nexus S, its information is:
CPU FEATURES: swp half thumb fastmult vfp edsp thumbee neon vfpv3
ANDROID VERSION: 4.03

I use android-ndk-r8b to build ffmpeg for Android.

The test video is attached. Here is the detailed information about this video:
format: mp4
video codec: h24
audio codec: aac
size: 672x384

And gdb backtrack extracted by logcat:
I/DEBUG ( 77): * * * * * * * * * * * * * * * *
I/DEBUG ( 77): Build fingerprint: 'unknown'
I/DEBUG ( 77): pid: 1035, tid: 1071 >>> com.abc.ui <<<
I/DEBUG ( 77): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 52f2601d
I/DEBUG ( 77): r0 530dc600 r1 52f2601d r2 00000160 r3 00000000
I/DEBUG ( 77): r4 00000018 r5 00000000 r6 00000000 r7 00000000
I/DEBUG ( 77): r8 5305fd5b r9 00000180 10 0000009d fp 00000001
I/DEBUG ( 77): ip 00000028 sp 5251eae4 lr 50e8e834 pc 511040dc cpsr 60000010
I/DEBUG ( 77): d0 1818181818181818 d1 2828282828282828
I/DEBUG ( 77): d2 0046004f0051004f d3 003e003e003e003f
I/DEBUG ( 77): d4 8080808080808080 d5 8080808080808080
I/DEBUG ( 77): d6 7b7b7b7b7c7d7e7e d7 7b7b7b7b7b7c7d7e
I/DEBUG ( 77): d8 0000000042900000 d9 43f0000000000000
I/DEBUG ( 77): d10 0000000000000000 d11 0000000000000000
I/DEBUG ( 77): d12 0000000000000000 d13 0000000000000000
I/DEBUG ( 77): d14 0000000000000000 d15 0000000000000000
I/DEBUG ( 77): d16 7b7b7b7b7b7c7d7e d17 7b7b7b7b7b7c7d7e
I/DEBUG ( 77): d18 1ed81f181f581f80 d19 1ec01ec01ec01ec0
I/DEBUG ( 77): d20 00480048004d0050 d21 003e003e003f0045
I/DEBUG ( 77): d22 007e007e007e007e d23 007f007f007f007f
I/DEBUG ( 77): d24 0101010101010101 d25 ffffffffffffffff
I/DEBUG ( 77): d26 ffffffffffffffff d27 2c2c2c2c2c2c2c2c
I/DEBUG ( 77): d28 1f1f1f2026292827 d29 1f1f1f2026292827
I/DEBUG ( 77): d30 1f1f1f1f1f1f2026 d31 1f1f1f1f1f1f2026
I/DEBUG ( 77): scr 20000012
I/DEBUG ( 77):
I/DEBUG ( 77): #00 pc 0047b0dc /data/data/com.abc.ui/lib/libffmpeg.so (ff_put_h264_chroma_mc8_neon)
I/DEBUG ( 77): #01 lr 50e8e834 /data/data/com.abc.ui/lib/libffmpeg.so
I/DEBUG ( 77):
I/DEBUG ( 77): code around pc:
I/DEBUG ( 77): 511040bc f3c50801 f4214ac2 f3c62c00 f3c72801 .....J!..,...(..
I/DEBUG ( 77): 511040cc f5d1f000 f2b45105 f2ca0870 f2ca1872 .....Q..p...r...
I/DEBUG ( 77): 511040dc f4216ac2 f2b67107 f44007d2 f44017d2 .j!..q....@...@.
I/DEBUG ( 77): 511040ec caffffef e8bd80f0 e92d40f0 e1cd41d4 .........@-..A..
I/DEBUG ( 77): 511040fc e1a0e000 f5d1f000 f7d1f002 e0170594 ................
I/DEBUG ( 77):
I/DEBUG ( 77): code around lr:
I/DEBUG ( 77): 50e8e814 e1a01006 e7942002 e59d007c e58dc004 ..... ..|.......
I/DEBUG ( 77): 50e8e824 e59d3068 e58de000 e59dc08c e12fff3c h0..........<./.
I/DEBUG ( 77): 50e8e834 e28dd044 e8bd8ff0 e59d0034 e0831000 D.......4.......
I/DEBUG ( 77): 50e8e844 e285300f e1530001 aaffff7d e59d1030 .0....S.}...0...
I/DEBUG ( 77): 50e8e854 e289300f e0822001 e1530002 b3a02000 .0... ....S.. ..
I/DEBUG ( 77):
I/DEBUG ( 77): memory map around addr 52f2601d:
I/DEBUG ( 77): 52f14000-52f26000
I/DEBUG ( 77): (no map for address)
I/DEBUG ( 77): 52f2d000-52f2e000 /dev/pvrsrvkm
I/DEBUG ( 77):
I/DEBUG ( 77): stack:
I/DEBUG ( 77): 5251eaa4 0000046c
I/DEBUG ( 77): 5251eaa8 5251ead0
I/DEBUG ( 77): 5251eaac 5118d618 /data/data/com.abc.ui/lib/libffmpeg.so
I/DEBUG ( 77): 5251eab0 51741c20
I/DEBUG ( 77): 5251eab4 51741aa0
I/DEBUG ( 77): 5251eab8 504c2140
I/DEBUG ( 77): 5251eabc 00000aac
I/DEBUG ( 77): 5251eac0 000019e0
I/DEBUG ( 77): 5251eac4 5118d618 /data/data/com.abc.ui/lib/libffmpeg.so
I/DEBUG ( 77): 5251eac8 5118d618 /data/data/com.abc.ui/lib/libffmpeg.so
I/DEBUG ( 77): 5251eacc 5047e010
I/DEBUG ( 77): 5251ead0 00000000
I/DEBUG ( 77): 5251ead4 00000002
I/DEBUG ( 77): 5251ead8 df0027ad
I/DEBUG ( 77): 5251eadc 00000000
I/DEBUG ( 77): 5251eae0 00000005
I/DEBUG ( 77): #00 5251eae4 5047e010
I/DEBUG ( 77): 5251eae8 000000c0
I/DEBUG ( 77): 5251eaec 52f253bd
I/DEBUG ( 77): 5251eaf0 0000332c
I/DEBUG ( 77): 5251eaf4 50e8e834 /data/data/com.abc.ui/lib/libffmpeg.so
I/DEBUG ( 77): 5251eaf8 00000005
I/DEBUG ( 77): 5251eafc 00000000
I/DEBUG ( 77): 5251eb00 504810de
I/DEBUG ( 77): 5251eb04 504810fe
I/DEBUG ( 77): 5251eb08 5048111e
I/DEBUG ( 77): 5251eb0c 00003000
I/DEBUG ( 77): 5251eb10 00003808
I/DEBUG ( 77): 5251eb14 00003808
I/DEBUG ( 77): 5251eb18 504c3450
I/DEBUG ( 77): 5251eb1c 0004213b
I/DEBUG ( 77): 5251eb20 00000001
I/DEBUG ( 77): 5251eb24 00000000
I/DEBUG ( 77): 5251eb28 00000180

comment:4 by bruce-wu, 12 years ago

By the way, I use ffmpeg 0.8.10 and the bug can be reproduced on ffmpeg 0.11

in reply to:  4 comment:5 by Carl Eugen Hoyos, 12 years ago

Replying to bruce-wu:

By the way, I use ffmpeg 0.8.10 and the bug can be reproduced on ffmpeg 0.11

Please provide a backtrace for the crash with FFmpeg 0.11 (or, even better, current git head) as explained on http://ffmpeg.org/bugreports.html

comment:6 by Carl Eugen Hoyos, 12 years ago

Cc: bruce-wu added

comment:7 by Carl Eugen Hoyos, 12 years ago

Keywords: crash added

comment:8 by bruce-wu, 12 years ago

After investigation, I found what the problem is: read memory outside of the bound of the array pointed to by register R1 in MACRO h264_chroma_mc8 or MACRO h264_chroma_mc4 in libavcodec/arm/h264dsp_neon.S (verion 0.8.10) or libavcodec/arm/h264cmc_neon.S(version 0.11.1). I fixed the bug by modifying those two macros. Here is updated macros in version 0.8.10:

/* chroma_mc8(uint8_t *dst, uint8_t *src, int stride, int h, int x, int y) */
        .macro  h264_chroma_mc8 type
function ff_\type\()_h264_chroma_mc8_neon, export=1
        push            {r4-r7, lr}
        ldrd            r4,  [sp, #20]
.ifc \type,avg
        mov             lr,  r0
.endif
        pld             [r1]
        pld             [r1, r2]

        muls            r7,  r4,  r5
        rsb             r6,  r7,  r5,  lsl #3
        rsb             ip,  r7,  r4,  lsl #3
        sub             r4,  r7,  r4,  lsl #3
        sub             r4,  r4,  r5,  lsl #3
        add             r4,  r4,  #64

        beq             2f

        add             r5,  r1,  r2

        vdup.8          d0,  r4
        lsl             r4,  r2,  #1
        vdup.8          d1,  ip
        vld1.64         {d4, d5}, [r1], r4
        vdup.8          d2,  r6
        vdup.8          d3,  r7

        vext.8          d5,  d4,  d5,  #1

1:
        vld1.64         {d6, d7}, [r5], r4
        pld             [r5]
        vmull.u8        q8,  d4,  d0
        vext.8          d7,  d6,  d7,  #1
        vmlal.u8        q8,  d5,  d1
        vld1.64         {d4, d5}, [r1], r4
        vmlal.u8        q8,  d6,  d2
        vext.8          d5,  d4,  d5,  #1
        vmlal.u8        q8,  d7,  d3
        vmull.u8        q9,  d6,  d0
        subs            r3,  r3,  #2
        vmlal.u8        q9,  d7,  d1
        vmlal.u8        q9,  d4,  d2
        vmlal.u8        q9,  d5,  d3
        vrshrn.u16      d16, q8,  #6
        /*vld1.64         {d6, d7}, [r5], r4*/
        pld             [r1]
        vrshrn.u16      d17, q9,  #6
.ifc \type,avg
        vld1.64         {d20}, [lr,:64], r2
        vld1.64         {d21}, [lr,:64], r2
        vrhadd.u8       q8,  q8,  q10
.endif
        /*vext.8          d7,  d6,  d7,  #1*/
        vst1.64         {d16}, [r0,:64], r2
        vst1.64         {d17}, [r0,:64], r2
        bgt             1b

        pop             {r4-r7, pc}

2:      tst             r6,  r6
        add             ip,  ip,  r6
        vdup.8          d0,  r4
        vdup.8          d1,  ip

        beq             4f

        add             r5,  r1,  r2
        lsl             r4,  r2,  #1

3:
        vld1.64         {d4}, [r1], r4
        vld1.64         {d6}, [r5], r4

        pld             [r5]
        vmull.u8        q8,  d4,  d0
        vmlal.u8        q8,  d6,  d1
        vmull.u8        q9,  d6,  d0
        vmlal.u8        q9,  d4,  d1
        vrshrn.u16      d16, q8,  #6
        vrshrn.u16      d17, q9,  #6
.ifc \type,avg
        vld1.64         {d20}, [lr,:64], r2
        vld1.64         {d21}, [lr,:64], r2
        vrhadd.u8       q8,  q8,  q10
.endif
        subs            r3,  r3,  #2
        pld             [r1]
        vst1.64         {d16}, [r0,:64], r2
        vst1.64         {d17}, [r0,:64], r2
        bgt             3b

        pop             {r4-r7, pc}

4:      vld1.64         {d4, d5}, [r1], r2
        vld1.64         {d6, d7}, [r1], r2
        vext.8          d5,  d4,  d5,  #1
        vext.8          d7,  d6,  d7,  #1

        pld             [r1]
        subs            r3,  r3,  #2
        vmull.u8        q8,  d4,  d0
        vmlal.u8        q8,  d5,  d1
        vmull.u8        q9,  d6,  d0
        vmlal.u8        q9,  d7,  d1
        pld             [r1]
        vrshrn.u16      d16, q8,  #6
        vrshrn.u16      d17, q9,  #6
.ifc \type,avg
        vld1.64         {d20}, [lr,:64], r2
        vld1.64         {d21}, [lr,:64], r2
        vrhadd.u8       q8,  q8,  q10
.endif
        vst1.64         {d16}, [r0,:64], r2
        vst1.64         {d17}, [r0,:64], r2
        bgt             4b

        pop             {r4-r7, pc}
endfunc
        .endm

/* chroma_mc4(uint8_t *dst, uint8_t *src, int stride, int h, int x, int y) */
        .macro  h264_chroma_mc4 type
function ff_\type\()_h264_chroma_mc4_neon, export=1
        push            {r4-r7, lr}
        ldrd            r4,  [sp, #20]
.ifc \type,avg
        mov             lr,  r0
.endif
        pld             [r1]
        pld             [r1, r2]

        muls            r7,  r4,  r5
        rsb             r6,  r7,  r5,  lsl #3
        rsb             ip,  r7,  r4,  lsl #3
        sub             r4,  r7,  r4,  lsl #3
        sub             r4,  r4,  r5,  lsl #3
        add             r4,  r4,  #64

        beq             2f

        add             r5,  r1,  r2

        vdup.8          d0,  r4
        lsl             r4,  r2,  #1
        vdup.8          d1,  ip
        vld1.64         {d4},     [r1], r4
        vdup.8          d2,  r6
        vdup.8          d3,  r7

        vext.8          d5,  d4,  d5,  #1
        vtrn.32         d0,  d1
        vtrn.32         d2,  d3
        vtrn.32         d4,  d5

1:
        vld1.64         {d6},     [r5], r4
        pld             [r5]
        vext.8          d7,  d6,  d7,  #1
        vmull.u8        q8,  d4,  d0
        vtrn.32         d6,  d7

        vld1.64         {d4},     [r1], r4
        vmlal.u8        q8,  d6,  d2

        vext.8          d5,  d4,  d5,  #1
        vmull.u8        q9,  d6,  d0
        vtrn.32         d4,  d5
        vmlal.u8        q9,  d4,  d2

        vadd.i16        d16, d16, d17
        vadd.i16        d17, d18, d19
        vrshrn.u16      d16, q8,  #6
        subs            r3,  r3,  #2
        pld             [r1]
.ifc \type,avg
        vld1.32         {d20[0]}, [lr,:32], r2
        vld1.32         {d20[1]}, [lr,:32], r2
        vrhadd.u8       d16, d16, d20
.endif
        vst1.32         {d16[0]}, [r0,:32], r2
        vst1.32         {d16[1]}, [r0,:32], r2
        bgt             1b

        pop             {r4-r7, pc}

2:      tst             r6,  r6
        add             ip,  ip,  r6
        vdup.8          d0,  r4
        vdup.8          d1,  ip
        vtrn.32         d0,  d1

        beq             4f

        vext.32         d1,  d0,  d1,  #1
        add             r5,  r1,  r2
        lsl             r4,  r2,  #1
        vld1.32         {d4[0]},  [r1], r4

3:
        vld1.32         {d4[1]},  [r5], r4
        pld             [r5]
        vmull.u8        q8,  d4,  d0
        vld1.32         {d4[0]},  [r1], r4
        vmull.u8        q9,  d4,  d1

        vadd.i16        d16, d16, d17
        vadd.i16        d17, d18, d19
        vrshrn.u16      d16, q8,  #6
.ifc \type,avg
        vld1.32         {d20[0]}, [lr,:32], r2
        vld1.32         {d20[1]}, [lr,:32], r2
        vrhadd.u8       d16, d16, d20
.endif
        subs            r3,  r3,  #2
        pld             [r1]
        vst1.32         {d16[0]}, [r0,:32], r2
        vst1.32         {d16[1]}, [r0,:32], r2
        bgt             3b

        pop             {r4-r7, pc}

4:      vld1.64         {d4},     [r1], r2
        vld1.64         {d6},     [r1], r2
        vext.8          d5,  d4,  d5,  #1
        vext.8          d7,  d6,  d7,  #1
        vtrn.32         d4,  d5
        vtrn.32         d6,  d7

        vmull.u8        q8,  d4,  d0
        vmull.u8        q9,  d6,  d0
        subs            r3,  r3,  #2

        vadd.i16        d16, d16, d17
        vadd.i16        d17, d18, d19
        pld             [r1]
        vrshrn.u16      d16, q8,  #6
.ifc \type,avg
        vld1.32         {d20[0]}, [lr,:32], r2
        vld1.32         {d20[1]}, [lr,:32], r2
        vrhadd.u8       d16, d16, d20
.endif
        pld             [r1]
        vst1.32         {d16[0]}, [r0,:32], r2
        vst1.32         {d16[1]}, [r0,:32], r2
        bgt             4b

        pop             {r4-r7, pc}
endfunc
        .endm

As shown in the code, register R1 points to ARRAY src (type is uint_t*). The idea in the modification is to test if register R3 (ARGUMENT h in caller of C program) is less than or equal to zero before reading elements pointed to by registe R1. If it is, then skip reading and jump to the end of function.

I tested the code using several videos, and it works. For version 0.11.1, the modification is the same.

Last edited 12 years ago by Carl Eugen Hoyos (previous) (diff)

comment:9 by Michael Niedermayer, 12 years ago

please submit a "git format-patch" or a normal patch as generated by git diff of the relevant changes. Its not possible nor reliable to extract from the posted code what has been changed

comment:10 by Gregory, 12 years ago

I had the same problem on iOS, and did a quick test porting the changes to the ffmpeg version we use (same neon file as 0.11.2), works fine!

in reply to:  10 comment:11 by Carl Eugen Hoyos, 12 years ago

Replying to Gregory:

I had the same problem on iOS, and did a quick test porting the changes to the ffmpeg version we use (same neon file as 0.11.2), works fine!

Please consider either posting the patch here or sending it to ffmpeg-devel (where it will probably receive more attention) to allow other users to profit from the fix!

comment:12 by bruce-wu, 12 years ago

A patch for bug #1227 is attached. I use git version of ffmpeg, which I guess it's newest.

in reply to:  12 comment:13 by Michael Niedermayer, 12 years ago

Replying to bruce-wu:

A patch for bug #1227 is attached. I use git version of ffmpeg, which I guess it's newest.

Does this slow the code down ?
are all changes really needed ? from a quick look some look unneeded

comment:14 by bruce-wu, 12 years ago

This patch may slow the code down a little bit, but it works. The original code seems too ambitious to be correct. Anyway, it is just my proposal. If someone have more efficient code to fix the bug, I will appreciate it.

in reply to:  14 comment:15 by Carl Eugen Hoyos, 12 years ago

Replying to bruce-wu:

This patch may slow the code down a little bit, but it works.

Are all of your changes necessary to fix the crash?

comment:16 by elioxia, 12 years ago

I suggest 32byte-padded buffer as a workaround, but cannot understand why 16byte padding is not enough in the first place.

comment:17 by Michael Niedermayer, 12 years ago

Keywords: h264 added
Resolution: fixed
Status: newclosed

Ive applied the patch to fix the crash for now, patches that make the code faster without reintroducing the crash are very welcome !

comment:18 by Michael Niedermayer, 12 years ago

Resolution: fixed
Status: closedreopened

Ive made a mistake during testing, the patch is not working, it breaks h264 decoding as can be seen with "make fate" with --samples=...
Thus patch reverted, please submit a working patch and make sure that speed loss is kept to a minimum if it cannot be entirely avoided

in reply to:  16 comment:19 by Michael Niedermayer, 12 years ago

Replying to elioxia:

I suggest 32byte-padded buffer as a workaround, but cannot understand why 16byte padding is not enough in the first place.

The problem the way i understand it (i couldnt reproduce a crash) is that this code reads a line too much not just a few bytes

comment:20 by Michael Niedermayer, 12 years ago

Resolution: fixed
Status: reopenedclosed

This should have been fixed a month ago or so, please reopen if its still crashing with latest ffmpeg

Note: See TracTickets for help on using tickets.