Opened 14 years ago

Closed 14 years ago

#272 closed defect (fixed)

altivec specific code segfaults on ppc when compiled with --enable-pic

Reported by: Kim Nguyen Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: ppc, PIC
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

When compiled with --enable-pic, ffplay segfault when using code in libavcodec/ppc/fft_altivec_s.S on various files. For instance when decoding aac or ac3 audio streams. This is always reproducible and do not happen when disabling pic (the default):

Tested with latest git (47d2ca3205b53665328fe301879c339449db7a1d) but this has been the case for a long time I believe, I started remarking this when upgrading from mythtv 0.23 to 0.24, several months ago.

from master:
./configure --enable-pic
make
gdb ./ffplay_g ~myth/aac.dump

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x493b5460 (LWP 32531)]
0x10361e3c in ff_fft_calc_altivec () at libavcodec/ppc/fft_altivec_s.S:447
447 DECL_FFTS 0

Full gdb output attached (with bt, disass and register dump) attached.
I believe the cause of the crash to be the loading of a global symbol in a non pic compatible way.

Just doing ./configure and make, the file plays normally.

Attachments (2)

gdb.txt (16.1 KB ) - added by Kim Nguyen 14 years ago.
bt, disass, info all-registers of ffplay_g
full_gdb.txt (19.3 KB ) - added by Kim Nguyen 14 years ago.
GDB trace for the file aac2.dump with full disassembly

Download all attachments as: .zip

Change History (17)

by Kim Nguyen, 14 years ago

Attachment: gdb.txt added

bt, disass, info all-registers of ffplay_g

comment:1 by Carl Eugen Hoyos, 14 years ago

Status: newopen
Version: unspecifiedgit-master

If you believe the crash did not happen in the past, please find the version introducing it.

comment:2 by Kim Nguyen, 14 years ago

Hi again,
the offending commit is this one (hurray for git bissect):


commit a46b84d1204d3cd2de14f08de29afee08f8f86d0
Author: Måns Rullgård <mans@mansr.com>
Date: Sun Jul 4 18:33:47 2010 +0000

PPC: convert Altivec FFT to pure assembler


On PPC a leaf function has a 288-byte red zone below the stack pointer,
sparing these functions the chore of setting up a full stack frame.


When a function call is disguised within an inline asm block, the
compiler might not adjust the stack pointer as required before a
function call, resulting in the red zone being clobbered.


Moving the entire function to pure asm avoids this problem and also
results in somewhat better code.


Originally committed as revision 24044 to svn://svn.ffmpeg.org/ffmpeg/trunk


Reverting this commit on top of master (which is f9ecb849ef39bc337d9439b829fe08da5c95cc3d at the moment) fixes the issue for me. I guess it went unnoticed for so long because
1) ppc is pretty rare these days
2) other software using ffmpeg as a shared library (thus using -fPIC) used to embbed it and link statically until fairly recently, or just shipped older versions.

(From the commit message though, reverting might introduce some other bugs that the commit fixed).

comment:3 by Carl Eugen Hoyos, 14 years ago

I compiled current FFmpeg on a G4 with --enable-pic (made sure about #define HAVE_ALTIVEC 1), tested two files containing aac audio, made sure ff_fft_calc_altivec() was called, but cannot reproduce a crash here.
(Your output is incomplete, unfortunately.)

comment:4 by Kim Nguyen, 14 years ago

I have confirmed this with a stream from my internet/tv box. The sample is available here:

http://idea.nguyen.vg/~kim/aac2.dump

(few seconds of a french commercial). Video is H264, Audio is AAC. When compiled with --enable-pic
with commit a46b84d1204 reverted, ffplay plays audio and video fine. With plain master
(f9ecb849ef3) it segfaults as explained above. The stream also plays fine on x86 vlc (just to check that it's not too corrupt).

Here are more infos and I'm attaching a gdb trace with full disassembly of the function (instead of pc-32, pc+32).

Machine is a G4 MacMini :


/proc/cpuinfo:

processor : 0
cpu : 7447A, altivec supported
clock : 1416.666661MHz
revision : 1.2 (pvr 8003 0102)
bogomips : 83.24
timebase : 41620997
platform : PowerMac
model : PowerMac10,1
machine : PowerMac10,1
motherboard : PowerMac10,1 MacRISC3 Power Macintosh
detected as : 287 (Mac mini)
pmac flags : 00000010
L2 cache : 512K unified
pmac-generation : NewWorld
Memory : 1024 MB


gcc version:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/powerpc-linux-gnu/gcc/powerpc-linux-gnu/4.5.2/lto-wrapper
Target: powerpc-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=powerpc-linux-gnu --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/powerpc-linux-gnu --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib/powerpc-linux-gnu --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --enable-secureplt --disable-softfloat --enable-targets=powerpc-linux,powerpc64-linux --with-cpu=default32 --disable-werror --with-long-double-128 --enable-checking=release --build=powerpc-linux-gnu --host=powerpc-linux-gnu --target=powerpc-linux-gnu
Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)


as version:

GNU assembler version 2.21.0 (powerpc-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.21.0.20110327


distrib is ubuntu natty, ppc (libc6 version is 2.13 egcs)

by Kim Nguyen, 14 years ago

Attachment: full_gdb.txt added

GDB trace for the file aac2.dump with full disassembly

comment:5 by Carl Eugen Hoyos, 14 years ago

Since I am unable to reproduce the crash with the sample on a G4 after compiling with --enable-pic (and with altivec enabled):
Could you please provide complete, uncut output of ffmpeg when crashing?
(If ffmpeg does not crash, but ffplay does, please mention it and provide complete, uncut output of ffplay.)

$ ffmpeg -i aac2.dump -f null -
ffmpeg version git-N-30662-gf9ecb84, Copyright (c) 2000-2011 the FFmpeg developers
  built on Jun  9 2011 09:58:10 with gcc 4.0.1 (Apple Inc. build 5493)
  configuration: --enable-pic
  libavutil    51.  8. 0 / 51.  8. 0
  libavcodec   53.  6. 1 / 53.  6. 1
  libavformat  53.  2. 0 / 53.  2. 0
  libavdevice  53.  1. 1 / 53.  1. 1
  libavfilter   2. 14. 1 /  2. 14. 1
  libswscale    0. 14. 1 /  0. 14. 1
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[aac @ 0x1012400] channel element 2.10 is not allocated
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[h264 @ 0x1010800] non-existing PPS referenced
[h264 @ 0x1010800] non-existing PPS 0 referenced
[h264 @ 0x1010800] decode_slice_header error
[h264 @ 0x1010800] no frame!
[mpegts @ 0x1005400] max_analyze_duration 5000000 reached at 5034667
Input #0, mpegts, from 'aac2.dump':
  Duration: 00:00:10.24, start: 3033.232422, bitrate: 1791 kb/s
  Program 1352
    Stream #0.0[0x23](fra): Subtitle: [6][0][0][0] / 0x0006
    Stream #0.1[0xa3]: Video: h264 (High), yuv420p, 480x576 [PAR 32:15 DAR 16:9], 29.06 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0.2[0x778](fra): Audio: aac, 48000 Hz, stereo, s16, 73 kb/s
[buffer @ 0xc0e110] w:480 h:576 pixfmt:yuv420p tb:1/1000000 sar:32/15 sws_param:
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf53.2.0
    Stream #0.0: Video: rawvideo, yuv420p, 480x576 [PAR 32:15 DAR 16:9], q=2-31, 200 kb/s, 90k tbn, 25 tbc
    Stream #0.1(fra): Audio: pcm_s16be, 48000 Hz, stereo, s16, 1536 kb/s
Stream mapping:
  Stream #0.1 -> #0.0
  Stream #0.2 -> #0.1
Press [q] to stop, [?] for help
[aac @ 0x1012400] channel element 2.10 is not allocated
Error while decoding stream #0.2
frame=   85 fps= 39 q=0.0 Lsize=      -0kB time=00:00:02.51 bitrate=  -0.1kbits/s dup=13 drop=0
video:0kB audio:472kB global headers:0kB muxing overhead -100.004552%

comment:6 by Carl Eugen Hoyos, 14 years ago

Reproduced by developer: set

Reproducible on Linux (not on OS X where gas-preprocessor.pl is used).

comment:7 by reimar, 14 years ago

Are you sure this isn't specific to your compiler version?
LLVM and possible also gcc made an ABI change that optimized the TOC register r2 away.
The current code thus can't work anymore.

in reply to:  7 comment:8 by Kim Nguyen, 14 years ago

Replying to reimar:

Are you sure this isn't specific to your compiler version?
LLVM and possible also gcc made an ABI change that optimized the TOC register r2 away.
The current code thus can't work anymore.

I think that's what is happening. However I'm using a fairly standard compiler (which I think is the default in any recent distro: gcc 4.5.x) Also tested with gcc 4.4.5 with the same results so I guess this change in gcc is quite old. However I came up with a proof of concept patch to load the address of local symbols in a pic friendly way. (I use the standard trick of computing the space between a fixed label and the symbol to be loaded and then add this to the adresss of the fixed label which is determined at runtime using a phony branch instruction). I've tested it on linux ppc 32bit and I'm now able to play the above sample without problems.

diff --git a/libavcodec/ppc/asm.S b/libavcodec/ppc/asm.S
index e372d53..30e0953 100644
--- a/libavcodec/ppc/asm.S
+++ b/libavcodec/ppc/asm.S
@@ -67,7 +67,11 @@ X(\name):
 
 .macro movrel rd, sym
 #if CONFIG_PIC
-    lwz     \rd, \sym@got(r2)
+    bcl             20, 31, lab_pic_\@
+lab_pic_\@:
+    mflr    \rd
+    addis   \rd, \rd, (\sym - lab_pic_\@)@ha
+    addi    \rd, \rd, (\sym - lab_pic_\@)@l    
 #else
     lis     \rd, \sym@ha
     la      \rd, \sym@l(\rd)

comment:9 by Carl Eugen Hoyos, 14 years ago

Apple gcc-4.2 with --enable-pic fails compilation:

$ make
AS      libavcodec/ppc/fft_altivec_s.o
libavcodec/ppc/fft_altivec_s.S:747:Parameter syntax error (parameter 2)
libavcodec/ppc/fft_altivec_s.S:747:Invalid mnemonic 'got(r2)'
libavcodec/ppc/fft_altivec_s.S:787:Parameter syntax error (parameter 2)
libavcodec/ppc/fft_altivec_s.S:787:Invalid mnemonic 'got(r2)'
libavcodec/ppc/fft_altivec_s.S:794:Parameter syntax error (parameter 2)
libavcodec/ppc/fft_altivec_s.S:794:Invalid mnemonic 'got(r2)'
libavcodec/ppc/fft_altivec_s.S:1282:Parameter syntax error (parameter 2)
libavcodec/ppc/fft_altivec_s.S:1282:Invalid mnemonic 'got(r2)'
libavcodec/ppc/fft_altivec_s.S:1322:Parameter syntax error (parameter 2)
libavcodec/ppc/fft_altivec_s.S:1322:Invalid mnemonic 'got(r2)'
libavcodec/ppc/fft_altivec_s.S:1329:Parameter syntax error (parameter 2)
libavcodec/ppc/fft_altivec_s.S:1329:Invalid mnemonic 'got(r2)'
make: *** [libavcodec/ppc/fft_altivec_s.o] Error 1

comment:10 by Kim Nguyen, 14 years ago

This is strange, my patch is removing the @got(r2), not adding it. I don't know why it remains
when compiled on osx. Maybe it's due to gas-preprocessor.pl or something.

comment:11 by Carl Eugen Hoyos, 14 years ago

(Sorry, I had originally written my comment before you posted your patch.)

Apple gcc 4.2 fails with current, unpatched FFmpeg git with --enable-pic

comment:12 by Carl Eugen Hoyos, 14 years ago

With the patch, compilation succeeds with Apple gcc 4.2 and decoding also works on Linux.

comment:13 by Michael Niedermayer, 14 years ago

what performance effect does this patch have?
it could slow things down for cases that dont need the extra code

comment:14 by Kim Nguyen, 14 years ago

what performance effect does this patch have?
it could slow things down for cases that dont need the extra code

Depends on what you mean by 'cases that don't need the extra code'. If you compile without PIC, the patch is not used (#ifdef) so no performance loss here, and if you compile with -fPIC and a 'recent' gcc nothing works anyway (gcc 4.2 which triggers the bug was released in 2007). I can also confirm that ASM + patch remains much faster than using the plain C version of the function.

By the way, gcc is using the same trick to load global symbol in -fPIC code nowadays:

static int GLOBAL[3] = {1, 2, 3};
int f(int x) { return GLOBAL[x]; }

Compiling with gcc -O0 -S -c foo.c one can see that the address of GLOBAL is computed as so:

	bcl 20,31,.L2
.L2:
    	mflr 30
	addis 30,30,.LCTOC1-.L2@ha
	addi 30,30,.LCTOC1-.L2@l

where .LCTOC1 is the label marking the begining of the TOC section. This is exactly what the patch does.

Anyway the movrel macro at issue is always followed by *several* memory accesses so one extra unconditional jump can hardly impact performances. I also belive this patch to be backward compatible so the code can still be compiled with older versions of gcc/as (this is untested though).

What the patch does not handle at the moment is the PPC64 architecture (I'm not even sure this arch is affected by this bug).

in reply to:  14 comment:15 by Michael Niedermayer, 14 years ago

Resolution: fixed
Status: openclosed

Patch applied
i assume this fixes this ticket

Thanks for the patch

Note: See TracTickets for help on using tickets.