gdb奇怪的回溯

时间:2011-03-13 15:17:09

标签: c gdb coredump backtrace

我的程序是用dietlibc静态编译的。它是在ubuntu x64上编译的(使用-m32标志为x86编译)并在x86上运行。

编译后的大小只有100KB左右。我用-ggdb3编译它,没有优化标志。

我的程序使用signal.h来处理SIGSEGV信号,然后调用abort()。

程序运行几天没有问题但有时会出现段错误。这是我得到奇怪的回溯,我不明白:

username@ubuntu:~/Desktop$ gdb -c core.28569 program-name
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=i386-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from program-name...done.
[New Thread 28569]
Core was generated by `program-name'.
Program terminated with signal 6, Aborted.
#0  0x00914410 in __kernel_vsyscall ()
Setting up the environment for debugging gdb.
Function "internal_error" not defined.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
Function "info_command" not defined.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
.gdbinit:8: Error in sourced command file:
Argument required (one or more breakpoint numbers).
(gdb) bt
#0  0x00914410 in __kernel_vsyscall ()
During symbol reading, incomplete CFI data; unspecified registers (e.g., eax) at 0x914411.
#1  0x0804d7f4 in __unified_syscall ()
#2  0xbf8966c0 in ?? ()
#3  
#4  0x2054454e in ?? ()
#5  0x20524c43 in ?? ()
#6  0x2e352e33 in ?? ()
#7  0x32373033 in ?? ()
#8  0x2e203b39 in ?? ()
#9  0x2054454e in ?? ()
#10 0x20524c43 in ?? ()
#11 0x2e302e33 in ?? ()
#12 0x32373033 in ?? ()
#13 0x4d203b39 in ?? ()
#14 0x61696465 in ?? ()
#15 0x6e654320 in ?? ()
#16 0x20726574 in ?? ()
#17 0x36204350 in ?? ()
#18 0x203b302e in ?? ()
#19 0x54454e2e in ?? ()
#20 0x43302e34 in ?? ()
#21 0x00000029 in ?? ()
#22 0xbf8989a8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) bt full
#0  0x00914410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x0804d7f4 in __unified_syscall ()
No symbol table info available.
#2  0xbf8966c0 in ?? ()
No symbol table info available.
#3  
No symbol table info available.
#4  0x2054454e in ?? ()
No symbol table info available.
#5  0x20524c43 in ?? ()
No symbol table info available.
#6  0x2e352e33 in ?? ()
No symbol table info available.
#7  0x32373033 in ?? ()
No symbol table info available.
#8  0x2e203b39 in ?? ()
No symbol table info available.
#9  0x2054454e in ?? ()
No symbol table info available.
#10 0x20524c43 in ?? ()
No symbol table info available.
#11 0x2e302e33 in ?? ()
No symbol table info available.
#12 0x32373033 in ?? ()
No symbol table info available.
#13 0x4d203b39 in ?? ()
No symbol table info available.
#14 0x61696465 in ?? ()
No symbol table info available.
#15 0x6e654320 in ?? ()
No symbol table info available.
#16 0x20726574 in ?? ()
No symbol table info available.
#17 0x36204350 in ?? ()
No symbol table info available.
#18 0x203b302e in ?? ()
No symbol table info available.
#19 0x54454e2e in ?? ()
No symbol table info available.
#20 0x43302e34 in ?? ()
No symbol table info available.
#21 0x00000029 in ?? ()
No symbol table info available.
#22 0xbf8989a8 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) quit

2 个答案:

答案 0 :(得分:16)

这是一个堆栈超支。

#4  0x2054454e in ?? ()

看起来像文字,“TEN”或“NET”

#5  0x20524c43 in ?? ()

“RLC”或“CLR”

等等。

将地址视为文本 - 看看您是否可以识别此文本覆盖堆栈的位置。

答案 1 :(得分:6)

您的堆栈跟踪实际上非常容易理解:

  • 你在某处有SIGSEGV,
  • 您的信号处理程序执行了任何操作,然后调用abort()
  • 通过调用raise(2)
  • 发出了__unified_syscall()系统调用

GDB中没有堆栈跟踪的原因是

  • __unified_syscall在程序集中实现,
  • 不使用帧指针和
  • 没有适当的cfi指令来描述如何从中解除它。

我认为这是dietlibc中的一个错误,实际上很容易修复。看看这个(未经测试的)补丁是否为您修复了它:

--- dietlibc-0.31/i386/unified.S.orig   2011-03-13 10:16:23.000000000 -0700
+++ dietlibc-0.31/i386/unified.S    2011-03-13 10:21:32.000000000 -0700
@@ -31,8 +31,14 @@ __unified_syscall:
    movzbl  %al, %eax
 .L1:
    push    %edi
+        cfi_adjust_cfa_offset (4)
+        cfi_rel_offset (edi, 0)
    push    %esi
+        cfi_adjust_cfa_offset (4)
+        cfi_rel_offset (esi, 0)
    push    %ebx
+        cfi_adjust_cfa_offset (4)
+        cfi_rel_offset (ebx, 0)
    movl    %esp,%edi
    /* we use movl instead of pop because otherwise a signal would
       destroy the stack frame and crash the program, although it
@@ -61,8 +67,11 @@ __unified_syscall:
 #endif
 .Lnoerror:
    pop %ebx
+        cfi_adjust_cfa_offset (-4)
    pop %esi
+        cfi_adjust_cfa_offset (-4)
    pop %edi
+        cfi_adjust_cfa_offset (-4)

 /* here we go and "reuse" the return for weak-void functions */
 #include "dietuglyweaks.h"

如果您无法重建dietlibc,或者补丁不正确,您仍可以更好地分析堆栈跟踪。据我所知,__unified_syscall没有触及%ebp。因此,您可以通过这样做获得合理的堆栈跟踪:

define xbt
  set $xbp = (void **)$arg0
  while 1
    x/2a $xbp
    set $xbp = (void **)$xbp[0]
  end
end

xbt $ebp

注意:如果xbt有效,它可能会进入SIGSEGV信号帧周围的杂草(该帧也不使用帧指针)。这可能导致完全垃圾,或者跳过一两帧(这恰好是SIGSEGV发生的帧)。

所以你真的要更好地将正确的展开描述符放入dietlibc。