GDB调试具有特定调用堆栈的缺失符号表的coredump

时间:2017-01-29 13:37:02

标签: c++ linux gdb coredump

我得到了这个奇怪的崩溃,我不知道如何调试核心转储,因为调用堆栈由于某种原因缺少符号信息,除了最后一个函数:

#0  BIH::intersectRay<VMAP::MapRayCallback> (this=0x7f47b8339608, r=..., intersectCallback=..., maxDist=@0x7f493af8383c: 0, stopAtFirst=true, los=<optimized out>) at ../BIH.h:223
#1  0x000000307ff00000 in ?? ()
#2  0x7ff0000000000000 in ?? ()
#3  0x0000000000000030 in ?? ()
#4  0x000000307ff00000 in ?? ()
#5  0x7ff0000000000000 in ?? ()
#6  0x0000000000000030 in ?? ()
#7  0x000000307ff00000 in ?? ()
#8  0x7ff0000000000000 in ?? ()
#9  0x0000000000000030 in ?? ()
#10 0x000000307ff00000 in ?? ()
#11 0x7ff0000000000000 in ?? ()
#12 0x0000000000000030 in ?? ()
#13 0x000000307ff00000 in ?? ()
#14 0x7ff0000000000000 in ?? ()
#15 0x0000000000000030 in ?? ()
#16 0x000000307ff00000 in ?? ()
#17 0x7ff0000000000000 in ?? ()
#18 0x0000000000000030 in ?? ()
#19 0x000000307ff00000 in ?? ()
#20 0x7ff0000000000000 in ?? ()
#21 0x0000000000000030 in ?? ()
#22 0x000000307ff00000 in ?? ()
....
#749 0x7ff0000000000000 in ?? ()
#750 0x0000000000000030 in ?? ()
#751 0x000000307ff00000 in ?? ()
#752 0x7ff0000000000000 in ?? ()
#753 0x0000000000000030 in ?? ()
#754 0x000000307ff00000 in ?? ()
#755 0x7ff0000000000000 in ?? ()
#756 0x0000000000000030 in ?? ()
#757 0x000000307ff00000 in ?? ()
#758 0x7ff0000000000000 in ?? ()
#759 0x0000000000000030 in ?? ()
#760 0x000000307ff00000 in ?? ()
#761 0x7ff0000000000000 in ?? ()
#762 0x0000000000000030 in ?? ()
#763 0x000000307ff00000 in ?? ()
#764 0x03010102464c457f in ?? ()
#765 0x0000000000000000 in ?? ()`


(gdb) info frame 0
Stack frame at 0x7f493af83830:
 rip = 0x930f0b in BIH::intersectRay<VMAP::MapRayCallback> (../BIH.h:223); saved rip = 0x307ff00000
 called by frame at 0x7f493af83838
 source language c++.
 Arglist at 0x7f493af83438, args: this=0x7f47b8339608, r=..., intersectCallback=..., maxDist=@0x7f493af8383c: 0, stopAtFirst=true, los=<optimized out>
 Locals at 0x7f493af83438, Previous frame's sp is 0x7f493af83830
 Saved registers:
  rbx at 0x7f493af837f8, rbp at 0x7f493af83800, r12 at 0x7f493af83808, r13 at 0x7f493af83810, r14 at 0x7f493af83818, r15 at 0x7f493af83820, rip at 0x7f493af83828

#1  0x000000307ff00000 in ?? ()
No symbol table info available.
(gdb) info frame 1
Stack frame at 0x7f493af83838:
 rip = 0x307ff00000; saved rip = 0x7ff0000000000000
 called by frame at 0x7f493af83840, caller of frame at 0x7f493af83830
 Arglist at 0x7f493af83828, args:
 Locals at 0x7f493af83828, Previous frame's sp is 0x7f493af83838
 Saved registers:
  rip at 0x7f493af83830

#2  0x7ff0000000000000 in ?? ()
No symbol table info available.
(gdb) info frame 2
Stack frame at 0x7f493af83840:
 rip = 0x7ff0000000000000; saved rip = 0x30
 called by frame at 0x7f493af83848, caller of frame at 0x7f493af83838
 Arglist at 0x7f493af83830, args:
 Locals at 0x7f493af83830, Previous frame's sp is 0x7f493af83840
 Saved registers:
  rip at 0x7f493af83838

#3  0x0000000000000030 in ?? ()
No symbol table info available.
(gdb) info frame 3
Stack frame at 0x7f493af83848:
 rip = 0x30; saved rip = 0x307ff00000
 called by frame at 0x7f493af83850, caller of frame at 0x7f493af83840
 Arglist at 0x7f493af83838, args:
 Locals at 0x7f493af83838, Previous frame's sp is 0x7f493af83848
 Saved registers:
  rip at 0x7f493af83840

#4  0x000000307ff00000 in ?? ()
No symbol table info available.
(gdb) info frame 4
Stack frame at 0x7f493af83850:
 rip = 0x307ff00000; saved rip = 0x7ff0000000000000
 called by frame at 0x7f493af83858, caller of frame at 0x7f493af83848
 Arglist at 0x7f493af83840, args:
 Locals at 0x7f493af83840, Previous frame's sp is 0x7f493af83850
 Saved registers:
  rip at 0x7f493af83848

代码使用-g -fvar-tracking -O2 -march=native编译。

我有各种崩溃的各种转储,所有崩溃都有符号表工作,并提供相关的调用堆栈和信息,但由于某种原因,这种特定的崩溃是神秘的。

我注意到的一件事是相同的地址编号一遍又一遍地重复,可能是某些无限循环或一些腐蚀或溢出堆栈的递归? 如果是这样,是否有任何方法可以获得调用堆栈中最顶层的函数(例如,任何方式超出帧#765或在触发溢出之前获取调用的函数)?

我无法将$spjump设置为任何地址,因为我无法调试并逐步执行实时程序,只需分析核心转储。
我无法复制这次崩溃,它不时发生在生产中。 valgrind也是不可能的。

是否有任何g++编译器选项或gdb标志可以帮助我解决这个问题? 任何有关如何调试此类问题的指示表示赞赏(如果可能的话)。

1 个答案:

答案 0 :(得分:2)

  

我不知道如何调试核心转储,因为调用堆栈由于某种原因缺少符号信息

第1部分:

这种无意义的调用堆栈最常见的原因是产生核心转储的二进制文件与用于实际分析核心的二进制文件之间不匹配。

如果您在链接时使用--build-id,或者您的GCC默认配置为使用该链接器标志,那么您可以验证二进制匹配(或不匹配){{1使用这个程序:

core

这应该产生类似于:

的输出
readelf -n /path/to/binary

build-id字符串$ readelf -n /bin/sleep Displaying notes found at file offset 0x00000254 with length 0x00000020: Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 2.6.24 Displaying notes found at file offset 0x00000274 with length 0x00000024: Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: c266a51e4b85b16ca17bff8328f3abeafb577b29 是您关心的输出。假设你的二进制文件有,请安装c266a51e4b85b16ca17bff8328f3abeafb577b29包,然后使用

elfutils

查看核心转储生成时使用了哪些二进制文件。

输出应如下所示:

eu-unstrip -n --core /path/to/core

上面你可以看到这个$ eu-unstrip -n --core /tmp/core 0x400000+0x208000 c266a51e4b85b16ca17bff8328f3abeafb577b29@0x400284 - - [exe] 0x7ffca5721000+0x1000 9c7cbcf6c957d8fc8e55b45a3c7a1556b38a3097@0x7ffca5721340 . - linux-vdso.so.1 0x7f491ad5a000+0x2241c8 d0f537904076d73f29e4a37341f8a449e2ef6cd0@0x7f491ad5a1d8 /lib64/ld-linux-x86-64.so.2 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.19.so ld-linux-x86-64.so.2 0x7f491a995000+0x3c42c0 cf699a15caae64f50311fc4655b86dc39a479789@0x7f491a995280 /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.19.so libc.so.6 转储实际上是由core生成的。

如果/bin/sleep中的可执行构建标识与您的二进制文件不匹配,则您需要找到与您的core匹配的build-id的二进制文件,然后才能提取正确的崩溃堆栈跟踪在GDB。

第2部分:

如果二进制文件 匹配core,那么堆栈很可能只是损坏(由于例如堆栈缓冲区溢出)。

  

valgrind是不可能的。

无论如何,Valgrind在检测堆栈损坏方面异常弱。

调试这类问题的当前技术水平是Address Sanitizer,这个问题要快得多,而且可能足够快以便在生产中运行。

如果已清理的二进制文件不够快,无法用于生产,您可以将其设置为使其在&#34;阴影模式下处理某些输入子集&#34; (二进制运行,但其输出被丢弃)。您在此类设置中所做的任何努力都可能会发现10个新的错误,并为您节省大量未来的调试工作。