Question

如果gdb核心中存在丢失/损坏的库，我该如何隔离它？
我还读到线程有可能覆盖自己的堆栈，我该如何检测？

如何用以下bt隔离上述问题？

/etc/gdb/gdbinit:105: Error in sourced command file:
Error while executing Python code.
Reading symbols from /opt/hsp/bin/addrman...done.

warning: Corrupted shared library list: 0x0 != 0x7c8d48ea8948c089

warning: Corrupted shared library list: 0x0 != 0x4ed700

warning: no loadable sections found in added symbol-file system-supplied DSO at
0x7ffd50ff6000
Core was generated by `addrman --notification-socket
/opt/hsp/sockets/memb_notify.socket'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004759e4 in ps_locktrk_info::lktrk_locker_set (this=0x348,
locker_ip=<optimized out>) at ./ps/ps_lock_track.h:292
292     ./ps/ps_lock_track.h: No such file or directory.
(gdb) bt
#0  0x00000000004759e4 in ps_locktrk_info::lktrk_locker_set (this=0x348,
locker_ip=<optimized out>) at ./ps/ps_lock_track.h:292
#1  0x0000000000000000 in ?? ()

Answer 1

看起来核心文件已损坏，可能是由于堆或堆栈损坏。腐败通常是缓冲区溢出或其他未定义行为的结果。

如果您在Linux上运行，我会尝试valgrind。它经常可以很快发现腐败。 Windows有一些类似的工具。

是的，多线程应用程序可以溢出堆栈。每个线程仅分配有限的金额。这通常只有在您具有非常深的函数调用堆栈或者在堆栈上分配大型本地对象时才会发生。

有关设置Linux应用程序的堆栈大小的一些有趣信息here和here。

面对你的问题，我会：

检查lktrk_locer_set方法的所有调用方。如果可能的话，仔细调查每一个，看看是否存在明显的堆栈溢出或堆损坏
尝试使用Valgrind或类似工具来发现问题
添加调试日志以隔离问题

Answer 2

warning: Corrupted shared library list: 0x0 != 0x7c8d48ea8948c089

上述错误通常表示您为生成核心转储时使用的GDB提供了不同的系统库（或主二进制文件）。

要么在开发机器上分析“生产”核心转储，要么在核心转储生成时和分析它之间升级系统库，或者您重建了主二进制文件。

如果以上其中一项是正确的，请参阅this answer。

有关线程的两个问题

2 个答案: