Valgrind堆栈完全错过了一个函数

时间:2013-05-28 11:08:36

标签: c++ c memory-leaks valgrind

我有两个c文件:

交流转换器

void main(){
    ...
    getvtable()->function();
}

vtable指向位于b.c:

中的函数
void function(){
    malloc(42);
}

现在,如果我在valgrind中跟踪程序,我会得到以下结果:

==29994== 4,155 bytes in 831 blocks are definitely lost in loss record 26 of 28
==29994==    at 0x402CB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==29994==    by 0x40A24D2: (below main) (libc-start.c:226)

所以对函数的调用在堆栈上完全被省略了!这怎么可能?如果我使用GDB,则会显示包含“function”的正确堆栈。

包含调试符号,Linux,32位。

UPD:

回答第一个问题,我在调试valgrind的GDB服务器时得到以下输出。断点不会出现,而是在我直接使用GDB进行调试时。

stasik@gemini:~$ gdb -q
(gdb) set confirm off
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 11665
[Switching to Thread 11665]
0x040011d0 in ?? ()
(gdb) file /home/stasik/leak.so
Reading symbols from /home/stasik/leak.so...done.
(gdb) break function
Breakpoint 1 at 0x110c: file ../../source/leakclass.c, line 32.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>end
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) source thread-frames.py
Stack level 0, frame at 0x42348a0:
 eip = 0x404efcb; saved eip 0x4f2f544c
 called by frame at 0x42348a4
 Arglist at 0x4234898, args:
 Locals at 0x4234898, Previous frame's sp is 0x42348a0
 Saved registers:
  ebp at 0x4234898, eip at 0x423489c
Stack level 1, frame at 0x42348a4:
 eip = 0x4f2f544c; saved eip 0x6e492056
 called by frame at 0x42348a8, caller of frame at 0x42348a0
 Arglist at 0x423489c, args:
 Locals at 0x423489c, Previous frame's sp is 0x42348a4
 Saved registers:
  eip at 0x42348a0
Stack level 2, frame at 0x42348a8:
 eip = 0x6e492056; saved eip 0x205d6f66
 called by frame at 0x42348ac, caller of frame at 0x42348a4
 Arglist at 0x42348a0, args:
 Locals at 0x42348a0, Previous frame's sp is 0x42348a8
 Saved registers:
  eip at 0x42348a4
Stack level 3, frame at 0x42348ac:
 eip = 0x205d6f66; saved eip 0x61746144
---Type <return> to continue, or q <return> to quit---
 called by frame at 0x42348b0, caller of frame at 0x42348a8
 Arglist at 0x42348a4, args:
 Locals at 0x42348a4, Previous frame's sp is 0x42348ac
 Saved registers:
  eip at 0x42348a8
Stack level 4, frame at 0x42348b0:
 eip = 0x61746144; saved eip 0x65736162
 called by frame at 0x42348b4, caller of frame at 0x42348ac
 Arglist at 0x42348a8, args:
 Locals at 0x42348a8, Previous frame's sp is 0x42348b0
 Saved registers:
  eip at 0x42348ac
Stack level 5, frame at 0x42348b4:
 eip = 0x65736162; saved eip 0x70616d20
 called by frame at 0x42348b8, caller of frame at 0x42348b0
 Arglist at 0x42348ac, args:
 Locals at 0x42348ac, Previous frame's sp is 0x42348b4
 Saved registers:
  eip at 0x42348b0
Stack level 6, frame at 0x42348b8:
 eip = 0x70616d20; saved eip 0x2e646570
 called by frame at 0x42348bc, caller of frame at 0x42348b4
 Arglist at 0x42348b0, args:
---Type <return> to continue, or q <return> to quit---
 Locals at 0x42348b0, Previous frame's sp is 0x42348b8
 Saved registers:
  eip at 0x42348b4
Stack level 7, frame at 0x42348bc:
 eip = 0x2e646570; saved eip 0x0
 called by frame at 0x42348c0, caller of frame at 0x42348b8
 Arglist at 0x42348b4, args:
 Locals at 0x42348b4, Previous frame's sp is 0x42348bc
 Saved registers:
  eip at 0x42348b8
Stack level 8, frame at 0x42348c0:
 eip = 0x0; saved eip 0x0
 caller of frame at 0x42348bc
 Arglist at 0x42348b8, args:
 Locals at 0x42348b8, Previous frame's sp is 0x42348c0
 Saved registers:
  eip at 0x42348bc
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) continue
Continuing.

3 个答案:

答案 0 :(得分:5)

我认为有两个可能的原因:

  • Valgrind正在使用与GDB不同的堆栈展开方法
  • 在两个环境下运行程序时,地址空间布局不同,而您只是在Valgrind下遇到堆栈损坏。

我们可以通过使用Valgrind的内置gdbserver获得更多洞察力。

将此Python代码段保存到 thread-frames.py

import gdb

f = gdb.newest_frame()
while f is not None:
    f.select()
    gdb.execute('info frame')
    f = f.older()

<强> t.gdb

set confirm off
file MY-PROGRAM
break function
commands
silent
end
run
source thread-frames.py
quit

<强> v.gdb

set confirm off
target remote | vgdb
file MY-PROGRAM
break function
commands
silent
end
continue
source thread-frames.py
quit

(在上面的脚本中更改 MY-PROGRAM 功能,根据需要更改以下命令)

获取有关GDB下的堆栈帧的详细信息:

$ gdb -q -x t.gdb
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbffff2f0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbffff310
 source language c.
 Arglist at 0xbffff2e8, args: 
 Locals at 0xbffff2e8, Previous frame's sp is 0xbffff2f0
 Saved registers:
  ebp at 0xbffff2e8, eip at 0xbffff2ec
Stack level 1, frame at 0xbffff310:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0xb7e33963
 caller of frame at 0xbffff2f0
 source language c.
 Arglist at 0xbffff2f8, args: 
 Locals at 0xbffff2f8, Previous frame's sp is 0xbffff310
 Saved registers:
  ebp at 0xbffff2f8, eip at 0xbffff30c

在Valgrind下获取相同的数据:

$ valgrind --vgdb=full --vgdb-error=0 ./MY-PROGRAM

在另一个shell中:

$ gdb -q -x v.gdb
relaying data between gdb and process 574
0x04001020 in ?? ()
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbe88e2c0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbe88e2e0
 source language c.
 Arglist at 0xbe88e2b8, args: 
 Locals at 0xbe88e2b8, Previous frame's sp is 0xbe88e2c0
 Saved registers:
  ebp at 0xbe88e2b8, eip at 0xbe88e2bc
Stack level 1, frame at 0xbe88e2e0:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0x4051963
 caller of frame at 0xbe88e2c0
 source language c.
 Arglist at 0xbe88e2c8, args: 
 Locals at 0xbe88e2c8, Previous frame's sp is 0xbe88e2e0
 Saved registers:
  ebp at 0xbe88e2c8, eip at 0xbe88e2dc

如果GDB在连接到“ valgrind --gdb ”时能够成功展开堆栈,那么Valgrind的堆栈展开算法就会出现问题。您可以仔细检查“信息框”输出以查找内联和尾调用框架或其他可能导致Valgrind关闭的原因。否则它可能是堆栈损坏。

答案 1 :(得分:5)

好的,使用显式-O0编译所有.so部分和主程序似乎可以解决问题。似乎加载.so的'核心'程序的一些优化(因此总是被编译为未优化的)正在打破堆栈。

答案 2 :(得分:2)

这是Tail-call优化措施。

函数function调用malloc作为它的最后一件事。编译器在调用function之前看到此并杀死malloc 的堆栈帧。优点是当malloc返回时,它会直接返回到名为function的任何函数。即它避免malloc返回function只是为了击中另一条返回指令。

在这种情况下,优化已经防止了不必要的跳转并使堆栈使用稍微更高效,这很好,但是在递归尾调用的情况下,这种优化是一个巨大的胜利,因为它将递归转变为更像迭代。

正如您已经发现的那样,禁用优化会使调试变得更加容易。如果你想调试优化代码(也许是为了进行性能测试),那么,正如@Zang MingJie已经说过的那样,你可以用-fno-optimize-sibling-calls禁用这一优化。