没有发现泄漏时调整分段错误的提示

时间:2011-12-19 22:19:50

标签: c debugging memory-management segmentation-fault valgrind

我写了一个基于C的应用程序似乎运行正常,除了非常大的数据集作为输入。

对于大输入,我在二进制功能的最后阶段会出现分段错误。

我使用valgrind运行二进制文件(带有测试输入):

valgrind --tool=memcheck --leak-check=yes /foo/bar/baz inputDataset > outputAnalysis

这项工作通常需要几个小时,但valgrind花了七天时间。

不幸的是,在这一点上,我不知道如何阅读我从这次运行中得到的结果。

我收到很多警告:

...
==4074== Conditional jump or move depends on uninitialised value(s)                                                                                                                  
==4074==    at 0x435900: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x439CC5: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x400BF2: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x402086: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x402A0F: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x41684F: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x4001B8: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x7FEFFFF57: ???                                                                                                                                                      
==4074==  Uninitialised value was created                                                                                                                                            
==4074==    at 0x461D3A: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x43F926: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x416B9B: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x416725: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x4001B8: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x7FEFFFF57: ???
...

没有提示代码部分,没有变量名称等。我该如何处理这些信息?

最后,我最终得到以下错误,但是 - 与没有崩溃的较小数据集一样,valgrind没有发现泄漏:

...
==4074== Process terminating with default action of signal 11 (SIGSEGV)                                                                                                              
==4074==  Access not within mapped region at address 0x7158E7F7                                                                                                                      
==4074==    at 0x7158E7F7: ???                                                                                                                                                       
==4074==    by 0x4020B8: ??? (in /foo/bar/baz)                                                                                   
==4074==    by 0x6322203A22656D6E: ???                                                                                                                                               
==4074==    by 0x306C675F6E557267: ???                                                                                                                                               
==4074==    by 0x202C22373232302F: ???                                                                                                                                               
==4074==    by 0x6D616E656C696621: ???                                                                                                                                               
==4074==    by 0x72686322203A2264: ???                                                                                                                                               
==4074==    by 0x3030306C675F6E54: ???                                                                                                                                               
==4074==    by 0x346469702E373231: ???                                                                                                                                               
==4074==    by 0x646469662E34372F: ???                                                                                                                                               
==4074==    by 0x722E64616568656B: ???                                                                                                                                               
==4074==    by 0x63656D6F6C756764: ???                                                                                                                                               
==4074==  If you believe this happened as a result of a stack                                                                                                                        
==4074==  overflow in your program's main thread (unlikely but                                                                                                                       
==4074==  possible), you can try to increase the size of the                                                                                                                         
==4074==  main thread stack using the --main-stacksize= flag.                                                                                                                        
==4074==  The main thread stack size used in this run was 10485760.                                                                                                                  
==4074==                                                                                                                                                                             
==4074== HEAP SUMMARY:                                                                                                                                                               
==4074==     in use at exit: 0 bytes in 0 blocks                                                                                                                                     
==4074==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated                                                                                                                    
==4074==                                                                                                                                                                             
==4074== All heap blocks were freed -- no leaks are possible                                                                                                                         
==4074==                                                                                                                                                                             
==4074== For counts of detected and suppressed errors, rerun with: -v                                                                                                                
==4074== ERROR SUMMARY: 1603141870 errors from 86 contexts (suppressed: 0 from 0)
Segmentation fault

我为其分配空间的所有内容都获得了等效的free语句,然后我将指针设置为NULL

此时,我如何才能最好地调试此应用程序,以确定导致分段错误的其他原因?


2011年12月22日 - 编辑

我使用以下编译标志编译了我的二进制文件的调试版本,名为debug-binary

-D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 -DUSE_ZLIB -g -O0 -Wformat -Wall -pedantic -std=gnu99

当我使用valgrind运行时,我没有获得更多信息:

valgrind -v --tool=memcheck --leak-check=yes --error-limit=no --track-origins=yes debug-binary input > output

这是一个输出片段:

==25116== 2 errors in context 14 of 14:                                                                                                                                                                                                      
==25116== Invalid read of size 4                                                                                                                                                                                                             
==25116==    at 0x4045E8: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x40682F: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x404F0C: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x401FA4: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x402016: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x403B27: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x40295E: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)                                                                                                                                                                           
==25116==  Address 0x539f188 is 24 bytes inside a block of size 48 free'd                                                                                                                                                                    
==25116==    at 0x4A05D21: free (vg_replace_malloc.c:325)                                                                                                                                                                                    
==25116==    by 0x401F6B: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x402016: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x403B27: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x40295E: ??? (in /foo/bar/debug-binary)                                                                                                                                 
==25116==    by 0x31A021D993: (below main) (in /lib64/libc-2.5.so) 

这是我的二进制文件或我的应用程序依赖的系统库(libc)的问题吗?

我也不知道如何解释???条目。是否有另一个编译标志我需要valgrind来提供更多信息?

3 个答案:

答案 0 :(得分:6)

Valgrind基本上说没有值得注意的堆管理问题。该程序是从一个不太复杂的编程错误中分离出来的。

如果是我,我会

  • 使用gcc -g
  • 进行编译
  • 启用核心转储文件(ulimit -c unlimited),
  • 正常运行程序,
  • 让它出错
  • 使用gdb检查核心文件并查看其出现故障时的操作:

      

    gdb(programfile)(corefile)
         BT

  •   

答案 1 :(得分:4)

我不相信valgrind能够找到你在堆栈上超出值的所有错误(但不会超出堆栈本身)。所以,你可能想尝试gcc的-f-stack-protector-all选项。

您还应该尝试使用-fmudflap(单线程)或-fmudflapth(多线程)进行mudflap。

mudflap和堆栈保护器应该比valgrind更快

另外,看起来你没有调试符号,使得阅读回溯变得困难。添加-ggdb。 您可能还想启用核心文件生成(try ulimit -c unlimited)。这样,您可以尝试使用gdb program core

在崩溃后调试流程

正如@wallyk指出的那样,你的段错误实际上可能很容易找到 - 例如,你可能正在取消引用NULL,并且gdb可以指向你的确切行(或者,除非你用{{1编译)否则关闭}})。这是有道理的,例如,如果您只是为更大的数据集运行内存,因此malloc返回NULL,而您忘记在某处检查。

最后,如果没有其他意义,总会有硬件问题的可能性。但是那些预计是相当随机的,例如,不同的值会在不同的运行中被破坏。如果您尝试使用其他计算机,并且它在那里发生,则极不可能是硬件问题。

答案 2 :(得分:2)

“条件跳转或移动取决于未初始化的值”是您需要修复的严重错误。它表示程序的行为受未初始化变量(包括malloc()返回的未初始化的内存区域)的内容的影响。

要从valgrind获取可读回溯,您需要使用-g进行编译。