在Boost.Test

时间:2018-04-09 12:13:18

标签: c++ boost valgrind double-free

我在群集上执行Boost.Test测试用例时遇到问题。错误是:*** glibc detected *** ...myprogram.test: corrupted double-linked list: 0x000000000096b4d0 ***

在这上面运行valgrind给了我:

==9687== Invalid free() / delete / delete[] / realloc()
==9687==    at 0x4A06016: operator delete(void*) (vg_replace_malloc.c:480)
==9687==    by 0x3A81035D2C: __cxa_finalize (in /lib64/libc-2.12.so) 
==9687==    by 0x721CD05: ??? (in /lib/libboost_unit_test_framework-gcc71-mt-d-1_65_1.so.1.65.1)
==9687==    by 0x72ABF9C: ??? (in /lib/libboost_unit_test_framework-gcc71-mt-d-1_65_1.so.1.65.1)
==9687==    by 0x3A81035991: exit (in /lib64/libc-2.12.so)
==9687==    by 0x3A8101ED23: (below main) (in /lib64/libc-2.12.so)   
==9687==  Address 0x9919d80 is 0 bytes inside a block of size 18 free'd
==9687==    at 0x4A06016: operator delete(void*) (vg_replace_malloc.c:480)
==9687==    by 0x3A81035991: exit (in /lib64/libc-2.12.so)
==9687==    by 0x3A8101ED23: (below main) (in /lib64/libc-2.12.so)   

GDB的堆栈跟踪如下所示:

#0  0x0000003a81032495 in raise () from /lib64/libc.so.6
#1  0x0000003a81033c75 in abort () from /lib64/libc.so.6
#2  0x0000003a810703a7 in __libc_message () from /lib64/libc.so.6
#3  0x0000003a81075dee in malloc_printerr () from /lib64/libc.so.6
#4  0x0000003a810761f3 in malloc_consolidate () from /lib64/libc.so.6
#5  0x0000003a81078c18 in _int_free () from /lib64/libc.so.6
#6  0x00000000005feae8 in boost::checked_array_delete<char(x=0x991a20 "\210\350\070\201:") at /include/boost-1_65_1/boost/core/checked_delete.hpp:41
#7  0x00000000005fbd21 in boost::scoped_array<char>::~scoped_array (this=0x94bd80, __in_chrg=<optimized out>) at /include/boost-1_65_1/boost/smart_ptr/scoped_array.hpp:69
#8  0x00000000005f9d36 in boost::execution_monitor::~execution_monitor (this=0x94bd60, __in_chrg=<optimized out>)
    at /include/boost-1_65_1/boost/test/execution_monitor.hpp:316
#9  0x00000000005fbd3c in boost::unit_test::unit_test_monitor_t::~unit_test_monitor_t (this=0x94bd60, __in_chrg=<optimized out>)
    at /include/boost-1_65_1/boost/test/unit_test_monitor.hpp:33
#10 0x0000003a81035992 in exit () from /lib64/libc.so.6
#11 0x0000003a8101ed24 in __libc_start_main () from /lib64/libc.so.6
#12 0x00000000005f5b59 in _start ()

当抛出任何未捕获的异常(包括测试失败)以及某些(当前未知的)场合时,会发生这种情况。但是异常崩溃是100%可重现的。

程序看起来很好,因为本地它可以用于任何此类崩溃。所以我认为这是由于集群中某些模块之间不兼容。

为了避免这种情况,我重新编译了Boost和OpenBLAS,但我还在使用其他几个库,我不想重建(需要花费很多时间)来测试每个库。这些是libSSH2,GPI2,HDF5虽然它们不会出现在ldd中,所以我假设静态链接(我不是测试的作者)并且认为它们不太可能导致问题:

    linux-vdso.so.1 =
    libpthread.so.0 =/lib64/libpthread.so.0
    librt.so.1 =/lib64/librt.so.1
    libboost_filesystem-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_filesystem-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_program_options-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_program_options-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_coroutine-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_coroutine-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_context-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_context-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_iostreams-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_iostreams-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_regex-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_regex-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_thread-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_thread-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_date_time-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_date_time-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_chrono-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_chrono-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_atomic-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_atomic-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_system-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_system-gcc71-mt-d-1_65_1.so.1.65.1
    libboost_serialization-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_serialization-gcc71-mt-d-1_65_1.so.1.65.1
    libdl.so.2 =/lib64/libdl.so.2
    libssl.so.10 =/usr/lib64/libssl.so.10
    libgssapi_krb5.so.2 =/lib64/libgssapi_krb5.so.2
    libkrb5.so.3 =/lib64/libkrb5.so.3
    libcom_err.so.2 =/lib64/libcom_err.so.2
    libk5crypto.so.3 =/lib64/libk5crypto.so.3
    libresolv.so.2 =/lib64/libresolv.so.2
    libcrypto.so.10 =/usr/lib64/libcrypto.so.10
    libz.so.1 =/lib64/libz.so.1
    libstdc++.so.6 =/sw/global/compilers/gcc/7.1.0/lib64/libstdc++.so.6
    libm.so.6 =/lib64/libm.so.6
    libgcc_s.so.1 =/sw/global/compilers/gcc/7.1.0/lib64/libgcc_s.so.1
    libc.so.6 =/lib64/libc.so.6
    /lib64/ld-linux-x86-64.so.2
    libbz2.so.1 =/lib64/libbz2.so.1
    liblzma.so.0 =/usr/lib64/liblzma.so.0
    libicudata.so.42 =/usr/lib64/libicudata.so.42
    libicui18n.so.42 =/usr/lib64/libicui18n.so.42
    libicuuc.so.42 =/usr/lib64/libicuuc.so.42
    libkrb5support.so.0 =/lib64/libkrb5support.so.0
    libkeyutils.so.1 =/lib64/libkeyutils.so.1
    libselinux.so.1 =/lib64/libselinux.so.1

根据我的发现,我认为第二个免费是&#34;正确&#34;一个因为它是智能指针释放其内存。所以第一次删除是错误的,但它来自exit内部,这对我没有帮助。

我怎样才能找到,为什么以及如何指针是双重免费的?请注意,我没有在群集上拥有root权限,因此GCC库的调试符号不可用。

使用的编译器是GCC 7.1和Boost 1.65.1,虽然我已经尝试过其他Boost版本和GCC 5.3

我将一个测试用例缩减为:

  • 针对libray的链接
  • BOOST_AUTO_TEST_CASE(...)
  • 抛出std::runtime_error

所以问题出在库的静态init / finalize中。

1 个答案:

答案 0 :(得分:0)

您使用的是数据集(Data Driven Test Cases)吗?

如果是这样,您可能会遇到https://svn.boost.org/trac10/ticket/13380

我在此之前遇到并分析了这一点:Boost's data-driven tests' join operator `+` corrupts first column