我正在我的OS X(Snow Leopard)笔记本电脑上用C ++编写并行代码,我正在尝试用memchecker调试它。我已经使用valgrind支持成功构建了OpenMPI:configure --prefix=/opt/openmpi-1.4.3/ --enable-debug --enable-memchecker --with-valgrind=/opt/valgrind-3.6.0/ FFLAGS=-m64 F90FLAGS=-m64
(忽略Fortran标志,这是因为我的Fortran编译器来自GCC)。
当我用
运行我的应用程序时mpirun -np 2 valgrind --suppressions = / opt / openmpi-1.4.3 / share / openmpi / openmpi-valgrind.supp --leak-check = yes --dsymutil = yes ./program
我从Valgrind那里得到了很多警告(其中大部分来自最后的堆摘要)。我在下面列出了一小段警告。我从他们那里得到的是Valgrind在MPI库中检测到内存泄漏和未初始化的值,但我对此并不感兴趣。我想从我写的代码中得到警告。我已经使用OpenMPI提供的抑制文件运行Valgrind,但显然这还不够。如何轻松忽略OpenMPI发行版中检测到的所有其他警告?是否有可能在OS X上找到Valgrind的OpenMPI调试抑制文件,或者你知道任何狡猾的技巧吗?
第一个警告是
==1531== Syscall param writev(vector[...]) points to uninitialised byte(s)
==1531== at 0x1014E16E2: writev (in /usr/lib/libSystem.B.dylib)
==1531== by 0x101AEA4C5: mca_oob_tcp_peer_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
==1531== by 0x101AF0B88: mca_oob_tcp_send_nb (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
==1531== by 0x101AC7F48: orte_rml_oob_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
==1531== by 0x101AC8AA1: orte_rml_oob_send_buffer (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
==1531== by 0x101B3489E: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
==1531== Address 0x101a8911b is 107 bytes inside a block of size 256 alloc'd
==1531== at 0x10002DB2D: realloc (vg_replace_malloc.c:525)
==1531== by 0x1012240B6: opal_dss_buffer_extend (in /opt/openmpi-1.4.3/lib/libopen- pal.0.dylib)
==1531== by 0x101225CF7: opal_dss_copy_payload (in /opt/openmpi-1.4.3/lib/libopen-pal.0.dylib)
==1531== by 0x101B347CA: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
执行后,堆摘要的一小段看起来像这样
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,950 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100077C96: create_comm (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10007798A: ompi_attr_create_predefined (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000737CF: ompi_attr_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000A4840: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
...
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,952 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1065ACFE6: ???
==1531== by 0x10658867B: ???
==1531== by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100179985: mca_io_base_file_select (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100089D55: ompi_file_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000E1ED1: MPI_File_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531==
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,953 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1065A6210: ???
==1531== by 0x106597149: ???
==1531== by 0x106596AAB: ???
==1531== by 0x1065AD14C: ???
==1531== by 0x10658867B: ???
==1531== by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
答案 0 :(得分:2)
我无法谈论Open MPI在Valgrind下的行为,但MPICH2应该更好。如果您没有特别需要Open MPI作为MPI实现,那么您可以easily configure MPICH2 to avoid problems with Valgrind。
答案 1 :(得分:1)
您可以自己为valgrind添加额外的抑制。这些将处理您发布的第一组警告:
{
ORTE OOB suppression rule
Memcheck:Param
writev(vector[...])
fun:writev
fun:mca_oob_tcp_msg_send_handler
fun:mca_oob_tcp_peer_send
fun:mca_oob_tcp_send_nb
fun:orte_rml_oob_send
fun:orte_rml_oob_send_buffer
...
fun:ompi_mpi_init
}
{
OMPI init leak
Memcheck:Leak
fun:malloc
...
fun:ompi_mpi_init
}
{
OMPI init leak
Memcheck:Leak
fun:realloc
...
fun:ompi_mpi_init
}
{
OMPI init leak
Memcheck:Leak
fun:calloc
...
fun:ompi_mpi_init
}