c ++ boost MPI&线程化 - 序列化错误:未映射的地址

时间:2012-05-17 21:36:51

标签: c++ serialization boost mpi boost-mpi

我很难过。 all_gather适用于基元(例如int),但即使对于简单的STL容器也是如此。 valgrind声称容器没有分配/初始化,但这看起来并不正确。

总结:

  • 我使用openMP进行多线程处理,然后重新加入线程。
  • 在序列中,我尝试all_gather使用`boost :: mpi :: all_gather进行简单的std::map。 MPI排名线程。 (有2个MPI排名,每个MPI排名有4个主题)。
  • 然后我打算做更多(隔离的)多线程。

看起来很简单......这可能会发生什么?

的main.cpp

#include <openmpi/mpi.h>
#include <omp.h>
#include <boost/mpi.hpp>    
#include "globals.h"

int main(int argc, char* argv[])
{        

    int provided_MPI;
    MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided_MPI );

    boost::mpi::environment my_boost_mpi_env(argc, argv);
    boost::mpi::communicator world_MPI_boost;        
    world_MPI_boost_ptr = &world_MPI_boost;
        // ^^^ global variable of type   boost::mpi::communicator *

    perform_complete_variable_elimination_schedule();
    //...

}

Conn_Comp.cpp

#include <boost/mpi.hpp>    
#include <boost/mpi/collectives.hpp>
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/serialization/map.hpp>

#include "globals.h"

...

void perform_complete_variable_elimination_schedule()
{

    // isolated work in parallel using OpenMP
    #pragma omp parallel
    { 
    //work
    }    

    // SERIAL REGION (with respect to threading).

    std::map<uint,uint> my_map;
    std::vector< std::map<uint,uint> >   vec_of_my_maps;

    boost::mpi::all_gather<    std::map<uint,uint>    >
                     (*world_MPI_boost_ptr,
                      my_map,
                      vec_of_my_maps);  //  <--- line 293 (referenced by valgrind)


    // more isolated work in parallel using OpenMP
    #pragma omp parallel
    { 
    //work
    }

}

valgrind抱怨vector的{​​{1}}会导致无效的读取。但是这个map是在vector调用之前创建的 - 所以它显然在范围内而不是在并行线程区域中。 选择valgrind错误输出:

all_gather

我根据 boost help page的建议使用MPI_Init_thread。

正如我在顶部所说,如果我使用原语(即==12665== Use of uninitialised value of size 4 ==12665== at 0x41C8D7A: boost::archive::detail::basic_iarchive::get_library_version() const (basic_iarchive.cpp:575) ==12665== by 0x41C92C6: boost::archive::detail::basic_iarchive::load_object(void*, boost::archive::detail::basic_iserializer const&) (basic_iarchive.cpp:399) ==12665== by 0x80F5696: void boost::mpi::all_gather<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > >(boost::mpi::communicator const&, std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > const&, std::vector<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >, std::allocator<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > > >&) (iserializer.hpp:387) ==12665== by 0x80DEC83: Conn_Comp::perform_complete_variable_elimination_schedule() (Conn_Comp.cpp:**293**) ==12665== by 0x80C840A: main (main.cpp:695) ==12665== ==12665== Invalid read of size 2 ==12665== at 0x41C8D7A: boost::archive::detail::basic_iarchive::get_library_version() const (basic_iarchive.cpp:575) ==12665== by 0x41C92C6: boost::archive::detail::basic_iarchive::load_object(void*, boost::archive::detail::basic_iserializer const&) (basic_iarchive.cpp:399) ==12665== by 0x80F5696: void boost::mpi::all_gather<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > >(boost::mpi::communicator const&, std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > const&, std::vector<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >, std::allocator<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > > >&) (iserializer.hpp:387) ==12665== by 0x80DEC83: Conn_Comp::perform_complete_variable_elimination_schedule() (main.cpp:**293**) ==12665== by 0x80C840A: main (main.cpp:695) ==12665== Address 0x3580bece is not stack'd, malloc'd or (recently) free'd ==12665== [drosphila:12665] *** Process received signal *** [drosphila:12665] Signal: Segmentation fault (11) [drosphila:12665] Signal code: Address not mapped (1) [drosphila:12665] Failing at address: 0x3580bece [drosphila:12665] [ 0] /lib/i686/cmov/libpthread.so.0(+0xe500) [0x44f8500] [drosphila:12665] [ 1] /usr/lib/libboost_serialization.so.1.42.0(_ZN5boost7archive6detail14basic_iarchive11load_objectEPvRKNS1_17basic_iserializerE+0x1b7) [0x41c92c7] [drosphila:12665] [ 2] ./detect_NAHR(_ZN5boost3mpi10all_gatherISt3mapIjjSt4lessIjESaISt4pairIKjjEEEEEvRKNS0_12communicatorERKT_RSt6vectorISD_SaISD_EE+0x587) [0x80f5697] [drosphila:12665] [ 3] ./detect_NAHR(_ZN9Conn_Comp46perform_complete_variable_elimination_scheduleEv+0x534) [0x80dec84] [drosphila:12665] [ 4] ./detect_NAHR(main+0xf5b) [0x80c840b] [drosphila:12665] [ 5] /lib/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x4519ca6] [drosphila:12665] [ 6] ./detect_NAHR() [0x80c73e1] [drosphila:12665] *** End of error message *** }而不是地图,那么uint工作正常。地图为什么要失败? all_gather已经有了序列化STL容器的方法,所以这不是问题...

另请注意,保存所有值的向量会在boost serialize(我检查all_gather的实现)中自动调整大小以容纳所有内容。无论如何,即使我自己初始化它,它仍然会失败。

最后,即使我使用普通的旧数组(正确分配),例如all_gather,我遇到同样的问题。

1 个答案:

答案 0 :(得分:2)

嗯,这很令人尴尬。 如果其他人有同样的奇怪错误,我会留下问题。

我的代码问题实际上是在makefile中。 我忘了链接到MPI的boost库。

不正确的makefile标志:

-I$(BOOST_INCLUDE)     -lboost_serialization   -lboost_mpi 

显然该行只包含足够的信息以允许程序编译和运行,但会导致运行时错误。

更正makefile标志:

-L$(BOOST_LIB) -ldl -Wl,-rpath,$(BOOST_LIB) -lboost_serialization -lboost_mpi

(注意添加库链接标志)。