我很难过。 all_gather
适用于基元(例如int
),但即使对于简单的STL容器也是如此。 valgrind声称容器没有分配/初始化,但这看起来并不正确。
总结:
all_gather
使用`boost :: mpi :: all_gather进行简单的std::map
。 MPI排名不线程。 (有2个MPI排名,每个MPI排名有4个主题)。看起来很简单......这可能会发生什么?
的main.cpp
#include <openmpi/mpi.h>
#include <omp.h>
#include <boost/mpi.hpp>
#include "globals.h"
int main(int argc, char* argv[])
{
int provided_MPI;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided_MPI );
boost::mpi::environment my_boost_mpi_env(argc, argv);
boost::mpi::communicator world_MPI_boost;
world_MPI_boost_ptr = &world_MPI_boost;
// ^^^ global variable of type boost::mpi::communicator *
perform_complete_variable_elimination_schedule();
//...
}
Conn_Comp.cpp
#include <boost/mpi.hpp>
#include <boost/mpi/collectives.hpp>
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/serialization/map.hpp>
#include "globals.h"
...
void perform_complete_variable_elimination_schedule()
{
// isolated work in parallel using OpenMP
#pragma omp parallel
{
//work
}
// SERIAL REGION (with respect to threading).
std::map<uint,uint> my_map;
std::vector< std::map<uint,uint> > vec_of_my_maps;
boost::mpi::all_gather< std::map<uint,uint> >
(*world_MPI_boost_ptr,
my_map,
vec_of_my_maps); // <--- line 293 (referenced by valgrind)
// more isolated work in parallel using OpenMP
#pragma omp parallel
{
//work
}
}
valgrind抱怨vector
的{{1}}会导致无效的读取。但是这个map
是在vector
调用之前创建的 - 所以它显然在范围内而不是在并行线程区域中。
选择valgrind错误输出:
all_gather
我根据 boost help page的建议使用MPI_Init_thread。
正如我在顶部所说,如果我使用原语(即==12665== Use of uninitialised value of size 4
==12665== at 0x41C8D7A: boost::archive::detail::basic_iarchive::get_library_version() const (basic_iarchive.cpp:575)
==12665== by 0x41C92C6: boost::archive::detail::basic_iarchive::load_object(void*, boost::archive::detail::basic_iserializer const&) (basic_iarchive.cpp:399)
==12665== by 0x80F5696: void boost::mpi::all_gather<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > >(boost::mpi::communicator const&, std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > const&, std::vector<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >, std::allocator<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > > >&) (iserializer.hpp:387)
==12665== by 0x80DEC83: Conn_Comp::perform_complete_variable_elimination_schedule() (Conn_Comp.cpp:**293**)
==12665== by 0x80C840A: main (main.cpp:695)
==12665==
==12665== Invalid read of size 2
==12665== at 0x41C8D7A: boost::archive::detail::basic_iarchive::get_library_version() const (basic_iarchive.cpp:575)
==12665== by 0x41C92C6: boost::archive::detail::basic_iarchive::load_object(void*, boost::archive::detail::basic_iserializer const&) (basic_iarchive.cpp:399)
==12665== by 0x80F5696: void boost::mpi::all_gather<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > >(boost::mpi::communicator const&, std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > const&, std::vector<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >, std::allocator<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > > >&) (iserializer.hpp:387)
==12665== by 0x80DEC83: Conn_Comp::perform_complete_variable_elimination_schedule() (main.cpp:**293**)
==12665== by 0x80C840A: main (main.cpp:695)
==12665== Address 0x3580bece is not stack'd, malloc'd or (recently) free'd
==12665==
[drosphila:12665] *** Process received signal ***
[drosphila:12665] Signal: Segmentation fault (11)
[drosphila:12665] Signal code: Address not mapped (1)
[drosphila:12665] Failing at address: 0x3580bece
[drosphila:12665] [ 0] /lib/i686/cmov/libpthread.so.0(+0xe500) [0x44f8500]
[drosphila:12665] [ 1] /usr/lib/libboost_serialization.so.1.42.0(_ZN5boost7archive6detail14basic_iarchive11load_objectEPvRKNS1_17basic_iserializerE+0x1b7) [0x41c92c7]
[drosphila:12665] [ 2] ./detect_NAHR(_ZN5boost3mpi10all_gatherISt3mapIjjSt4lessIjESaISt4pairIKjjEEEEEvRKNS0_12communicatorERKT_RSt6vectorISD_SaISD_EE+0x587) [0x80f5697]
[drosphila:12665] [ 3] ./detect_NAHR(_ZN9Conn_Comp46perform_complete_variable_elimination_scheduleEv+0x534) [0x80dec84]
[drosphila:12665] [ 4] ./detect_NAHR(main+0xf5b) [0x80c840b]
[drosphila:12665] [ 5] /lib/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x4519ca6]
[drosphila:12665] [ 6] ./detect_NAHR() [0x80c73e1]
[drosphila:12665] *** End of error message ***
}而不是地图,那么uint
工作正常。地图为什么要失败? all_gather
已经有了序列化STL容器的方法,所以这不是问题...
另请注意,保存所有值的向量会在boost serialize
(我检查all_gather
的实现)中自动调整大小以容纳所有内容。无论如何,即使我自己初始化它,它仍然会失败。
最后,即使我使用普通的旧数组(正确分配),例如all_gather
,我遇到同样的问题。
答案 0 :(得分:2)
我的代码问题实际上是在makefile中。 我忘了链接到MPI的boost库。
不正确的makefile标志:
-I$(BOOST_INCLUDE) -lboost_serialization -lboost_mpi
显然该行只包含足够的信息以允许程序编译和运行,但会导致运行时错误。
更正makefile标志:
-L$(BOOST_LIB) -ldl -Wl,-rpath,$(BOOST_LIB) -lboost_serialization -lboost_mpi
(注意添加库链接标志)。