到目前为止,我一直在openmpi/1.10.2
使用gcc/5.3.0
,我的代码一直运行良好。
我正在处理的群集使用cray-mpich/7.5.0
将其MPI实施更改为gcc/5.3.0
,我发现以下错误。
编译器正在将局部变量(idx
,disp
,blocks
和types
)预先分配为<optimized out>
。所有数组都预先分配了size == 2
。
#include<mpi.h>
#include<vector>
#include<iostream>
int main( int argc, char** argv)
{
MPI_Init(&argc, &argv);
int rank;
int size;
MPI_Comm_rank(MPI_COMM_WORLD,&rank); //passing the references
MPI_Comm_size(MPI_COMM_WORLD,&size); //passing the references
std::vector<int> mIntegers(0);
std::vector<double> mFloats(2);
if (rank == 0 )
{
mFloats[0]=1.0;
mFloats[1]=1.0;
}
int ioRank = 0;
int nBlocks = 0;
if(mIntegers.size() > 0)
{
nBlocks++;
}
if(mFloats.size() > 0)
{
nBlocks++;
}
int idx = 0;
MPI_Aint displ[nBlocks];
int blocks[nBlocks];
MPI_Datatype types[nBlocks];
MPI_Aint element;
// Create integer part
if(mIntegers.size() > 0)
{
MPI_Get_address(mIntegers.data(), &element);
displ[idx] = element;
blocks[idx] = mIntegers.size();
types[idx] = MPI_INT;
idx++;
}
// Create floats part
if(mFloats.size() > 0)
{
MPI_Get_address(mFloats.data(), &element);
displ[idx] = element;
blocks[idx] = mFloats.size();
types[idx] = MPI_DOUBLE;
idx++;
}
MPI_Datatype paramType;
// Create MPI datatype
MPI_Type_create_struct(nBlocks, blocks, displ, types, ¶mType);
// Commit MPI datatype
MPI_Type_commit(¶mType);
// Broadcast the information
MPI_Bcast(MPI_BOTTOM, 1, paramType, ioRank, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
std::cout<<"Process:"<< rank <<" of " << size << " F[0] "<<mFloats[0]<< ", F[1] "<<mFloats[1]<< std::endl ;
// Free the datatype
MPI_Type_free(¶mType);
MPI_Finalize();
return 0;
}
我尝试使用new
初始化数组,将它们设置为零和std::vector
,以避免过度优化或内存泄漏而没有任何成功。
代码编译为:
$mpic++ -O2 segFault.cpp -o segFault
并执行:
$mpirun -n 16 segFault
因此,由于内存分配不匹配导致MPI_Bcast
导致segmentation fault
。
MPICH将MPI_BOTTOM
和MPIR_F08_MPI_BOTTOM
定义为
#define MPI_BOTTOM (void *)0
extern int MPIR_F08_MPI_BOTTOM;
而open-mpi将MPI_BOTTOM
定义为
#define MPI_BOTTOM ((void *) 0) /* base reference address */