我最终使我的程序使用MPI在集群上运行,但是一旦将其扩展到超过7个任务,就会遇到分段错误。我回到了下面的超级基本程序,但仍然遇到了超过7个内核的分段错误。为了进行调试,我在集群上为自己分配了8个内核,以便直接进行工作而不是提交冗长的工作。我使用Intel编译器和GCC遇到了问题。我以为它会在8个处理器处中断,但现在是7个,我觉得有点奇怪。但是话又说回来,我觉得整个事情很奇怪。关于代码为何会超出一定数量的已分配核心的任何想法(我可能会添加在同一节点上)。
在集群上,我使用以下命令: $ salloc -n 8
$ enable_lmod
$ module load icc / 19 impi / 19 libstdcxx / 4
$ mpiicpc -std = c ++ 11 -o MPItest test.cpp
$ mpiexec.hydra ./MPItest测试
#include "mpi.h"
#include <stdio.h>
int main(int argc, char** argv)
{
int numtasks, rank, dest, source, rc, count, tag=1;
double inmsg, outmsg=20.0;
MPI_Status Stat;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
{
printf("Number of processors: %d \n",numtasks);
for (int i=1; i<numtasks;i++)
{
dest = i;
outmsg+=outmsg;
rc = MPI_Send(&outmsg, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
for (int i=1; i<numtasks;i++)
{
source = i;
rc = MPI_Recv(&inmsg, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &Stat);
rc = MPI_Get_count(&Stat, MPI_DOUBLE, &count);
printf("Task %d: Received %d double(s), %f, from task %d with tag %d \n",
rank, count,inmsg, Stat.MPI_SOURCE, Stat.MPI_TAG);
}
}
else
{
dest = 0;
source = 0;
rc = MPI_Recv(&inmsg, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &Stat);
rc = MPI_Get_count(&Stat, MPI_DOUBLE, &count);
printf("Task %d: Received %d double(s), %f, from task %d with tag %d \n",
rank, count,inmsg, Stat.MPI_SOURCE, Stat.MPI_TAG);
rc = MPI_Send(&inmsg, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}