MPI环领导选举返回分段错误

时间:2017-06-07 11:36:01

标签: c mpi distributed-computing mpich leader

这就是我想要实现的目标。

Blue is the message.
Yellow is when the specific node changes the leader known to it.
Green is the final election of each node.

enter image description here

代码对我来说似乎是正确的,但无论我尝试什么,它总是卡在while循环中。对于运行期间的少量节点,它会在一段时间后返回分段错误。

election_status=0;
firstmsg[0]=world_rank;     // self rank
firstmsg[1]=0;              // counter for hops
chief=world_rank;           // each node declares himself as leader
counter=0;                  // message counter for each node

// each node sends the first message to the next one
MPI_Send(&firstmsg, 2, MPI_INT, (world_rank+1)%world_size, 1, MPI_COMM_WORLD);
printf("Sent ID with counter to the right node [%d -> %d]\n",world_rank, (world_rank+1)%world_size);

while (election_status==0){
    // EDIT: Split MPI_Recv for rank 0 and rest
    if (world_rank==0){
        MPI_Recv(&incoming, 2, MPI_INT, world_size-1, 1, MPI_COMM_WORLD, &status);
    }
    else {
        MPI_Recv(&incoming, 2, MPI_INT, (world_rank-1)%world_size, 1, MPI_COMM_WORLD, &status);
    }
    counter=counter+1;
    if (incoming[0]<chief){
        chief=incoming[0];
    }
    incoming[1]=incoming[1]+1;

    // if message is my own and hopped same times as counter
    if (incoming[0]==world_rank && incoming[1]==counter) {
        printf("Node %d declares node %d a leader.\n", world_rank, chief);  
        election_status=1;
    }
    //sends the incremented message to the next node
    MPI_Send(&incoming, 2, MPI_INT, (world_rank+1)%world_size, 1, MPI_COMM_WORLD);  
}

MPI_Finalize();

1 个答案:

答案 0 :(得分:1)

为了确定所有等级的多个等级中的最小数量,请使用MPI_Allreduce

  • MPI_Send正在阻止。它可以永久阻止,直到发布匹配的接收。你的程序在第一次调用MPI_Send时会死锁 - 而任何连续的一次应该是巧合的。为避免这种情况,请特别使用MPI_Sendrecv
  • (world_rank-1)%world_size将为-1生成world_rank == 0。使用-1作为排名是无效的。它可能恰好是MPI_ANY_SOURCE