MPI_Waitany导致分段错误

时间:2014-07-20 00:25:00

标签: c segmentation-fault mpi openmpi

我正在使用MPI将图像分发到不同的进程,以便:

  
      
  1. 进程0将图像分发到不同的进程。

  2.   
  3. 处理其他   处理图像,然后将结果发送回处理0。

  4.   

当进程0使用映像完成其作业时,进程0会尝试忙于进程,因此只要它处于空闲状态,就会为其分配另一个要处理的映像。代码如下:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"

#define MAXPROC 16    /* Max number of processes */
#define TOTAL_FILES 7

int main(int argc, char* argv[]) {
        int i, nprocs, tprocs, me, index;
        const int tag  = 42;    /* Tag value for communication */

        MPI_Request recv_req[MAXPROC];  /* Request objects for non-blocking receive */
        MPI_Request send_req[MAXPROC]; /* Request objects for non-blocking send */     
        MPI_Status status;              /* Status object for non-blocing receive */

        char myname[MPI_MAX_PROCESSOR_NAME];             /* Local host name string */
        char hostname[MAXPROC][MPI_MAX_PROCESSOR_NAME];  /* Received host names */
        int namelen;   

        MPI_Init(&argc, &argv);                /* Initialize MPI */
        MPI_Comm_size(MPI_COMM_WORLD, &nprocs);    /* Get nr of processes */
        MPI_Comm_rank(MPI_COMM_WORLD, &me);    /* Get own identifier */

        MPI_Get_processor_name(myname, &namelen);  /* Get host name */
        myname[namelen++] = (char)0;              /* Terminating null byte */

        /* First check that we have at least 2 and at most MAXPROC processes */
        if (nprocs<2 || nprocs>MAXPROC) {
                if (me == 0) {
                  printf("You have to use at least 2 and at most %d processes\n", MAXPROC);
                }
                MPI_Finalize(); exit(0);
        }

        /* if TOTAL_FILES < nprocs then use only TOTAL_FILES + 1 procs */
        tprocs = (TOTAL_FILES < nprocs) ? TOTAL_FILES + 1 : nprocs;
        int done = -1;

        if (me == 0) {    /* Process 0 does this */

                int send_counter = 0, received_counter;

                for (i=1; i<tprocs; i++) {
                        MPI_Isend(&send_counter, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
                        ++send_counter;
                        /* Receive a message from all other processes */
                        MPI_Irecv (hostname[i], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[i]);
                }      

                for (received_counter = 0; received_counter < TOTAL_FILES; received_counter++){

                        /* Wait until at least one message has been received from any process other than 0*/
                        MPI_Waitany(tprocs-1, &recv_req[1], &index, &status);

                        if (index == MPI_UNDEFINED) perror("Errorrrrrrr");                     
                        printf("Received a message from process %d on %s\n", status.MPI_SOURCE, hostname[index+1]);

                        if (send_counter < TOTAL_FILES){ /* si todavia faltan imagenes por procesar */
                                MPI_Isend(&send_counter, 1, MPI_INT, status.MPI_SOURCE, tag, MPI_COMM_WORLD, &send_req[status.MPI_SOURCE]);
                                ++send_counter;
                                MPI_Irecv (hostname[status.MPI_SOURCE], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[status.MPI_SOURCE]);
                        }      
                }

              for (i=1; i<tprocs; i++) {
                      MPI_Isend(&done, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
              }

        } else if (me < tprocs) { /* all other processes do this */

                int y;         
                MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);

                while (y != -1) {                                      
                        printf("Process %d: Received image %d\n", me, y);
                        sleep(me%3+1);  /* Let the processes sleep for 1-3 seconds */

                        /* Send own identifier back to process 0 */
                        MPI_Send (myname, namelen, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
                        MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);                
                }      
        }

        MPI_Finalize();
        exit(0);
}

基于this示例。

现在我收到分段错误,不知道为什么。我是 MPI 的新手,但我在上面的代码中看不到错误。它只发生在一定数量的进程中。例如,当TOTAL_FILES = 7并且运行5,6或7个进程时。适用于9个或更多流程。

可以找到整个代码here。尝试使用6个进程会导致上述错误。

编译并执行:

mpicc -Wall sscce.c -o sscce -lm 
mpirun -np 6 sscce

1 个答案:

答案 0 :(得分:2)

导致细分错误的问题不是MPI_Waitany,但是recv_req[]中的所有请求都已完成(即index == MPI_UNDEFINED),这就是您处理案例的方式。 perror()在尝试访问printf时,不会停止代码并继续进行,然后在hostname[index+1]语句中进行段错误。数组中所有请求完成的原因是,由于在接收呼叫中使用MPI_ANY_SOURCE,发送方的等级不能保证等于recv_req[]中请求的索引 - 只需在index返回后比较status.MPI_SOURCEMPI_Waitany即可自行查看。因此,后续调用MPI_Irecv很有可能覆盖仍未完成的请求,因此MPI_Waitany可以完成的请求数量少于预期的实际结果数。

另请注意,您永远不会等待发送请求完成。很幸运,Open MPI实现使用急切协议发送小消息,因此即使在已启动的发送请求中从未调用MPI_Wait(any|all)MPI_Test(any|all),也会发送这些消息。