这些退出代码对MPI程序意味着什么?

时间:2015-10-15 19:53:56

标签: mpi

当我尝试运行MPI程序但失败时。它说:

job aborted:

[ranks] message

[0] process exited without calling finalize

[1-3] terminated

错误分析表示退出代码为0xc0000005

然后我谷歌,有人说使用MPI_Init_thread代替,但它给了我255退出代码。

我该如何解决?排名0过程有什么问题?

以下是使用MPI发送和接收数据的代码片段:

        // MPI things
    MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
    // master
    if (taskid == 0)
    {
        //printf("taskid: %d", taskid);
        average = Nchunk / Nworkers;
        extra = Nchunk % Nworkers;
        mtype = FROM_MASTER;
        offset = 0;

        // store volume[Itemp[n]]
        for (int i = 0; i < Nchunk; i++)
        {
            volumeTemp[i] = volume[Itemp[i]];
        }

        // send to slave
        for (int dest = 1; dest <= Nworkers; dest++)
        {

            Nelements = (dest <= extra) ? average + 1 : average;
            MPI_Send(&Nelements, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
            MPI_Send(&offset, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
            MPI_Send(&Itemp[offset], Nelements, MPI_INT, dest, mtype, MPI_COMM_WORLD);
            MPI_Send(&SMtemp[offset], Nelements, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);
            MPI_Send(&volumeTemp[offset], Nelements, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);
            offset = offset + Nelements;
        }


        // receive result from slave
        mtype = FROM_WORKERS;
        for (int source = 1; source <= Nworkers; source++)
        {
            //MPI_Recv(&average, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
            //MPI_Recv(&offset, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
            MPI_Recv(&sinogram[ns], 1, MPI_FLOAT, source, mtype, MPI_COMM_WORLD, &status);
        }


    }
    //printf("taskid: %d", taskid);

    // slave
    if (taskid > 0)
    {
        //printf("taskid: %d", taskid);
        mtype = FROM_MASTER;
        MPI_Recv(&Nelements, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
        MPI_Recv(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
        MPI_Recv(&Itemp[offset], Nelements, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
        MPI_Recv(&SMtemp[offset], Nelements, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
        MPI_Recv(&volumeTemp, Nelements, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);

        for (int i = 0; i < average; i++)
        {
            if (fabs(volumeTemp[i]) > 1.0e-14)
                sinogram[ns] = sinogram[ns] + volumeTemp[i] * SMtemp[i];
        }

        //send to master
        mtype = FROM_WORKERS;
        MPI_Send(&sinogram[ns], 1, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);
    }

1 个答案:

答案 0 :(得分:1)

MPI的退出代码很少有意义,因为您有多个进程都返回自己的错误代码。依赖程序吐出的错误消息会更有帮助。幸运的是,你的计划确实如此!

[0] process exited without calling finalize

这可能意味着两件事之一;

  1. 您的计划已完成,但未拨打MPI_Finalize。这很容易解决。检查以确保程序可以正常终止,它会调用MPI_Finalize。这可能是你的问题,也可能不是你的问题......
  2. 您的程序异常终止。这通常很难追查,可能需要一些通常的MPI debugging技巧。我们可能无法解决您的问题,除非您的代码很小或您follow the guidelines on creating a good example,否则问题就出现了问题。