MPI发送和接收不会超过8182倍

时间:2016-09-17 15:09:24

标签: c parallel-processing mpi

我在使用以下代码时遇到了一些麻烦:

    int main(int argc, char *argv[]){

int id, p, n, ln, i, j, retCode;
double *buffer;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &id);


n = strtol(argv[1],NULL,10); // total number of elements to be distributed

ln = n/p;   // local number of elements

buffer = (double*)calloc(ln, sizeof(double));

if (id == p-1)  // Process p-1 send to other processes
{
    for (i=0; i< p-1; i++)
    {
        fprintf(stdout, "Process %d is sending %d elements to process %d\n", p-1, ln, i);
        retCode = MPI_Ssend (buffer, ln, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);

        if(retCode)
            fprintf(stdout, "MPISend error at file %s, line %d  code %d\n", __FILE__, __LINE__, retCode);

        fprintf(stdout, "Process %d completed sending to process %d\n", p-1, i);

    }

} 
else    // other processes receive from process p-1
{
    fprintf(stdout, "Process %d is receiving %d elements from process %d\n", id, ln,p-1);
    retCode = MPI_Recv (buffer, ln, MPI_DOUBLE, p-1, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    if(retCode)
        fprintf(stdout, "MPI_Recv error at file %s, line %d  code %d\n", __FILE__, __LINE__, retCode);
    fprintf(stdout, "Process %d received from process %d\n", id, p-1);
}
free(buffer);
MPI_Finalize(); 
return 0;
}

我们的想法是打开一个包含进程p-1的数据集,然后将其分发给其余进程。当变量ln(本地元素数)小于8182时,此解决方案有效。当我增加元素数量时,我发现以下错误:

    mpiexec -np 2   ./sendreceive 16366
    Process 0 is receiving 8183 elements from process 1
    Process 1 is sending 8183 elements to process 0
    Fatal error in MPI_Recv: Other MPI error, error stack:
    MPI_Recv(224)...................: MPI_Recv(buf=0x2000590, count=8183,         MPI_DOUBLE, src=1, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0x1) failed
    PMPIDI_CH3I_Progress(623).......: fail failed
    pkt_RTS_handler(317)............: fail failed
    do_cts(662).....................: fail failed
    MPID_nem_lmt_dcp_start_recv(288): fail failed
    dcp_recv(154)...................: Internal MPI error!  cannot read from remote process

出了什么问题?

2 个答案:

答案 0 :(得分:0)

我认为问题是你在退出程序之前没有调用MPI_Finalize()。如果我在笔记本电脑上运行你的代码,我会得到一个错误(另一个错误!),即使对于&#34; n&#34;的小值,如果我在返回之前调用MPI_Finalize(),它就会消失。

我猜你得到n&lt; = 8192的错误的原因是,在内部深处,MPI正在使用不同的协议来交换64K字节或更少的消息到大于64K的消息。对于较小的消息,您只是幸运的是发送完成而不调用Finalize()。

答案 1 :(得分:0)

我猜如果使用MPI_Send而不是MPI_Ssend,代码是否有效? 如果您尝试使用其他通信设备,它是否有效?

如果至少对其中一个问题的回答是肯定的,那么我会尝试检查这是否是您使用的MPI实现的已知错误。