我在使用以下代码时遇到了一些麻烦:
int main(int argc, char *argv[]){
int id, p, n, ln, i, j, retCode;
double *buffer;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &id);
n = strtol(argv[1],NULL,10); // total number of elements to be distributed
ln = n/p; // local number of elements
buffer = (double*)calloc(ln, sizeof(double));
if (id == p-1) // Process p-1 send to other processes
{
for (i=0; i< p-1; i++)
{
fprintf(stdout, "Process %d is sending %d elements to process %d\n", p-1, ln, i);
retCode = MPI_Ssend (buffer, ln, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
if(retCode)
fprintf(stdout, "MPISend error at file %s, line %d code %d\n", __FILE__, __LINE__, retCode);
fprintf(stdout, "Process %d completed sending to process %d\n", p-1, i);
}
}
else // other processes receive from process p-1
{
fprintf(stdout, "Process %d is receiving %d elements from process %d\n", id, ln,p-1);
retCode = MPI_Recv (buffer, ln, MPI_DOUBLE, p-1, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
if(retCode)
fprintf(stdout, "MPI_Recv error at file %s, line %d code %d\n", __FILE__, __LINE__, retCode);
fprintf(stdout, "Process %d received from process %d\n", id, p-1);
}
free(buffer);
MPI_Finalize();
return 0;
}
我们的想法是打开一个包含进程p-1的数据集,然后将其分发给其余进程。当变量ln(本地元素数)小于8182时,此解决方案有效。当我增加元素数量时,我发现以下错误:
mpiexec -np 2 ./sendreceive 16366
Process 0 is receiving 8183 elements from process 1
Process 1 is sending 8183 elements to process 0
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(224)...................: MPI_Recv(buf=0x2000590, count=8183, MPI_DOUBLE, src=1, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0x1) failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error! cannot read from remote process
出了什么问题?
答案 0 :(得分:0)
我认为问题是你在退出程序之前没有调用MPI_Finalize()。如果我在笔记本电脑上运行你的代码,我会得到一个错误(另一个错误!),即使对于&#34; n&#34;的小值,如果我在返回之前调用MPI_Finalize(),它就会消失。
我猜你不得到n&lt; = 8192的错误的原因是,在内部深处,MPI正在使用不同的协议来交换64K字节或更少的消息到大于64K的消息。对于较小的消息,您只是幸运的是发送完成而不调用Finalize()。
答案 1 :(得分:0)
我猜如果使用MPI_Send而不是MPI_Ssend,代码是否有效? 如果您尝试使用其他通信设备,它是否有效?
如果至少对其中一个问题的回答是肯定的,那么我会尝试检查这是否是您使用的MPI实现的已知错误。