我正在尝试在进程之间传递频率向量,并在过程中对其进行更新。进程以树形拓扑进行通信:
0: 1 2
1: 0 3 4 5 6
2: 0 7 8
3: 1
4: 1 9 10
5: 1
6: 1
7: 2
8: 2 11
9: 4
10: 4
11: 8
基本上,等级0只能与1和2通信,而1只能与0,3,4,5,6通信,依此类推。最后,等级0应该有一个频率向量,并包含其他等级的所有值。
if (rank == 0) {
for (i = 0; i < nr_elements; i++) {
MPI_Recv(local_frequency, num_alphabets, MPI_INT, neigh[i], 0, MPI_COMM_WORLD, &status);
printf("[RANK %d]Received from %d\n", rank, neigh[i]);
for(i = 0; i < num_alphabets; i++) {
frequency[i]+=local_frequency[i];
}
}
}
else {
//leaf
if (nr_elements == 1) {
MPI_Send(frequency, num_alphabets, MPI_INT, parent, 0, MPI_COMM_WORLD);
printf("[RANK %d]Sent to %d\n", rank, parent);
}
else {
//first we receive
for (i = 0; i < nr_elements; i++) {
if (neigh[i] != parent) {
MPI_Recv(local_frequency, num_alphabets, MPI_INT, neigh[i], 0, MPI_COMM_WORLD, &status);
printf("[RANK %d]Received from %d\n", rank, neigh[i]);
for(i = 0; i < num_alphabets; i++) {
frequency[i]+=local_frequency[i];
}
}
}
MPI_Send(frequency, num_alphabets, MPI_INT, parent, 0, MPI_COMM_WORLD);
printf("[RANK %d]Sent to %d\n", rank, parent);
}
这是他们沟通的结果:
- [RANK 2]Received from 7
- [RANK 2]Sent to 0
- [RANK 3]Sent to 1
- [RANK 6]Sent to 1
- [RANK 7]Sent to 2
- [RANK 4]Received from 9
- [RANK 4]Sent to 1
- [RANK 5]Sent to 1
- [RANK 9]Sent to 4
- [RANK 0]Received from 1
- [RANK 1]Received from 3
- [RANK 1]Sent to 0
- [RANK 10]Sent to 4
- [RANK 11]Sent to 8
- [RANK 8]Received from 11
- [RANK 8]Sent to 2
每个孩子都将信息发送给他们的父母,但显然不是所有消息都被接收到。但是,如果我在每个MPI_Recv之后都删除了更新操作,则一切正常。同步有问题吗?我该怎么办?
Some things you should know:
- num_alphabets = 256
- parent and nr_elements are well calculated
- neigh is the neighbours vector
答案 0 :(得分:1)
调试
使用-g
进行编译并在调试器中运行可能会帮助您确定问题出在哪里。为此,您可以按以下方式启动MPI程序:
mpirun -n 4 xterm -hold -e gdb -ex run --args ./program [arg1] [arg2] [...]
这将为每个进程打开一个终端窗口,使您可以独立检查每个进程的内存和堆栈。
阻止发送/接收
由于MPI_Recv
和MPI_Send
都处于阻塞状态,因此当两个进程正在发送,而一个进程应该从另一个进程接收时,您很容易陷入困境。您可以阅读dining philosophers及其类似信息,以更好地处理此类情况。我还建议将调试消息添加到调试输出消息中,以指示您的进程何时尝试发送和尝试接收。当他们应该互相发送/接收时,您可能会发现一对都试图接收或都试图发送的一对。
无障碍通信
上述解决方案是使用MPI的非阻塞发送/接收命令:MPI_Isend
和MPI_Irecv
。这些消除了您会在上面找到的竞争条件,并且在一个流程可以工作而又等待另一个流程的结果的情况下也很方便。由于您有一棵树,因此无法确定是哪个孩子会首先返回其结果。