我对MPI完全不熟悉,并且完全不知道什么是有效的。 我有x(x = 2,4,8,16,...)节点,每个节点都有超过400万行的大文本文件。 我想使用bucket sort对这些行进行排序,为此,我想将正确的行发送到每个节点(= buckets),这意味着我必须从每个节点发送(和接收)到每个其他节点。
目前我的想法是这样的: 使用我的root,我计算每个节点必须知道的桶限制,以便他们可以将正确的行发送到正确的节点。我使用MPI_Bcast广播这些限制。 之后,由于要发送的每组行的计数不同,我使用MPI_Isend,MPI_Irecv和MPI_Waitall将要从每个节点接收的行数发送到每个其他节点。 当每个节点知道它将从每个节点接收多少数据时,我对数据本身也这样做。 发送和接收是针对每个节点交替完成的(进程[i]首先要发送到进程[i + 1]然后从进程[i-1]接收)。
我的发送和接收部分的代码如下所示:
// fill the buffer for the current rank (because data doesn't have to be sent)
buffers[rank] = buckets[rank];
int i, amount, sendIndex, receiveIndex;
// create MPI_Requests
int reqsIndex = 0;
MPI_Request dummyRequest;
MPI_Request reqs[numtasks - 1];
for (i = 0; i < numtasks - 1; i++)
reqs[i] = MPI_REQUEST_NULL;
// amounts[i]: Which process receives how much data from the current rank
int* amounts = malloc(sizeof(int) * (numtasks-1));
sendIndex = (rank + 1) % numtasks; // always contains the index of the next process to send data to (starting with [rank + 1])
receiveIndex = (rank + numtasks - 1) % numtasks; // always contains the index of the next process to receive data from (starting with [rank - 1])
int isSending = 1; // determines if the current process sends or receives data (always sends first)
// alternating send and receive for each process
for (i = 0; i < (numtasks - 1) * 2; i++)
{
if (isSending) {
MPI_Isend(&buckets[sendIndex]->n, 1, MPI_INT, sendIndex, TAG_NUMBER_OF_SENDED_LINES, MPI_COMM_WORLD, &dummyRequest);
sendIndex = (sendIndex + 1) % numtasks;
if (sendIndex == rank)
sendIndex = (sendIndex + 1) % numtasks;
}
else {
MPI_Irecv(&amounts[receiveIndex], 1, MPI_INT, receiveIndex, TAG_NUMBER_OF_SENDED_LINES, MPI_COMM_WORLD, &reqs[reqsIndex]);
receiveIndex = (receiveIndex + 1) % numtasks;
if (receiveIndex == rank)
receiveIndex = (receiveIndex + 1) % numtasks;
reqsIndex++;
}
isSending = (isSending + 1) % 2; // switch isSending from 1 to 0 or 0 to 1
}
MPI_Waitall(numtasks - 1, reqs, MPI_STATUSES_IGNORE); // waits for all receives to be finished
// requests reset
reqsIndex = 0;
for (i = 0; i < numtasks - 1; i++)
reqs[i] = MPI_REQUEST_NULL;
sendIndex = (rank + 1) % numtasks; // always contains the index of the next process to send data to (starting with [rank + 1])
receiveIndex = (rank + numtasks - 1) % numtasks; // always contains the index of the next process to receive data from (starting with [rank - 1])
isSending = 1; // determines if the current process sends or receives data (always sends first)
for (i = 0; i < (numtasks - 1) * 2; i++)
{
if (isSending) {
MPI_Isend(buckets[sendIndex]->data, buckets[sendIndex]->n * LINE_LENGTH, MPI_BYTE, sendIndex, TAG_LINES, MPI_COMM_WORLD, &dummyRequest);
sendIndex = (sendIndex + 1) % numtasks;
if (sendIndex == rank)
sendIndex = (sendIndex + 1) % numtasks;
}
else {
lineBuffer* lines = allocLines(amounts[receiveIndex]);
MPI_Irecv(lines->data, amounts[receiveIndex] * LINE_LENGTH, MPI_BYTE, receiveIndex, TAG_LINES, MPI_COMM_WORLD, &reqs[reqsIndex]);
buffers[receiveIndex] = lines;
receiveIndex = (receiveIndex + 1) % numtasks;
if (receiveIndex == rank)
receiveIndex = (receiveIndex + 1) % numtasks;
reqsIndex++;
}
isSending = (isSending + 1) % 2; // switch isSending from 1 to 0 or 0 to 1
}
MPI_Waitall(numtasks - 1, reqs, MPI_STATUSES_IGNORE); // waits for all receives to be finished
目前,每个节点的发送/接收时间非常不同,有些进程比其他节点需要更长的时间来完成此代码。有没有比isend / irecv / waitall更好用的东西,特别是考虑到发送的所有东西都是不同的大小? 干杯。 :)