Question

我对MPI完全不熟悉，并且完全不知道什么是有效的。我有x（x = 2,4,8,16，...）节点，每个节点都有超过400万行的大文本文件。我想使用bucket sort对这些行进行排序，为此，我想将正确的行发送到每个节点（= buckets），这意味着我必须从每个节点发送（和接收）到每个其他节点。

目前我的想法是这样的：使用我的root，我计算每个节点必须知道的桶限制，以便他们可以将正确的行发送到正确的节点。我使用MPI_Bcast广播这些限制。之后，由于要发送的每组行的计数不同，我使用MPI_Isend，MPI_Irecv和MPI_Waitall将要从每个节点接收的行数发送到每个其他节点。当每个节点知道它将从每个节点接收多少数据时，我对数据本身也这样做。发送和接收是针对每个节点交替完成的（进程[i]首先要发送到进程[i + 1]然后从进程[i-1]接收）。

我的发送和接收部分的代码如下所示：

// fill the buffer for the current rank (because data doesn't have to be sent)
buffers[rank] = buckets[rank];

int i, amount, sendIndex, receiveIndex;

// create MPI_Requests
int reqsIndex = 0;
MPI_Request dummyRequest;
MPI_Request reqs[numtasks - 1];
for (i = 0; i < numtasks - 1; i++)
   reqs[i] = MPI_REQUEST_NULL;

// amounts[i]: Which process receives how much data from the current rank
int* amounts = malloc(sizeof(int) * (numtasks-1));

sendIndex = (rank + 1) % numtasks; // always contains the index of the next process to send data to (starting with [rank + 1])
receiveIndex = (rank + numtasks - 1) % numtasks;  // always contains the index of the next process to receive data from (starting with [rank - 1])
int isSending = 1; // determines if the current process sends or receives data (always sends first)
// alternating send and receive for each process
for (i = 0; i < (numtasks - 1) * 2; i++)
{
    if (isSending) { 
        MPI_Isend(&buckets[sendIndex]->n, 1, MPI_INT, sendIndex, TAG_NUMBER_OF_SENDED_LINES, MPI_COMM_WORLD, &dummyRequest);
        sendIndex = (sendIndex + 1) % numtasks;
        if (sendIndex == rank)
            sendIndex = (sendIndex + 1) % numtasks;
    }
    else {
        MPI_Irecv(&amounts[receiveIndex], 1, MPI_INT, receiveIndex, TAG_NUMBER_OF_SENDED_LINES, MPI_COMM_WORLD, &reqs[reqsIndex]); 
        receiveIndex = (receiveIndex + 1) % numtasks;
        if (receiveIndex == rank)
            receiveIndex = (receiveIndex + 1) % numtasks;
        reqsIndex++;    
    }
    isSending = (isSending + 1) % 2; // switch isSending from 1 to 0 or 0 to 1
}

MPI_Waitall(numtasks - 1, reqs, MPI_STATUSES_IGNORE); // waits for all receives to be finished

// requests reset
reqsIndex = 0;
for (i = 0; i < numtasks - 1; i++)
   reqs[i] = MPI_REQUEST_NULL;

sendIndex = (rank + 1) % numtasks; // always contains the index of the next process to send data to (starting with [rank + 1])
receiveIndex = (rank + numtasks - 1) % numtasks; // always contains the index of the next process to receive data from (starting with [rank - 1])
isSending = 1; // determines if the current process sends or receives data (always sends first)
for (i = 0; i < (numtasks - 1) * 2; i++)
{
    if (isSending) {
        MPI_Isend(buckets[sendIndex]->data, buckets[sendIndex]->n * LINE_LENGTH, MPI_BYTE, sendIndex, TAG_LINES, MPI_COMM_WORLD, &dummyRequest);
        sendIndex = (sendIndex + 1) % numtasks;
        if (sendIndex == rank)
            sendIndex = (sendIndex + 1) % numtasks;
    }
    else {
        lineBuffer* lines = allocLines(amounts[receiveIndex]); 
        MPI_Irecv(lines->data, amounts[receiveIndex] * LINE_LENGTH, MPI_BYTE, receiveIndex, TAG_LINES, MPI_COMM_WORLD, &reqs[reqsIndex]); 
        buffers[receiveIndex] = lines;
        receiveIndex = (receiveIndex + 1) % numtasks;
        if (receiveIndex == rank)
            receiveIndex = (receiveIndex + 1) % numtasks;
        reqsIndex++;        
    }
    isSending = (isSending + 1) % 2; // switch isSending from 1 to 0 or 0 to 1
}

MPI_Waitall(numtasks - 1, reqs, MPI_STATUSES_IGNORE); // waits for all receives to be finished

目前，每个节点的发送/接收时间非常不同，有些进程比其他节点需要更长的时间来完成此代码。有没有比isend / irecv / waitall更好用的东西，特别是考虑到发送的所有东西都是不同的大小？干杯。：）

MPI：在所有节点之间有效交换不同数量的数据

0 个答案: