Question

我很难从所有处理器收集一些数据到根目录，这是我想要做的一个例子：

我在每个处理器中有几对（实际上它们是边缘），理想情况下要将它们发送到根目录，或者如果我无法发送它们相应的索引（一个数字而不是成对。

例如：

sorted

我想知道存储配对或数字并发送和接收它们的最佳方法是什么。理想情况下，我更喜欢将它们存储在2d矢量中，因为从一开始我就不知道我需要多少空间，并在2D矢量中再次接收它们。我知道它可能不可能或可能非常复杂。

这是我正在寻找的程序的伪代码，但不知道如何在MPI中实现。

Processor 0: sends {(0,5), (1,6)} to root, or it sould send {5,17}
Processor 1: sends {(2,3)} to root, or it sould send {14}
Processor 2: sends {} to root, or it sould send {}
Processor 3: sends {(4,0)} to root, or it sould send {20}

我也考虑过MPI_Gatherv，但似乎没有帮助。从here

得到了这个想法

vector<vector<int > >allSelectedEdges;
vector<vector<int > >selectedEdgesLocal; 
int edgeCount=0;   
if(my_rank!=0){                          
        for(int i = 0; i < rows; ++i)
            for(int j = 0; j < nVertex; ++j)
                if (some conditions)
                {
                    vector<int> tempEdge;
                    tempEdge.push_back(displs[my_rank]+i);
                    tempEdge.push_back(j);
                    selectedEdgesLocal.push_back(tempEdge);
                    edgeCount++;
                }
        }
        "send selectedEdgesLocal to root"
}else
{
      "root recieve sselectedEdgesLocal and store in allSelectedEdges"
}

Answer 1

你应该使用Gather来做这件事。问题是每个进程都有不同数量的值要发送到根。因此，您可以确定要发送的最大值数，并让每个进程发送那么多值（例如，未使用的值为NAN），或者执行Gilles Gouaillardet在评论中建议的内容，并使用两个步骤：

让每个进程计算需要发送的值的数量。将这些计数收集到根rcounts。
使用Gather收集值 - 现在根进程知道rcounts，它可以轻松地将rdisp计算为rcounts的累积总和。

“使用固定的最大数值并用NAN填充未使用的插槽”更简单，如果数据总量很小，则可以正常工作。如果数据总量很大并且每个进程发送的值的数量差异很大，那么两步解决方案可能更有效。

Answer 2

我按如下方式更新我的代码，现在正在运行。

vector <int> selectedEdgesIndicesLocal;

    int edgeCount=0;
    for(int i = 0; i < rows; ++i)
        for(int j = 0; j < nVertex; ++j)
            if (some condistions)
            {
                int index=...;
                selectedEdgesIndicesLocal.push_back(index);
                edgeCount++;
            }

    int NumEdgesToAdd;
    MPI_Reduce(&edgeCount, &NumEdgesToAdd, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

    int *edgeCountsInRoot;
    if (my_rank == 0)edgeCountsInRoot = (int *)malloc(comm_size * sizeof(int));
    MPI_Gather(&edgeCount, 1, MPI_INT, edgeCountsInRoot, 1, MPI_INT, 0,MPI_COMM_WORLD);

    int *allSelectedIndicesEdges;
    if (my_rank == 0)allSelectedIndicesEdges = (int *)malloc(NumEdgesToAdd * sizeof(int));

    int * edgeCounts, *edgeDisp;

    cout<<edgeCount<<endl;
    if (my_rank==0) {
        edgeCounts= (int *)malloc(comm_size * sizeof(int));
        edgeDisp= (int *)malloc(comm_size * sizeof(int));
        int edgeSum=0;
        for(int i=0; i<comm_size; ++i) {
            edgeCounts[i] = edgeCountsInRoot[i];
            edgeDisp[i]=edgeSum;
            edgeSum+=edgeCountsInRoot[i];
        }
    }
    MPI_Gatherv(&selectedEdgesIndicesLocal.front(), edgeCount, MPI_INT, &allSelectedIndicesEdges[0], edgeCounts,edgeDisp, MPI_INT, 0, MPI_COMM_WORLD);

使用MPI_Send和MPI_Recv从所有处理器发送到root。

2 个答案: