Question

这个问题是关于在C ++中实现以下模拟的最佳策略。

我正在尝试将模拟作为物理研究项目的一部分，该项目基本上跟踪空间中节点链的动态。每个节点都包含一个位置以及某些参数（局部曲率，速度，与邻居的距离等等），这些参数都是随时间演变的。

每个时间步骤都可以分解为这四个部分：

计算本地参数。这些值取决于链中最近的邻居。
计算全局参数。
演进。根据全局和局部参数以及一些随机力场，每个节点的位置移动一小部分。
填充。如果两个连续节点之间的距离达到临界值，则插入新节点。

（此外，节点可能会卡住，这会使其在模拟的其余部分处于非活动状态。具有非活动邻居的非活动节点的本地参数不会更改，也不需要再进行计算。）

每个节点包含~60个字节，链中有~100 000个节点，我需要进行大约1 000 000个时间步长的链。然而，我想最大化这些数字，因为它会提高我的模拟的准确性，但是在模拟在合理的时间（〜小时）内完成的限制。（约30％的节点将处于非活动状态。）

我已经开始将此模拟实现为C ++中的双向链表。这似乎很自然，因为我需要在现有节点之间插入新节点，因为本地参数取决于最近的邻居。（我添加了一个指向下一个活动节点的额外指针，以避免不必要的计算，每当我遍历整个链时）。

在并行化（或编码）方面，我不是专家，但我已经玩过OpenMP，我真的很喜欢用两行代码加速独立操作的循环。我不知道如何使我的链表并行完成，或者它是否有效（？）。所以我有了使用stl向量的想法。在哪里我可以存储邻居的索引并通过标准查找来访问它们，而不是指向最近的邻居。我还可以通过链的位置（每x次时间步）对矢量进行排序，以便在内存中获得更好的局部性。这种方法可以循环OpenMP方式。

我有点被这个想法所吓倒，因为我不需要处理内存管理。我想stl向量实现比我在列表中处理节点的简单“新”和“删除”方式更好。我知道我可以用stl列表做同样的事情，但我不喜欢用迭代器访问最近邻居的方式。

所以我问你，1337 h4x0r和熟练的程序员，对我的模拟来说什么是更好的设计？向量方法是否勾勒出一个好主意？或者是否有链接列表上的技巧可以使它们与OpenMP一起使用？或者我应该考虑一种完全不同的方法？

模拟将在具有8核和48G RAM的计算机上运行，所以我想我可以换取大量内存来提高速度。

提前致谢

修改我需要每次添加1-2％的新节点，因此将它们存储为没有索引到最近邻居的矢量将无法工作，除非我每次都对矢量进行排序。

Answer 1

这是一个经典的权衡问题。使用数组或std :: vector将使计算更快，插入更慢;使用双向链表或std :: list将使插入更快，计算速度更慢。

判断权衡问题的唯一方法是凭经验;哪个会更快地适合您的特定应用？你所能做的就是尝试两种方式并看到。计算越强烈，模板越短（例如，计算强度 - 每个内存量需要多少次触发），标准数组就越不重要。但基本上你应该以两种方式模拟基本计算的实现，看看它是否重要。我已经用std :: vector和std :: list攻击了一些非常原油。它可能在任何一种方式中都是错误的，但是你可以试一试并使用一些参数并看看哪些胜利。在我的系统上，给出了大小和计算量，列表更快，但它可以很容易地进行。

w / rt openmp，是的，如果这就是你要去的方式，那么你的双手就会受到束缚;你几乎肯定要采用矢量结构，但首先你应该确保插入的额外成本不会消除多个核心的任何好处。

#include <iostream>
#include <list>
#include <vector>
#include <cmath>
#include <sys/time.h>
using namespace std;

struct node {
    bool stuck;
    double x[2];
    double loccurve;
    double disttoprev;
};

void tick(struct timeval *t) {
    gettimeofday(t, NULL);
}

/* returns time in seconds from now to time described by t */
double tock(struct timeval *t) {
    struct timeval now;
    gettimeofday(&now, NULL);
    return (double)(now.tv_sec - t->tv_sec) +
        ((double)(now.tv_usec - t->tv_usec)/1000000.);
}

int main()
{
    const int nstart = 100;
    const int niters = 100;
    const int nevery = 30;
    const bool doPrint = false;
    list<struct node>   nodelist;
    vector<struct node> nodevect;

    // Note - vector is *much* faster if you know ahead of time 
    //  maximum size of vector
    nodevect.reserve(nstart*30);

    // Initialize
    for (int i = 0; i < nstart; i++) {
        struct node *mynode = new struct node;
        mynode->stuck = false;
        mynode->x[0] = i; mynode->x[1] = 2.*i;
        mynode->loccurve = -1;
        mynode->disttoprev = -1;

        nodelist.push_back( *mynode );
        nodevect.push_back( *mynode );
    }

    const double EPSILON = 1.e-6;
    struct timeval listclock;
    double listtime;

    tick(&listclock);
    for (int i=0; i<niters; i++) {
        // Calculate local curvature, distance

        list<struct node>::iterator prev, next, cur;
        double dx1, dx2, dy1, dy2;

        next = cur = prev = nodelist.begin();
        cur++;
        next++; next++;
        dx1 = prev->x[0]-cur->x[0];
        dy1 = prev->x[1]-cur->x[1];

        while (next != nodelist.end()) {
            dx2 = cur->x[0]-next->x[0];
            dy2 = cur->x[1]-next->x[1];

            double slope1 = (dy1/(dx1+EPSILON));
            double slope2 = (dy2/(dx2+EPSILON));

            cur->disttoprev = sqrt(dx1*dx1 + dx2*dx2 );

            cur->loccurve = ( slope1*slope2*(dy1+dy2) +
                              slope2*(prev->x[0]+cur->x[0]) -
                              slope1*(cur->x[0] +next->x[0]) ) /
                            (2.*(slope2-slope1) + EPSILON);

            next++;
            cur++;
            prev++;
        }

        // Insert interpolated pt every neveryth pt
        int count = 1;
        next = cur = nodelist.begin();
        next++;
        while (next != nodelist.end()) {
            if (count % nevery == 0) {
                struct node *mynode = new struct node;
                mynode->x[0] = (cur->x[0]+next->x[0])/2.;
                mynode->x[1] = (cur->x[1]+next->x[1])/2.;
                mynode->stuck = false;
                mynode->loccurve = -1;
                mynode->disttoprev = -1;
                nodelist.insert(next,*mynode);
            }
            next++;
            cur++;
            count++;
        }
    }
                                                               51,0-1        40%

    struct timeval vectclock;
    double vecttime;

    tick(&vectclock);
    for (int i=0; i<niters; i++) {
        int nelem = nodevect.size();
        double dx1, dy1, dx2, dy2;
        dx1 = nodevect[0].x[0]-nodevect[1].x[0];
        dy1 = nodevect[0].x[1]-nodevect[1].x[1];

        for (int elem=1; elem<nelem-1; elem++) {
            dx2 = nodevect[elem].x[0]-nodevect[elem+1].x[0];
            dy2 = nodevect[elem].x[1]-nodevect[elem+1].x[1];

            double slope1 = (dy1/(dx1+EPSILON));
            double slope2 = (dy2/(dx2+EPSILON));

            nodevect[elem].disttoprev = sqrt(dx1*dx1 + dx2*dx2 );

            nodevect[elem].loccurve = ( slope1*slope2*(dy1+dy2) +
                              slope2*(nodevect[elem-1].x[0] +
                                      nodevect[elem].x[0])  -
                              slope1*(nodevect[elem].x[0] +
                                      nodevect[elem+1].x[0]) ) /
                            (2.*(slope2-slope1) + EPSILON);

        }

        // Insert interpolated pt every neveryth pt
        int count = 1;
        vector<struct node>::iterator next, cur;
        next = cur = nodevect.begin();
        next++;
        while (next != nodevect.end()) {
            if (count % nevery == 0) {
                struct node *mynode = new struct node;
                mynode->x[0] = (cur->x[0]+next->x[0])/2.;
                mynode->x[1] = (cur->x[1]+next->x[1])/2.;
                mynode->stuck = false;
                mynode->loccurve = -1;
                mynode->disttoprev = -1;
                nodevect.insert(next,*mynode);
            }
            next++;
            cur++;
            count++;
        }
    }
    vecttime = tock(&vectclock);

    cout << "Time for list: " << listtime << endl;
    cout << "Time for vect: " << vecttime << endl;

    vector<struct node>::iterator v;
    list  <struct node>::iterator l;

    if (doPrint) {
        cout << "Vector: " << endl;
        for (v=nodevect.begin(); v!=nodevect.end(); ++v) {
             cout << "[ (" << v->x[0] << "," << v->x[1] << "), " << v->disttoprev << ", " << v->loccurve << "] " << endl;
        }

        cout << endl << "List: " << endl;
        for (l=nodelist.begin(); l!=nodelist.end(); ++l) {
             cout << "[ (" << l->x[0] << "," << l->x[1] << "), " << l->disttoprev << ", " << l->loccurve << "] " << endl;
        }

    }

    cout << "List size is " << nodelist.size() << endl;
}

Answer 2

假设新元素的创建相对不频繁，我会采用排序向量方法，原因包括：

在
利用空间位置
更容易并行化

当然，为了实现这一目标，您必须确保向量总是排序，而不是每隔k个时间步。

Answer 3

对于并行编程学生来说，这看起来很不错。

您似乎拥有一个自然导致分销的数据结构，即链条。您可以对（半）静态分配给不同线程的子链进行相当多的工作。您可能希望单独处理N-1边界情况，但如果子链长度> 3，则它们彼此隔离。

当然，在每个步骤之间，您必须更新全局变量，但链长等变量是简单的并行添加。只需计算每个子链的长度，然后再添加它们。如果您的子链长度为100000/8，则单线程工作是在步骤之间添加这8个子链。

如果节点的增长非常不均匀，您可能需要经常重新平衡子链长度。

什么是更好的实施策略？

3 个答案: