Question

我正在尝试用C ++编写多码马尔可夫链，而我正在尝试利用多个CPU（最多24个）在每个CPU中运行不同的链，我在选择正确的容器时遇到问题在每个CPU上收集数值计算结果。我想要测量的基本上是布尔变量数组的平均值。我试过在`std :: vector``对象周围编写一个包装器，看起来像这样：

struct densityStack {
    vector<int> density; //will store the sum of boolean varaibles
    int card; //will store the amount of elements we summed over for normalizing at the end

    densityStack(int size){ //constructor taking as only parameter the size of the array, usually size = 30
        density = vector<int> (size, 0);
        card = 0;
        }

    void push_back(vector<int> & toBeAdded){ //method summing a new array (of measurements) to our stack
        for(auto valStack = density.begin(), newVal = toBeAdded.begin(); valStack != density.end(); ++valStack, ++ newVal)
            *valStack += *newVal;
        card++;
        }

    void savef(const char * fname){ //method outputting into a file
        ofstream out(fname);
        out.precision(10);
        out << card << "\n"; //saving the cardinal in first line 
        for(auto val = density.begin(); val != density.end(); ++val)
            out << << (double) *val/card << "\n";
        out.close();
        }
};

然后，在我的代码中，我使用单个densityStack对象，每次CPU核心有数据（可以每秒100次），它将调用push_back将数据发送回{{ 1}}。

我的问题是，这似乎比第一种原始方法慢，即每个核心存储每个测量数据的文件，然后我使用一些Python脚本来平均和清理（我对它感到不满，因为存储了太多信息并在硬盘驱动器上产生太大的无用压力。）

你知道我在哪里可以失去很多表现吗？我的意思是有明显的过度来源吗？因为对我来说，即使在1000Hz的频率下复制矢量也不应该太多。

Answer 1

您如何同步共享的densityStack实例？

从有限的信息来看，我的猜测是每次有一小块数据时，CPU都会被阻塞等待写入数据。如果这是问题，那么提高性能的简单技术就是减少写入次数。为每个CPU保留一个数据缓冲区，并且不那么频繁地写入densityStack。

c ++堆栈对多核应用程序有效

1 个答案: