TL; DR:我有一个类成员函数,其中有一些并行代码使用其他私有或受保护的类成员。
我的班级结构类似于:
class ChildClass : public GeneralClass
{
private:
std::vector<Eigen::MatrixXd> edgePotentials;
protected:
// Graph structure
size_t numberOfNodes;
size_t numberOfEdges;
vector< vector<size_t> > edges;
vector< vector<size_t> > containing_edges; // containing_edges[i] is the list of edges that contain the node i
// Some intermediate quantities
std::vector<Eigen::MatrixXd> P;
std::vector<Eigen::MatrixXd> RC;
// Caching decision variables for the above quantities
std::vector< std::vector<bool> > hasChangedP;
std::vector< std::vector<bool> > hasChangedRC;
// A component of the main algorithm (that need to be parallized)
virtual void computeP(size_t d, const std::vector<Eigen::MatrixXd> &X);
public:
ChildClass();
// Main algorithm
virtual double MainAlgorithm() override;
};
现在在成员MainAlgorithm
中,我调用了一些需要并行化的函数:
double ChildClass::MainAlgorithm()
{
/// Initialization
vector<MatrixXd> X(D);
...
// Main algorithm
for(size_t d = 0; d < D; d++){
// Step 1: compute the caching decision variables hasChangedP and hasChangedRC (to see if P and RC need to be re-computed or not)
// Step 2: Call this function
computeP(d, X);
// Step 3: Update X
}
}
并且所讨论的功能具有以下结构:
void ChildClass::computeP(size_t d, const vector<MatrixXd> &X)
{
// for each edge
#pragma omp parallel for
for(size_t e = 0; e < numberOfEdges; e++){
size_t i = edges[e][0];
size_t j = edges[e][1];
if(hasChangedRC[e][d]){
RC[e].col(d) = edgePotentials[e]*X[1-d].col(j);
}
}
// now for each node
#pragma omp parallel for
for(size_t i = 0; i < numberOfNodes; i++){
if(hasChangedP[d][i]){
... compute P[d].col(i) based on RC, edgePotentials, containing_edges...
}
}
}
目前#pragma omp parallel for
根本没有帮助。我想这是因为班级成员(numberOfNodes, numberOfEdges, RC, edgePotentials, containing_edges
,......)无法在并行区域中共享?
你能帮我解决一下吗?非常感谢你!
更新
numberOfNodes
可以从几千到几十万,numberOfEdges
几次numberOfNodes
。正如@zzxyz所建议的,我试图将循环划分为N
块(其中N
是线程数)。而不是
#pragma omp parallel for
for(size_t e = 0; e < numberOfEdges; e++){
// Code for each edge here
}
我用过:
size_t threads = 8;
size_t p = floor(numberOfEdges/threads);
#pragma omp parallel for
for(size_t b = 0; b < threads; b++){
size_t first = b*p;
size_t last = (b+1)*p - 1;
if(b >= threads - 1){
last = numberOfEdges - 1;
}
for(size_t e = first; e <= last; e++){
// Code for each edge here
}
}
类似于节点上的循环。然而,这也没有帮助。 (正如@zzxyz后面指出的那样,这是OpenGM已经为我们自动完成的事情。)