Question

我目前正在尝试并行化粒子群算法，并且无法找到一种有效的方法来处理循环内的=。我没有采用最简单的方式将#pragma omp parallel for放在循环的开头，因为我预计这会导致错误的共享问题。变量是矩阵和向量，整个事情变得复杂。

我认为这个最小的例子（使用jsbin，一个类似的线性代数库，它可以更好地描述它，然后我可以描述它：

#include <omp.h>
#include <math.h>
#include <armadillo>

using namespace std;

int main(int argc, char** argv) {
  arma::uword dimensions = 10, particleCount = 40;
  //Matrix with dimensions-rows, and particleCount-columns. initialized with 1
  arma::Mat<double> positions = arma::ones(dimensions, particleCount);
  //same as above, but initialized with 2.   +,/,*,- are elementwise;like in matlab
  arma::Mat<double> velocities = arma::ones(dimensions, particleCount) * 2;

  for(arma::uword n = 0; n < particleCount; n++) {
    //.col(n) gets the nth column of the matrix
    arma::Col<double> newVelocity = std::rand() * velocities.col(n);
    //there is a lot more math done here, but all of it is read-only access to other variables
    positions.col(n) += newVelocity; //again elementwise
    velocities.col(n) = newVelocity;
  }

  return 0;
}

我的第一个想法是做这样的事情，但效率非常低：

#include <omp.h>
#include <math.h>
#include <armadillo>

using namespace std;

int main(int argc, char** argv) {
  arma::uword dimensions = 10, particleCount = 40;
  //these two variables cannot be moved inside the parallel region :-/
  arma::Mat<double> positions = arma::ones(dimensions, particleCount);
  arma::Mat<double> velocities = arma::ones(dimensions, particleCount) * 2;

  #pragma omp parallel
  {
    arma::Mat<double> velocity_private = arma::zeros(dimensions,particleCount);
    #pragma omp for
    for(arma::uword n = 0; n < particleCount; n++) {
      arma::Col<double> newVelocity = std::rand() * velocities.col(n);
      velocity_private.col(n) = newVelocity;
    }
    #pragma omp single
    {
      //first part of workaround for '='
      velocities = arma::zeros(dimensions, particleCount);
    }
    #pragma omp critical
    {
      for(arma::uword n = 0; n < particleCount; n++) {
        positions.col(n) += velocity_private.col(n);
        //second part of workaround for '='
        velocities.col(n) += velocity_private.col(n); 
      }
    }
  }//end omp parallel

  return 0;
}

我考虑过使用用户定义的缩减，但我没有找到任何分配示例。所有这些都是添加或乘法的地方，不幸的是不太容易接近。

任何建议或建议表示赞赏！：）

并行分配到共享变量

0 个答案: