Question

我在嵌套循环上使用openMp，其工作原理如下

#pragma omp parallel shared(vector1) private(i,j)
{
#pragma omp for schedule(dynamic)
    for (i = 0; i < vector1.size(); ++i){

       //some code here 

       for (j = 0; j < vector1.size(); ++j){

           //some other code goes here
       #pragma omp critical
       A+=B;
       }
     C +=A;
    }
}

这里的问题是我的代码在代码的A+=B部分进行了大量的计算。因此，通过使其成为关键，我没有达到我想要的加速。（实际上似乎有一些开销，因为我的程序需要更长的时间才能执行，然后顺序编写）。

我尝试使用

#pragma omp reduction private(B) reduction(+:A)
    A+=B

这加快了执行时间但是它似乎没有处理像critical子句这样的竞争条件，因为我没有得到相同的A结果。

我可以尝试替代吗？

Answer 1

除非您想解决使Vector3类线程安全或重写您的操作以便与std::atomic<Vector3>一起使用的麻烦，否则两者都会遇到性能缺陷（尽管不是如同使用临界区一样严重，你实际上可以模仿behaviour of OpenMP reduction：

#pragma omp parallel // no need to declare variables declared outside/inside as shared/private
{

    Vector3 A{}, LocalC{}; // both thread-private

    #pragma omp for schedule(dynamic)
    for (i = 0; i < vector1.size(); ++i){

       //some code here 

       for (j = 0; j < vector1.size(); ++j){

           //some other code goes here

           A += B; // does not need a barrier
       }
       LocalC += A; // does not need a barrier
    }

    #pragma omp critical
    C += LocalC;
}

请注意，这假设您无法访问A以便在您的某些代码中阅读＆＃34;某些代码＆＃34;评论，但如果您曾想过使用reduction条款，那么您也不应该这样做。

替换#pragma omp critical（C ++）

1 个答案: