Question

#pragma omp parallel for    // I want reduction but overloading doesn't work on the version used
for (int i = 0; i <500; i++)
    #pragma omp critical
    for (j=i; j < 102342; j++)
    {
        Output[j] += staticConstant[i] * data[j-i];
    }
}

有没有办法减少在这里工作？制作本地私人副本并不能加快速度。

Answer 1

在这种情况下，你最好的选择是交换循环

#pragma omp parallel for
for (j=0; j < 102342; j++)
   for (int i = 0; i <= min(j,499); i++)
        Output[j] += staticConstant[i] * data[j-i];

另一个（但次优）选项是使用原子

#pragma omp parallel for
for (int i = 0; i <500; i++)
   for (j=i; j < 102342; j++)
    {
      #pragma omp atomic
      Output[j] += staticConstant[i] * data[j-i];
    }

Answer 2

忘记在这种情况下使用减少，与并行化的效果相比，它会产生的开销（减少变量中的100000多个元素）可能会杀死大部分增益。坚持简单的并行化结构。

理想情况下，您需要的是并行化j循环，因为迭代之间不存在依赖关系。所以你可以这样做：

#pragma omp parallel
for ( int i = 0; i < 500; i++ ) {
    #pragma omp for
    for ( int j = i; j < 102342; j++ ) {
        Output[j] += staticConstant[i] * data[j-i];
    }
}

对于像这个简单的代码，这应该就够了。

现在，您可能希望更进一步，尝试交换i和j循环以提高并行化效果（不确定它会产生很大的不同）。为此，您需要在i循环初始化中删除对j的依赖关系。这是一种方法：

// first let's do the dependent iterations in j
for ( int i = 0; i < 500; i++ ) {
    for ( int j = i; j < 500; j++ ) {
        Output[j] += staticConstant[i] * data[j-i];
    }
}
// then all the other iterations, and swap the i and j loops
// now we can parallelize no problem
#pragma omp parallel for
for ( int j = 500; j < 102342; j++ ) {
    for ( int i = 0; i < 500; i++ ) {
        Output[j] += staticConstant[i] * data[j-i];
    }
}

旧版本上的OpenMP减少而不会超载

2 个答案: