AMP C ++计算数组的最大值

时间:2017-01-13 14:27:15

标签: multithreading parallel-processing fold c++-amp

我想比较两个三维数组与C ++ AMP之间的最大绝对差值。

使用OpenMP很容易。考虑2个阵列

float*** u, ***uOld;

代码是:

double residual = 0;
#pragma omp parallel for schedule(static) collapse(3) reduction(max : residual)
for (int i = 0; i < nbX; i++)
    for (int j = 0; j < nbY; j++)
        for (int k = 0; k < nbTheta; k++)
            residual = std::max(residual, fabs(u[i][j][k] - uOld[i][j][k]));

使用AMP Algorithms的max_element会很容易,但是没有实现。我想到类似的东西,但外循环级别需要减少:

extent<1> extTheta(nbTheta);
parallel_for_each(extTheta, [=, &u_, &uOld_](index<1> iTheta) restrict(amp)
{
    type residual = 0;
    for (int iX = 0; iX < nbX; iX++)
    for (int iY = 0; iY < nbY; iY++)
    residual = fast_math::fmax(residual, fast_math::fabs(u_[iX][iY][iTheta] - uOld_[iX][iY][iTheta]));
})

数据在GPU上,出于效率原因,我不希望它在GPU上传输。如何有效地做到这一点?

1 个答案:

答案 0 :(得分:0)

这是一个灵感来自msdn博客的解决方案:https://blogs.msdn.microsoft.com/nativeconcurrency/2012/03/08/parallel-reduction-using-c-amp

parallel_for_each(extent<3>(nbTheta, nbX, nbY), [=, &u_, &uOld_](index<3> idx) restrict(amp)
{
    uOld_[idx[0]][idx[1]][idx[2]] = abs1(u_[idx[0]][idx[1]][idx[2]] - uOld_[idx[0]][idx[1]][idx[2]]);
});


array_view<float, 1> residualReduce_ =  uOld_.view_as<1>(extent<1>(nbTheta*nbX*nbY));
array_view<float, 1> residual_ = residualReduce_.section(index<1>(0), extent<1>(1));
for (unsigned shift = nbTheta*nbX*nbY / 2; shift > 0; shift /= 2)
{
    parallel_for_each(extent<1>(shift), [=](index<1> idx) restrict(amp)
    {
        residualReduce_[idx[0]] = fast_math::fmax(residualReduce_[idx[0]], residualReduce_[idx[0] + shift]);
        if (shift % 2){ //If odd, each thread includes a shifted entry. One will match the end of the queue
            residualReduce_[idx[0]] = fast_math::fmax(residualReduce_[idx[0]], residualReduce_[idx[0] + shift + 1]);
        }
    });
}
concurrency::copy(residual_, &residual);
parallel_for_each(extent<3>(nbTheta, nbX, nbY), [=, &u_, &uOld_](index<3> idx) restrict(amp)
{
    uOld_[idx[0]][idx[1]][idx[2]] = u_[idx[0]][idx[1]][idx[2]];
})

与问题中的代码段不同,解决方案包括将uOld更新为U。

减少效率不是最高,但与其余代码相比仍然很快。