Question

我正在转换算法以利用C ++ AMP提供的大规模加速。我所处的阶段是将for循环放入已知的parallel_for_each循环中。

通常这应该是一项简单的任务，但它看起来比我想象的要复杂得多。它是一个嵌套循环，我每次迭代使用4步递增：

for(int j = 0; j < height; j += 4, data += width * 4 * 4)
{
    for(int i = 0; i < width; i += 4)
    {

我遇到的麻烦就是索引的使用。我似乎无法找到一种方法将其正确地放入parallel_for_each循环中。使用等级2的索引是要走的路，但通过分支操纵它会损害性能增益。

我发现了一个类似的帖子：Controlling the index variables in C++ AMP。它还涉及索引操作，但增量方面不包括我的问题。

亲切的问候，

Forcecast

Answer 1

您应该将tile视为跨GPU分区工作的机制，而不是作为索引机制。当您发现限制自己使用4x4磁贴时，可能会导致您陷入性能瓶颈。

你不能只做以下事情：

auto compute_domain = concurrency::extent<2>(height / 4, width / 4);

parallel_for_each(accl_view, compute_domain, [=](index<2> idx) restrict(amp)
{
    int j = idx[0] * 4;
    int i = idx[1] * 4;

    // Your algorithm here...
}

C ++ AMP，用于循环到parallel_for_each循环

1 个答案: