Question

这是我从udacity中的“并行计算简介”中得到的简短代码。此代码中的索引让我感到困惑。

__global__ void use_shared_memory_GPU(float *array)
{
    int i, index = threadIdx.x;
    float average, sum=0.0f;

    __shared__ float sh_arr[128];

    sh_arr[index] = array[index];

    __syncthreads();

    // Now it begins to confuse me
    for(i=0; i<index; i++) { sum += sh_arr[i]; }   // what is the index here?

    average = sum / (index + 1.0f);               // what is the index here?
                                                  // why add 1.0f?

    if(array[index] > average) {array[index] = average;}

}

index 被创建为每个线程的Id，我可以理解。但是在计算平均值时，索引将用作线程数。第一个索引用作数组的并行计算id，而第二个索引用作公共c。我在我的程序中重复这个过程，但结果不会重复。

索引背后的诀窍是什么？我在cuda-gdb中打印它，它只显示0.对此有详细解释吗？

添加一点。计算平均值时，为什么增加1.0f？

Answer 1

此代码计算前缀和。值数组的前缀和如下所示：

array:       1     2     4     3     5     7
prefix-sums: 1     3     7    10    15    22
averages:    1     2  2.33  2.25     3  3.67
index:       0     1     2     3     4     5

每个前缀sum是值array中直到该位置的元素之和。该代码还计算“平均值”，即前缀和除以用于计算总和的元素数。

在您显示的代码中，每个线程都在计算前缀和数组的不同元素（以及单独的平均值）。

因此，为了计算每个线程中的给定平均值，我们取前缀sum并除以索引，但是我们必须将1加到索引中，因为向索引添加1会给出用于计算的元素数量该线程的前缀 - 总和（和平均值）。

Cuda中threadId中的索引

1 个答案: