Question

我想根据条件在opencl内核中输入一些值到输出数组中。所以我希望在每个值输入到数组后增加数组的索引。由于需要满足条件，因此输出数组索引是未知的。我使用输出数组索引作为参数：

__kernel void match(__global float* input, __global float* output, int pos)
{ 
     if(condition)
     {
          output[pos] = input[get_global_id(0)];
          atomic_inc(&pos); // this is where the problem lies
     }
}

我也尝试将pos作为数组

__kernel void match(__global float* input, __global float* output, __global int* pos)
{ 
     if(condition)
     {
          output[pos[0]] = input[get_global_id(0)];
          atomic_inc(pos[0]); // this is where the problem lies
     }
}

对于这两种情况，clBuildProgram返回错误代码-11。当我增加值pos ++但它没有返回数组位置的任何最终值时，它工作。

任何人都可以解释我做错了什么吗？

Answer 1

不确定我是否理解这个问题，但让我们试一试：

input中的每个元素都分配了一个帖子吗？如果是这样，input将使用内核中的index[get_global_id(0)]进行索引，假设（很大的假设）您正在使用一维数组并调用clEnqueuNDRangeKernel()，其全局工作大小类似于size_t Global_Work_Size[1] = {input_size}

当使用int pos调用类似于第一个示例的内核时，这会在每个线程中放置pos的常量，因此在我解释您的问题时它将无效。

如果内核索引没有以简单的方式映射，则需要动态计算索引，或者需要输入另一个数组，该数组是映射{{1}的索引的查找表（LUT） } input。

最后，您可以使用clGetProgramBuildInfo来确切了解错误。 See the write-up I did in another thread

Answer 2

您不能直接使用使用atomic_inc增加的变量的值，否则您将具有竞争条件。 atomic_inc的documentation提到它在增量之前返回旧值，如果每个线程以这种方式使用它，它们将各自获得唯一值。所以使用它的正确方法是：

int localPos = atomic_inc(&pos);
output[localPos] = input[get_global_id(0)];

“pos”可以是全球的或本地的，但它似乎应该是全球性的。

OpenCL - 内核增量数组的索引

2 个答案: