Question

我有一个错误的global_id()结果问题。我想将尺寸为{35,35,35}的3D体素与尺寸为{5,5,5}的3D内核进行卷积。因此，我将global_size = {35,35,35}和local size = { 5, 5, 5}称为“ clEnqueueNDRangeKernel”

std::vector<size_t> local_nd  = { 5, 5, 5 };
std::vector<size_t> global_nd = { 35, 35, 35 };
err = clEnqueueNDRangeKernel( queue, hello_kernel, work_dim, NULL, global_nd.data(), local_nd.data(), 0, NULL, NULL);

当我调用get_global_id()函数时，我期望的是 global_id(0)应该在0到34之间 global_id(1)应该介于0到34之间和global_id(2)应该在0到34之间。

但是对于global_id(0) and global_id(1)，结果似乎是正确的。但是global_id(2)的值范围是30-34，而不是我期望的是0到34。

const int  ic0     =  get_global_id(0);  // icol
const int  ic1     =  get_global_id(1);  // irow  
const int  ic2     =  get_global_id(2);  // idep 


printf(" %d %d %d\n", ic0, ic1, ic2 ); 
// value of ic0 = [0  -> 34] correct!
// value of ic1 = [0  -> 34] correct!
// value of ic2 = [30 -> 34]  ( SHOULD IT BE [0->34] )?

我的gpu是max-workgroup是max work-group项目ND：{1024，1024，64}

Answer 1

我发现了pmdj建议的问题。

printf in kernels isn't always reliable - there's often a fixed-size buffer, and if you output too much, some messages may be dropped.

在某些情况下更改了OpenCL代码后。例如：

if( ic2< 10 )
    printf("ic2: %d ", ic2 );

输出范围为[0-> 34，符合我的预期]

OpenCL get_global_id错误的结果

1 个答案: