OpenCL get_global_id错误的结果

时间:2018-10-16 09:42:07

标签: c++ gpu opencl

我有一个错误的global_id()结果问题。我想将尺寸为{35,35,35}的3D体素与尺寸为{5,5,5}的3D内核进行卷积。因此,我将global_size = {35,35,35}local size = { 5, 5, 5}称为“ clEnqueueNDRangeKernel”

std::vector<size_t> local_nd  = { 5, 5, 5 };
std::vector<size_t> global_nd = { 35, 35, 35 };
err = clEnqueueNDRangeKernel( queue, hello_kernel, work_dim, NULL,,, 0, NULL, NULL); 

当我调用get_global_id()函数时,我期望的是 global_id(0)应该在0到34之间 global_id(1)应该介于0到34之间 和global_id(2)应该在0到34之间。

但是对于global_id(0) and global_id(1),结果似乎是正确的。 但是global_id(2)的值范围是30-34,而不是 我期望的是0到34。

const int  ic0     =  get_global_id(0);  // icol
const int  ic1     =  get_global_id(1);  // irow  
const int  ic2     =  get_global_id(2);  // idep 

printf(" %d %d %d\n", ic0, ic1, ic2 ); 
// value of ic0 = [0  -> 34] correct!
// value of ic1 = [0  -> 34] correct!
// value of ic2 = [30 -> 34]  ( SHOULD IT BE [0->34] )?

我的gpu是max-workgroup是max work-group项目ND:{1024,1024,64}

1 个答案:

答案 0 :(得分:1)


printf in kernels isn't always reliable - there's often a fixed-size buffer, and if you output too much, some messages may be dropped.


if( ic2< 10 )
    printf("ic2: %d ", ic2 );

输出范围为[0-> 34,符合我的预期]