每个CUDA块都具有相同的blockIdx.z

时间:2017-02-26 20:46:29

标签: cuda

外部内核:

dim3 block(32, 32, 1);
printf("rows = %u\n", rows);
dim3 grid(8, 8, rows);
forward_step1<<<block, grid>>>(weight_D, a_D, res1_D, columns);

内核:

unsigned int tid = blockDim.x*threadIdx.y + threadIdx.x;
unsigned int i = blockIdx.z;
unsigned int j = (gridDim.x*blockIdx.y+blockIdx.x)*blockDim.x*blockDim.y + tid;
if (j==0) printf("%u\n", i);

结果:

  

行= 3
  0
  0
  0

1 个答案:

答案 0 :(得分:2)

内核调用的语法是:

kernel<<<grid_size, block_size>>>(arguments)

您好像已经交换了grid_sizeblock_size个参数。您的网格大小为(32, 32, 1),块大小为(8, 8, rows)