我可以在cuda compute capability 2.0卡上使用多少个网格尺寸?

时间:2011-07-29 10:27:54

标签: cuda

我想使用3D网格进行cuda计算。这个页面[1]或这个答案[2]说我可以使用三个维度,但查询我的设备属性给了我以下内容:

   --- General Information for device 0 ---
Name:  Quadro 4000
Compute capability:  2.0
Clock rate:  950000
Device copy overlap:  Enabled
Kernel execution timeout :  Enabled
   --- Memory Information for device 0 ---
Total global mem:  2146631680
Total constant Mem:  65536
Max mem pitch:  2147483647
Texture Alignment:  512
   --- MP Information for device 0 ---
Multiprocessor count:  8
Shared mem per mp:  49152
Registers per mp:  32768
Threads in warp:  32
Max threads per block:  1024
Max thread dimensions:  (1024, 1024, 64)
Max grid dimensions:  (65535, 65535, 1)

如果我尝试在我的代码中使用3D网格,则不会发生任何事情:

__global__ void updateBuffer( ... )
{
  int x = blockIdx.x;
  int y = blockIdx.y;
  int z = threadIdx.x;

  int offset =
      x +
      y * width +
      z * width * height;

  buffer[offset] = ...;
}

__global__ void updateBuffer2( ... )
{
  int x = blockIdx.x;
  int y = blockIdx.y;
  int z = blockIdx.z;

  int offset =
      x +
      y * width +
      z * width * height;

  buffer[offset] = ...;
}

void callKerner() {
  dim3 blocks(extW,extH,1);
  dim3 threads(extD,1,1);

  dim3 blocks2(extW,extH,extD);
  dim3 threads2(1,1,1);


  updateBuffer<<<blocks,threads>>>( ... ); // works fine
  updateBuffer2<<<blocks2,threads2>>>( ... ); // nothing happens
}

那么3d网格不适用于某些卡吗?

[1] http://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications [2] Maximum blocks per grid:CUDA

2 个答案:

答案 0 :(得分:2)

我通过安装最新的nvidia驱动程序并更新到cuda 4.0来修复它

答案 1 :(得分:1)

这一行有一个线索:

Max grid dimensions:  (65535, 65535, 1)