Question

我想知道是否可以启动cuda内核，以便可以在运行时提及网格/块大小，而不是像往常一样提及编译时间。

对此的任何帮助都非常宝贵。

Answer 1

在CUDA应用程序中，为网格指定固定大小永远不会非常有用。大多数时间块大小是固定的，网格大小保持动态，并根据输入数据大小进行更改。请考虑以下矢量添加示例。

__global__ void kernel(float* a, float* b, float* c, int length)
{
    int tid = blockIdx.x * blockDim.x + threadIdx.x;

    //Bound checks inside the kernel
    if(tid<length)
       c[tid] = a[tid] + b[tid];
}

int addVectors(float* a, float* b, float* c, int length)
{
   //a, b, c are allocated on the device

   //Fix the block size to an appropriate value
   dim3 block(128);

   dim3 grid;
   grid.x = (length + block.x - 1)/block.x;

   //Grid size is dependent on the length of the vector. 
   //Total number of threads are rounded up to the nearest multiple of block size.
   //It means total number of threads are at least equal to the length of the vector.

   kernel<<<grid,block>>>(a,b,c,length);

   return 0;
}

Answer 2

Cuda内核和设备函数可以使用blockDim。{x,y,z}来访问块配置以及gridDim。{x,y,z}来访问网格配置。如果你有一个内核/设备功能可以处理各种配置而不是你需要做的就是使用你在运行时选择的myKernel<<<dimGrid,dimBlock>>>或dimGrid启动内核（dimBlock）时间。我认为这根本不常见。

是否可以在运行时定义具有gridsize / block大小的cuda内核？

2 个答案: