Question

这是我的代码：

int threadNum = BLOCKDIM/8;
dim3 dimBlock(threadNum,threadNum);
int blocks1 = nWidth/threadNum + (nWidth%threadNum == 0 ? 0 : 1);
int blocks2 = nHeight/threadNum + (nHeight%threadNum == 0 ? 0 : 1);
dim3 dimGrid;
dimGrid.x = blocks1;
dimGrid.y = blocks2;

//  dim3 numThreads2(BLOCKDIM);
//  dim3 numBlocks2(numPixels/BLOCKDIM + (numPixels%BLOCKDIM == 0 ? 0 : 1) );
perform_scaling<<<dimGrid,dimBlock>>>(imageDevice,imageDevice_new,min,max,nWidth, nHeight);
cudaError_t err = cudaGetLastError();
cudasafe(err,"Kernel2");

这是我的第二个内核的执行，它在数据使用方面完全独立。 BLOCKDIM为512，nWidth and nHeight也为512，cudasafe只打印错误代码的相应字符串消息。代码的这一部分在内核调用之后发出配置错误。

什么可能会给出这个错误，任何想法？

Answer 1

这种类型的错误消息经常引用启动配置参数（在这种情况下，网格/线程块维度，在其他情况下也可以是共享内存等）。当你看到这样的消息时，在启动内核之前打印出你的实际配置参数是个好主意，看看你是否犯过任何错误。

你说BLOCKDIM = 512.你有threadNum = BLOCKDIM/8所以threadNum = 64.你的线程块配置是：

dim3 dimBlock(threadNum,threadNum);

因此，您要求启动64 x 64个线程的块，即每个块4096个线程。这对任何一代CUDA设备都无效。

Answer 2

只是添加到以前的答案中，您还可以找到代码中允许的最大线程数，因此它可以在其他设备中运行，而无需对您将使用的线程数进行硬编码：

struct cudaDeviceProp properties;
cudaGetDeviceProperties(&properties, device);
cout<<"using "<<properties.multiProcessorCount<<" multiprocessors"<<endl;
cout<<"max threads per processor: "<<properties.maxThreadsPerMultiProcessor<<endl;

CUDA内核调用的“无效配置参数”错误？

2 个答案: