第二个内核上的cuda“invalid argument”错误

时间:2012-12-21 17:45:02

标签: cuda

我遇到内核启动问题。我有一个使用一个大内核的程序。现在,由于同步问题,我需要将其拆分为两个。第一个内核执行一些初始化操作,并传递传递给第二个内核的参数的子集。只运行第一个内核工作正常。由于缺少初始化但内核本身已启动,因此在执行时仅运行第二个内核会失败。连续运行会导致第二个内核失败,并显示“无效参数”错误。如有必要,我会提供代码,但现在无法弄清楚它可能会有什么帮助。提前致谢。

编辑: 这里是请求的启动代码:

void DeviceManager::integrate(){
  assert(hostArgs->neighborhoodsSize > 0);
  size_t maxBlockSize;
  size_t blocks;
  size_t threadsPerBlock;
  // init patch kernel
  maxBlockSize = 64;
  blocks = (hostArgs->patchesSize /maxBlockSize);
  if(0 != hostArgs->patchesSize % maxBlockSize){
    blocks++;
  }
  threadsPerBlock = maxBlockSize;
  std::cout << "blocks: " << blocks << ", threadsPerBlock: " << threadsPerBlock << std::endl;
  initPatchKernel<CUDA_MAX_SPACE_DIMENSION><<<blocks,threadsPerBlock>>>(devicePatches, hostArgs->patchesSize);
  cudaDeviceSynchronize();

  //calc kernel
  maxBlockSize = 64;
  blocks = (hostArgs->neighborhoodsSize /maxBlockSize);
  if(0 != hostArgs->neighborhoodsSize % maxBlockSize){
    blocks++;
  }
  threadsPerBlock = maxBlockSize;
  size_t maxHeapSize = hostArgs->patchesSize * (sizeof(LegendreSpace) + sizeof(LinearSpline)) + hostArgs->neighborhoodsSize * (sizeof(ReactionDiffusionCCLinearForm) + sizeof(ReactionDiffusionCCBiLinearForm));
  std::cout << "maxHeapSize: " << maxHeapSize << std::endl;
  cudaDeviceSetLimit(cudaLimitMallocHeapSize, maxHeapSize);
  std::cout << "blocks: " << blocks << ", threadsPerBlock: " << threadsPerBlock << std::endl;
  integrateKernel<CUDA_MAX_SPACE_DIMENSION><<<blocks,threadsPerBlock>>>(deviceNeighborhoods, hostArgs->neighborhoodsSize, devicePatches, hostArgs->patchesSize, hostArgs->biLinearForms, hostArgs->linearForms, deviceRes);
  cudaDeviceSynchronize();
}

内存传输和分配应该不是问题,因为它只使用一个内核时有效。

编辑2: 我通过包装函数在调试模式下构建每个内核调用后检查错误。因此,在每次内核调用之后执行以下操作:

cudaError_t cuda_result_code = cudaGetLastError();                        
if (cuda_result_code!=cudaSuccess) {                                      
   fprintf("message: %s\n",cudaGetErrorString(cuda_result_code));
}

很抱歉没有提到这个,包装器不是我的抱歉不要粘贴这个技巧。 失败前的输出如下:

blocks: 1, threadsPerBlock: 64
maxHeapSize: 4480
blocks: 1, threadsPerBlock: 64
message: invalid argument

1 个答案:

答案 0 :(得分:2)

cudaDeviceSetLimit

cudaLimitMallocHeapSize 控制malloc()和free()设备系统调用使用的堆的大小(以字节为单位)。设置cudaLimitMallocHeapSize必须在启动使用malloc()或free()设备系统调用的任何内核之前执行,否则将返回 cudaErrorInvalidValue 。此限制仅适用于计算能力为2.0或更高的设备。尝试在计算能力小于2.0的设备上设置此限制将导致返回错误cudaErrorUnsupportedLimit。