Question

我在GPU上运行循环，以便在每次迭代后，检查收敛条件是否满足。如果是，我退出while循环。

__device__ int converged = 0; // this line before the kernel

内核：

__global__ convergence_kernel()
{
   if (convergence condition is true)
   {
      atomicAdd(&converged, 1);
   }
}

在CPU上我在循环中调用内核：

int *convc = (int*) calloc(1,sizeof(int));
//converged = 0; //commenting as this is not correct as per Robert's suggestion
while(convc[0]< 1)
{
    foo_bar1<<<num_blocks, threads>>>(err, count);
    cudaDeviceSynchronize();
    count += 1;

    cudaMemcpyFromSymbol(convc, converged, sizeof(int));
}

所以在这里，如果条件为真，我的convc [0] = 1，但是，当我打印这个值时，我总是看到一个随机值，例如。 conv = 3104，conv = 17280，conv = 17408等。

有人可以告诉我cudaMemcpyFromSymbol手术中缺少什么吗？我错过了什么？提前谢谢。

Answer 1

我最好的猜测是，当您将converged值读入convc时，为什么会收到垃圾，这是因为您尚未在任何地方初始化converged。它不能像这样在主机代码中完成：

converged = 0;

您可以将声明更改为：

__device__ int converged = 0; // this line before the kernel

或者您也可以使用cudaMemcpyToSymbol function，这实际上与您似乎已经注意到的cudaMemcpyFromSymbol函数相反。

多次调用CUDA内核时cudaMemcpyFromSymbol错误

1 个答案: