Question

从CUDA内核编译引用int常量并正常工作，但引用float常量会产生编译错误。

以下是使用CUDA 6.5在Windows 7 x64上使用Visual Studio 2013编译的最小代码：

#include "cuda_runtime.h"
#include <stdio.h>

const int foo = 42;
const float bar = 42.0f;

__global__ void kernel(void)
{
  printf("foo = %d\n", foo);  // OK
  printf("bar = %f\n", bar);  // error : identifier "bar" is undefined in device code
}

int main()
{
  kernel <<<1, 1 >>>();
  cudaDeviceSynchronize();
  return 0;
}

编译器输出：

error : identifier "bar" is undefined in device code

注意：如果第二行printf行被注释掉，代码将编译并生成42的预期输出。为清楚起见，跳过了运行时错误检查，因为这个问题与编译时错误有关。

我知道我可以使用CUDA __constant__内存来实现类似的目标，但我仍然想了解导致此情景中int和float之间存在差异的原因，以及是否存在是一种使用float常量可编译来生成内核代码的方法。

在CUDA内核中使用常量：使用int，无法使用float编译

0 个答案: