Question

我跟着这个：Dynamically allocating memory inside __device/global__ CUDA kernel

但它仍然无法编译。

error : calling a host function("_malloc_dbg") from a __device__/__global__  
function("kernel") is not allowed

error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA  
\v4.1\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\"  
--use-local-env --cl-version 2010 -ccbin "c:\Program Files (x86)\Microsoft Visual  
Studio 10.0\VC\bin\x86_amd64" -I"..\..\..\Source\Include" -G0  --keep-dir   
"x64\Debug" -maxrregcount=0  --machine 64 --compile  -g  -Xcompiler "/EHsc /nologo 
/Od /Zi  /MDd " -o "x64\Debug\move.cu.obj"  "C:\Source\scene\move.cu"" exited with  
code 2. C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA  
4.1.targets     361 10

根据建议，我添加了#if __CUDA_ARCH__ >= 200并返回false。

还有什么问题？我正在使用GTX480。

编辑：我也有此警告：#warning C4005: '_malloca' : macro redefinition

Answer 1

我知道你解决了你的主要问题，但还有其他问题：

我添加了#if __CUDA_ARCH__ >= 200，它返回false。

CUDA代码至少编译两次。在一次编译过程中，CPU代码在另一次传递中生成设备代码。 __CUDA_ARCH__仅定义，用于设备代码生成。可以进行更多的编译过程并为多个体系结构生成GPU代码。 CPU的代码不会改变，但GPU会改变。

我怀疑您在生成CPU代码时正在测试#if __CUDA_ARCH__ >= 200。

在CUDA C中为VS2010编译计算能力2.x.

1 个答案: