Question

对于使用Titan GPU（compute_35,sm_35）的计算机，我使用CMakeLists.txt中的这一行编译了一些代码：

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_35,code=sm_35)

代码编译并运行良好。

我想查看此代码对使用GTS 450（compute_20,sm_21）的朋友造成的编译问题。所以，我将上面的一行改为：

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_20,code=sm_21)

代码使用Titan在我的计算机上编译时没有任何错误。但是当我运行它（再次在我的Titan计算机上）时，它在thrust::copy调用后失败并出现以下错误：

$ ./foobar
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  invalid device function 
"foobar" terminated by signal SIGABRT (Abort)

Google表示上述错误是由GPU架构不匹配引起的。

最奇怪的部分是使用上面的行（arch=compute_20,code=sm_21），代码在我朋友的GTS 450计算机上编译并运行没有错误！除GPU外，她的Ubuntu 12.04，gcc和CUDA SDK 5.5版本与我的相同。

这是导致此错误的真正原因吗？为什么Titan不能运行compute_20代码？是不是CUDA GPU应该向后兼容PTX或SASS代码？即使不是，为什么驱动程序JIT不能将compute_20 PTX编译为sm_35的SASS？

Answer 1

如果您指定：

-gencode arch=compute_20,code=compute_20

你的代码应该在任何GPU上运行（通过JIT）。

根据nvcc manual，当您为code开关指定虚拟架构时，将直接启用JIT。您可以在一个命令中创建多个规范：

-arch=compute_20 -code=compute20,sm_21,sm_35

（请注意，这取代了指定-gencode ...）

允许来自sm_20 PTX的JIT，以及直接在cc2.1或cc3.5设备上执行非JIT。

CUDA：为什么在compute_35设备上compute_20代码失败？

1 个答案: