Question

在随CUDA 6.0提供的示例中，我运行以下带有错误输出的编译命令：

foo@foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc cdpSimpleQuicksort.cu
nvcc warning : The 'compute_10' and 'sm_10' architectures are deprecated, and may be removed in a future release.
cdpSimpleQuicksort.cu(105): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above

cdpSimpleQuicksort.cu(114): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above

2 errors detected in the compilation of "/tmp/tmpxft_0000241a_00000000-6_cdpSimpleQuicksort.cpp1.ii".

然后我将命令更改为此，发生了新的故障：

foo@foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc -gencode arch=compute_35,code=sm_35 cdpSimpleQuicksort.cu
cdpSimpleQuicksort.cu(105): error: kernel launch from __device__ or __global__ functions requires separate compilation mode

cdpSimpleQuicksort.cu(114): error: kernel launch from __device__ or __global__ functions requires separate compilation mode

2 errors detected in the compilation of "/tmp/tmpxft_000024f3_00000000-6_cdpSimpleQuicksort.cpp1.ii".

这与我所使用的机器仅具备Compute 2.1并且构建工具阻止我这一事实有关吗？分辨率是什么......我没有在文档中找到明确处理此错误的任何内容。

我查看了this问题，而且...文档链接根本没有帮助。我需要知道如何修改编译命令。

Answer 1

查看cdpSimpleQuicksort项目附带的makefile。它显示了编译它所需的一些额外的开关，因为CUDA动态并行（这实际上是你看到的第二组错误。）返回并研究makefile，看看你是否可以弄清楚如何组合一些使用--cubin编译命令。

读者摘要版本是应该编译而不会出错：

nvcc --cubin -rdc=true -I../../common/inc -arch=sm_35 cdpSimpleQuicksort.cu

说了这么多，你应该可以编译你想要的任何类型的目标，但是你将无法在cc2.1架构上运行cdp代码。

cdp documentation 和here

如何为更高的计算版本强制生成cubin文件

1 个答案: