Question

问题：

有了.h，我想在编译c / c ++或cuda时将real定义为double，计算能力＆gt; = 1.3。如果编译具有计算能力的cuda＆lt; 1.3然后将real定义为float。

经过几个小时，我来到这里（这不起作用）

#   if defined(__CUDACC__)

#       warning * making definitions for cuda

#       if defined(__CUDA_ARCH__)
#           warning __CUDA_ARCH__ is defined
#       else
#           warning __CUDA_ARCH__ is NOT defined
#       endif

#       if (__CUDA_ARCH__ >= 130)
#                       define real double
#                       warning using double in cuda
#       elif (__CUDA_ARCH__ >= 0)
#               define real float
#               warning using float in cuda
#               warning how the hell is this printed when __CUDA_ARCH__ is not defined?
#       else
#               define real 
#               error what the hell is the value of __CUDA_ARCH__ and how can I print it
#       endif

#   else
#       warning * making definitions for c/c++
#       define real double
#       warning using double for c/c++
#   endif

编译时（注意-arch标志）

nvcc -arch compute_13  -Ilibcutil testFloatDouble.cu

我得到了

* making definitions for cuda
__CUDA_ARCH__ is defined
using double in cuda

* making definitions for cuda
warning __CUDA_ARCH__ is NOT defined
warning using float in cuda
how the hell is this printed if __CUDA_ARCH__ is not defined now?

Undefined symbols for architecture i386:
  "myKernel(float*, int)", referenced from: ....

我知道文件会被nvcc编译两次。第一个是OK（ CUDACC 已定义且 CUDA_ARCH ＆gt; = 130）但第二次会发生什么？ CUDA_DEFINED 但 CUDA_ARCH 未定义或值为＆lt; 130？为什么？

感谢您的时间。

Answer 1

似乎你可能会混淆两件事 - 如何在nvcc处理CUDA代码时区分主机和设备编译轨迹，以及如何区分CUDA和非CUDA代码。两者之间存在细微差别。 __CUDA_ARCH__回答第一个问题，__CUDACC__回答第二个问题。

请考虑以下代码段：

#ifdef __CUDACC__
#warning using nvcc

template <typename T>
__global__ void add(T *x, T *y, T *z)
{
    int idx = threadIdx.x + blockDim.x * blockIdx.x;

    z[idx] = x[idx] + y[idx];
}

#ifdef __CUDA_ARCH__
#warning device code trajectory
#if __CUDA_ARCH__ > 120
#warning compiling with double precision
template void add<double>(double *, double *, double *);
#else
#warning compiling with single precision
template void add<float>(float *, float *, float *);
#else
#warning nvcc host code trajectory
#endif
#else
#warning non-nvcc code trajectory
#endif

这里我们有一个模板化的CUDA内核，它具有依赖于CUDA体系结构的实例化，一个由nvcc驱动的主机代码的单独节，以及一个用于编译未由nvcc引导的主机代码的节。其行为如下：

$ ln -s cudaarch.cu cudaarch.cc
$ gcc -c cudaarch.cc -o cudaarch.o
cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory

$ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:19:2: warning: #warning compiling with single precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11'
ptxas info    : Used 4 registers, 12+16 bytes smem

$ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:16:2: warning: #warning compiling with double precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20'
ptxas info    : Used 8 registers, 44 bytes cmem[0]

这一点是：

__CUDACC__定义nvcc是否正在转向编译
__CUDA_ARCH__ 始终未定义，由nvcc引导
__CUDA_ARCH__仅针对由nvcc

这三条信息总是足以将设备代码条件编译到不同的CUDA架构，主机端CUDA代码以及根本不由nvcc编译的代码。 nvcc文档有时候有点简洁，但所有这些都在关于编译轨迹的讨论中有所涉及。

Answer 2

目前我看到的唯一实用解决方案是使用自定义定义：


#   if (!defined(__CUDACC__) ||  defined(USE_DOUBLE_IN_CUDA)) 
#       define real double
#       warning defining double for cuda or c/c++
#   else
#       define real float
#       warning defining float for cuda
#   endif

然后

nvcc -DUSE_DOUBLE_IN_CUDA -arch compute_13  -Ilibcutil testFloatDouble.cu

因为它输出两个编辑：

#warning defining double for cuda or c/c++
#warning defining double for cuda or c/c++

和

nvcc  -Ilibcutil testFloatDouble.cu

确实

#warning defining float for cuda
#warning defining float for cuda

CUDA和nvcc：使用预处理器在float或double之间进行选择

2 个答案: