Question

我正在尝试编译一些函数以在主机代码和设备cuda代码中使用它们，但出现多定义链接错误。我试图实现的目标如下：

我有一个CudaConfig.h文件，其中包含以下内容

CudaConfig.h

#ifdef __CUDACC__
#define CUDA_CALLABLE_DEVICE __device__
#define CUDA_CALLABLE_HOST __host__
#define CUDA_CALLABLE __host__ __device__
#else
#define CUDA_CALLABLE_DEVICE
#define CUDA_CALLABLE_HOST
#define CUDA_CALLABLE
#endif

在foo.h文件中，我具有一些具有以下签名的功能

#include "CudaConfig.h"
struct Bar {Eigen::Vector3d v;};
CUDA_CALLABLE_DEVICE Eigen::Vector3d &foo(Bar &aBar);

然后我在foo.cpp和foo.cu文件中实现它们。

foo.cpp

#include "foo.h"

Eigen::Vector3d &foo(Bar &aBar) {aBar.v += {1,1,1}; return aBar.v;}

foo.cu

#include "foo.h"

Eigen::Vector3d &foo(Bar &aBar) {aBar.v += {1,1,1}; return aBar.v;}

由于在使用__device__函数使用Eigen禁用某些SIMD操作时，我需要将这两种实现分开放在不同的文件中，因此出于性能原因，我不想在foo.cu文件中同时实现这两种实现。

我应该直接在.h文件中实现该功能，将它们标记为内联，这样我就没有多定义链接错误吗？由于Eigen为__device__代码禁用了SIMD，这是否会使__host__和__device__函数与内联的预期有所不同？

Answer 1

这是正在发生的事情

rthoni@rthoni-lt1:~/projects/nvidia/test_device_host$ cat test.cu
extern "C" {
__device__ void test_device_fn()
{
}
}
rthoni@rthoni-lt1:~/projects/nvidia/test_device_host$ nvcc test.cu -c -o test_cu.o
rthoni@rthoni-lt1:~/projects/nvidia/test_device_host$ objdump -t test_cu.o 

test_cu.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 tmpxft_000004d9_00000000-5_test.cudafe1.cpp
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l     O .bss   0000000000000001 _ZL22__nv_inited_managed_rt
0000000000000008 l     O .bss   0000000000000008 _ZL32__nv_fatbinhandle_for_managed_rt
0000000000000000 l     F .text  0000000000000016 _ZL37__nv_save_fatbinhandle_for_managed_rtPPv
0000000000000010 l     O .bss   0000000000000008 _ZZL22____nv_dummy_param_refPvE5__ref
000000000000002f l     F .text  0000000000000016 _ZL22____nv_dummy_param_refPv
0000000000000000 l    d  __nv_module_id 0000000000000000 __nv_module_id
0000000000000000 l     O __nv_module_id 000000000000000f _ZL15__module_id_str
0000000000000018 l     O .bss   0000000000000008 _ZL20__cudaFatCubinHandle
0000000000000045 l     F .text  0000000000000022 _ZL26__cudaUnregisterBinaryUtilv
0000000000000067 l     F .text  000000000000001a _ZL32__nv_init_managed_rt_with_modulePPv
0000000000000000 l    d  .nv_fatbin 0000000000000000 .nv_fatbin
0000000000000000 l       .nv_fatbin 0000000000000000 fatbinData
0000000000000000 l    d  .nvFatBinSegment   0000000000000000 .nvFatBinSegment
0000000000000000 l     O .nvFatBinSegment   0000000000000018 _ZL15__fatDeviceText
0000000000000020 l     O .bss   0000000000000008 _ZZL31__nv_cudaEntityRegisterCallbackPPvE5__ref
0000000000000081 l     F .text  0000000000000026 _ZL31__nv_cudaEntityRegisterCallbackPPv
00000000000000a7 l     F .text  0000000000000045 _ZL24__sti____cudaRegisterAllv
0000000000000000 l    d  .init_array    0000000000000000 .init_array
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000016 g     F .text  0000000000000019 test_device_fn
0000000000000000         *UND*  0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000000000         *UND*  0000000000000000 exit
0000000000000000         *UND*  0000000000000000 __cudaUnregisterFatBinary
0000000000000000         *UND*  0000000000000000 __cudaInitModule
0000000000000000         *UND*  0000000000000000 __cudaRegisterFatBinary
0000000000000000         *UND*  0000000000000000 atexit

如您所见，即使该功能仅被标记为__device__，nvcc仍会在目标文件中为其生成符号。

此行为是nvcc的错误。（我们的错误跟踪器中的＃845649）

有3种方法可以消除此错误：

让nvcc生成设备和主机代码
更改编译cu文件的方式以仅生成设备代码
将您的__device__函数包装在一个空的命名空间中

Answer 2

在您的特定情况下，您似乎可以将其设置为constexpr未经修饰的函数：

constexpr Eigen::Vector3d &foo(Bar &aBar) noexcept {aBar.v += {1,1,1}; return aBar.v;}

并使用nvcc调用--expt-relaxed-constexpr：

--expt-relaxed-constexpr                   (-expt-relaxed-constexpr)       
        Experimental flag: Allow host code to invoke __device__ constexpr functions,
        and device code to invoke __host__ constexpr functions.Note that the behavior
        of this flag may change in future compiler releases.

这应该适用于设备和主机代码。

CUDA设备功能的多种定义

2 个答案: