Question

这可能是与Linker errors 2005 and 1169 (multiply defined symbols) when using CUDA __device__ functions (should be inline by default)类似的问题，但并非完全如此。在VS2010上尝试构建项目（使用已在其他地方工作的代码）时，我收到了几个LNK2005错误。我的智慧结束了。

例如，我有以下三个文件：transposeGPU.h，transposeGPU.cu和transposeCUDA.cu。 transposeGPU.h可归纳如下：

void transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height);

，即没有任何包含的单一声明。该函数的定义可在transposeGPU.cu中找到，可归纳如下：

#include <stdio.h>
#include "../transposeGPU.h"
#include "../helper_funcs.h"

#include "transposeCUDA.cu"

void
transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height)
{
    // execution configuration parameters
    dim3 threads(16, 16);
    dim3 grid(iDivUp(width, 16), iDivUp(height, 16));
    size_t shared_mem_size =
        (threads.x * threads.y + (threads.y - 1)) * sizeof(float);

    transposeCUDA<<<grid, threads, shared_mem_size>>>(
        d_dst, dst_pitch / sizeof(float),
        d_src, src_pitch / sizeof(float),
        width, height);
}

，tranposeGPU.cu除了定义transposeCUDA.cu和调用transposeGPU()之外，还包括其标题文件和transposeCUDA()，后者在transposeCUDA.cu中找到。现在，transposeCUDA.cu按预期定义了函数：

#include "common_kernel.h"

__global__ void
transposeCUDA(
    float *g_dst, size_t s_dst_pitch,
    const float *g_src, size_t s_src_pitch,
    unsigned int img_width, unsigned int img_height)
{
// several lines of code...
}

这一切看起来都是有序的，但我仍然在error LNK2005: "void __cdecl __device_stub__Z13transposeCUDAPfjPKfjjj(float *,unsigned int,float const *,unsigned int,unsigned int,unsigned int)" (?__device_stub__Z13transposeCUDAPfjPKfjjj@@YAXPAMIPBMIII@Z) already defined in transposeCUDA.obj中得到transposeGPU.obj。

那和其他二十个类似的链接器错误。为什么？没有明显的重新定义发生。任何帮助将不胜感激。

Answer 1

如果要编译transposeCUDA.cu和transposeGPU.cu，则会发生重新定义，因为定义出现在两个翻译单元中。你不应该#include transposeCUDA.cu并将nvcc应用于该文件。

Answer 2

澄清：__device__函数是内联的（至少在pre-Fermi之前），但__global__不是 - 毕竟，你不能将GPU代码内联到你的CPU可执行函数中。全局函数可以使用它们的地址，唯一的区别是地址指向GPU内存（类似于存储在GPU上的数据的正常指针看起来就像普通的指针）。

正如William Pursell所说，如果你编译你的全局函数两次，你会得到两个具有相同定义的函数，导致链接器错误。

CUDA和链接器错误

2 个答案: