Question

我试图将我的代码分离一些，但有些东西失败了。编译错误：

error: calling a __host__ function("DecoupledCallGpu") from a __global__ function("kernel") is not allowed

代码摘录：

main.c （调用cuda主机功能）：

#include "cuda_compuations.h"
...
ComputeSomething(&var1,&var2);
...

cuda_computations.cu （具有内核，主机主功能并包含具有设备功能的标头）：

#include "cuda_computations.h"
#include "decoupled_functions.cuh"
...
__global__ void kernel(){
...
DecoupledCallGpu(&var_kernel);
}

void ComputeSomething(int *var1, int *var2){
//allocate memory and etc..
...
kernel<<<20,512>>>();
//cleanup
...
}

decoupled_functions.cuh ：

#ifndef _DECOUPLEDFUNCTIONS_H_
#define _DECOUPLEDFUNCTIONS_H_

void DecoupledCallGpu(int *var);

#endif

decoupled_functions.cu：

#include "decoupled_functions.cuh"

__device__ void DecoupledCallGpu(int *var){
  *var=0;
}

#endif

汇编：

nvcc -g --ptxas-options = -v -arch = sm_30 -c cuda_computations.cu -o cuda_computations.o -lcudart

问题：为什么DecoupledCallGpu是从主机功能而不是内核调用的呢？

P.S。：如果你需要，我可以分享它背后的实际代码。

Answer 1

将__device__装饰器添加到decoupled_functions.cuh中的原型中。这应该照顾你看到的错误信息。

然后，您需要在模块中使用separate compilation and linking。因此，不要使用-c编译-dc进行编译。您的链接命令将需要修改。一个基本的例子是here。

你的问题有点令人困惑：

问题：为什么从主机函数调用DecoupledCallGpu而不是它本应该调用的内核？

我无法判断你是否因英语绊倒或者是否存在误解。实际的错误消息指出：

错误：不允许从__host__函数（＆＃34;内核＆＃34;）调用__global__函数（＆＃34; DecoupledCallGpu＆＃34;）

这是因为编译单元中的（即在模块内，正在编译的文件中，即cuda_computations.cu），该函数的唯一描述DecoupledCallGpu()是标题中原型中提供的内容：

void DecoupledCallGpu(int *var);

这个原型在CUDA C中表示未修饰的函数，这些函数是equivalent to __host__（仅）修饰函数：

__host__ void DecoupledCallGpu(int *var);

该编译单元不知道decoupled_functions.cu中的实际内容。

因此，当你有这样的内核代码时：

__global__ void kernel(){       //<- __global__ function
...
DecoupledCallGpu(&var_kernel);  //<- appears as a __host__ function to compiler
}

编译器认为您试图从__host__函数调用__global__函数，这是非法的。

如何正确链接cuda头文件与设备功能？

1 个答案: