Question

我正在尝试转换我使用的C ++程序，该程序使用随机库，这是一个C ++ 11功能。在阅读了几篇类似的帖子之后，我尝试将代码分成三个文件。首先，我想说我不太熟悉C / C ++，而且大部分都是在工作中使用R.

主文件如下所示。

#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_
#include <complex>
#include <random>
#include <iostream>
#include "my_code_header.h"
using namespace std;
std::default_random_engine generator;
std::normal_distribution<double> distribution(0.0,1.0);
const int rand_mat_length = 24561;
double rand_mat[rand_mat_length];// = {0};
void create_std_norm(){
  for(int i = 0 ; i < rand_mat_length ; i++)
    ::rand_mat[i] = distribution(generator);
}
.
.
.
int main(void)
{
  ...
  ...
  call_global();
  return 0;
}
#endif

头文件如下所示。

#ifndef mykernel_h
#define mykernel_h
void call_global();
void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width);
#endif

.cu文件如下所示。

#ifndef _MY_KERNEL_
#define _MY_KERNEL_
#include <iostream>
#include "my_code_header.h"
#define TILE_WIDTH 8
using namespace std;
__global__ void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width)
{
  unsigned int row = blockIdx.y*blockDim.y + threadIdx.y;
  unsigned int col = blockIdx.x*blockDim.x + threadIdx.x;
  if ((row>length) || (col>width)) {
    return;
  }
  ... 
}
void call_global()
{
  const size_t imageLength = 528;
  const size_t imageWidth = 528;
  const dim3 threadsPerBlock(TILE_WIDTH,TILE_WIDTH);
  const dim3 numBlocks(((imageLength) / threadsPerBlock.x), ((imageWidth) / threadsPerBlock.y));
  double *d_a, *d_b, *mys ;

  ...
  cudaMalloc((void**)&d_a, sizeof(double) * imageLength);
  cudaMalloc((void**)&d_b, sizeof(double) * imageWidth);
  cudaMalloc((void**)&mys, sizeof(double) * imageLength * imageWidth);

  two_d_example<<<numBlocks,threadsPerBlock>>>(d_a, d_b, mys, imageLength, imageWidth);
  ...  
  cudaFree(d_a);
  cudaFree(d_b);


}

#endif

请注意，__global__已从.h中删除，因为我收到了以下错误，因为它是由g ++编译的。

In file included from my_code_main.cpp:12:0:
my_code_header.h:5:1: error: ‘__global__’ does not name a type

当我使用nvcc编译.cu文件时，一切正常并生成my_code_kernel.o。但是因为我在我的.cpp中使用C ++ 11，我试图用g ++编译它，我收到以下错误。

/tmp/ccR2rXzf.o: In function `main':
my_code_main.cpp:(.text+0x1c4): undefined reference to `call_global()'
collect2: ld returned 1 exit status

我知道这可能不需要对CUDA做任何事情，可能只是错误地使用在两个地方都包含标题。还有什么是正确的编译方式，最重要的是链接my_code_kernel.o和my_code_main.o（希望如此）？对不起，如果这个问题太琐碎了！

Answer 1

看起来你没有链接my_code_kernel.o。您已经使用-c作为nvcc命令（导致它编译但没有链接，即生成.o文件），我猜你没有使用-c和你的g ++命令，在这种情况下，您需要将my_code_kernel.o添加到输入列表以及.cpp文件中。

你想要实现的分离是完全可能的，它看起来就像你没有正确连接。如果仍有问题，请将编译命令添加到您的问题中。

仅供参考：您不需要在头文件中声明two_d_example()，它只会在您的.cu文件中使用（来自call_global()）。

分离出.cu和.cpp（使用c ++ 11库）

1 个答案: