编辑

Question

我实际上正在一个项目中，我计划混合使用“纯” C ++文件（.cpp和.hpp）和 C ++ / CUDA文件（.cu和.cuh）。

该项目的目标主要是基于矩阵和复数的计算，因此，无论是我还是使用 template float或double专业化。

我知道CUDA有几个库，例如 cuBLAS 和 cuFFT 已经实现了我实际上打算做的事情，但是我的目的是熟悉CUDA即使使用 thrust 库，也无需使用隐藏我想理解的管理的实现。

下面，您将找到所有信息以了解我的实际问题。

因此，我开发了程序的小型体系结构，其中在main.cu文件中声明了一个QGPU::GPU<double> gpu对象，该对象将调用add成员函数，该成员函数已定义在QGPU.cuh文件中。

main.cu的内容

#include "QGPU.cuh"

int main()
{
  std::valarray<std::complex<double>> ma(10);
  std::valarray<std::complex<double>> mb(10);
  QGPU::GPU<double> gpu;

  gpu.add(ma, mb, 2, 5);
  return (0);
}

QGPU.cuh的内容

add的总体目的只是封装与CUDA实现有关的方法，由于QCUDA::CUDAGPU<T> cgpu_属性，我将调用这些方法。

#pragma once

# include <valarray>
# include <complex>

# include "QCUDA.cuh"

namespace QGPU {

  template<typename T>
  class GPU {
  private:
   QCUDA::CUDAGPU<T>    cgpu_;
  public:
   GPU();
   virtual ~GPU();
   std::valarray<std::complex<T>>* add(const std::valarray<std::complex<T>>&,
                                       const std::valarray<std::complex<T>>&,
                                       const int,
                                       const int);
  };

  template<typename T>
  GPU<T>::GPU()
   : cgpu_()
  {};

  template<typename T>
  GPU<T>::~GPU() = default;

  template<typename T>
  std::valarray<std::complex<T>>* GPU<T>::add(const std::valarray<std::complex<T>>& m1,
                                              const std::valarray<std::complex<T>>& m2,
                                              const int m,
                                              const int n) {
  this->cgpu_.convertSTLToThrust(m1, m2, m, n);
  //tmp
  return (nullptr);
  }; 
};

QCUDA.cuh的内容

convertSTLToThrust在QCUDA.cu中定义。

#pragma once

# include <thrust/host_vector.h>
# include <thrust/device_vector.h>
# include <thrust/complex.h>

namespace QCUDA {

 template<typename T>
 using thrustHostVector = thrust::host_vector<thrust::complex<T>>;

 template<typename T>
 using thrustDeviceVector = thrust::device_vector<thrust::complex<T>>;

 template<typename T>
 class CUDAGPU {
 private:
  thrustHostVector<T>         hostVecA_;
  thrustHostVector<T>         hostVecB_;
  thrustDeviceVector<T>       deviceVecA_;
  thrustDeviceVector<T>       deviceVecB_;
 public:
  CUDAGPU();
  virtual ~CUDAGPU();

  void convertSTLToThrust(const std::valarray<std::complex<T>>&,
                          const std::valarray<std::complex<T>>&,
                          const int,
                          const int);
 };

 template<typename T>
 CUDAGPU<T>::CUDAGPU() = default;

 template<typename T>
 CUDAGPU<T>::~CUDAGPU() = default;

};

QCUDA.cu的内容

自从我在convertSTLToThrust文件中定义了.cu成员函数以来，我明确地专门化了模板，并且还指定了该方法将在host部分上运行。但是，我会说我的成员函数的内容是正确的，因为我通知nvcc在host部分上运行此函数。

#include <cuda_runtime_api.h>
#include <cuda.h>

#include "QCUDA.cuh"

template<typename T> __host__
void QCUDA::CUDAGPU<T>::convertSTLToThrust(const std::valarray<std::complex<T>>& m1,
                                           const std::valarray<std::complex<T>>& m2,
                                           const int m,
                                           const int n) {

  std::cout << "Before resizing hostVecA_: " << this->hostVecA_.size() << std::endl;
  this->hostVecA_.resize(m * n, 0);
  std::cout << "After resizing hostVecA_: " << this->hostVecA_.size() << std::endl;
  std::cout << "Before resizing hostVecB_: " << this->hostVecB_.size() << std::endl;
  this->hostVecB_.resize(m * n, 0);
  std::cout << "After resizing hostVecB_: " << this->hostVecB_.size() << std::endl;
}


template<> __host__
void QCUDA::CUDAGPU<double>::convertSTLToThrust(const std::valarray<std::complex<double>>&,
                                                const std::valarray<std::complex<double>>&,
                                                const int,
                                                const int);

但是，通过此实现，当我尝试调用{{1时，我在QGPU.cuh文件的 add 函数中有一个未定义引用 }}成员函数来自convertSTLToThrust属性。

因此，我有条不紊地分析了包含cgpu_文件的方式以及如何使我的显式模板专业化，但是所有用于解决此未定义错误的分析均未成功。

这是我的错误输出：

.cuh

我实际上正在使用具有cuda版本的GTX 1060：

In function `QGPU::GPU<double>::add(std::valarray<std::complex<double> > const&, std::valarray<std::complex<double> > const&, int, int)':
tmpxft_00000cb9_00000000-5_main.cudafe1.cpp:(.text._ZN4QGPU3GPUIdE3addERKSt8valarrayISt7complexIdEES7_ii[_ZN4QGPU3GPUIdE3addERKSt8valarrayISt7complexIdEES7_ii]+0x38): undefined reference to `QCUDA::CUDAGPU<double>::convertSTLToThrust(std::valarray<std::complex<double> > const&, std::valarray<std::complex<double> > const&, int, int)'
collect2: error: ld returned 1 exit status

编辑

由于我的帖子被视为“重复”。因此，我开始在建议的帖子中寻找有关我的问题的关键信息。最后，我找到了一种解决方法，可以更正我的未定义参考。

根据原始张贴者已验证回复的帖子。有一个链接可以将我们引向c ++ F.A.Q，它准确地说明了我正在尝试做的事情。我在常见问题解答的this part找到了解决方案。

根据我在本部分中所读的内容，因此我在$nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Wed_Apr_11_23:16:29_CDT_2018 Cuda compilation tools, release 9.2, V9.2.88文件中更改了我以前的显式专业化：

FROM

QCUDA.cu

TO

#include <cuda_runtime_api.h>
#include <cuda.h>

#include "QCUDA.cuh"

template<typename T> __host__
void QCUDA::CUDAGPU<T>::convertSTLToThrust(const std::valarray<std::complex<T>>& m1,
                                           const std::valarray<std::complex<T>>& m2,
                                           const int m,
                                           const int n) {
  std::cout << "Before resizing hostVecA_: " << this->hostVecA_.size() << std::endl;
  this->hostVecA_.resize(m * n, 0);
  std::cout << "After resizing hostVecA_: " << this->hostVecA_.size() << std::endl;
  std::cout << "Before resizing hostVecB_: " << this->hostVecB_.size() << std::endl;
  this->hostVecB_.resize(m * n, 0);
  std::cout << "After resizing hostVecB_: " << this->hostVecB_.size() << std::endl;
}

template<> __host__
void QCUDA::CUDAGPU<double>::convertSTLToThrust(const std::valarray<std::complex<double>>&,
                                               const std::valarray<std::complex<double>>&,
                                               const int,
                                               const int);

对.cu文件中定义的模板类成员函数的未定义引用

编辑

FROM

TO

0 个答案: