Question

如何在设备端运行构造函数？当我使用SolidSphere sphere<<<1, 1>>> (1, 12, 24)时，编译器会给我错误：

错误：类＆＃34; SolidSphere＆＃34;
没有默认构造函数

class SolidSphere
{
.
.
.
public:
    __device__ __host__ SolidSphere(float radius, unsigned int rings, unsigned int sectors)
.
.
.
};

SolidSphere sphere<<<1, 1>>> (1, 12, 24);

Answer 1

您最大的问题是了解设备代码和内核函数之间的区别。

设备代码可以在内核函数中实例化。内核函数是CUDA设备的入口点。

这就是你所拥有的：

class SolidSphere
{
public:
    __device__ __host__ SolidSphere(float radius, 
          unsigned int rings, unsigned int sectors);
};

这就是你需要的：

__global__ void KernelSolidSphere(/** inputs and outputs */) {
     // notice this is how you use __device__ compiled code
     SolidSphere sphere(10.32, 3, 5);
     // use the sphere here
     return;
}

这就是你从主持人那里打电话的方式：

KernelSolidSphere<<<1, 1>>>(/** inputs and outputs */);

当我刚开始学习时，我大量使用了this资源。这应该提供你需要的一切。

Answer 2

我要对此进行一次尝试，澄清一下我从未这样做过，这只是我的理解。

__device__调用可以从内核（__global__）进行。您不能拥有类的__global__成员函数。

您可以拥有__global__初始化调用，但无法分配新内存。

如果您想使用构造函数初始化内存块，最好使用的是新的展示位置：

class Point
{
public:
  __host__ __device__ Point() {}

  __host__ __device__ Point(int a,int b) : x(a), y(b)
  {
  }
  int x,y;
private:

};

__global__ void init_point(void* buffer,int a, int b)
{
  new(buffer) Point(a,b);
}
#include <iostream>

int main()
{
  int count = 0;
    int i = 0;

  cudaGetDeviceCount(&count);
    if(count == 0) {
        fprintf(stderr, "There is no device.\n");
        return false;
    }
  int cuda_count = 0;
    for(i = 0; i < count; i++) {
        cudaDeviceProp prop;
        if(cudaGetDeviceProperties(&prop, i) == cudaSuccess) 
    {
      if (prop.major >= 1) { cuda_count++;}
      std::cout << "[" << i << "] --" << prop.name << std::endl;
        }
    }

    if(cuda_count == 0) {
        fprintf(stderr, "There is no device supporting CUDA.\n");
        return -1;
    }

  std::cout << std::endl << "Select device" << std::endl;

  std::cin >> i;

    cudaSetDevice(i);

  printf("CUDA initialized.\n");

  void* buff;
  cudaMalloc(&buff,sizeof(Point));
  init_point<<<1,1>>>(buff,10,20);
    cudaThreadSynchronize();
  Point cpu_point;
  cudaMemcpy(&cpu_point,buff,sizeof(Point),cudaMemcpyDeviceToHost);
  std::cout << cpu_point.x << std::endl;
  std::cout << cpu_point.y << std::endl;
  getchar();
  getchar();
  return 0;
}

显然，这可以扩展到init_point，可以多线程方式初始化点。

请注意，Cuda架构上的阵列结构通常比阵列结构设计慢得多。

在设备上运行类构造函数/方法。怎么样？

2 个答案: