Question

嘿所有，我正在使用CUDA和Thrust库。当我尝试访问CUDA内核上的双指针时，我遇到了一个问题，该内核加载了来自主机的Object *（指针向量）类型的thrust :: device_vector。当用'nvcc -o thrust main.cpp cukernel.cu'编译时，我收到警告'警告：无法告诉指针指向什么，假设全局内存空间'以及尝试运行程序时的启动错误。

我已经阅读了Nvidia论坛，解决方案似乎是“不要在CUDA内核中使用双指针”。在发送到内核之前，我不打算将双指针折叠成1D指针......有没有人找到解决这个问题的方法？所需的代码如下，提前感谢！

--------------------------
        main.cpp
--------------------------

Sphere * parseSphere(int i)
{
  Sphere * s = new Sphere();
  s->a = 1+i;
  s->b = 2+i;
  s->c = 3+i;
  return s;
}

int main( int argc, char** argv ) {

  int i;
  thrust::host_vector<Sphere *> spheres_h;
  thrust::host_vector<Sphere> spheres_resh(NUM_OBJECTS);

  //initialize spheres_h
  for(i=0;i<NUM_OBJECTS;i++){
    Sphere * sphere = parseSphere(i);
    spheres_h.push_back(sphere);
  }

  //initialize spheres_resh
  for(i=0;i<NUM_OBJECTS;i++){
    spheres_resh[i].a = 1;
    spheres_resh[i].b = 1;
    spheres_resh[i].c = 1;
  }

  thrust::device_vector<Sphere *> spheres_dv = spheres_h;
  thrust::device_vector<Sphere> spheres_resv = spheres_resh;
  Sphere ** spheres_d = thrust::raw_pointer_cast(&spheres_dv[0]);
  Sphere * spheres_res = thrust::raw_pointer_cast(&spheres_resv[0]);

  kernelBegin(spheres_d,spheres_res,NUM_OBJECTS);

  thrust::copy(spheres_dv.begin(),spheres_dv.end(),spheres_h.begin());
  thrust::copy(spheres_resv.begin(),spheres_resv.end(),spheres_resh.begin());

  bool result = true;

  for(i=0;i<NUM_OBJECTS;i++){
    result &= (spheres_resh[i].a == i+1);
    result &= (spheres_resh[i].b == i+2);
    result &= (spheres_resh[i].c == i+3);
  }

  if(result)
  {
    cout << "Data GOOD!" << endl;
  }else{
    cout << "Data BAD!" << endl;
  }

  return 0;
}


--------------------------
        cukernel.cu
--------------------------
__global__ void deviceBegin(Sphere ** spheres_d, Sphere * spheres_res, float    
num_objects)
{
  int index = threadIdx.x + blockIdx.x*blockDim.x;

  spheres_res[index].a = (*(spheres_d+index))->a; //causes warning/launch error
  spheres_res[index].b = (*(spheres_d+index))->b; 
  spheres_res[index].c = (*(spheres_d+index))->c; 
}

void kernelBegin(Sphere ** spheres_d, Sphere * spheres_res, float num_objects)
{

 int threads = 512;//per block
 int grids = ((num_objects)/threads)+1;//blocks per grid

 deviceBegin<<<grids,threads>>>(spheres_d, spheres_res, num_objects);
}

Answer 1

这里的基本问题是设备向量spheres_dv包含主机指针。 Thrust不能在GPU和主机CPU地址空间之间进行“深度复制”或指针转换。因此，当您将spheres_h复制到GPU内存时，您将收到主机指针的GPU阵列。 GPU上主机指针的间接是非法的 - 它们是错误的内存地址空间中的指针，因此您在内核中获得了相当于段错误的GPU。

该解决方案将涉及用在GPU上执行内存分配的内容替换您的parseSphere函数，而不是使用parseSphere，它目前在主机内存中分配每个新结构。如果你有一个Fermi GPU（看起来你没有）并使用CUDA 3.2或4.0，那么一种方法就是将parseSphere转换为内核。设备代码支持C ++ new运算符，因此可以在设备内存中进行结构创建。您需要修改Sphere的定义，以便将构造函数定义为__device__函数，以使此方法起作用。

替代方法将涉及创建保存设备指针的主机阵列，然后将该阵列复制到设备内存。您可以在this answer中看到相关示例。请注意，声明包含thrust::device_vector thrust::device_vector的{{1}}可能无效，因此您可能需要使用基础CUDA API调用来构建此数组设备指针。

你还应该注意我没有提到反向复制操作，这同样难以做到。

底线是推力（以及C ++ STL容器）实际上并不打算保留指针。它们旨在保存值，并通过使用用户不应看到的迭代器和底层算法来抽象指针间接和直接内存访问。此外，“深层复制”问题是NVIDIA论坛上的明智人士反对GPU代码中的多个指针级别的主要原因。它使代码变得非常复杂，并且它在GPU上也执行得更慢。

CUDA /推力双指针问题（指针向量）

1 个答案: