Question

在CUDA C项目中，我想尝试使用Thrust库来查找浮点数组中的最大元素。似乎Thrust函数thrust :: max_element（）就是我所需要的。我想要使用此函数的数组是cuda内核的结果（看起来工作正常），因此在调用thrust :: max_element（）时它已经存在于设备内存中。我对Thrust库不太熟悉，但在查看了thrust :: max_element（）的文档并阅读了本网站上类似问题的答案之后，我认为我已经掌握了这个过程的工作原理。不幸的是我得到了错误的结果，似乎我没有正确使用库函数。有人可以告诉我我的代码有什么问题吗？

float* deviceArray;
float* max;
int length = 1025;

*max = 0.0f;
size = (int) length*sizeof(float);     

cudaMalloc(&deviceArray, size);
cudaMemset(deviceArray, 0.0f, size);

// here I launch a cuda kernel which modifies deviceArray

thrust::device_ptr<float> d_ptr = thrust::device_pointer_cast(deviceArray);
*max = *(thrust::max_element(d_ptr, d_ptr + length));

我使用以下标题：

#include <thrust/extrema.h>
#include <thrust/device_ptr.h>

即使我确定在运行内核后deviceArray包含非零值，我仍然会获得* max的零值。我使用nvcc作为编译器（CUDA 7.0），我在具有计算能力3.5的设备上运行代码。

非常感谢任何帮助。感谢。

Answer 1

这不是正确的C代码：

float* max;
int length = 1025;

*max = 0.0f;

在您正确地为该指针提供分配（并将指针设置为等于该分配的地址）之前，您不允许使用指针（max）存储数据。

除此之外，你的其余代码似乎对我有用：

$ cat t990.cu
#include <thrust/extrema.h>
#include <thrust/device_ptr.h>
#include <iostream>


int main(){

  float* deviceArray;
  float max, test;
  int length = 1025;

  max = 0.0f;
  test = 2.5f;
  int size = (int) length*sizeof(float);

  cudaMalloc(&deviceArray, size);
  cudaMemset(deviceArray, 0.0f, size);
  cudaMemcpy(deviceArray, &test, sizeof(float),cudaMemcpyHostToDevice);

  thrust::device_ptr<float> d_ptr = thrust::device_pointer_cast(deviceArray);
  max = *(thrust::max_element(d_ptr, d_ptr + length));
  std::cout << max << std::endl;
}
$ nvcc -o t990 t990.cu
$ ./t990
2.5
$

在CUDA C项目中使用thrust :: max_element

1 个答案: