Question

我在阵列上运行Thrust并行二进制搜索类型例程：

// array and array2 are raw pointers to device memory
thrust::device_ptr<int> array_ptr(array);

// Search for first position where 0 could be inserted in array
// without violating the ordering
thrust::device_vector<int>::iterator iter;
iter = thrust::lower_bound(array_ptr, array_ptr+length, 0, cmp(array2));

自定义函数对象cmp定义了自定义比较运算符：

struct cmp
{
    cmp(int *array2){ this->array2 = array2; }

    __device__ bool operator()(const int& x, const int& y)
    {
        return device_function(array2,x) <= device_function(array2,y);
    }

    int *array2;
};

比较依赖于对设备上编译的函数的调用：

__device__ int device_function( const int* array2, const int value ){
    int quantity = 0;

    for (int i = 0; i < 50000; ++i){
        if ( array2[i] > value ){ quantity += array2[i]; }
    }

    return quantity;
}

我的问题是：在设备上进行了什么（如果有的话）并行执行以便device_function中的总和减少？如果函数按顺序执行，如何引入并行性来加速函数求值？

Answer 1

我的问题是：在设备上进行了什么（如果有的话）并行执行，以便在device_function中减少总和？

无。 __device__函数中的普通C / C ++代码（无论是在CUDA还是Thrust中）从单个CUDA线程的上下文顺序执行。

如果函数按顺序执行，如何引入并行性来加速函数求值？

一种可能的方法是使用Thrust v1.8（可从github或CUDA 7 RC获得）并在您传递给cmp的仿函数（thrust::lower_bound）中放置一个适当的推力函数。 / p>

Here是一个在传递给另一个推力函数的自定义仿函数中使用thrust::sort的实例。

使用此方法的并行化需要在支持CUDA动态并行的设备上进行编译和执行。并且无法保证整体加速，就像任何CUDA Dynamic Parallelism代码一样。这种并行性水平是否会带来任何好处将取决于许多因素，例如先前的并行性水平是否已经最大限度地利用了该设备。

出于示例目的，您device_function中包含的功能似乎可以通过对thrust::transform_reduce的单个调用来替换。然后可以将您的cmp函数重写为类似的内容（在浏览器中编码，未经过测试）：

struct cmp
{
    cmp(int *array2){ this->array2 = array2; }

    __device__ bool operator()(const int& x, const int& y)
    {
        return (thrust::transform_reduce(thrust::device, array2,array2+50000, my_greater_op(x), 0, thrust::plus<int>()) <= thrust::transform_reduce(thrust::device, array2,array2+50000, my_greater_op(y), 0, thrust::plus<int>()));
    }

    int *array2;

};

并且您必须提供适当的my_greater_op仿函数：

struct my_greater_op
{
  int val;
  my_greater_op(int _val) {val = _val;}
  __host__ __device__ int operator(const int& x)
  {
     return (x>val)?x:0;
  }
};

在Thrust比较运算符中加速device函数

1 个答案:

在Thrust比较运算符中加速__device__函数

1 个答案:

在Thrust比较运算符中加速device函数