当我使用thrust::device_vector
的迭代器提供算法时,Thrust会自动选择GPU后端,因为矢量的数据存在于GPU上。但是,当我只向算法提供thrust::counting_iterator
个参数时,如何选择它在哪个后端执行?
在下面的thrust::find
调用中,没有device_vector
迭代器参数,那么Thrust如何选择使用哪个后端(CPU,OMP,TBB,CUDA)?
如何在不使用此代码中的thrust::device_vector<>
的情况下控制此算法执行的后端?
thrust::counting_iterator<uint64_t> first(i);
thrust::counting_iterator<uint64_t> last = first + step_size;
auto iter = thrust::find(
thrust::make_transform_iterator(first, functor),
thrust::make_transform_iterator(last, functor),
true);
更新23.01.14。 MSVS2012,CUDA5.5,Thrust 1.7 :
编译成功!
#include <iostream>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/find.h>
#include <thrust/functional.h>
#include <thrust/execution_policy.h>
struct is_odd : public thrust::unary_function<uint64_t, bool> {
__host__ __device__ bool operator()(uint64_t const& x) {
return x & 1;
}
};
int main() {
thrust::counting_iterator<uint64_t> first(0);
thrust::counting_iterator<uint64_t> last = first + 100;
auto iter = thrust::find(thrust::device,
thrust::make_transform_iterator(first, is_odd()),
thrust::make_transform_iterator(last, is_odd()),
true);
int bbb; std::cin >> bbb;
return 0;
}
答案 0 :(得分:3)
有时Thrust算法的执行方式可能不明确,如counting_iterator
示例所示,因为其关联的“后端系统”为thrust::any_system_tag
(counting_iterator
可以在任何地方取消引用,因为它是没有数据支持)。在这种情况下,Thrust将使用设备后端。默认情况下,这将是CUDA。但是,您可以通过几种方式明确控制执行的执行方式。
您可以通过模板参数显式指定系统,如ngimel的答案,或者您可以在示例中将thrust::device
执行策略作为thrust::find
的第一个参数提供:
#include <thrust/execution_policy.h>
...
thrust::counting_iterator<uint64_t> first(i);
thrust::counting_iterator<uint64_t> last = first + step_size;
auto iter = thrust::find(thrust::device,
thrust::make_transform_iterator(first, functor),
thrust::make_transform_iterator(last, functor),
true);
此技术需要Thrust 1.7或更高。
答案 1 :(得分:2)
实例化counting_iterator时必须指定系统模板参数:
typedef thrust::device_system_tag System;
thrust::counting_iterator<uint64_t,System> first(i)
答案 2 :(得分:1)
如果您使用的是当前版本的Thrust,请按照Jared Hoberock提到的方式进行操作。但是,如果你可能使用旧版本(你工作的系统可能有旧版本的CUDA),那么下面的例子可能有所帮助。
#include <thrust/version.h>
#if THRUST_MINOR_VERSION > 6
#include <thrust/execution_policy.h>
#elif THRUST_MINOR_VERSION == 6
#include <thrust/iterator/retag.h>
#else
#endif
...
#if THRUST_MINOR_VERSION > 6
total =
thrust::transform_reduce(
thrust::host
, thrust::counting_iterator<unsigned int>(0)
, thrust::counting_iterator<unsigned int>(N)
, AFunctor(), 0, thrust::plus<unsigned int>());
#elif THRUST_MINOR_VERSION == 6
total =
thrust::transform_reduce(
thrust::retag<thrust::host_system_tag>(thrust::counting_iterator<unsigned int>(0))
, thrust::retag<thrust::host_system_tag>(thrust::counting_iterator<unsigned int>(N))
, AFunctor(), 0, thrust::plus<unsigned int>());
#else
total =
thrust::transform_reduce(
thrust::counting_iterator<unsigned int, thrust::host_space_tag>(0)
, thrust::counting_iterator<unsigned int, thrust::host_space_tag>(objectCount)
, AFunctor(), 0, thrust::plus<unsigned int>());
#endif
@see Thrust: How to directly control where an algorithm invocation executes?