Question

感谢您回复我的问题@Eric Shiyin Kang，但没有前缀“主机”或“设备”导致我的问题，经过一些尝试和错误，我发现错误是“成员数据始终是一个常数” 举个例子：

struct OP {
    int N;
    __host__ __device__
    OP(const int n): N(n){};

    __host__ __device__
    UI operator()(const UI a) {
        int b = a * N;
        N++;
        return b;
    }
}
thrust::transform(A.begin(), A.end(), B.begin(), OP(2) );

在这种情况下，如果A是{0,1,2,3，...}，那么B是{0,2,4,6,8}，但实际B应该是{0,3（1 *（2 + 1）），8（2 *（3 + 1）），15（3 *（4 + 1）），....}

我不知道是什么导致这种情况，推力设计的原因？谁能告诉我？

Answer 1

对于更新的Q，无法在设备代码中更新主机var N。在并行算法中多次更新共享变量通常是不安全的。

实际上，初始化dev向量的最快方法应该是在对象构建阶段使用花式迭代器，例如，

// v[]={0,2,4,6,8...}
thrust::device_vector<float> v(
        thrust::make_transform_iterator(
                thrust::counting_iterator<float>(0.0),
                _1 * 2.0),
        thrust::make_transform_iterator(
                thrust::counting_iterator<float>(0.0),
                _1 * 2.0) + SIZE);

// u[]={0,3,8,15...}
thrust::device_vector<float> u(
        thrust::make_transform_iterator(
                thrust::counting_iterator<float>(0.0),
                _1 * (_1 + 2.0)),
        thrust::make_transform_iterator(
                thrust::counting_iterator<float>(0.0),
                _1 * (_1 + 2.0)) + SIZE);

它将比define-sequence-and-transform方式快几倍，因为后一种方法不止一次地读取/写入整个设备mem v。

请注意，上面的代码仅适用于Thrust 1.6.0+，因为lambda表达式函子与花式迭代器一起使用。对于CUDA 5.0中的Thrust 1.5.3，您应该明确地编写一个函子。

A已删除的原始Q.

您可以在__host__之前添加__device__和operator()()限定符，

struct OP {
    __host__ __device__ void operator()(int &a) {
        a *=2;
    }
}

和

struct OP {
    __host__ __device__ int operator()(int a) {
        return a*2;
    }
}

否则编译器将无法为GPU生成正确的设备代码。

Thrust - 初始device_vector

1 个答案: