Question

我正在尝试在CUDA Thrust上实现以下C ++函数：

0x08000

为此，我创建了一个仿函数，可以通过以下方式设置每个Fragment向量的索引：

$posts = get_posts("post_type=listings"); 
$count = count($posts); 
 echo "$count";

但是，对于我应用的转换，我遇到了以下错误：

void setFragment( vector< Atom * > &vStruct, vector< Fragment * > &vFragment ) {
    Fragment *frag;

    int n = vStruct.size();

    for( int i = 0 ; i < n-2 ; i++ ){
        frag = new Fragment();
        frag->index[0] = i;
        frag->index[1] = i+1;   
        frag->index[2] = i+2;   

        vFragment.push_back( frag );    
    }
}

我是CUDA的新手。如果有人可以帮助我在CUDA上实现C ++函数，我将不胜感激。

Answer 1

坦率地说，您编写的代码有几个明显的问题，永远无法以您想像的方式工作。除此之外，我猜想首先要在GPU上运行像这样的功能的原因是因为分析显示它非常慢。之所以这么慢，是因为它的设计异常糟糕，并且对于一个体面大小的输入数组，可能new和push_back调用了数百万次。无法在GPU上加速这些功能。它们更慢，而不是更快。而且，使用GPU建立这种类型的结构数组，仅将其复制回主机的想法与尝试对accelerate file I/O使用推力一样不合逻辑。从根本上讲，没有什么硬件或问题的大小比执行原始主机代码要快。 GPU上的延迟和GPU与主机之间的互连带宽保证了这一点。

使用推力初始化GPU内存中的结构数组的元素很简单。 tabulate转换可与像这样的函子一起使用：

#include <thrust/device_vector.h>
#include <thrust/tabulate.h>
#include <iostream>

struct Fragment
{
   int index[3];
   Fragment() = default;
};

struct functor
{
    __device__ __host__
    Fragment operator() (const int &i) const { 
        Fragment f; 
        f.index[0] = i; f.index[1] = i+1; f.index[2] = i+2; 
        return f;
    }
};


int main()
{
    const int N = 10;
    thrust::device_vector<Fragment> dvFragment(N);
    thrust::tabulate(dvFragment.begin(), dvFragment.end(), functor());

    for(auto p : dvFragment) {
        Fragment f = p;
        std::cout << f.index[0] << " " << f.index[1] << " " << f.index[2] << std::endl;
    }

    return 0;
}

运行如下：

$ nvcc -arch=sm_52 -std=c++14 -ccbin=g++-7 -o mobasher Mobasher.cu 
$ cuda-memcheck ./mobasher 
========= CUDA-MEMCHECK
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
9 10 11
========= ERROR SUMMARY: 0 errors

但这不是您问题中原始主机代码的直接翻译。

在设备向量上设置类型为int数组的每个主机向量的数据元素

1 个答案: