Question

我正在尝试将一些链接列表形式的数据传输到我的GPGPU。我是否需要进行与节点数量一样多的传输，或者有更好更快的方法吗？

Answer 1

使用Thrust库时，您可以从迭代器范围生成设备向量。在以下站点，他们为此案例提供了一个示例

#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <list>
#include <vector>

int main(void)
{
    // create an STL list with 4 values
    std::list<int> stl_list;

    stl_list.push_back(10);
    stl_list.push_back(20);
    stl_list.push_back(30);
    stl_list.push_back(40);

    // initialize a device_vector with the list
    thrust::device_vector<int> D(stl_list.begin(), stl_list.end());

    // copy a device_vector into an STL vector
    std::vector<int> stl_vector(D.size());
    thrust::copy(D.begin(), D.end(), stl_vector.begin());

    return 0;
}

https://github.com/thrust/thrust/wiki/Quick-Start-Guide

查看标题为“Iterators and Static Dispatching”的部分。

您可以使用STL的算法库执行类似的操作。

std::list<int> stl_list;
stl_list.push_back(10);
...
float *myarray = new float[stl_list.size()];
float *mydevicearray;
CUDA_SAFE_CALL(cudaMalloc(&mydevicearray, sizeof(float)*stl_list.size()));
std::copy(stl_list.begin(), stl_list.end(), myarray);
CUDA_SAFE_CALL(cudaMemcpy(myarray, mydevicearray, sizeof(float)*stl_list.size(), cudaMemcpyHostToDevice));

这两个示例应该只进行一次memcopy操作，因为将内存复制到CUDA设备的成本很高，而且对列表中的每个元素都这样做是不合逻辑的。

Answer 2

如果要将数据从链接列表传输到Array（到GPU），则只需将节点中的值发送到阵列（GPU）即可。这很简单。您可以将cudaMalloc()与节点数一起使用。

如果您尝试将数据从链表传输到链表（到GPU），那么创建节点和传输数据是一个繁忙的过程。您可以调用不同的函数来创建节点，链接节点和所有节点。（不是优选的，因为链表比串行更串行）。

建议第一个案例。它简单而且你想要的只是数据传输。

尝试使用Thrust库来获取数据结构。

将链表作为数组传递给CUDA

2 个答案: