Question

我想尽可能少地复制。目前我正在使用num_t* array = new num_t[..]，然后在for循环中将多维向量的每个值复制到array。

我想找到一个更好的方法来做到这一点。

Answer 1

对于算术类型，您可以使用函数memcpy。例如

#include <iostream>
#include <vector>
#include <cstring>

int main()
{
    std::vector<std::vector<int>> v =
    {
        { 1 },
        { 1, 2 },
        { 1, 2, 3 },
        { 1, 2, 3, 4 }
    };

    for ( const auto &row : v )
    {
        for ( int x : row ) std::cout << x << ' ';
        std::cout << std::endl;
    }
    std::cout << std::endl;

    size_t n = 0;
    for ( const auto &row : v ) n += row.size();

    int *a = new int[n];
    int *p = a;

    for ( const auto &row : v )
    {
        std::memcpy( p, row.data(), row.size() * sizeof( int ) );
        p += row.size();
    }        

    for ( p = a; p != a + n; ++p ) std::cout << *p << ' ';
    std::cout << std::endl;

    delete []a;
}

程序输出

1 
1 2 
1 2 3 
1 2 3 4 

1 1 2 1 2 3 1 2 3 4

Answer 2

正如您在评论中所述，您的vector<vector<T>>结构的内部向量具有相同的大小。所以你实际上要做的是存储一个m x n矩阵。

通常这种矩阵不存储在多维结构中，但存储在线性存储器中。然后，基于最常使用row-major and column-major order的索引方案派生给定元素的位置（行，列）。

由于您已经声明要将此数据复制到GPU上，因此只需通过复制整个线性向量即可完成此复制。然后，您将在GPU和主机上使用相同的索引方案。

如果您使用的是CUDA，请查看Thrust。它提供了thrust::host_vector<T>和thrust::device_vector<T>，并进一步简化了复制：

thrust::host_vector<int> hostVec(100); // 10 x 10 matrix
thrust::device_vector<int> deviceVec = hostVec; // copies hostVec to GPU

将多维std :: vector转换为一个数组

2 个答案: