Question

我想置换存储为交错数组的矩阵行（即由行主C风格格式的矢量支持），并将相同的排列应用于相应矢量的元素。

假设矩阵尺寸为RxC，相应的矢量具有R元素。

我目前的想法是生成R索引的排列，然后使用thrust::stable_sort_by_key来置换向量，如here所示。

然后我可以创建另一个排列向量，重复我之前创建的C次的每个元素。

因此，如果R = 4，C = 3且原始置换指数向量为[4,2,3,1]，则矩阵的置换向量将为[4,4,4,2,2,2,3] ，3,3,1,1,1]。通过使用稳定排序，矩阵的一行中的元素不应被置换。

我的问题是，如果有更好/更有效的方法，使用Thrust或普通CUDA。

示例：

原始矩阵：

[ 1 1 1 1 ]
[ 2 2 2 2 ]
[ 3 3 3 3 ]
[ 4 4 4 4 ]
[ 5 5 5 5 ]

原始载体：

[1 2 3 4 5]

排列顺序：

[5 3 1 2 4]

置换矩阵：

[ 5 5 5 5 ]
[ 3 3 3 3 ]
[ 1 1 1 1 ]
[ 2 2 2 2 ]
[ 4 4 4 4 ]

置换矢量：

[5 3 1 2 4]

我的用例是我有一个特征矩阵和每个例子的相应标签的矢量。我想置换矩阵并在向量上应用相同的置换，作为SGD迭代之前的混洗步骤。我想拥有连续行并迭代它们的原因是我计划使用cuBLAS gemv来执行矩阵向量操作，这假设矩阵在内存中以类似的方式布局（尽管是以列为主的格式，意味着我需要像this）

那样称呼它

Answer 1

我的问题是，如果有更好/更有效的方法来执行此操作，使用Thrust

我相信有。置换向量为您提供了将输入矩阵的内容直接复制到置换矩阵所需的所有信息，而无需进行排序。

有用的thrust功能是permutation_iterator。置换迭代器允许我们在运行中重新排序我们选择的输入元素，以便在任何操作中使用。如果我们提供一个合适的索引计算函数，我们可以将一个线性索引（通过counting_iterator）传递给索引函子，以创建（通过transform_iterator）适当的置换输入索引。复制操作。

这是一个有效的例子：

$ cat t1061.cu
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <iostream>
#include <assert.h>

typedef int mytype;

struct copy_idx_func : public thrust::unary_function<unsigned, unsigned>
{
  size_t c;
  unsigned *p;
  copy_idx_func(const size_t _c, unsigned *_p) : c(_c),p(_p) {};
  __host__ __device__
  unsigned operator()(unsigned idx){
    unsigned myrow = idx/c;
    unsigned newrow = p[myrow]-1;
    unsigned mycol = idx%c;
    return newrow*c+mycol;
  }
};


int main(){

  const mytype mat[]   = {1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5};
  const mytype vec[]   = {1,2,3,4,5};
  const unsigned per[] = {5,3,1,2,4};

  const size_t msize = sizeof(mat)/sizeof(mytype);
  const size_t vsize = sizeof(vec)/sizeof(mytype);
  const size_t psize = sizeof(per)/sizeof(unsigned);
  const size_t cols  = msize/vsize;
  // const size_t rows  = vsize;
  assert(msize%vsize == 0);
  assert(vsize == psize);

  thrust::device_vector<mytype>   d_m(mat, mat+msize);
  thrust::device_vector<mytype>   d_v(vec, vec+vsize);
  thrust::device_vector<unsigned> d_p(per, per+psize);
  thrust::device_vector<mytype>   d_rm(msize);
  thrust::device_vector<mytype>   d_rv(vsize);
  std::cout << "Initial Matrix:" << std::endl;
  thrust::copy_n(d_m.begin(), msize, std::ostream_iterator<mytype>(std::cout, ","));

  // permute the matrix
  thrust::copy_n(thrust::make_permutation_iterator(d_m.begin(), thrust::make_transform_iterator(thrust::counting_iterator<unsigned>(0), copy_idx_func(cols,thrust::raw_pointer_cast(d_p.data())))), msize, d_rm.begin());

  std::cout << std::endl << "Permuted Matrix:" << std::endl;
  thrust::copy_n(d_rm.begin(), msize, std::ostream_iterator<mytype>(std::cout, ","));
  std::cout << std::endl << "Initial Vector:" << std::endl;
  thrust::copy_n(d_v.begin(), vsize, std::ostream_iterator<mytype>(std::cout, ","));

  // permute the vector
  thrust::copy_n(thrust::make_permutation_iterator(d_v.begin(), thrust::make_transform_iterator(thrust::counting_iterator<unsigned>(0),  copy_idx_func(1,thrust::raw_pointer_cast(d_p.data())))), vsize, d_rv.begin());

  std::cout << std::endl << "Permuted Vector:" << std::endl;
  thrust::copy_n(d_rv.begin(), vsize, std::ostream_iterator<mytype>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -o t1061 t1061.cu
$ ./t1061
Initial Matrix:
1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,
Permuted Matrix:
5,5,5,5,5,3,3,3,3,3,1,1,1,1,1,2,2,2,2,2,4,4,4,4,4,
Initial Vector:
1,2,3,4,5,
Permuted Vector:
5,3,1,2,4,
$

注意：

在操作上置换矢量与置换矩阵相同。我们只是将向量视为一列的矩阵。
正如评论中所讨论的，如果用例完全在推力范围内，则可能根本不需要复制元素。 permutation_iterator允许我们以任何置换顺序从原始矩阵中选择元素，我们可以简单地将此构造传递给任何需要按照置换顺序排列原始矩阵的推力操作。

使用CUDA / Thrust

1 个答案: