Question

我正在尝试使用Rcpp::CharacterMatrix并将每一行转换为Rcpp::List中的自己的元素。

但是，我编写的函数有一个奇怪的行为，其中列表的每个条目都对应于矩阵的最后一行。为什么会这样？这是一些指针相关的概念吗？请解释一下。

功能：

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List char_expand_list(CharacterMatrix A) {
  CharacterVector B(A.ncol());

  List output;

  for(int i=0;i<A.nrow();i++) {
    for(int j=0;j<A.ncol();j++) {
      B[j] = A(i,j);
    }

    output.push_back(B);
  }

  return output;
}

测试矩阵：

这是传递给上述函数的矩阵A。

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
mat
#     [,1] [,2] [,3]
# [1,] "a"  "a"  "a" 
# [2,] "b"  "b"  "b" 
# [3,] "c"  "c"  "c"

输出：

上面的函数应该将此矩阵作为输入并返回矩阵行列表，如下所示：

char_expand_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"

但我得到了不同的东西：

char_expand_list(mat)
# [[1]]
# [1] "c" "c" "c"
#
# [[2]]
# [1] "c" "c" "c"
#
# [[3]]
# [1] "c" "c" "c"

可以看出，输出具有最后一个元素，例如对于第一和第二列表元素重复的矩阵行“c”。为什么会这样？

Answer 1

这里发生的事情主要是Rcpp对象如何工作的结果。特别是，CharacterVector充当指向内存位置的指针。通过在for循环之外定义此内存位置，结果是 a＆＃34;全球＆＃34;指针。也就是说，当循环中发生B更新时这随后会更新B中已方便存储的Rcpp::List的所有变体。因此，整个"c"的重复行列表。

有了这个说法，在任何.push_back()数据类型上使用Rcpp是一个非常非常非常错误的想法，因为您最终会来回复制不断扩大的对象。复制将在Rcpp数据类型隐藏控制R对象的基础SEXP时发生，必须重新创建。因此，您应该尝试以下方法之一：

重新排列创建Rcpp::CharacterVector的位置，使其位于第一个for循环内并预分配Rcpp::List空间。
切换到仅使用C ++标准库对象，并在最后将其转换为适当的类型。
- std::list std::vector<T>类型T（即std::string）
- Rcpp::wrap(x)返回正确的对象或将函数返回类型从Rcpp::List修改为std::list<std::vector<T> >。
预分配Rcpp::List空格并使用std::vector<T>类型T（即std::string）。
预先分配Rcpp::List空格并制作clone() Rcpp对象，然后将其存储在列表中。

选项1

这里我们通过将B的声明移动到。{重新排列函数第一个循环，预分配列表空间，并正常访问输出列表。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_rearrange(Rcpp::CharacterMatrix A) {
  Rcpp::List output(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {
    Rcpp::CharacterVector B(A.ncol());

    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    output[i] = B;
  }

  return output;
}

选项2

在此，我们删除了Rcpp::CharacterVector，取而代之的是std::vector<std::string>，并将Rcpp::List替换为std::list<std::vector<std::string> >。最后，我们通过Rcpp::List将标准对象转换为Rcpp::wrap()。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_std_to_list(Rcpp::CharacterMatrix A) {
  std::vector<std::string> B(A.ncol());

  std::list<std::vector<std::string> > o;

  for(int i = 0 ;i < A.nrow(); i++) {
    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    o.push_back(B);
  }

  return Rcpp::wrap(o);
}

，并提供：

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
char_expand_std_to_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"

选项3

或者，你可以保持Rcpp::List，但只是声明大小它提前期待并仍然使用std::vector<T>元素。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_vec(Rcpp::CharacterMatrix A) {
  std::vector<std::string> B(A.ncol());

  Rcpp::List o(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {
    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    o[i] = B;
  }

  return o;
}

选项4

最后，在为列表预定义空间的情况下，有一个明确的克隆每次迭代的数据。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_clone(Rcpp::CharacterMatrix A) {
  Rcpp::CharacterVector B(A.ncol());
  Rcpp::List output(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {

    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    output[i] = clone(B);
  }

  return output;
}

基准

基准测试结果显示选项1 具有重新排列和预分配空间表现最好。亚军第二名是选项4 ，其中包括克隆每个向量，然后将其保存到Rcpp::List。

library("microbenchmark")
library("ggplot2")

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))

micro_mat_to_list = 
  microbenchmark(char_expand_list_rearrange(mat),
                 char_expand_std_to_list(mat),
                 char_expand_list_vec(mat),
                 char_expand_list_clone(mat))
micro_mat_to_list
# Unit: microseconds
#                             expr   min     lq    mean median     uq    max neval
#  char_expand_list_rearrange(mat) 1.501 1.9255 3.22054 2.1965 4.8445  6.797   100
#     char_expand_std_to_list(mat) 2.869 3.2035 4.90108 3.7740 6.4415 27.627   100
#        char_expand_list_vec(mat) 1.948 2.2335 3.83939 2.7130 5.2585 24.814   100
#      char_expand_list_clone(mat) 1.562 1.9225 3.60184 2.2370 4.8435 33.965   100

将Rcpp对象分配到Rcpp列表会产生最后一个元素的重复项

1 个答案:

选项1

选项2

选项3

选项4

基准