在原处子集原子向量

时间:2019-08-05 13:05:26

标签: r subset rcpp

Subsetting a large vector uses unnecessarily large amounts of memory开始:

例如给出原子向量

x <- rep_len(1:10, 1e7)

如何使用Rcpp就地修改x以通过数字索引删除元素?在R中,可以执行此操作,但是不能就地执行(即不复制x):

idrops <- c(5, 4, 9)
x <- x[-idrops]

执行此操作的合理有效方法如下:

IntegerVector dropElements(IntegerVector x, IntegerVector inds) {
  R_xlen_t n = x.length();
  R_xlen_t ndrops = inds.length();
  IntegerVector out = no_init(n - ndrops);
  R_xlen_t k = 0; // index of out
  for (R_xlen_t i = 0; i < n; ++i) {
    bool drop = false;
    for (R_xlen_t j = 0; j < ndrops; ++j) {
      if (i == inds[j]) {
        drop = true;
        break;
      }
    }
    if (drop) {
      continue;
    }
    out[k] = x[i];
    ++k;
  }
  return out;
}

尽管这几乎是不到位的(它也不是很安全,但这是重点)。我知道STL的.erase(),尽管看起来Rcpp是设计使然,然后才转换为STL。

1 个答案:

答案 0 :(得分:0)

您链接到的问题在Rcpp中有点简单,而且一字难懂,但是您可以通过遍历负索引向量和数据的子集范围来实现有效的负索引。例如:

#include <Rcpp.h>
using namespace Rcpp;

// solution for the original question
// [[Rcpp::export]]
IntegerVector popBeginningOfVector(IntegerVector x, int npop) {
  return IntegerVector(x.begin() + npop, x.end());
}

// [[Rcpp::export]]
IntegerVector efficientNegativeIndexing(IntegerVector x, IntegerVector neg_idx) {
  std::sort(neg_idx.begin(), neg_idx.end());
  size_t ni_size = neg_idx.size();
  size_t xsize = x.size();
  int * xptr = INTEGER(x);
  int * niptr = INTEGER(neg_idx);
  size_t xtposition = 0;
  IntegerVector xt(xsize - ni_size); // allocate new vector of the correct size
  int * xtptr = INTEGER(xt);
  int range_begin, range_end;
  for(size_t i=0; i < ni_size; ++i) {
    if(i == 0) {
      range_begin = 0;
    } else {
      range_begin = neg_idx[i-1];
    }
    range_end = neg_idx[i] - 1;
    // std::cout << range_begin << " " << range_end << std::endl;
    std::copy(xptr+range_begin, xptr+range_end, xtptr+xtposition);
    xtposition += range_end - range_begin;
  }
  std::copy(xptr+range_end+1, xptr + xsize, xtptr+xtposition);
  return xt;
}

用法:

library(Rcpp)
sourceCpp("~/Desktop/temp.cpp")

x <- rep_len(1:10, 1e7)
idrops <- c(5, 4, 9)
outputR <- x[-idrops]
outputRcpp <- efficientNegativeIndexing(x, idrops)
identical(outputRcpp, outputR)

library(microbenchmark)
microbenchmark(efficientNegativeIndexing(x, idrops), x[-idrops], times=10)