Question

我是C ++和Rcpp的新手。假设，我有一个向量

t1<-c(1,2,NA,NA,3,4,1,NA,5)

我希望获得t1元素的索引NA。我可以写：

NumericVector retIdxNA(NumericVector x) {

    // Step 1: get the positions of NA in the vector
    LogicalVector y=is_na(x);

    // Step 2: count the number of NA
    int Cnt=0;
    for (int i=0;i<x.size();i++) {
       if (y[i]) {
         Cnt++;
       }
    }

    // Step 3: create an output matrix whose size is same as that of NA
    // and return the answer
    NumericVector retIdx(Cnt);
    int Cnt1=0;
    for (int i=0;i<x.size();i++) {
       if (y[i]) {
          retIdx[Cnt1]=i+1;
          Cnt1++;
       }
    }
    return retIdx;
}

然后我得到

retIdxNA(t1)
[1] 3 4 8

我在想：

（i）Rcpp中是否有which的等价物？

（ii）有没有办法让上述功能更短/更清脆？特别是，有没有简单的方法来组合上面的步骤1,2,3？

Answer 1

RcppArmadillo的最新版本具有识别有限值和非有限值的索引的函数。

所以这段代码

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::uvec whichNA(arma::vec x) {
  return arma::find_nonfinite(x);
}

/*** R
t1 <- c(1,2,NA,NA,3,4,1,NA,5)
whichNA(t1)
*/

产生你想要的答案（在C / C ++中逐个模块，因为它们从零开始）：

R> sourceCpp("/tmp/uday.cpp")

R> t1 <- c(1,2,NA,NA,3,4,1,NA,5)

R> whichNA(t1)
     [,1]
[1,]    2
[2,]    3
[3,]    7
R>

如果您首先将序列创建为子集，则Rcpp也可以这样做：

// [[Rcpp::export]]
Rcpp::IntegerVector which2(Rcpp::NumericVector x) {
  Rcpp::IntegerVector v = Rcpp::seq(0, x.size()-1);
  return v[Rcpp::is_na(x)];
}

添加到上面的代码中产生：

R> which2(t1)
[1] 2 3 7
R>

逻辑子集在Rcpp中也有些新功能。

Answer 2

试试这个：

#include <Rcpp.h> 
using namespace Rcpp; 

// [[Rcpp::export]]
IntegerVector which4( NumericVector x) {

    int nx = x.size();
    std::vector<int> y;
    y.reserve(nx);

    for(int i = 0; i < nx; i++) {
        if (R_IsNA(x[i])) y.push_back(i+1);
    }

    return wrap(y);
}

我们可以在R：

中这样运行

> which4(t1)
[1] 3 4 8

<强>性能

请注意，我们已将上述解决方案更改为为输出向量保留空间。这取代which3，即：

// [[Rcpp::export]]
IntegerVector which3( NumericVector x) {
    int nx = x.size();
    IntegerVector y;
    for(int i = 0; i < nx; i++) {
        // if (internal::Rcpp_IsNA(x[i])) y.push_back(i+1);
        if (R_IsNA(x[i])) y.push_back(i+1);
    }
    return y;
}

然后，向量9个元素长度的表现如下，which4最快：

> library(rbenchmark)
> benchmark(retIdxNA(t1), whichNA(t1), which2(t1), which3(t1), which4(t1), 
+    replications = 10000, order = "relative")[1:4]
          test replications elapsed relative
5   which4(t1)        10000    0.14    1.000
4   which3(t1)        10000    0.16    1.143
1 retIdxNA(t1)        10000    0.17    1.214
2  whichNA(t1)        10000    0.17    1.214
3   which2(t1)        10000    0.25    1.786

对于长度为9000个元素的向量重复此操作，Armadillo解决方案比其他解决方案快得多。这里which3（与which4相同，除了它不为输出向量保留空间）最差，而which4排在第二位。

> tt <- rep(t1, 1000)
> benchmark(retIdxNA(tt), whichNA(tt), which2(tt), which3(tt), which4(tt), 
+   replications = 1000, order = "relative")[1:4]
          test replications elapsed relative
2  whichNA(tt)         1000    0.09    1.000
5   which4(tt)         1000    0.79    8.778
3   which2(tt)         1000    1.03   11.444
1 retIdxNA(tt)         1000    1.19   13.222
4   which3(tt)         1000   23.58  262.000

Answer 3

以上所有解决方案都是连续的。尽管不是微不足道的，但很有可能利用线程来实现which。有关详细信息，请参阅this write up。虽然对于如此小的尺寸，它不会弊大于利。就像乘飞机一小段距离一样，你在机场安检时会失去太多时间。

R通过为与输入一样大的逻辑向量分配内存来实现which，执行单次传递以将索引存储在此内存中，然后将其最终复制到适当的逻辑向量中。

直观地说，这似乎比双通道循环效率低，但不一定，因为复制数据范围很便宜。查看更多详情here。

Answer 4

只需为自己编写一个函数：

which_1<-function(a,b){
return(which(a>b))
}

然后将此函数传递给rcpp。

相当于＆＃39;其中＆＃39;在Rcpp中的功能

4 个答案: