Rcpp子集化DataFrame行

时间:2016-11-19 10:32:34

标签: r rcpp

我希望在Rcpp内置iris数据集的这一子集:

head(subset(iris, Species == "versicolor"))

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
51          7.0         3.2          4.7         1.4 versicolor
52          6.4         3.2          4.5         1.5 versicolor
53          6.9         3.1          4.9         1.5 versicolor
54          5.5         2.3          4.0         1.3 versicolor
55          6.5         2.8          4.6         1.5 versicolor
56          5.7         2.8          4.5         1.3 versicolor

我知道如何对Rcpp::DataFrame的列进行子集 - 有一个重载的运算符[,其作用如R:x["var"]。但是,我找不到任何方法可以让我使用不固定数量的列来对DataFrame的行进行子集化。

我想写一个函数subset_rows_rcpp_iris,它将Rcpp::DataFrame(永远是虹膜)和CharacterVector level_of_species作为输入。它将返回DataFrame对象。

DataFrame subset_rows_rcpp_iris(DataFrame x, CharacterVector level_of_species) {
    ...
}

首先,我想找到满足逻辑查询的行索引。我的问题是,如果我在Species函数中访问test向量,请将其保存为CharacterVector,然后将其与level_of_species进行比较我始终只有TRUE setosa cppFunction(' LogicalVector test(DataFrame x, CharacterVector level_of_species) { CharacterVector sub = x["Species"]; LogicalVector ind = sub == level_of_species; return(ind); } ') head(test(iris, "setosa")) [1] TRUE FALSE FALSE FALSE FALSE FALSE 时的值和其他情况下的FALSE值。

test

如果这样做,我可以重写Rcpp::DataFrame::create函数并使用带有TRUE / FALSE值的向量分别对数据框的每一列进行子集,然后再将它们与//in activity class public static int CURRENT_FRAGMENT = 1; //set this to your first fragment that is being viewed 组合。

1 个答案:

答案 0 :(得分:1)

cppFunction('LogicalVector test(DataFrame x, StringVector level_of_species) {
  using namespace std;  
  StringVector sub = x["Species"];
  std::string level = Rcpp::as<std::string>(level_of_species[0]);
  Rcpp::LogicalVector ind(sub.size());
  for (int i = 0; i < sub.size(); i++){
      ind[i] = (sub[i] == level);
  }

  return(ind);
}')

xx=test(iris, "setosa")
> table(xx)
 xx
 FALSE  TRUE 
   100    50 

完成子集!!! (我自己从这个问题中学到了很多东西......谢谢!)

cppFunction('Rcpp::DataFrame test(DataFrame x, StringVector level_of_species) {
  using namespace std;  
  StringVector sub = x["Species"];
  std::string level = Rcpp::as<std::string>(level_of_species[0]);
  Rcpp::LogicalVector ind(sub.size());
  for (int i = 0; i < sub.size(); i++){
    ind[i] = (sub[i] == level);
  }

 // extracting each column into a vector
 Rcpp::NumericVector   SepalLength = x["Sepal.Length"];
 Rcpp::NumericVector   SepalWidth = x["Sepal.Width"];
 Rcpp::NumericVector PetalLength = x["Petal.Length"];
 Rcpp::NumericVector   PetalWidth = x["Petal.Width"];


 return Rcpp::DataFrame::create(Rcpp::Named("Sepal.Length")  = SepalLength[ind],
                                Rcpp::Named("Sepal.Width")  = SepalWidth[ind],
                                Rcpp::Named("Petal.Length")  = PetalLength[ind],
                                Rcpp::Named("Petal.Width")  = PetalWidth[ind]
);}')

yy=test(iris, "setosa")
> str(yy)
 'data.frame':  50 obs. of  4 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...