在rcpp

时间:2016-06-12 00:11:59

标签: r rcpp

我从data.frame(即list listdata.frame中提取了vector,并希望在NumericVector中读取CharacterVector Rcpp进一步操纵。由于所有元素都是数字,我首先尝试将其读作data.frame。但是,指数发生了变化。然后,我尝试将其读作 0 1 18 19 31 Freq Prob 1 1 3 10 10 1 6 0.12 2 1 5 1 1 1 1 0.02 3 10 3 10 8 10 2 0.04 4 10 7 10 9 10 1 0.02 5 10 9 10 10 10 2 0.04 6 2 3 2 6 2 1 0.02 7 3 3 2 2 3 1 0.02 ,保留原始订单。

原始 > sapply(Model[[1]], mode) 0 1 18 19 31 Freq Prob "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" > sapply(Model[[1]], class) 0 1 18 19 31 Freq Prob "factor" "factor" "factor" "factor" "factor" "integer" "numeric" 如下所示:

data.frame

鉴于:

  

结构(列表(`0` =结构(c(1L,1L,2L,2L,2L,3L,4L),。Label = c(“1”,   “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”),       `1` =结构(c(4L,6L,4L,8L,10L,4L,4L),. Label = c(“1”,       “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”),       `18` =结构(c(2L,1L,2L,2L,2L,3L,3L),. Label = c(“1”,       “10”,“2”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”),       `19` =结构(c(2L,1L,9L,10L,2L,7L,3L),. Label = c(“1”,       “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”),       `31` =结构(c(1L,1L,2L,2L,2L,3L,4L),. Label = c(“1”,       “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”),       Freq = c(6L,1L,2L,1L,2L,1L,1L),Prob = c(0.12,0.02,       0.04,0.02,0.04,0.02,0.02)),. Name = c(“0”,“1”,“18”,   “19”,“31”,“Freq”,“Prob”),row.names = c(NA,7L),class =“data.frame”)

每列的模式和类别如下:

Rcpp

注意:第一行是CharacterVector中列出的列名,第二行是apply函数的结果。

将其NumericVector // [[Rcpp::export]] //x is the dataframe, idx is column to read int dataframe1(DataFrame& x, int idx) { Rcpp::CharacterVector columnChar = x[idx]; Rcpp::NumericVector columnNum = x[idx]; Rcpp::Rcout << columnChar << std::endl; Rcpp::Rcout << columnNum << std::endl; return (0); } 读入的 dataframe1(Model[[1]],0) "1" "1" "10" "10" "10" "2" "3" "3" "3" "4" "4" "5" "5" "5" "6" "6" "6" "6" "6" "7" "7" "7" "8" "8" "9" 1 1 2 2 2 3 4 4 4 5 5 6 6 6 7 7 7 7 7 8 8 8 9 9 10 如下:

NumericVector

输出如下:比如当R中的索引为1时,即Rcpp中的0,

NumericVector

如您所见,两个向量的顺序不同,AngularJS的顺序已被排序。但这只发生在因子列中,整数和数字列没有问题。

所以问题是如何在Rcpp中将因子读入Angular.bootstrap时保留顺序?

THX

1 个答案:

答案 0 :(得分:1)

Rcpp内部代表性有限factor。因此,您必须提前传入与每个因子关联的整数值。

这是区别的原因:

Rcpp::Rcout << columnChar << std::endl; // reading from factor label
Rcpp::Rcout << columnNum << std::endl; // reading from id associated with factor label

修改

要了解发生的事情,请考虑:

set.seed(133)
x = sample(1:10, 10, replace = F)
x

给出:

 [1]  6  8 10  3  2  4  7  9  5  1

这纯粹是数字。

现在,考虑一个因素:

xf = factor(x, labels = 11:20)

xf

,并提供:

[1] 16 18 20 13 12 14 17 19 15 11
Levels: 11 12 13 14 15 16 17 18 19 20

注意:x的值不再存在。而是通过映射到11到20之间的字符值来屏蔽它。这就是你在数字输出中看到重复的1和2但在字符输出中看到1和10的原因。

接下来,如果我们转换为数字,我们有:

as.numeric(xf)

,并提供:

[1]  6  8 10  3  2  4  7  9  5  1

或“分解”之前的原始值

获得实际水平:

as.numeric(as.character(xf))

返回:

[1] 16 18 20 13 12 14 17 19 15 11

编辑2:

要看到这一点,让我们修改原始功能:

#include <Rcpp.h>

// [[Rcpp::export]]
void dataframe_factors(Rcpp::DataFrame& x) { 
  Rcpp::CharacterVector factor_name = x[0];
  Rcpp::NumericVector factor_id = x[0];
  Rcpp::NumericVector numeric_val = x[1];
  Rcpp::Rcout << "FN: " << factor_name << std::endl;
  Rcpp::Rcout << "FID: " << factor_id << std::endl;

  // Numeric
  Rcpp::Rcout << "ORG: " << numeric_val << std::endl;

}


/*** R
set.seed(133)
x = sample(1:10, 10, replace = F)

xf = factor(x, labels = 11:20)

d = data.frame(xf, x)

dataframe_factors(d)
*/

给出:

FN: "16" "18" "20" "13" "12" "14" "17" "19" "15" "11"
FID: 6 8 10 3 2 4 7 9 5 1
ORG: 6 8 10 3 2 4 7 9 5 1