我从data.frame
(即list
list
个data.frame
中提取了vector
,并希望在NumericVector
中读取CharacterVector
Rcpp进一步操纵。由于所有元素都是数字,我首先尝试将其读作data.frame
。但是,指数发生了变化。然后,我尝试将其读作 0 1 18 19 31 Freq Prob
1 1 3 10 10 1 6 0.12
2 1 5 1 1 1 1 0.02
3 10 3 10 8 10 2 0.04
4 10 7 10 9 10 1 0.02
5 10 9 10 10 10 2 0.04
6 2 3 2 6 2 1 0.02
7 3 3 2 2 3 1 0.02
,保留原始订单。
原始 > sapply(Model[[1]], mode)
0 1 18 19 31 Freq Prob
"numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
> sapply(Model[[1]], class)
0 1 18 19 31 Freq Prob
"factor" "factor" "factor" "factor" "factor" "integer" "numeric"
如下所示:
data.frame
鉴于:
结构(列表(`0` =结构(c(1L,1L,2L,2L,2L,3L,4L),。Label = c(“1”, “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”), `1` =结构(c(4L,6L,4L,8L,10L,4L,4L),. Label = c(“1”, “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”), `18` =结构(c(2L,1L,2L,2L,2L,3L,3L),. Label = c(“1”, “10”,“2”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”), `19` =结构(c(2L,1L,9L,10L,2L,7L,3L),. Label = c(“1”, “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”), `31` =结构(c(1L,1L,2L,2L,2L,3L,4L),. Label = c(“1”, “10”,“2”,“3”,“4”,“5”,“6”,“7”,“8”,“9”),class =“factor”), Freq = c(6L,1L,2L,1L,2L,1L,1L),Prob = c(0.12,0.02, 0.04,0.02,0.04,0.02,0.02)),. Name = c(“0”,“1”,“18”, “19”,“31”,“Freq”,“Prob”),row.names = c(NA,7L),class =“data.frame”)
每列的模式和类别如下:
Rcpp
注意:第一行是CharacterVector
中列出的列名,第二行是apply函数的结果。
将其NumericVector
和 // [[Rcpp::export]]
//x is the dataframe, idx is column to read
int dataframe1(DataFrame& x, int idx) {
Rcpp::CharacterVector columnChar = x[idx];
Rcpp::NumericVector columnNum = x[idx];
Rcpp::Rcout << columnChar << std::endl;
Rcpp::Rcout << columnNum << std::endl;
return (0);
}
读入的 dataframe1(Model[[1]],0)
"1" "1" "10" "10" "10" "2" "3" "3" "3" "4" "4" "5" "5" "5" "6" "6" "6" "6" "6" "7" "7" "7" "8" "8" "9"
1 1 2 2 2 3 4 4 4 5 5 6 6 6 7 7 7 7 7 8 8 8 9 9 10
如下:
NumericVector
输出如下:比如当R中的索引为1时,即Rcpp中的0,
NumericVector
如您所见,两个向量的顺序不同,AngularJS
的顺序已被排序。但这只发生在因子列中,整数和数字列没有问题。
所以问题是如何在Rcpp中将因子读入Angular.bootstrap
时保留顺序?
THX
答案 0 :(得分:1)
Rcpp
内部代表性有限factor
。因此,您必须提前传入与每个因子关联的整数值。
这是区别的原因:
Rcpp::Rcout << columnChar << std::endl; // reading from factor label
Rcpp::Rcout << columnNum << std::endl; // reading from id associated with factor label
要了解发生的事情,请考虑:
set.seed(133)
x = sample(1:10, 10, replace = F)
x
给出:
[1] 6 8 10 3 2 4 7 9 5 1
这纯粹是数字。
现在,考虑一个因素:
xf = factor(x, labels = 11:20)
xf
,并提供:
[1] 16 18 20 13 12 14 17 19 15 11
Levels: 11 12 13 14 15 16 17 18 19 20
注意:x
的值不再存在。而是通过映射到11到20之间的字符值来屏蔽它。这就是你在数字输出中看到重复的1和2但在字符输出中看到1和10的原因。
接下来,如果我们转换为数字,我们有:
as.numeric(xf)
,并提供:
[1] 6 8 10 3 2 4 7 9 5 1
或“分解”之前的原始值
获得实际水平:
as.numeric(as.character(xf))
返回:
[1] 16 18 20 13 12 14 17 19 15 11
要看到这一点,让我们修改原始功能:
#include <Rcpp.h>
// [[Rcpp::export]]
void dataframe_factors(Rcpp::DataFrame& x) {
Rcpp::CharacterVector factor_name = x[0];
Rcpp::NumericVector factor_id = x[0];
Rcpp::NumericVector numeric_val = x[1];
Rcpp::Rcout << "FN: " << factor_name << std::endl;
Rcpp::Rcout << "FID: " << factor_id << std::endl;
// Numeric
Rcpp::Rcout << "ORG: " << numeric_val << std::endl;
}
/*** R
set.seed(133)
x = sample(1:10, 10, replace = F)
xf = factor(x, labels = 11:20)
d = data.frame(xf, x)
dataframe_factors(d)
*/
给出:
FN: "16" "18" "20" "13" "12" "14" "17" "19" "15" "11"
FID: 6 8 10 3 2 4 7 9 5 1
ORG: 6 8 10 3 2 4 7 9 5 1