在R中,我有3个数据帧,类似于我在下面提供的示例版本。第一个Data
是主要数据集,TW
和UW
数据框具有与Data
(MN-mapping_for_N
)类似的变量,然后是1000个不同的值每个变量(N48
)等我在这里为我的目的提供了3个。
Data<-matrix(c(4720,44.29,"Work or Private Clinic",N48,2659,55.05,"Hospital",N1,1612,59.99,"No Care",N48),ncol = 4,byrow=TRUE)
colnames(Data)<-c("studyid", "Pred_ex", "wherecare", "MN-mapping_for_N")
Data<-data.frame(Data)
TW<-matrix(c("N48",0.07,0.08,0.09,"N1",0.10,0.11,0.12,"N2",0.02,0.03,0.04,"N3",0.04,0.05,0.06),ncol = 4, byrow = TRUE)
colnames(TW)<-c("MN-mapping_for_N","draw1","draw2","draw3")`
TW<-data.frame(TW)
UW<-matrix(c("N48",0.71,0.81,0.91,"N1",0.11,0.111,0.131,"N2",0.021,0.031,0.041,"N3",0.041,0.051,0.061),ncol = 4, byrow = TRUE)
colnames(UW)<-c("MN-mapping_for_N","draw1","draw2","draw3")`
UW<-data.frame(UW)
我的目标是使用UT
和TW
数据中随机选择的列创建一个新列,正确绘制的值取决于Data$wherecare
中的值
我一直在使用dplyr和匹配功能的组合以及我自己创建的几个功能。目前看起来像
drawselect<-function(x) {
samplepick<-sample(2:1001,1)
select(x,1,num_range("draw",samplepick))
}
DALY_FX_LT_NR<-function(x){
draw_T_DW<-drawselect(TW)
draw_UT_DW<-drawselect(UW)
drawnames.TW<-colnames((draw_T_DW))
drawnames.UT<-colnames(draw_UT_DW)
UT.draw<-drawnames.UT[2]
T.draw<-drawnames.T[2]
print(UT.draw)
print(T.draw)
newdf<-x %>% mutate(DW=NA)
for(i in 1:nrow(newdf)){
if(newdf$wherecare[i]!= "No Care"){
newdf$DW=draw_T_DW[,2][match(newdf$`MN-mapping_for_N`,draw_T_DW$`MN-mapping_for_N`)]
next
}else if(newdf$wherecare[i]=="No Care"){
newdf$DW=draw_UT_DW[,2][match(newdf$`MN-mapping_for_N`,draw_UT_DW$`MN-mapping_for_N`[i])]
}
}
newdf
}
代码运行,但我似乎无法让它实际逐行迭代,以便从正确的数据框中拉出绘制值(即UT
或TW
。 drawselect
功能)。
所以我看起来像:
-------------------------------------------------------------
studyid Pred_ex wherecare MN-mapping_for_N DW
--------- --------- ---------------------- ------------------ ------
4720 44.29 Work or Private Clinic N48 0.08
2659 55.05 Hospital N1 0.11
1612 59.99 No Care N48 0.08
--------------------------------------------------------------------
当我应该得到:
studyid Pred_ex wherecare MN-mapping_for_N DW
--------- --------- ---------------------- ------------------ ------
4720 44.29 Work or Private Clinic N48 0.08
2659 55.05 Hospital N1 0.11
1612 59.99 No Care N48 0.81
--------------------------------------------------------------------
关键区别是右下角的0.81,样本数据不是很大,但实际数据是几百行长,所以我想让函数“正确决定”拉出哪个数据集从。此值可能为0.71,0.81或0.91,UT
的任何N48
值均可用。
最终目标是在计算中使用该值乘以Pred_ex
列,我可以这样做,然后多次重新运行此函数来引导数据,但直到我能得到这些{{ 1}}语句正常工作我不能这样做。我也尝试使用if
来匹配这些并且在条件语句不起作用时遇到了类似的问题。我认为dplyr::left_join
函数的编写效果会更好,但我肯定会对任何事情持开放态度。
非常感谢任何帮助。
另外,感谢大家一般堆栈溢出,阅读其他问题的答案是我得到这个目标的主要原因。
答案 0 :(得分:0)
因此,您不需要新功能(我保留drawselect
,您可以执行以下操作:
for (i in 1:nrow(Data)){
if (Data$wherecare[i] != "No Care"){
Data$DW[i]<- drawselect(TW)[which(drawselect(TW)$MN.mapping_for_N == as.character(Data$MN.mapping_for_N[i])), 2]
} else {
Data$DW[i]<- drawselect(UW)[which(drawselect(UW)$MN.mapping_for_N == as.character(Data$MN.mapping_for_N[i])), 2]
}
}
> Data
studyid Pred_ex wherecare MN.mapping_for_N DW
1 4720 44.29 Work or Private Clinic N48 0.08
2 2659 55.05 Hospital N1 0.11
3 1612 59.99 No Care N48 0.81
如果您想将所有内容都包装在一个函数中(包括drawselect
),请尝试以下几行:
DALY_FX_LT_NR<-function(x, y, z){ #x would be Data, y would be TW, z would be UW
samplepick<-sample(2:(ncol(y)-1),1)
for (i in 1:nrow(x)){
if (x$wherecare[i] != "No Care"){
x$DW[i]<- y[which(y$MN.mapping_for_N==as.character(x$MN.mapping_for_N[i])), paste0("draw", samplepick)]
} else {
x$DW[i]<- z[which(z$MN.mapping_for_N==as.character(x$MN.mapping_for_N[i])), paste0("draw", samplepick)]
}
}
return(x)
}
> DALY_FX_LT_NR(x = Data, y = TW, z = UW)
studyid Pred_ex wherecare MN.mapping_for_N DW
1 4720 44.29 Work or Private Clinic N48 0.09
2 2659 55.05 Hospital N1 0.12
3 1612 59.99 No Care N48 0.91