为了说明我的问题,这是一个演示,展示了我想要做的事情。我的问题是如何在fun2
。
#generate demo data
set.seed(123)
n<-200
data<-data.frame(time=sample(0:5,n,T),
out=sample(c(rbinom(n,1,0.3), rep(NA,40)),n),
pred1=rnorm(n),
pred2=rbinom(n,10,0.7),
pred3=sample(1:2,n,T))
#function1 to impute the out with time<=i.
#You need to install 'mice' package to run that.
fun1<-function(data, time, out, i, m){
library(mice)
if (sum(is.na(data[data[, time]<=i, out]))>=1) {compl<-complete(mice(data[data[, time]<=i, ], m=m))}
else {compl=data[data[, time]<=i, ]}
return(compl)
}
test<-fun1(data,"time","out",1,5)
test #out was imputed for time<=1
fun2<-function(data, time, out, m){
C1<-fun1(data, time, out, 1, m)#impute the out for time<=1
R1<-rbind(C1, data[data[, time]==(1+1),])#rbind the imputed out with the unimputed
C2<-fun1(R1, time, out, 2, m)#impute the out for time<=2...
R2<-rbind(C2, data[data[, time]==(1+2),])
C3<-fun1(R2, time, out, 3, m)
R3<-rbind(C3, data[data[, time]==(1+3),])
C4<-fun1(R3, time, out, 4, m)
R4<-rbind(C4, data[data[, time]==(1+4),])
C5<-fun1(R4, time, out, 5, m)
return(C5)
}
fun2(data,"time","out",5)
我的问题是如何设置fun2的循环,以便它适用于任何数据集,如演示(“时间”值没有限制,这里是5)
答案 0 :(得分:1)
您需要为fun2
提供最大循环值,例如iloop
,然后
fun2<-function (data,time,out,iloop,m) {
r <- list()
for (j in 1:iloop) {
r[[j]]<- rbind(fun1(data,time,out,j,m),data[data[,time]==(j+1),])
}
return(r[[j]])
}
我可能在那里错过了一点,但总的想法应该是明确的。我已将R1 R2 R3
分配为list
的元素而不是所有{{1}}。
答案 1 :(得分:1)
这里的关键是<<-
赋值运算符,它允许您在函数内修改函数外部的值。通常我会避免使用它,但在这种情况下,你需要重新使用以前的计算,我真的没有看到一个选项。
fun2<-function(index, data, time, out, m){
R <- rbind(C, data[data[, time]==(index),])#rbind the imputed out with the unimputed
C <<- fun1(R, time, out, index, m)#impute the out for time<=2...
}
C <- fun1(data, "time", "out", 1, 5) # Need to initialize C
for(i in 2:5) fun2(i, data, "time", "out", 5)
C # This now contains your result
请注意,我无法100%验证这是否正在执行您希望它执行的操作,因为似乎mice
具有非确定性行为(即,如果我运行您的代码两次,即使排除数据生成部分,我得到不同的答案)。