推算数据的计算值

时间:2014-08-23 01:08:54

标签: r r-mice

我想做类似以下的事情:( myData是数据表)

#create some data
myData = data.table(invisible.covariate=rnorm(50),
         visible.covariate=rnorm(50),
         category=factor(sample(1:3,50, replace=TRUE)),
         treatment=sample(0:1,50, replace=TRUE))
myData[,outcome:=invisible.covariate+visible.covariate+treatment*as.integer(category)]
myData[,invisible.covariate:=NULL]    

#process it
myData[treatment == 0,untreated.outcome:=outcome]
myData[treatment == 1,treated.outcome:=outcome]
myPredictors = matrix(0,ncol(myData),ncol(myData))
myPredictors[5,] = c(1,1,0,0,0,0)
myPredictors[6,] = c(1,1,0,0,0,0)
myImp = mice(myData,predictorMatrix=myPredictors)
fit1 = with(myImp, lm(treated.outcome ~ category)) #this works fine

for_each_imputed_dataset(myImp,  #THIS IS NOT A REAL FUNCTION but I hope you get the idea
     function(imputed_data_table) {
        imputed_data_table[,treatment.effect:=treated.outcome-untreated.outcome]
     })

fit2 = with(myImp, lm(treatment.effect ~ category)) 
#I want fit2 to be an object similar to fit1
...

我想为每个插补数据集添加一个计算值,然后使用该计算值进行统计。显然,上面的结构可能不是你怎么做的。我对任何解决方案都感到满意,无论是涉及在鼠标之前以某种方式准备数据表,还是在" fit ="之前的一步。如上所述,或者"内有一些复杂的功能。调用

1 个答案:

答案 0 :(得分:1)

complete()函数将为每个请求的迭代生成“完整”的插补数据集。但请注意mice期望使用data.frames,因此它返回data.frames而不是data.tables。 (当然,如果你愿意,你可以转换)。但这是适合所有模型的一种方法

imp = mice(myData,predictorMatrix=predictors)
fits<-lapply(seq.int(imp$m), function(i) {
   lm(I(treated.outcome-untreated.outcome)~category, complete(imp, i))
})
fits

结果将显示在列表中,您可以通过lmfits[[1]]等提取特定fits[[2]]个对象