如何将多次推算数据与鼠标相结合?

时间:2015-05-03 17:19:07

标签: r missing-data r-mice

我将数据集拆分为男性和女性,然后使用mice包分别对其进行估算。

#Generate predictormatrix
pred_gender_0<-quickpred(data_gender_0, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
pred_gender_1<-quickpred(data_gender_1, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)

#impute the data with mice 
imp_pred_gen0 <- mice(data_gender_0,
                 pred=pred_gender_0,
                 m=10,
                 maxit=5,            
                 diagnostics=TRUE,
                 MaxNWts=3000) #i had to set this to 3000 because of an problematic unordered categorical variable 

imp_pred_gen1 <- mice(data_gender_1,
                 pred=pred_gender_1,
                 m=10,
                 maxit=5,            
                 diagnostics=TRUE,
                 MaxNWts=3000)

现在,我有两个带有10个插补数据集的对象。一个针对男性,一个针对女性。 我的问题是,如何将它们结合起来? 通常情况下,我会使用:

  

comp_imp&LT; -complete(IMP,&#34;长&#34)

我应该:

  1. 使用rbind.mids()结合男女数据,然后将其转换为长格式?
  2. 首先转换为长格式,然后使用rbind.mids()rbind()
  3. 感谢任何提示! =)

    ----------------------------------------------- ----------------------------

    更新 - 可重复的示例

    library("dplyr")
    library("mice")
    
    # We use nhanes-dataset from the mice-package as example
    
    # first: combine age-category 2 and 3 to get two groups (as example)
    nhanes$age[nhanes$age == 3] <- "2"
    nhanes$age<-as.numeric(nhanes$age)
    nhanes$hyp<-as.factor(nhanes$hyp)
    
    #split data into two groups
    nhanes_age_1<-nhanes %>% filter(age==1)
    nhanes_age_2<-nhanes %>% filter(age==2)
    
    #generate predictormatrix
    pred1<-quickpred(nhanes_age_1, mincor=0.1, inc=c('age','bmi'), exc='chl')
    pred2<-quickpred(nhanes_age_2, mincor=0.1, inc=c('age','bmi'), exc='chl')
    
    # seperately impute data
    set.seed(121012)
    imp_gen1 <- mice(nhanes_age_1,
                     pred=pred1,
                     m=10,
                     maxit=5,            
                     diagnostics=TRUE,
                     MaxNWts=3000)
    
    imp_gen2 <- mice(nhanes_age_2,
                     pred=pred2,
                     m=10,
                     maxit=5,            
                     diagnostics=TRUE,
                     MaxNWts=3000)
    
    
    #------ ALTERNATIVE 1:
    
    #combine imputed data
    combined_imp<-rbind.mids(imp_gen1,imp_gen2)
    complete_imp<-complete(combined_imp,"long")
    
    #output
       > combined_imp<-rbind.mids(imp_gen1,imp_gen2)
    Warning messages:
    1: In rbind.mids(imp_gen1, imp_gen2) :
      Predictormatrix is not equal in x and y; y$predictorMatrix is ignored
    .
    2: In x$visitSequence == y$visitSequence :
      longer object length is not a multiple of shorter object length
    3: In rbind.mids(imp_gen1, imp_gen2) :
      Visitsequence is not equal in x and y; y$visitSequence is ignored
    .
    
    > complete_imp<-complete(combined_imp,"long")
    Error in inherits(x, "mids") : object 'combined_imp' not found
    
    
    #------ ALTERNATIVE 2:
    
    complete_imp1<-complete(imp_gen1,"long")
    complete_imp2<-complete(imp_gen2,"long")
    combined_imp<-rbind.mids(complete_imp1,complete_imp2)
    
    #Output
    > complete_imp1<-complete(imp_gen1,"long")
    > complete_imp2<-complete(imp_gen2,"long")
    > combined_imp<-rbind.mids(complete_imp1,complete_imp2)
    Error in if (ncol(y) != ncol(x$data)) stop("The two datasets do not have the same number of columns\n") : 
      argument is of length zero
    

3 个答案:

答案 0 :(得分:0)

老实说,我不了解包mice,只是对归责概念的一个微弱的想法。

我不知道您想要执行什么样的分析,但是您说通常会这样做:comp_imp<-complete(imp,"long"),所以我会尝试相应地进行回答。

对我来说,第一种方法返回data.frame,但没有任何遗漏。这很奇怪,因为在complete(imp_gen1,"long")hyp中缺少数据。我不知道rbind.mids在那里做了什么。

因此我会采用你的第二种方法。

complete(., "long")的结果是一个普通的data.frame,因此无需将其与rbind.mids绑定。

我会改变你的第二种方法:

library(dplyr)
complete_imp1 <- complete(imp_gen1, "long")
complete_imp2 <- complete(imp_gen2, "long")
combined_imp <- bind_rows(complete_imp1, complete_imp2)

答案 1 :(得分:0)

complete_imp1 <- complete(imp_gen1, "long")已经返回10(m参数)推算数据框,只计算complete_imp1的总行数并乘以m

答案 2 :(得分:0)

您可以使用以下代码创建一个新的mids对象,其中包含10个估算的男女数据集。

class AudioCreateView(LoginRequiredMixin, CreateView):
    login_url = 'main:login'
    model = Audio
    #Specify you form there, the actual AudioForm define the fields to use
    form_class = AudioForm 
    template_name = 'main/events/create_audio.html'

执行此操作将调用rbind.mids,而不是R中的常规绑定函数。返回的新对象可以按通常的方式进行分析,例如使用with.mids将所需的模型拟合到每个估算数据集。