broom :: augment会忽略数据列

时间:2019-08-26 23:34:32

标签: r broom

broom :: augment仅从公式中使用的数据中输出列。这是有问题的行为,因为有时能够找到类似响应者ID的内容可能会非常有帮助。使用newdata参数可能是一种解决方法,但是在处理嵌套数据时仍然无法解决问题。

在线附加说明:

#simulated glm data
glmdata = data.frame(ID=1:100, A=rnorm(100), B=rnorm(100)) %>% mutate(response=rbinom(length(ID),1,1/(1+exp(-2*A-3*B))  ))

#fit model, not including the ID variable
glmfit = glm(response~A+B, glmdata,family='binomial')

#ID variable is contained in glm$data
str(glmfit$data)

#works!
head(glmfit$data$ID)


#use broom::augment
augmented = glmfit %>% augment

#does not work, wth broom?!
augmented$ID


#ok ... I could use the newdata argument
augmented = glmfit %>% augment(newdata=glmdata)
augmented$ID


#however, that is a hacky workaround ....

#... and it does not fix the following scenario:

#Let's say I want to use nest


#simulated glm data
glmdata1 = data.frame(segm=1,ID=1:100, A=rnorm(100), B=rnorm(100)) %>% mutate(response=rbinom(length(ID),1,1/(1+exp(-2*A-3*B))  ))
glmdata2 = data.frame(segm=2,ID=1:100, A=rnorm(100), B=rnorm(100)) %>% mutate(response=rbinom(length(ID),1,1/(1+exp(-3*A-2*B))  ))

glmdata_nest = rbind(glmdata1,glmdata2) %>% group_by(segm) %>% nest


#fit the two models via map
glmfit_nest= glmdata_nest %>% mutate(model=map(data, glm, formula=response~A+B, family='binomial') )

#run augment via map
glmfit_nest_augmented = glmfit_nest %>% mutate(augmented = map(model,augment))

#ID is not here ...
glmfit_nest_augmented$augmented$ID


#ok, so then we have to use map2 ....
glmfit_nest_augmented = glmfit_nest %>% mutate(augmented = map2(model,data,augment,newdata=.y))

#but even this doesn't work

#also, trying to recycling glm$data does not work
glmfit_nest_augmented = glmfit_nest %>% mutate(augmented = map(model,augment,newdata=.$data))

更新: 扫帚开发商故意选择这种不一致的行为 https://github.com/tidymodels/broom/issues/753

1 个答案:

答案 0 :(得分:2)

在这里.x.y随同~的匿名函数调用

glmfit_nest_augmented <-  glmfit_nest %>% 
         mutate(augmented = map2(model,data,~ augment(.x, newdata=.y))