在R中运行对数线性GLM时,我遇到了purrr :: map和broom :: tidy的问题。由于某些原因,模型p值在运行多个模型时不会打印,但会在单个模型中打印。最后,我希望多个模型能够像在单个模型中一样为每个模型打印p值。提供的示例使用内置的“泰坦尼克号”数据集(请参见William King的website)。
data(Titanic)
#convert to data frame
T.df <- as.data.frame(Titanic)
head(T.df)
#run glm as loglinear model
model1 <- glm(Freq ~ Sex * Survived, family = poisson, data = T.df)
#print model with tidy--p-values print here
broom::tidy(anova(model1, test = "Chisq"))
#Now run multiple models by class
#Note the models print just fine but without p values
T.df %>%
tidyr::nest(-Class) %>%
dplyr::mutate(
fit = purrr::map(data, ~ anova(glm(Freq ~ Sex * Survived, family = poisson, data = .x)), test="Chisq"),
tidied = purrr::map(fit, broom::tidy)
) %>%
tidyr::unnest(tidied)
我正在考虑,如何阻止broom :: tidy打印有关无法识别的列的警告消息?
先谢谢了。
答案 0 :(得分:1)
问题出在anova
的替换位置,test = "Chisq"
被包裹在anova
调用之外,即
anova(glm(Freq ~ Sex * Survived, family = poisson, data = .x)), test="Chisq")
^^^
使用正确的结束符
T.df %>%
nest(-Class) %>%
mutate(tidied = map(data, ~
glm(Freq ~ Sex * Survived, family = poisson, data = .x) %>%
anova(., test = "Chisq") %>%
broom::tidy(.))) %>%
unnest(tidied)
# A tibble: 16 x 7
# Class term df Deviance Resid..Df Resid..Dev p.value
# <fct> <chr> <int> <dbl> <int> <dbl> <dbl>
# 1 1st NULL NA NA 7 590. NA
# 2 1st Sex 1 3.78 6 586. 5.20e- 2
# 3 1st Survived 1 20.4 5 566. 6.28e- 6
# 4 1st Sex:Survived 1 162. 4 404. 4.78e- 37
# 5 2nd NULL NA NA 7 476. NA
# 6 2nd Sex 1 18.9 6 457. 1.37e- 5
# 7 2nd Survived 1 8.47 5 449. 3.62e- 3
# 8 2nd Sex:Survived 1 163. 4 286. 2.54e- 37
# 9 3rd NULL NA NA 7 876. NA
#10 3rd Sex 1 145. 6 732. 2.54e- 33
#11 3rd Survived 1 181. 5 550. 2.36e- 41
#12 3rd Sex:Survived 1 57.8 4 493. 2.92e- 14
#13 Crew NULL NA NA 7 2535. NA
#14 Crew Sex 1 1014. 6 1522. 2.02e-222
#15 Crew Survived 1 252. 5 1269. 7.85e- 57
#16 Crew Sex:Survived 1 42.4 4 1227. 7.63e- 11