作为创建多元逻辑回归的第一步,我正在做单变量回归,并希望选择p <0.20的变量包括在多元模型中。我可以将所需的变量映射到glm
并获取模型的输出,但是很难按p值的等级对它们进行排序。
这是我到目前为止所拥有的:
predictor1 <- c(0,1.1,2.4,3.1,4.0,5.9,4.2,3.3,2.2,1.1)
predictor2 <- as.factor(c("yes","no","no","yes","yes","no","no","yes","no","no"))
predictor3 <- as.factor(c("a", "b", "c", "c", "a", "c", "a", "a", "a", "c"))
outcome <- as.factor(c("alive","dead","alive","dead","alive","dead","alive","dead","alive","dead"))
df <- data.frame(pred1 = predictor1, pred2 = predictor2, pred3 = predictor3, outcome = outcome)
predictors <- c("pred1", "pred2", "pred3")
df %>%
select(predictors) %>%
map(~ glm(df$outcome ~ .x, data = df, family = "binomial")) %>%
#Extract odds ratio, confidence interval lower and upper bounds, and p value
map(function (x, y) data.frame(OR = exp(coef(x)),
lower=exp(confint(x)[,1]),
upper=exp(confint(x)[,2]),
Pval = coef(summary(x))[,4]))
此代码吐出每个模型的摘要
$pred1
OR lower upper Pval
(Intercept) 0.711082 0.04841674 8.521697 0.7818212
.x 1.133085 0.52179227 2.653040 0.7465663
$pred2
OR lower upper Pval
(Intercept) 1 0.18507173 5.40331 1
.xyes 1 0.07220425 13.84960 1
$pred3
OR lower upper Pval
(Intercept) 0.25 0.0127798 1.689944 0.2149978
.xb 170179249.43 0.0000000 NA 0.9961777
.xc 12.00 0.6908931 542.678010 0.1220957
但是在我的真实数据集中,有许多预测变量,因此我需要一种对输出进行排序的方法。在每个模型中最好使用最小(非截距)p值。也许我为每个模型的摘要选择的数据结构不是最好的,所以关于如何在更灵活的数据结构中获取相同信息的任何建议也将是很好的。
答案 0 :(得分:1)
使用map_dfr
代替map
,用intercept过滤行,然后执行arrange
。使用tidy
中的broom
代替自定义函数。
library(broom)
df %>%
select(predictors) %>%
map(~ glm(df$outcome ~ .x, data = df, family = "binomial")) %>%
map_dfr(tidy, .id='Model') %>%
filter(term!="(Intercept)") %>% arrange(p.value)
# A tibble: 4 x 6
Model term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 pred3 .xc 2.48e+ 0 1.61 1.55e+ 0 0.122
2 pred1 .x 1.25e- 1 0.387 3.23e- 1 0.747
3 pred3 .xb 1.90e+ 1 3956. 4.79e- 3 0.996
4 pred2 .xyes -5.73e-16 1.29 -4.44e-16 1.000
答案 1 :(得分:0)
您可以只使用do.call(rbind)
方法,然后按p值排序。 [-1, ]
省略了截距。
pl <- do.call(rbind, sapply(predictors, function(x) {
fo <- reformulate(x, response="outcome")
summary(glm(fo, data=df, family="binomial"))$coef[-1, ]
}))
pl[order(pl[, 4]), ]
# Estimate Std. Error z value Pr(>|z|)
# pred3c 2.484907e+00 1.6072751 1.546037e+00 0.1220957
# pred1 1.249440e-01 0.3866195 3.231703e-01 0.7465663
# pred3b 1.895236e+01 3956.1804861 4.790571e-03 0.9961777
# pred2 -5.733167e-16 1.2909944 -4.440892e-16 1.0000000
数据
df <- structure(list(pred1 = c(0, 1.1, 2.4, 3.1, 4, 5.9, 4.2, 3.3,
2.2, 1.1), pred2 = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("no", "yes"), class = "factor"), pred3 = structure(c(1L,
2L, 3L, 3L, 1L, 3L, 1L, 1L, 1L, 3L), .Label = c("a", "b", "c"
), class = "factor"), outcome = structure(c(1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L), .Label = c("alive", "dead"), class = "factor")), class = "data.frame", row.names = c(NA,
-10L))
predictors <- c("pred1", "pred2", "pred3")