我想提取glm的系数,不仅是可计算的p值,还有不可计算的p值,表示为NA。如何在矩阵或data.frame形式中提取包括NA行的系数?
我需要提取以下内容,
Estimate Std. Error z value Pr(>|z|)
x1 0.10909 0.05552 1.965 0.0494
x2 NA NA NA NA
x3 NA NA NA NA
x4 0.05472 0.12871 0.425 0.6707
x5 -0.07880 0.17616 -0.447 0.6547
我下面不需要这个。
coef(outSummary)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.38909359 26.07327652 -0.3217506 0.74764161
x1 0.10908801 0.05551894 1.9648793 0.04942821
x4 0.05471872 0.12871334 0.4251208 0.67074860
x5 -0.07879775 0.17616064 -0.4473062 0.65465396
这是一个示例代码。
maxRow = 12
maxX = 5
dfA = data.frame(matrix(data = 0, nrow = maxRow, ncol = (maxX+1)) )
colnames(dfA) = c("y", paste0("x", 1:maxX) )
dfA$y = c( rep(0, maxRow*0.5), rep(1, maxRow*0.5))
xWithData = paste0("x", c(1, 4:maxX) )
ctSeed = 384
set.seed(ctSeed)
dfA[, xWithData] = apply(dfA[ , xWithData ], MARGIN = 2, FUN = function(x) ( 1 * seq_len(maxRow) + round(rnorm(n = maxRow, mean = 100, sd = 10) ) ) )
dfA
outGlm = glm( y ~ ., family = binomial(link='logit'), data=dfA )
(outSummary = summary(outGlm) )
(outCoef = outSummary$coefficients )
答案 0 :(得分:1)
coef(outSummary)
似乎总是会丢弃NA
的预测变量。
因此,获取所有预测器估算值的完整表格的一种方法是使用attr(outSummary$terms, "term.labels")
将coef(outSummary)
中的条目与来自dplyr::full_join
的条目进行匹配和合并。这是tidyverse
方法:
library(tidyverse);
data.frame(coef(outSummary)) %>%
rownames_to_column("variable") %>%
full_join(data.frame(variable = attr(outSummary$terms, "term.labels"))) %>%
arrange(variable);
# variable Estimate Std..Error z.value Pr...z..
#1 (Intercept) -8.38909359 26.07327652 -0.3217506 0.74764161
#2 x1 0.10908801 0.05551894 1.9648793 0.04942821
#3 x2 NA NA NA NA
#4 x3 NA NA NA NA
#5 x4 0.05471872 0.12871334 0.4251208 0.67074860
#6 x5 -0.07879775 0.17616064 -0.4473062 0.65465396