从glm系数中提取参考水平

时间:2018-06-10 06:27:49

标签: r regression logistic-regression

我知道参考水平不包括在内,但我希望能够采用一个合适的glm对象并找出参考水平是什么(即不使用原始数据集的知识) 。这是存储在glm拟合对象的任何位置吗?

以下示例数据:

> btest <- data.frame(var1 = sample(c(1,2,3), 100, replace = T),
+                     var2 = sample(c('a','b','c'), 100, replace = T),
+                     var3 = sample(c('e','f','g'), 100, replace = T),
+                     var4 = rnorm(100, mean = 3, 2),
+                     var5 = sample(c('yes','no'), 100, replace = T))
> summary(glm(var5 ~ var1 + var2 + var3 + var4, data = btest, family = 'binomial'))

Call:
glm(formula = var5 ~ var1 + var2 + var3 + var4, family = "binomial", 
    data = btest)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6988  -1.0457  -0.6213   1.1224   1.8904  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -0.81827    0.73173  -1.118   0.2635  
var1         0.55923    0.27279   2.050   0.0404 *
var2b       -0.60998    0.53435  -1.142   0.2536  
var2c       -0.60250    0.51706  -1.165   0.2439  
var3f       -0.81899    0.53345  -1.535   0.1247  
var3g        0.21215    0.51907   0.409   0.6828  
var4         0.04429    0.12650   0.350   0.7263  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 137.99  on 99  degrees of freedom
Residual deviance: 128.35  on 93  degrees of freedom
AIC: 142.35

Number of Fisher Scoring iterations: 4

我想知道var1var4没有引用,但var2var3的参考级别为{{1}分别和'a'。因为我最终输出的是一个表,在这些参考级别上,这些变量的'e'NA

编辑:对于后来的人,我也想知道,当与下面的答案结合使用时,点击Estimate拟合对象的terms元素会有多大帮助......

glm

2 个答案:

答案 0 :(得分:2)

如果我将下面显示的拟合保存到变量class Connection(models.Model): start = models.ForeignKey( 'Point', on_delete=models.PROTECT, related_name='start_point', ) end = models.ForeignKey( 'Point', on_delete=models.PROTECT, related_name='end_point', ) length = models.DecimalField( max_digits=3, decimal_places=1, ) 中,则可以执行my_fit。对于所有分类变量,您将看到它们的所有级别。

然后,您可以将其与模型相关联。例如,var1不在xlevels中,因此它是连续的。 Var2有3个级别(a,b,c,),你有b和c的估计。这意味着a是参考。 Var3有类别e,f,g,你有f和g的估计,所以e必须是参考。

my_fit$xlevels

答案 1 :(得分:1)

这是一个提取xlevels并使用broom::tidy(带有一些其他操作)的函数,以便引用级别在包含所有其他术语的数据框中:

library(tidyverse)
library(broom)

tidy_coefs_with_ref <- function(mod_obj, sep = "_"){

  tidy_coefs <- tidy(mod_obj) %>% 
    separate(term, c("variable", "level"), sep, remove = FALSE) %>% 
    mutate(variable = paste0(variable, sep))

  xlevels <- mod_obj$xlevels  

  missing_levels <- xlevels %>% 
    enframe() %>% 
    unnest() %>% 
    set_names(c("variable", "level"))

  missing_levels %>% 
    anti_join(tidy_coefs) %>% 
    bind_rows(tidy_coefs) %>% 
    arrange(variable, level)

}

btest <- tibble(var1 = sample(c(1,2,3), 100, replace = T),
                var2 = sample(c('a','b','c'), 100, replace = T),
                var3 = sample(c('e','f','g'), 100, replace = T),
                var4 = rnorm(100, mean = 3, 2),
                var5 = sample(c(TRUE, FALSE), 100, replace = T)) %>% 
  rename_if(is.character, funs(paste0(., "_")))

btest2 <- glm(var5 ~ ., data = btest, family = 'binomial')

tidy_coefs_with_ref(btest2)
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3 rows [1,
#> 2, 7].
#> Joining, by = c("variable", "level")
#> # A tibble: 9 x 7
#>   variable     level term         estimate std.error statistic p.value
#>   <chr>        <chr> <chr>           <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)_ <NA>  (Intercept)   0.904       0.835    1.08     0.279
#> 2 var1_        <NA>  var1         -0.126       0.269   -0.468    0.640
#> 3 var2_        a     <NA>         NA          NA       NA       NA    
#> 4 var2_        b     var2_b       -0.719       0.515   -1.40     0.162
#> 5 var2_        c     var2_c       -0.632       0.525   -1.21     0.228
#> 6 var3_        e     <NA>         NA          NA       NA       NA    
#> 7 var3_        f     var3_f       -0.379       0.496   -0.764    0.445
#> 8 var3_        g     var3_g        0.429       0.517    0.829    0.407
#> 9 var4_        <NA>  var4         -0.00833     0.111   -0.0749   0.940

reprex package(v0.2.1)于2019-02-28创建

(可以清除带有seperate的步骤。)

也可能相关,这是一个要点链接,我在其中使用effect coding使用上述功能(的扩展)还提取掉落电平的影响幅度:https://gist.github.com/brshallo/f923b9b5c6360ce09beda35c3d1d55e9 < / p>