将GLM变量名称和值与术语分开

时间:2018-11-03 00:30:01

标签: r dplyr glm broom

我正在尝试将terms列分为两列,即回归中使用的变量和类别的值。

  library(MASS)
#> Warning: package 'MASS' was built under R version 3.5.1
  library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.1
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MASS':
#> 
#>     select
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  library(broom)
#> Warning: package 'broom' was built under R version 3.5.1
as_tibble(Titanic) %>%  dplyr::mutate(y_n = if_else(Survived == "Yes", 1, 0)) %>% 
  glm(y_n ~ Class + n + Age + Sex, data = .) %>%  broom::tidy() %>%  print(n = 10)
#> # A tibble: 7 x 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)  0.567    0.245       2.31    0.0294
#> 2 Class2nd    -0.00528  0.276      -0.0192  0.985 
#> 3 Class3rd     0.0503   0.279       0.180   0.858 
#> 4 ClassCrew    0.0740   0.283       0.262   0.796 
#> 5 n           -0.00106  0.000907   -1.16    0.255 
#> 6 AgeChild    -0.131    0.225      -0.582   0.566 
#> 7 SexMale      0.0833   0.208       0.401   0.692

reprex package(v0.2.1)于2018-11-02创建

需要这样的东西

enter image description here

1 个答案:

答案 0 :(得分:2)

也许满足以下条件:

df <- as_tibble(Titanic) %>%  dplyr::mutate(y_n = if_else(Survived == "Yes", 1, 0))
m <- glm(y_n ~ Class + n + Age + Sex, data = df)

(trm <- attr(m$terms, "term.labels")) # Getting original variables
# [1] "Class" "n"     "Age"   "Sex"  
(asgn <- attr(model.matrix(m$formula, data = df), "assign")) # See ?model.matrix
# [1] 0 1 1 1 2 3 4 

cbind(Term = trm[asgn[-1]], 
      Category = str_replace(names(coef(m)[-1]), trm[asgn[-1]], ""))
#      Term    Category
# [1,] "Class" "2nd"   
# [2,] "Class" "3rd"   
# [3,] "Class" "Crew"  
# [4,] "n"     ""      
# [5,] "Age"   "Child" 
# [6,] "Sex"   "Male" 

缺少拦截线,但是如果需要,可以在asgn[1] == 0的情况下添加它。