我正在尝试将terms列分为两列,即回归中使用的变量和类别的值。
library(MASS)
#> Warning: package 'MASS' was built under R version 3.5.1
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.1
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MASS':
#>
#> select
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(broom)
#> Warning: package 'broom' was built under R version 3.5.1
as_tibble(Titanic) %>% dplyr::mutate(y_n = if_else(Survived == "Yes", 1, 0)) %>%
glm(y_n ~ Class + n + Age + Sex, data = .) %>% broom::tidy() %>% print(n = 10)
#> # A tibble: 7 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.567 0.245 2.31 0.0294
#> 2 Class2nd -0.00528 0.276 -0.0192 0.985
#> 3 Class3rd 0.0503 0.279 0.180 0.858
#> 4 ClassCrew 0.0740 0.283 0.262 0.796
#> 5 n -0.00106 0.000907 -1.16 0.255
#> 6 AgeChild -0.131 0.225 -0.582 0.566
#> 7 SexMale 0.0833 0.208 0.401 0.692
由reprex package(v0.2.1)于2018-11-02创建
需要这样的东西
答案 0 :(得分:2)
也许满足以下条件:
df <- as_tibble(Titanic) %>% dplyr::mutate(y_n = if_else(Survived == "Yes", 1, 0))
m <- glm(y_n ~ Class + n + Age + Sex, data = df)
(trm <- attr(m$terms, "term.labels")) # Getting original variables
# [1] "Class" "n" "Age" "Sex"
(asgn <- attr(model.matrix(m$formula, data = df), "assign")) # See ?model.matrix
# [1] 0 1 1 1 2 3 4
cbind(Term = trm[asgn[-1]],
Category = str_replace(names(coef(m)[-1]), trm[asgn[-1]], ""))
# Term Category
# [1,] "Class" "2nd"
# [2,] "Class" "3rd"
# [3,] "Class" "Crew"
# [4,] "n" ""
# [5,] "Age" "Child"
# [6,] "Sex" "Male"
缺少拦截线,但是如果需要,可以在asgn[1] == 0
的情况下添加它。