使用tbl_summary获取分类数据的方法

时间:2020-10-08 18:50:27

标签: r dplyr gtsummary

我想生成我的分类变量子集的均值和频率。

mtcars2 <- mtcars %>% mutate(across(matches('cyl|gear|carb'), as.factor))

我知道我可以用它来分别获得分类和连续的输出。

mtcars_out <- tbl_summary(mtcars2, 
                          statistic = list(all_numeric() ~ "{mean} ({sd})",
                                           all_categorical() ~ "{n} / {N} ({p}%)")) %>% as_tibble()

由于mtacrs $ cyl已经具有“级别”相关联,因此我想按原样使用mtcars2并生成该变量的均值。像这样...但是tbl_summary不喜欢这样,因为它是一个分类变量。

mtcars_out <- tbl_summary(mtcars2, 
                          statistic = list(all_numeric() ~ "{mean} ({sd})",
                                           "cyl"~"{mean} ({sd})")) %>% as_tibble()

Error: Problem with `mutate()` input `tbl_stats`.
x There was an error assembling the summary statistics for 'cyl'
  with summary type 'categorical'.

There are 2 common sources for this error.
1. You have requested summary statistics meant for continuous
   variables for a variable being as summarized as categorical.
   To change the summary type to continuous, add the argument
  `type = list(cyl ~ 'continuous')`
2. One of the functions or statistics from the `statistic=` argument is not valid.
i Input `tbl_stats` is `pmap(...)`.

我尝试在调用中指定类型,但这也不起作用。

mtcars_out <- tbl_summary(mtcars2, 
                          type = list("cyl"~"continuous"),
                          statistic = list(all_numeric() ~ "{mean} ({sd})",
                                           all_categorical() ~ "{n} / {N} ({p}%)")) %>% as_tibble()



 Error: Problem with `mutate()` input `summary_type`.
x Column 'cyl' is class "factor" and cannot be summarized as a continuous variable.
i Input `summary_type` is `assign_summary_type(...)`.

我的实际数据集有500个变量,并且已经为每个变量指定了类,所以我不想更改原始数据集的类类型。我想在tbl_summary调用中指定它。

任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:0)

您已将cyl设为因子,R不允许您取因子变量的平均值。

我认为对您来说,最简单的方法是获取变量的数字版本和因子版本。从那里您可以总结两个变量。从那里,您可以删除多余的标题行(用于变量的因子版本)。

library(gtsummary)
library(tidyverse)

tbl <- 
  mtcars %>%
  select(cyl) %>%
  mutate(fct_cyl = factor(cyl)) %>%
  tbl_summary(
    type = where(is.numeric) ~ "continuous",
    statistic = where(is.numeric) ~ "{mean} ({sd})",
    label = cyl ~ "No. Cylinders"
  ) 

# remove extra header row for factor variables
tbl$table_body <-
  tbl$table_body %>%
  filter(!(startsWith(variable, "fct_") & row_type == "label"))

# print table
tbl

enter image description here