我正在处理分组数据集,我想将4个汇总统计数据添加为4个新列:count,mean,ci lower,ci upper。
我总结了平均值,ci lower,ci upper如下:
library(Hmisc)
library(dplyr)
# summarize count, mean, confidence intervals and make four new columns;
mtcars %>% group_by(vs, am) %>%
do(
as.data.frame(as.list(smean.cl.normal(.$mpg)))
)
# vs am Mean Lower Upper
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 0 15.05000 13.28723 16.81277
# 2 0 1 19.75000 15.54295 23.95705
# 3 1 0 20.74286 18.45750 23.02822
# 4 1 1 28.37143 23.97129 32.77157
但是,当我添加count时,新列将成为2列列表:
df <- mtcars %>% group_by(vs, am) %>%
do(
n = length(.$mpg),
stats = smean.cl.normal(.$mpg)
)
# # A tibble: 4 × 4
# vs am n stats
# * <dbl> <dbl> <list> <list>
# 1 0 0 <int [1]> <dbl [3]>
# 2 0 1 <int [1]> <dbl [3]>
# 3 1 0 <int [1]> <dbl [3]>
# 4 1 1 <int [1]> <dbl [3]>
我想要的输出是:
# vs am n Mean Lower Upper
# <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 0 0 12 15.05000 13.28723 16.81277
# 2 0 1 6 19.75000 15.54295 23.95705
# 3 1 0 7 20.74286 18.45750 23.02822
# 4 1 1 7 28.37143 23.97129 32.77157
我应该如何方便地实现这一目标?
提前致谢。
我也尝试过:
mtcars %>% group_by(vs, am) %>%
do(
as.data.frame(as.list(c(length(.$mpg), smean.cl.normal(.$mpg))))
)
# Source: local data frame [4 x 8]
# Groups: vs, am [4]
#
# vs am X12 Mean Lower Upper X6 X7
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 0 12 15.05000 13.28723 16.81277 NA NA
# 2 0 1 NA 19.75000 15.54295 23.95705 6 NA
# 3 1 0 NA 20.74286 18.45750 23.02822 NA 7
# 4 1 1 NA 28.37143 23.97129 32.77157 NA 7
这会产生奇怪的结果。
答案 0 :(得分:1)
您可以在do
使用多个tidyverse
软件包的情况下完成此操作,即tidyr
,dplyr
,purrr
和broom
。
背后的原因是do
will eventually be replaced by purrr
它确实:
您需要进行一些操作才能在步骤3中以正确的形式获取smean.cl.normal
。我的方法是将输出转换为整齐的数据框,其中broom::tidy
然后tidyr::spread
行成列。每个vs / am组都有适当的整齐形式。这种方法可能会有所改进,希望这些建议能够发表在评论中。
library(Hmisc)
library(tidyverse)
mtcars %>%
group_by(vs, am) %>%
nest(mpg) %>%
mutate(stats = map(data, ~spread(tidy(smean.cl.normal(.x$mpg)), names, x)),
n = map(data, nrow)) %>%
unnest(stats, n) %>%
select(-data)