我正在尝试汇总此数据集作为示例,并尝试使用多个功能n()
和mean()
。如何在同一工作流程中将两者结合在一起?
这是一个玩具数据集,可反映我的较大数据:
library(tidyverse)
df <- structure(list(group_var = c(70, 72, 73, 70, 70, 71, 70, 71,
71, 70), var1_scr = c(50.5, 25.75, 50.5, 50.5, 50.5, 50.5, 75.25,
75.25, 50.5, 75.25), var2_scr = c(50.5, 50.5, NA, 75.25, 50.5,
50.5, 75.25, 75.25, 100, 75.25), var3_scr = c(NA, NA, 75.25,
NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
df
#> # A tibble: 10 x 4
#> group_var var1_scr var2_scr var3_scr
#> <dbl> <dbl> <dbl> <dbl>
#> 1 70 50.5 50.5 NA
#> 2 72 25.8 50.5 NA
#> 3 73 50.5 NA 75.2
#> 4 70 50.5 75.2 NA
#> 5 70 50.5 50.5 NA
#> 6 71 50.5 50.5 NA
#> 7 70 75.2 75.2 NA
#> 8 71 75.2 75.2 NA
#> 9 71 50.5 100 NA
#> 10 70 75.2 75.2 NA
# summarize the scores
df %>% group_by(group_var) %>%
summarise_at(vars(ends_with("_scr")), funs(mean(., na.rm = TRUE)))
#> # A tibble: 4 x 4
#> group_var var1_scr var2_scr var3_scr
#> <dbl> <dbl> <dbl> <dbl>
#> 1 70 60.4 65.4 NaN
#> 2 71 58.8 75.2 NaN
#> 3 72 25.8 50.5 NaN
#> 4 73 50.5 NaN 75.2
# count all the oberservations
df %>% group_by(group_var) %>%
summarise(obs = n())
#> # A tibble: 4 x 2
#> group_var obs
#> <dbl> <int>
#> 1 70 5
#> 2 71 3
#> 3 72 1
#> 4 73 1
# my goal is to produce this dataset but using the mutate_at function
df %>% group_by(group_var) %>%
summarise(var1_scr = mean(var1_scr),
var2_scr = mean(var2_scr),
var3_scr = mean(var3_scr),
obs = n())
#> # A tibble: 4 x 5
#> group_var var1_scr var2_scr var3_scr obs
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 70 60.4 65.4 NA 5
#> 2 71 58.8 75.2 NA 3
#> 3 72 25.8 50.5 NA 1
#> 4 73 50.5 NA 75.2 1
由reprex package(v0.3.0)于2019-08-15创建
答案 0 :(得分:4)
一种选择是,在按“ group_var”分组后,还要在分组变量中添加“ n”,然后执行summarise_at
library(dplyr)
df %>%
group_by(group_var) %>%
group_by(obs = n(), add = TRUE) %>%
summarise_at(vars(ends_with("_scr")), list(~mean(., na.rm = TRUE)))
# A tibble: 4 x 5
# Groups: group_var [4]
# group_var obs var1_scr var2_scr var3_scr
# <dbl> <int> <dbl> <dbl> <dbl>
#1 70 5 60.4 65.4 NaN
#2 71 3 58.8 75.2 NaN
#3 72 1 25.8 50.5 NaN
#4 73 1 50.5 NaN 75.2
另一种选择是使用mutate
创建频率列,并通过将频率列也包含在mean
中(例如summarise_at
-> 3)来获得mean(rep(3, 5))
>
df %>%
group_by(group_var) %>%
mutate(obs = n()) %>%
summarise_at(vars(ends_with("_scr"), obs), list(~mean(., na.rm = TRUE)))
# A tibble: 4 x 5
# group_var var1_scr var2_scr var3_scr obs
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 70 60.4 65.4 NaN 5
#2 71 58.8 75.2 NaN 3
#3 72 25.8 50.5 NaN 1
#4 73 50.5 NaN 75.2 1
注意:两者都为“ obs”提供了一列
在这里,OP的预期输出是summarise/summarise_at/summarise_all/summarise_if
有效的汇总输出。但是,如果我们需要使用mutate_at
(仅用于演示)
df %>%
group_by(group_var) %>%
mutate(obs = n()) %>%
mutate_at(vars(ends_with("_scr"), obs), list(~mean(., na.rm = TRUE))) %>%
distinct_at(vars(group_var, ends_with("_scr"), obs))
# A tibble: 4 x 5
# Groups: group_var [4]
# group_var var1_scr var2_scr var3_scr obs
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 70 60.4 65.4 NaN 5
#2 72 25.8 50.5 NaN 1
#3 73 50.5 NaN 75.2 1
#4 71 58.8 75.2 NaN 3
答案 1 :(得分:4)
如果您在同一调用中需要两个函数,我们可以做
library(dplyr)
df %>% group_by(group_var) %>%
summarise_at(vars(ends_with("_scr")), list(m=~mean(., na.rm = TRUE), n=~n()))
# A tibble: 4 x 7
group_var var1_scr_m var2_scr_m var3_scr_m var1_scr_n var2_scr_n var3_scr_n
<dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 70 60.4 65.4 NaN 5 5 5
2 71 58.8 75.2 NaN 3 3 3
3 72 25.8 50.5 NaN 1 1 1
4 73 50.5 NaN 75.2 1 1 1
考虑一下OP注释:我的目标是使用mutate_at函数生成此数据集
df %>% group_by(group_var) %>%
mutate_at(vars(ends_with("_scr")), list(m=~mean(., na.rm = TRUE), n=~n())) %>%
slice(1)