我正在尝试使用mutate_if
根据变量名执行计算。例如,如果变量名称包含" demo"计算平均值,如果名称包含" meas"计算中位数:
library(tidyverse)
library(stringr)
exm_data <- data_frame(
group = sample(letters[1:5], size = 50, replace = TRUE),
demo_age = rnorm(50),
demo_height = runif(50, min = 48, max = 80),
meas_score1 = rnorm(50),
meas_score2 = rnorm(50)
)
exm_data
#> # A tibble: 50 x 5
#> group demo_age demo_height meas_score1 meas_score2
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a -1.46539563 58.22435 -0.760692567 0.1077901
#> 2 b 1.90983770 56.57976 0.262933462 -1.0186600
#> 3 c 0.58502114 66.26322 2.283491647 0.3215542
#> 4 b -0.97228337 74.82932 2.447551824 -0.4763201
#> 5 a 0.65814161 72.19627 -0.592671739 -0.0521247
#> 6 c -0.62133706 75.49976 0.005813255 -0.4195284
#> 7 b 0.40650836 60.99083 0.809183477 -0.1127530
#> 8 c -0.48251421 50.94077 -1.171749420 1.7268231
#> 9 b 1.24476630 71.39803 1.786950340 0.7980217
#> 10 c -0.09704469 69.52001 -0.511872217 -1.1465523
#> # ... with 40 more rows
exm_data %>%
mutate_if(str_detect(colnames(.), "demo"), mean) %>%
mutate_if(str_detect(colnames(.), "meas"), median)
#> # A tibble: 50 x 5
#> group demo_age demo_height meas_score1 meas_score2
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a -0.03250753 64.31412 -0.09909911 0.1307904
#> 2 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 3 c -0.03250753 64.31412 -0.09909911 0.1307904
#> 4 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 5 a -0.03250753 64.31412 -0.09909911 0.1307904
#> 6 c -0.03250753 64.31412 -0.09909911 0.1307904
#> 7 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 8 c -0.03250753 64.31412 -0.09909911 0.1307904
#> 9 b -0.03250753 64.31412 -0.09909911 0.1307904
#> 10 c -0.03250753 64.31412 -0.09909911 0.1307904
#> # ... with 40 more rows
如您所见,这项工作符合预期。但是,我想按组进行这些计算,当我添加group_by
语句时,它会中断:
exm_data %>%
group_by(group) %>%
mutate_if(str_detect(colnames(.), "demo"), mean) %>%
mutate_if(str_detect(colnames(.), "meas"), median)
#> Error: length(.p) == length(vars) is not TRUE
有没有办法在使用列名的分组tibble上使用mutate_if
?
答案 0 :(得分:6)
您可以使用mutate_at
和contains
中的dplyr
,如下所示,
library(dplyr)
exm_data %>%
group_by(group) %>%
mutate_at(vars(contains('demo')), funs(mean)) %>%
mutate_at(vars(contains('meas')), funs(median))
给出,
# A tibble: 50 x 5 # Groups: group [5] group demo_age demo_height meas_score1 meas_score2 <chr> <dbl> <dbl> <dbl> <dbl> 1 d 0.12916082 60.26550 0.1932882 -0.5356818 2 b -0.31142894 64.50839 0.3219514 -0.4777860 3 b -0.31142894 64.50839 0.3219514 -0.4777860 4 a -0.34373403 64.84180 0.1929516 -0.3821047 5 a -0.34373403 64.84180 0.1929516 -0.3821047 6 b -0.31142894 64.50839 0.3219514 -0.4777860 7 d 0.12916082 60.26550 0.1932882 -0.5356818 8 a -0.34373403 64.84180 0.1929516 -0.3821047 9 d 0.12916082 60.26550 0.1932882 -0.5356818 10 c -0.05963747 59.07845 -0.2395409 -0.4484245
BONUS 您不需要加载stringr