使用group_by和mutate_if按列名称

时间:2017-10-06 13:51:36

标签: r dplyr tidyverse

我正在尝试使用mutate_if根据变量名执行计算。例如,如果变量名称包含" demo"计算平均值,如果名称包含" meas"计算中位数:

library(tidyverse)
library(stringr)

exm_data <- data_frame(
  group = sample(letters[1:5], size = 50, replace = TRUE),
  demo_age = rnorm(50),
  demo_height = runif(50, min = 48, max = 80),
  meas_score1 = rnorm(50),
  meas_score2 = rnorm(50)
)
exm_data
#> # A tibble: 50 x 5
#>    group    demo_age demo_height  meas_score1 meas_score2
#>    <chr>       <dbl>       <dbl>        <dbl>       <dbl>
#>  1     a -1.46539563    58.22435 -0.760692567   0.1077901
#>  2     b  1.90983770    56.57976  0.262933462  -1.0186600
#>  3     c  0.58502114    66.26322  2.283491647   0.3215542
#>  4     b -0.97228337    74.82932  2.447551824  -0.4763201
#>  5     a  0.65814161    72.19627 -0.592671739  -0.0521247
#>  6     c -0.62133706    75.49976  0.005813255  -0.4195284
#>  7     b  0.40650836    60.99083  0.809183477  -0.1127530
#>  8     c -0.48251421    50.94077 -1.171749420   1.7268231
#>  9     b  1.24476630    71.39803  1.786950340   0.7980217
#> 10     c -0.09704469    69.52001 -0.511872217  -1.1465523
#> # ... with 40 more rows


exm_data %>%
  mutate_if(str_detect(colnames(.), "demo"), mean) %>%
  mutate_if(str_detect(colnames(.), "meas"), median)
#> # A tibble: 50 x 5
#>    group    demo_age demo_height meas_score1 meas_score2
#>    <chr>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1     a -0.03250753    64.31412 -0.09909911   0.1307904
#>  2     b -0.03250753    64.31412 -0.09909911   0.1307904
#>  3     c -0.03250753    64.31412 -0.09909911   0.1307904
#>  4     b -0.03250753    64.31412 -0.09909911   0.1307904
#>  5     a -0.03250753    64.31412 -0.09909911   0.1307904
#>  6     c -0.03250753    64.31412 -0.09909911   0.1307904
#>  7     b -0.03250753    64.31412 -0.09909911   0.1307904
#>  8     c -0.03250753    64.31412 -0.09909911   0.1307904
#>  9     b -0.03250753    64.31412 -0.09909911   0.1307904
#> 10     c -0.03250753    64.31412 -0.09909911   0.1307904
#> # ... with 40 more rows

如您所见,这项工作符合预期。但是,我想按组进行这些计算,当我添加group_by语句时,它会中断:

exm_data %>%
  group_by(group) %>%
  mutate_if(str_detect(colnames(.), "demo"), mean) %>%
  mutate_if(str_detect(colnames(.), "meas"), median)
#> Error: length(.p) == length(vars) is not TRUE

有没有办法在使用列名的分组tibble上使用mutate_if

1 个答案:

答案 0 :(得分:6)

您可以使用mutate_atcontains中的dplyr,如下所示,

library(dplyr)

 exm_data %>% 
  group_by(group) %>% 
  mutate_at(vars(contains('demo')), funs(mean)) %>% 
  mutate_at(vars(contains('meas')), funs(median))

给出,

# A tibble: 50 x 5
# Groups:   group [5]
   group    demo_age demo_height meas_score1 meas_score2
   <chr>       <dbl>       <dbl>       <dbl>       <dbl>
 1     d  0.12916082    60.26550   0.1932882  -0.5356818
 2     b -0.31142894    64.50839   0.3219514  -0.4777860
 3     b -0.31142894    64.50839   0.3219514  -0.4777860
 4     a -0.34373403    64.84180   0.1929516  -0.3821047
 5     a -0.34373403    64.84180   0.1929516  -0.3821047
 6     b -0.31142894    64.50839   0.3219514  -0.4777860
 7     d  0.12916082    60.26550   0.1932882  -0.5356818
 8     a -0.34373403    64.84180   0.1929516  -0.3821047
 9     d  0.12916082    60.26550   0.1932882  -0.5356818
10     c -0.05963747    59.07845  -0.2395409  -0.4484245

BONUS 您不需要加载stringr