如何在dplyr管道中传递变量名称以有条件地求和?

时间:2019-02-25 15:49:56

标签: r dplyr rlang nse

问题的症结在于如何将列变量传递到分组的df中以有条件地对数据求和。该示例的数据如下:

library(dplyr)
library(rlang)
set.seed(1)

# dummy dates
date_vars <- purrr::map(c('2018-01-31', '2018-02-28', '2018-03-31', 
                         '2018-04-30', '2018-05-31', '2018-06-30', 
                         '2018-07-31', '2018-08-31', '2018-09-30', 
                         '2018-10-31', '2018-11-30', '2018-12-31'), as.Date) %>% 
  purrr::reduce(c)

dummy_df <- tibble(

  id = rep(c("a", "b", "c"), each =  12),
  date = rep(date_vars, 3),
  value = runif(36, 1, 10)

)

下面的函数将获取一个数据帧,并按变量分组(使用rlang的sym函数),然后通过添加日期大于或等于某个日期周期的所有值来创建新的摘要列。在这里,我总结了3个月的“价值”。

agg_by_period <- function(df, date_period, period, grouping, new_col_prefix){

  grouping_vars <- syms(grouping)

  new_sum_column <- quo_name(paste0(new_col_prefix, "sum_", period, 'm'))

  df %>% 
    group_by(!!!grouping_vars) %>% 
    summarize(!!new_sum_column := sum(value[date >= date_period], na.rm = T)) %>% 
    select(!!!grouping_vars, !!sym(new_sum_column))

}


agg_by_period(df = dummy_df, 
              date_period = as.Date('2018-10-31'), 
              grouping = 'id',
              period = 3,
              new_col_prefix = 'new_'
)


# A tibble: 3 x 2
  id    new_sum_3m
  <chr>      <dbl>
1 a           7.00
2 b          11.9 
3 c          18.1 


太好了!我的问题特定于当此列的名称不是“ value”时,使函数中的“ value”动态化。我天真的尝试使用sym()传递了此列,并且它的错误如下:



agg_by_period2 <- function(df, date_period, period, grouping, new_col_prefix, 
                          value_var){

  grouping_vars <- syms(grouping)

  new_sum_column = quo_name(paste0(new_col_prefix, "sum_", period, 'm'))

  value_var_col <- sym(value_var)

  df %>% 
    group_by(!!!grouping_vars) %>% 
    summarize(!!new_sum_column := sum(!!value_var_col[date >= date_period], na.rm = T)) %>% 
    select(!!!grouping_vars, !!sym(new_sum_column))

}


agg_by_period2(df = dummy_df, 
              date_period = as.Date('2018-10-31'), 
              grouping = 'id',
              period = 3,
              new_col_prefix = 'new_',
              value_var = 'value'
)

 Error in `>=.default`(date, date_period) : 
  comparison (5) is possible only for atomic and list types 

当删除日期标准([date> = date_period])时,上述功能将起作用。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:3)

这似乎是!![的操作顺序问题。看起来您只需要将拼接处括在括号中即可

  df %>% 
    group_by(!!!grouping_vars) %>% 
    summarize(!!new_sum_column := sum((!!value_var_col)[date >= date_period], na.rm = T)) %>% 
    select(!!!grouping_vars, !!sym(new_sum_column))

注意(!!value_var_col)而不是!!value_var_col。这样,拼接将发生在子集之前。