R将接受多个group_by和汇总参数的函数

时间:2017-06-14 16:13:22

标签: r function dplyr

我有一个包含各种项目工作活动详情的tibble。我正在尝试编写一个泛型函数,它可以使用dplyr动词对tibble运行相当简单的查询,格式为:

df %>%
  group_by(user_id) %>%
  summarise(total_time = sum(duration))

这很简单,只需要一个分组变量和摘要变量。当我尝试将函数概括为接受多个分组和/或汇总变量时,我的问题出现了。我试图使用下面的三个函数来做到这一点(所以这里有相当多的代码)。

proj_activity_report <- function(query_id) {

    projects <- read_rds('~/work_tracker/projects.rds')
    users <- read_rds('~/work_tracker/users.rds')
    activity <- read_rds('~/work_tracker/activity.rds')

    activity %>% 
        filter(project_id %in% query_id) %>% 
        left_join(projects, by = 'project_id') %>% 
        left_join(users, by = c('user_id.x' = 'user_id')) %>% 
        mutate(full_name = paste(forename, surname),
               start_date = format(date(time_started), '%d %b %Y'),
               logged_date = format(date(time_logged), '%d %b %Y')) %>% 
        arrange(project_id, time_started) %>% 
        select(Activity_ID = activity_id,
               Project_ID = project_id,
               Activity_Type = activity_title,
               Project_Title = project_title,
               Worker_Name = full_name,
               Date_Started = start_date,
               Duration_mins = duration,
               Date_Logged = logged_date,
               Comments = comments,
               Project_Status = project_status) }

proj_activity_grouped <- function(query_id, ...) {
    grouping_vars <- quos(...)

    proj_activity_report(query_id) %>% 
        group_by(!!!grouping_vars)
}

proj_activity_summ <- function(query_id, grouping_vars, summ_var) {
    query_id <- enquo(query_id)
    summ_var <- enquo(summ_var)
    grouping_vars <- quos(grouping_vars)

    proj_activity_grouped(query_id, !!!grouping_vars) %>% 
        summarise(total = sum(!!summ_var))
}

函数proj_activity_report()工作正常,proj_activity_grouped()似乎工作正常,因为我通过调用

测试了它
proj_activity_grouped(102, Worker_Name) %>% 
  summarise(total_duration = sum(Duration_mins))

给出了我期望的输出:

# A tibble: 12 x 2
       Worker_Name total_duration
             <chr>          <dbl>
 1      Ahmed Khan            690
 2   Craig Stanton           1245
 3   Darnell Lewis           1395
 4 David Silverman            960
 5  Frankie Benton           1275
 6     Jane Benton            855
 7          Li Fan           1275
 8     Maria Gomes           1200
 9    Sunil Khanna           1080
10  Suzanne Watson           1380
11  Theresa Briers           1395
12   Valerie Jones           1500

(这是虚拟数据,所有名称都是假的。)

事情发生了proj_activity_summ()。使用上面的代码我得到一个错误Error in filter_impl(.data, quo) : 'match' requires vector arguments。我希望它与我处理变量的方式有关,但我无法弄清楚哪一点是错误的。

NB。我正在使用dplyr版本0.7.0。

0 个答案:

没有答案