循环遍历dplyr时格式化变异变量名称

时间:2016-10-10 21:20:30

标签: r dplyr

我有一个数据集,我在不同时间段创建新功能。由于我将重复使用以下dplyr块,我想将它包装在一个函数中,但我不知道如何编码新变异的预测变量的名称以反映它们在我的周期列表中引用的时间段间隔。

library(dplyr)
library(lubridate)

data <- data.frame(custid = c(1,1,1,2,2,2,3,3,3,4),
                   total = c(1,2,3,4,5,6,7,8,9,10),
                   date = as.Date(c("2015-01-01", "2015-01-02", 
                                    "2015-01-10", "2015-01-11", 
                                    "2015-01-21", "2015-01-22", 
                                    "2015-01-24", "2015-01-25", 
                                    "2015-01-27", "2015-01-28")))

period_intervals <- list(period_one = interval(as.Date("2015-01-01"), as.Date("2015-01-20")),
                         period_two = interval(as.Date("2015-01-21"), as.Date("2015-01-30")))


compute_period_predictors <- function(data, time_periods){
  ### Takes data set and a vector of time periods,
  ### Adds aggregated predictors for that time period.

  for(i in 1:length(time_periods)){
    df <- data %>%
      filter(date %within% period_intervals[[i]]) %>%
      group_by(custid) %>%
      mutate(period_i_total_mean = mean(total)) %>%
      mutate(period_i_total_sum = sum(total))
  }

  return(df)

}

示例:

假设我想为时段period_45,period_50和period_60创建这两个新的预测变量。我怎样才能将变异变量名称作为连接形式的句号_t_45_total_mean,period_50_total_mean等?

0 个答案:

没有答案