dplyr:如何将列名作为参数传递给group_by?

时间:2017-12-01 10:11:51

标签: r dplyr

我正在尝试编写一个函数来创建半小时平均值(总是在整个小时和过去的30分钟),我正在使用dplyr。

我将日期列名称作为参数传递,并使用“group_by_”对数据进行分组,然后对其进行汇总。但是,我经常收到错误说:

Error in cut.default(colName, cuts) : 'x' must be numeric

我正在使用的代码如下。我的数据框简称为“数据”。

dateColumn = "date"
measurevar = "temperature"

cuts <- seq(round(min(data[,dateColumn]), "hours")-30*60,
                  max(data[,dateColumn])+30*60, "30 min")

data_avg = data %>%
  group_by_(dateColumn = cut(dateColumn, cuts)) %>%
  summarise_at(.vars=vars(measurevar),
               funs(mean = mean   (., na.rm=T),
                    sd   = sd     (., na.rm=T) ))

你可以帮我解决这个问题吗?

请注意,日期列是POSIXct,以下是数据的示例:

data <- structure(list(date = structure(c(1508258822, 1508258827, 
1508258832, 1508258837, 1508258842, 1508258847, 1508258852, 1508258857, 
1508258862, 1508258867, 1508258877, 1508259298, 1508259303, 1508259308, 
1508259313, 1508259318, 1508259323, 1508259328, 1508259333, 1508259338, 
1508259343, 1508259348, 1508259353, 1508259778, 1508259783, 1508259788, 
1508259793, 1508259798, 1508259803, 1508259813, 1508259818, 1508259823, 
1508259828, 1508259833, 1508260259, 1508260264, 1508260269, 1508260274, 
1508260279, 1508260284, 1508260289, 1508260294, 1508260299, 1508260304, 
1508260309, 1508260314, 1508260739, 1508260744, 1508260749, 1508260754
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), temperature = c(295.49, 
295.49, 295.48, 295.47, 295.46, 295.45, 295.45, 295.45, 295.45, 
295.45, 295.44, 295.24, 294.98, 295.24, 295.24, 295.24, 295.24, 
295.23, 295.23, 295.21, 295.2, 295.2, 295.19, 294.93, 294.93, 
294.88, 294.93, 294.93, 294.93, 294.92, 294.92, 294.91, 294.9, 
294.9, 294.73, 294.72, 294.72, 294.71, 294.71, 294.71, 294.71, 
294.71, 294.72, 294.71, 294.71, 294.7, 294.55, 294.55, 294.55, 
294.54)), .Names = c("date", "temperature"), row.names = c(NA, 
50L), class = "data.frame")

生成的“data_avg”应该类似于:

        date           mean    sd
1 2017-10-17 16:30:00 295.46 0.1305597
2 2017-10-17 17:00:00 295.55 0.1137462

1 个答案:

答案 0 :(得分:1)

你试过这样的吗?

packageVersion("dplyr")
# [1] ‘0.7.4.9000’
data %>%
  group_by(dateColumn = cut(!!sym(dateColumn), cuts)) %>% 
  summarise_at(.vars=vars(measurevar),
               funs(mean = mean   (., na.rm=T),
               sd   = sd     (., na.rm=T) ))
# # A tibble: 2 x 3
# dateColumn           mean    sd
# <fctr>              <dbl> <dbl>
#   1 2017-10-17 16:30:00   295 0.142
# 2 2017-10-17 17:00:00   295 0.135

请参阅Programming with dplyr