我正在尝试编写一个函数来创建半小时平均值(总是在整个小时和过去的30分钟),我正在使用dplyr。
我将日期列名称作为参数传递,并使用“group_by_”对数据进行分组,然后对其进行汇总。但是,我经常收到错误说:
Error in cut.default(colName, cuts) : 'x' must be numeric
我正在使用的代码如下。我的数据框简称为“数据”。
dateColumn = "date"
measurevar = "temperature"
cuts <- seq(round(min(data[,dateColumn]), "hours")-30*60,
max(data[,dateColumn])+30*60, "30 min")
data_avg = data %>%
group_by_(dateColumn = cut(dateColumn, cuts)) %>%
summarise_at(.vars=vars(measurevar),
funs(mean = mean (., na.rm=T),
sd = sd (., na.rm=T) ))
你可以帮我解决这个问题吗?
请注意,日期列是POSIXct,以下是数据的示例:
data <- structure(list(date = structure(c(1508258822, 1508258827,
1508258832, 1508258837, 1508258842, 1508258847, 1508258852, 1508258857,
1508258862, 1508258867, 1508258877, 1508259298, 1508259303, 1508259308,
1508259313, 1508259318, 1508259323, 1508259328, 1508259333, 1508259338,
1508259343, 1508259348, 1508259353, 1508259778, 1508259783, 1508259788,
1508259793, 1508259798, 1508259803, 1508259813, 1508259818, 1508259823,
1508259828, 1508259833, 1508260259, 1508260264, 1508260269, 1508260274,
1508260279, 1508260284, 1508260289, 1508260294, 1508260299, 1508260304,
1508260309, 1508260314, 1508260739, 1508260744, 1508260749, 1508260754
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), temperature = c(295.49,
295.49, 295.48, 295.47, 295.46, 295.45, 295.45, 295.45, 295.45,
295.45, 295.44, 295.24, 294.98, 295.24, 295.24, 295.24, 295.24,
295.23, 295.23, 295.21, 295.2, 295.2, 295.19, 294.93, 294.93,
294.88, 294.93, 294.93, 294.93, 294.92, 294.92, 294.91, 294.9,
294.9, 294.73, 294.72, 294.72, 294.71, 294.71, 294.71, 294.71,
294.71, 294.72, 294.71, 294.71, 294.7, 294.55, 294.55, 294.55,
294.54)), .Names = c("date", "temperature"), row.names = c(NA,
50L), class = "data.frame")
生成的“data_avg”应该类似于:
date mean sd
1 2017-10-17 16:30:00 295.46 0.1305597
2 2017-10-17 17:00:00 295.55 0.1137462
答案 0 :(得分:1)
你试过这样的吗?
packageVersion("dplyr")
# [1] ‘0.7.4.9000’
data %>%
group_by(dateColumn = cut(!!sym(dateColumn), cuts)) %>%
summarise_at(.vars=vars(measurevar),
funs(mean = mean (., na.rm=T),
sd = sd (., na.rm=T) ))
# # A tibble: 2 x 3
# dateColumn mean sd
# <fctr> <dbl> <dbl>
# 1 2017-10-17 16:30:00 295 0.142
# 2 2017-10-17 17:00:00 295 0.135