我尝试基于5分钟组平均数据集。我正在使用dplyr,如示例中所示:
library(zoo)
library(xts)
library(dplyr)
t1 <- as.POSIXct("2012-1-1 0:0:0")
t2 <- as.POSIXct("2012-1-1 1:0:0")
d <- seq(t1, t2, by = "1 min")
x <- rnorm(length(d))
z <- cbind.data.frame(d,x)
z %>%
group_by(d = cut(d, breaks="5 min")) %>%
summarize(x = mean(x))
平均值为0:0:0到0:4:0,存储为时间戳0:0:0。但是,我要求时间戳为0:5:0,0:10:0,0:15:0等,并且这些时间戳对应的平均值为0:1:0 - 0:5:0,0 :6:0 - 0:10:0,0:11:00 - 0:15:0。
是否有一个简单的调整来获得这个?
答案 0 :(得分:3)
一种方法是明确指定断点和标签。例如:
# Create 5-minute breakpoints at 1,6,11,... minutes past the hour.
breaks=seq(as.POSIXct("2011-12-31 23:56:00"),
as.POSIXct("2012-01-01 01:05:00"), by="5 min")
> breaks
[1] "2011-12-31 23:56:00 PST" "2012-01-01 00:01:00 PST" "2012-01-01 00:06:00 PST" "2012-01-01 00:11:00 PST"
[5] "2012-01-01 00:16:00 PST" "2012-01-01 00:21:00 PST" "2012-01-01 00:26:00 PST" "2012-01-01 00:31:00 PST"
[9] "2012-01-01 00:36:00 PST" "2012-01-01 00:41:00 PST" "2012-01-01 00:46:00 PST" "2012-01-01 00:51:00 PST"
[13] "2012-01-01 00:56:00 PST" "2012-01-01 01:01:00 PST"
# Setting the labels to breaks - 60 subtracts 1 minute to each value in breaks,
# so the labels will be 5,10,15... minutes past the hour
z %>%
group_by(d = cut(d, breaks=breaks, labels=(breaks - 60)[-1])) %>%
summarize(x = mean(x))
d x
1 2012-01-01 00:00:00 -1.14713698
2 2012-01-01 00:05:00 -0.17172950
3 2012-01-01 00:10:00 0.19049591
4 2012-01-01 00:15:00 0.15619679
5 2012-01-01 00:20:00 0.18397502
6 2012-01-01 00:25:00 0.33750870
7 2012-01-01 00:30:00 -0.22182889
8 2012-01-01 00:35:00 -0.01832799
9 2012-01-01 00:40:00 1.08747482
10 2012-01-01 00:45:00 0.36870290
11 2012-01-01 00:50:00 0.75684684
12 2012-01-01 00:55:00 0.14584254
13 2012-01-01 01:00:00 0.34766052