我有一个双列数据集,频率为5分钟:
Dataset
Time Power
2015-04-01 04:05:00 1
2015-04-01 04:10:00 2
2015-04-01 04:15:00 3
2015-04-01 04:20:00 4
2015-04-01 04:25:00 5
2015-04-01 04:30:00 6
......
如何将其聚合成15分钟的频率数据集?新数据集应使用每三个时间戳作为新时间戳,新功率应为每三个功率值的总和。
New dataset
Time Power
2015-04-01 04:15:00 1+2+3
2015-04-01 04:30:00 4+5+6
......
答案 0 :(得分:1)
尝试:
data.frame(T=df$Time[c(F,F,T)], P=rowSums(matrix(df$Power,,3,T)))
# T P
#1 2015-04-01 04:15:00 6
#2 2015-04-01 04:30:00 15
我们通过回收两个具有真值的假索引来创建一个间隔为15分钟的数据框。最后,行的总和取自Power列,分为三行。
<强>基准强>
microbenchmark(
plafort = data.frame(T=big.df$Time[c(F,F,T)], P=rowSums(matrix(big.df$Power,,3,T))),
josilber = data.frame(Time=big.df$Time[seq(3, nrow(big.df), by=3)],
Power=tapply(big.df$Power, floor((seq(nrow(big.df))-1)/3), sum))
)
#Unit: milliseconds
#expr min lq mean median uq max neval
#plafort 1.250796 1.345753 1.451546 1.46044 1.527486 2.045416 100
#josilber 176.438850 180.862507 187.434138 186.37592 189.628021 340.325792 100
数据强>
big.df <- data.frame(Time = rep(df$Time, 1e4), Power = rep(df$Power, 1e4))
答案 1 :(得分:1)
创建一个标识每个观察窗口的列(使用向量回收):
> df$window <- df$Time + minutes(5*c(2,1,0))
> print(df)
Time power window
1 2015-04-01 00:05:00 1 2015-04-01 00:15:00
2 2015-04-01 00:10:00 2 2015-04-01 00:15:00
3 2015-04-01 00:15:00 3 2015-04-01 00:15:00
4 2015-04-01 00:20:00 4 2015-04-01 00:30:00
5 2015-04-01 00:25:00 5 2015-04-01 00:30:00
6 2015-04-01 00:30:00 6 2015-04-01 00:30:00
然后按窗口分组,总结:
> library(dplyr)
> df %>% group_by(window) %>% summarize(power=sum(power)) -> newdf
> print(newdf)
Source: local data frame [2 x 2]
window power
1 2015-04-01 00:15:00 6
2 2015-04-01 00:30:00 15
答案 2 :(得分:0)
data.frame(Time=dat$Time[seq(3, nrow(dat), by=3)],
Power=tapply(dat$Power, floor((seq(nrow(dat))-1)/3), sum))
# Time Power
# 0 2015-04-01 04:15:00 6
# 1 2015-04-01 04:30:00 15