R聚合某些行

时间:2015-06-18 00:17:35

标签: r

我有一个双列数据集,频率为5分钟:

Dataset
          Time       Power
 2015-04-01 04:05:00        1
 2015-04-01 04:10:00        2
 2015-04-01 04:15:00        3
 2015-04-01 04:20:00        4
 2015-04-01 04:25:00        5
 2015-04-01 04:30:00        6
  ......

如何将其聚合成15分钟的频率数据集?新数据集应使用每三个时间戳作为新时间戳,新功率应为每三个功率值的总和。

New dataset
          Time       Power
 2015-04-01 04:15:00        1+2+3
 2015-04-01 04:30:00        4+5+6
  ......

3 个答案:

答案 0 :(得分:1)

尝试:

data.frame(T=df$Time[c(F,F,T)], P=rowSums(matrix(df$Power,,3,T)))
#                    T     P
#1 2015-04-01 04:15:00     6
#2 2015-04-01 04:30:00    15

我们通过回收两个具有真值的假索引来创建一个间隔为15分钟的数据框。最后,行的总和取自Power列,分为三行。

<强>基准

microbenchmark(
plafort = data.frame(T=big.df$Time[c(F,F,T)], P=rowSums(matrix(big.df$Power,,3,T))),
josilber = data.frame(Time=big.df$Time[seq(3, nrow(big.df), by=3)],
             Power=tapply(big.df$Power, floor((seq(nrow(big.df))-1)/3), sum))
)
#Unit: milliseconds
#expr        min         lq       mean    median         uq        max neval
#plafort   1.250796   1.345753   1.451546   1.46044   1.527486   2.045416   100
#josilber 176.438850 180.862507 187.434138 186.37592 189.628021 340.325792   100

数据

big.df <- data.frame(Time = rep(df$Time, 1e4), Power = rep(df$Power, 1e4))

答案 1 :(得分:1)

创建一个标识每个观察窗口的列(使用向量回收):

> df$window <- df$Time + minutes(5*c(2,1,0))
> print(df)
                 Time power              window
1 2015-04-01 00:05:00     1 2015-04-01 00:15:00
2 2015-04-01 00:10:00     2 2015-04-01 00:15:00
3 2015-04-01 00:15:00     3 2015-04-01 00:15:00
4 2015-04-01 00:20:00     4 2015-04-01 00:30:00
5 2015-04-01 00:25:00     5 2015-04-01 00:30:00
6 2015-04-01 00:30:00     6 2015-04-01 00:30:00

然后按窗口分组,总结:

> library(dplyr)
> df %>% group_by(window) %>% summarize(power=sum(power)) -> newdf
> print(newdf)
Source: local data frame [2 x 2]

               window power
1 2015-04-01 00:15:00     6
2 2015-04-01 00:30:00    15

答案 2 :(得分:0)

data.frame(Time=dat$Time[seq(3, nrow(dat), by=3)],
           Power=tapply(dat$Power, floor((seq(nrow(dat))-1)/3), sum))
#                  Time Power
# 0 2015-04-01 04:15:00     6
# 1 2015-04-01 04:30:00    15