按时间序列和不同函数聚合data.frame

时间:2013-07-25 13:09:06

标签: r aggregate

我有很多测量值,每分钟记录一次。某些值中包含给定分钟的平均值,最小值和最大值。我想总结/汇总整个data.frame,每30分钟有一个条目,所以

str(wgData)
'data.frame':   115200 obs. of  7 variables:
 $ TIMESTAMP          : POSIXct, format: "2012-11-24 00:00:00" "2012-11-24 00:01:00" "2012-11-24 00:02:00" 7"2012-11-24 00:03:00" ...
 $ RECORD             : int  11683 11684 11685 11686 11687 11688 11689 11690 11691 11692 ...
 $ TPanel             : num  -0.075 -0.075 -0.075 -0.095 -0.095 -0.095 -0.095 -0.118 -0.118 -0.118 ...
 $ VBattery           : num  13.8 13.8 13.8 13.8 13.8 ...
 $ VBatteryHeating_Avg: num  12.2 12.2 12.2 12.2 12.2 ...
 $ VBatteryHeating_Min: num  12.2 12.2 12.2 12.2 12.2 ...
 $ VBatteryHeating_Max: num  12.2 12.2 12.2 12.2 12.2 ...

所以我想每30分钟计算一次:TIMESTAMPTPanel的平均值(小组的温度),VBattery的平均值,VBatteryHeating_Avg的平均值,最小VBatteryHeating_Min,最多VBatteryHeating_Max

我做了一些成功

wgData30min <- aggregate(list(TP = wgData$TPanel, VB=wgData$VBatteryHeating_Avg, VB_MIN=wgData$VBatteryHeating_Min, VB_MAX=wgData$VBatteryHeating_Min),
               list(Timestamp = cut(wgData$TIMESTAMP, "30 min")),
               mean)
head(wgData30min)
            Timestamp         TP       VB   VB_MIN   VB_MAX
1 2012-11-24 00:00:00 -0.1621667 12.15467 12.15333 12.15333
2 2012-11-24 00:30:00 -0.4751667 12.13333 12.13133 12.13133
3 2012-11-24 01:00:00 -0.5647333 12.11167 12.11067 12.11067
4 2012-11-24 01:30:00 -0.4573667 12.09133 12.08967 12.08967
5 2012-11-24 02:00:00 -0.4923667 12.07100 12.07000 12.07000
6 2012-11-24 02:30:00 -0.6469000 12.04933 12.04733 12.04733

...但是没有设法传递一系列函数来应用于列。任何帮助表示赞赏。

1 个答案:

答案 0 :(得分:3)

我相信您的数据看起来像这样

seconds <- seq(0,100000, by= 600)
dates <- as.POSIXlt(seconds, origin = "2012-11-24", tz = "UTC")
TPanel <- rnorm(167)
VBatteryHeating_Avg <- rcauchy(167)
VBatteryHeating_Min <- runif(167)
VBatteryHeating_Max <- rexp(167)

wgData <- data.frame(TIMESTAMP = dates, 
                     TPanel = TPanel, 
                     VBatteryHeating_Avg = VBatteryHeating_Avg, 
                     VBatteryHeating_Min = VBatteryHeating_Min, 
                     VBatteryHeating_Max = VBatteryHeating_Max)

head(wgData)
##             TIMESTAMP     TPanel VBatteryHeating_Avg VBatteryHeating_Min VBatteryHeating_Max
## 1 2012-11-24 00:00:00  0.4770116          10.2937806          0.80151633           0.8722767
## 2 2012-11-24 00:10:00  0.0304906         -20.7057773          0.32311092           0.7172383
## 3 2012-11-24 00:20:00  1.4875903           0.5749393          0.74020471           0.5857239
## 4 2012-11-24 00:30:00  0.4933884           6.6567398          0.73824231           0.3691020
## 5 2012-11-24 00:40:00 -0.0369843           3.4332840          0.06552402           0.2455765
## 6 2012-11-24 00:50:00  0.7339858          -3.3787044          0.06451802           0.5952835

可能最好的解决方案是使用plyr。首先,像以前一样使用cut为30分钟的区块制作指标。然后使用ddply,按该变量拆分数据框。

wgData$Timestamp30min <- cut(wgData$TIMESTAMP,"30 min")

library(plyr)

out <- ddply(wgData, .(Timestamp30min), summarize,
             TP = mean(TPanel),
             VB = mean(VBatteryHeating_Avg),
             VB_min = min(VBatteryHeating_Min),
             VB_max = max(VBatteryHeating_Max))

head(out)
##        Timestamp30min         TP          VB     VB_min    VB_max
## 1 2012-11-24 00:00:00  0.6650308 -3.27901911 0.32311092 0.8722767
## 2 2012-11-24 00:30:00  0.3967966  2.23710649 0.06451802 0.5952835
## 3 2012-11-24 01:00:00 -0.1326459 -1.20082543 0.50358789 1.0569388
## 4 2012-11-24 01:30:00  0.7845420 -0.07520645 0.14500901 0.9656004
## 5 2012-11-24 02:00:00 -0.4523882  0.40472169 0.24997021 1.4056166
## 6 2012-11-24 02:30:00 -0.2317818  0.61860868 0.64909054 0.2338781

或者,您可以对每个函数aggregatemeanmin使用max,并对这些结果使用merge,两个数据框一次。