我有很多测量值,每分钟记录一次。某些值中包含给定分钟的平均值,最小值和最大值。我想总结/汇总整个data.frame,每30分钟有一个条目,所以
str(wgData)
'data.frame': 115200 obs. of 7 variables:
$ TIMESTAMP : POSIXct, format: "2012-11-24 00:00:00" "2012-11-24 00:01:00" "2012-11-24 00:02:00" 7"2012-11-24 00:03:00" ...
$ RECORD : int 11683 11684 11685 11686 11687 11688 11689 11690 11691 11692 ...
$ TPanel : num -0.075 -0.075 -0.075 -0.095 -0.095 -0.095 -0.095 -0.118 -0.118 -0.118 ...
$ VBattery : num 13.8 13.8 13.8 13.8 13.8 ...
$ VBatteryHeating_Avg: num 12.2 12.2 12.2 12.2 12.2 ...
$ VBatteryHeating_Min: num 12.2 12.2 12.2 12.2 12.2 ...
$ VBatteryHeating_Max: num 12.2 12.2 12.2 12.2 12.2 ...
所以我想每30分钟计算一次:TIMESTAMP
,TPanel
的平均值(小组的温度),VBattery
的平均值,VBatteryHeating_Avg
的平均值,最小VBatteryHeating_Min
,最多VBatteryHeating_Max
我做了一些成功
wgData30min <- aggregate(list(TP = wgData$TPanel, VB=wgData$VBatteryHeating_Avg, VB_MIN=wgData$VBatteryHeating_Min, VB_MAX=wgData$VBatteryHeating_Min),
list(Timestamp = cut(wgData$TIMESTAMP, "30 min")),
mean)
head(wgData30min)
Timestamp TP VB VB_MIN VB_MAX
1 2012-11-24 00:00:00 -0.1621667 12.15467 12.15333 12.15333
2 2012-11-24 00:30:00 -0.4751667 12.13333 12.13133 12.13133
3 2012-11-24 01:00:00 -0.5647333 12.11167 12.11067 12.11067
4 2012-11-24 01:30:00 -0.4573667 12.09133 12.08967 12.08967
5 2012-11-24 02:00:00 -0.4923667 12.07100 12.07000 12.07000
6 2012-11-24 02:30:00 -0.6469000 12.04933 12.04733 12.04733
...但是没有设法传递一系列函数来应用于列。任何帮助表示赞赏。
答案 0 :(得分:3)
我相信您的数据看起来像这样
seconds <- seq(0,100000, by= 600)
dates <- as.POSIXlt(seconds, origin = "2012-11-24", tz = "UTC")
TPanel <- rnorm(167)
VBatteryHeating_Avg <- rcauchy(167)
VBatteryHeating_Min <- runif(167)
VBatteryHeating_Max <- rexp(167)
wgData <- data.frame(TIMESTAMP = dates,
TPanel = TPanel,
VBatteryHeating_Avg = VBatteryHeating_Avg,
VBatteryHeating_Min = VBatteryHeating_Min,
VBatteryHeating_Max = VBatteryHeating_Max)
head(wgData)
## TIMESTAMP TPanel VBatteryHeating_Avg VBatteryHeating_Min VBatteryHeating_Max
## 1 2012-11-24 00:00:00 0.4770116 10.2937806 0.80151633 0.8722767
## 2 2012-11-24 00:10:00 0.0304906 -20.7057773 0.32311092 0.7172383
## 3 2012-11-24 00:20:00 1.4875903 0.5749393 0.74020471 0.5857239
## 4 2012-11-24 00:30:00 0.4933884 6.6567398 0.73824231 0.3691020
## 5 2012-11-24 00:40:00 -0.0369843 3.4332840 0.06552402 0.2455765
## 6 2012-11-24 00:50:00 0.7339858 -3.3787044 0.06451802 0.5952835
可能最好的解决方案是使用plyr
。首先,像以前一样使用cut
为30分钟的区块制作指标。然后使用ddply
,按该变量拆分数据框。
wgData$Timestamp30min <- cut(wgData$TIMESTAMP,"30 min")
library(plyr)
out <- ddply(wgData, .(Timestamp30min), summarize,
TP = mean(TPanel),
VB = mean(VBatteryHeating_Avg),
VB_min = min(VBatteryHeating_Min),
VB_max = max(VBatteryHeating_Max))
head(out)
## Timestamp30min TP VB VB_min VB_max
## 1 2012-11-24 00:00:00 0.6650308 -3.27901911 0.32311092 0.8722767
## 2 2012-11-24 00:30:00 0.3967966 2.23710649 0.06451802 0.5952835
## 3 2012-11-24 01:00:00 -0.1326459 -1.20082543 0.50358789 1.0569388
## 4 2012-11-24 01:30:00 0.7845420 -0.07520645 0.14500901 0.9656004
## 5 2012-11-24 02:00:00 -0.4523882 0.40472169 0.24997021 1.4056166
## 6 2012-11-24 02:30:00 -0.2317818 0.61860868 0.64909054 0.2338781
或者,您可以对每个函数aggregate
,mean
和min
使用max
,并对这些结果使用merge
,两个数据框一次。