R:按日期汇总 - (平均每30分钟)

时间:2016-10-11 22:39:54

标签: r aggregate

我一直在努力解决这个问题: 我有一个数据框,包含不同参数的5分钟测量(大约6个月)。我想聚合它们并且每30分钟得到每个参数的平均值。这是一个简短的例子:

TIMESTAMP <- c("2015-12-31 0:30", "2015-12-31 0:35","2015-12-31 0:40", "2015-12-31 0:45", "2015-12-31 0:50", "2015-12-31 0:55", "2015-12-31 1:00", "2015-12-31 1:05", "2015-12-31 1:10", "2015-12-31 1:15", "2015-12-31 1:20", "2015-12-31 1:25", "2015-12-31 1:30")
value1 <- c(45, 50, 68, 78, 99, 100, 5, 9, 344, 10, 45, 68, 33)
mymet <- as.data.frame(TIMESTAMP, value1)
mymet$TIMESTAMP <- as.POSIXct(mymet$TIMESTAMP, format = "%Y-%m-%d %H:%M")

halfhour <- aggregate(mymet, list(TIME = cut(mymet$TIMESTAMP, breaks = "30 mins")), 
  mean, na.rm = TRUE)

我想要得到的是00:35到1:00之间的平均值,并在DATE-1:00AM调用,但是,我得到的是:平均在00:30到00:55之间,这叫做DATE-上午12时30

如何更改功能以提供我想要的值?

2 个答案:

答案 0 :(得分:1)

诀窍(我认为)正在考虑你的第一次观察何时开始。如果第一次观察是00:35并且您进行了30分钟的切割,那么间隔应遵循您想要的逻辑。关于Breaks的名称,只需要在名称上添加25分钟,然后就可以得到你想要的东西。以下是2015年6个月的示例:

require(lubridate)
require(dplyr)
TIMESTAMP <- seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2015-06-01 23:55'), by = '5 min')
TIMESTAMP <- data.frame(obs=1:length(TIMESTAMP),TS=TIMESTAMP)
TIMESTAMP <- TIMESTAMP[-(1:7),] #TO start with at 00:35 minutes
TIMESTAMP$Breaks <- cut(TIMESTAMP$TS, breaks = "30 mins")
TIMESTAMP$Breaks <- ymd_hms(as.character(TIMESTAMP$Breaks)) + (25*60)
Averages <- TIMESTAMP %>% group_by(Breaks) %>%    summarise(MeanObs=mean(obs,na.rm = TRUE))

答案 1 :(得分:1)

如果您正确构建了mymet,则可以将TIMESTAMP剪切为容器(可以使用cut.POSIXt进行处理),这样您就可以aggregate

mymet$half_hour <- cut(mymet$TIMESTAMP, breaks = "30 min")

aggregate(value1 ~ half_hour, mymet, mean)

##             half_hour   value1
## 1 2015-12-31 00:30:00 73.33333
## 2 2015-12-31 01:00:00 80.16667
## 3 2015-12-31 01:30:00 33.00000

数据

mymet <- structure(list(TIMESTAMP = structure(c(1451539800, 1451540100, 
    1451540400, 1451540700, 1451541000, 1451541300, 1451541600, 1451541900, 
    1451542200, 1451542500, 1451542800, 1451543100, 1451543400), class = c("POSIXct", 
    "POSIXt"), tzone = ""), value1 = c(45, 50, 68, 78, 99, 100, 5, 
    9, 344, 10, 45, 68, 33)), .Names = c("TIMESTAMP", "value1"), row.names = c(NA, 
    -13L), class = "data.frame")