如何根据时间水平汇总数据?

时间:2016-04-02 07:37:58

标签: r average

我一直在尝试根据月份和时间平均数据。 我使用的数据是6个月(比如1月到6月),一列中间隔15分钟,第二列中的时间段值。 我使用下面提到的鳕鱼将数据从分钟间隔平均到每小时间隔:

library(xts)

data<-read.csv("C:/Users/naman.nagar/Downloads/JAVA &R/15_Minute_Site_ Avg.csv",header=TRUE,stringsAsFactors = FALSE)
data$Timestamp<-as.POSIXct(strptime(cognos_data$Timestamp,format="%Y-%m-%d %H:%M"))
data.xts<-xts(x=cognos_data$Wanamaker,cognos_data$Timestamp)
ep<-endpoints(data.xts,"hours")
period.apply(data.xts,ep,mean)

使用上述代码得到的数据是:

    2015-12-19 10:15:00 1602
    2015-12-19 11:15:00 1608
    2015-12-19 12:15:00 1590
    2015-12-19 13:15:00 1590
    2015-12-19 14:15:00 1344
    2015-12-19 15:15:00 1338
    2015-12-19 16:15:00 1338
    2015-12-19 17:15:00 1338
    2015-12-19 18:15:00 1338
    2015-12-19 19:15:00 1392
    2015-12-19 20:15:00 1368
    2015-12-19 21:15:00 1302
    2015-12-19 22:15:00 1302
    2015-12-19 23:15:00 1266
    2015-12-20 00:15:00 1248
    2015-12-20 01:15:00 1254
    2015-12-20 02:15:00 1218
    2015-12-20 03:15:00 1188

现在从这些数据中,我希望月平均数据为:

    2015-12 10:00:00 1389
    2015-12 11:00:00 1390
    2015-12 12:00:00 1400
    2015-12 13:00:00 1396

意思是说我得到完整的十二月份的平均数据,以小时为单位,比如在12:00:00,以获得整个月的特定小时的平均值。

请帮忙。 提前谢谢!

2 个答案:

答案 0 :(得分:1)

这是使用dplyrlubridate包的解决方案。 我们假设您有以下数据:

library(dplyr)
library(lubridate)

# just to make it reproducible
# also added a line at 10:00:00 so that we have at least once more than one value for hour 10

    data <- structure(list(timestamp = c("2015-12-19 10:00:00", "2015-12-19 10:15:00", 
"2015-12-19 11:15:00", "2015-12-19 12:15:00", "2015-12-19 13:15:00", 
"2015-12-19 14:15:00", "2015-12-19 15:15:00", "2015-12-19 16:15:00", 
"2015-12-19 17:15:00", "2015-12-19 18:15:00", "2015-12-19 19:15:00", 
"2015-12-19 20:15:00", "2015-12-19 21:15:00", "2015-12-19 22:15:00", 
"2015-12-19 23:15:00", "2015-12-20 00:15:00", "2015-12-20 01:15:00", 
"2015-12-20 02:15:00", "2015-12-20 03:15:00"), x = c(400, 1602, 
1608, 1590, 1590, 1344, 1338, 1338, 1338, 1338, 1392, 1368, 1302, 
1302, 1266, 1248, 1254, 1218, 1188)), .Names = c("timestamp", 
"x"), row.names = c(NA, 19L), class = "data.frame")

# let's have a look to it
head(data)
#                 timestamp    x
# 1 2015-12-19 10:00:00  400
# 2 2015-12-19 10:15:00 1602
# 3 2015-12-19 11:15:00 1608
# 4 2015-12-19 12:15:00 1590
# 5 2015-12-19 13:15:00 1590
# 6 2015-12-19 14:15:00 1344
# etc.

然后,我们使用下面的管道:i)创建新列year_month(我假设你有多个)+小时,ii)按年份和月份分组,iii)总结每个平均值组(即每个小时给定月份):

data %>% 
   mutate(year_month=paste(year(timestamp), month(timestamp), sep="-"),
          hour=hour(timestamp)) %>% 
   group_by(year_month, hour) %>% summarize(mean_x=mean(x))

    # year_month  hour mean_x
# (chr) (int)  (dbl)
# 1     2015-12     0   1248
# 2     2015-12     1   1254
# 3     2015-12     2   1218
# 4     2015-12     3   1188
# 5     2015-12    10   1001
# 6     2015-12    11   1608

请注意小时10的值。

这是你想要的吗?

答案 1 :(得分:1)

我们可以在data.table中执行类似的操作:

library(data.table)
setDT(df)[, .(mean = mean(value)), by = .(year = format(Timestamp, "%Y"),
                                            month = format(Timestamp, "%m"), 
                                            hour = format(Timestamp, "%H"))]
#   year month hour  mean
#1: 2015    12   10  1602
#2: 2015    12   11  1608
#3: 2015    12   12  1590
#4: 2015    12   13  1590
#5: 2015    12   14  1344
#6: 2015    12   15  1338