我一直在尝试根据月份和时间平均数据。 我使用的数据是6个月(比如1月到6月),一列中间隔15分钟,第二列中的时间段值。 我使用下面提到的鳕鱼将数据从分钟间隔平均到每小时间隔:
library(xts)
data<-read.csv("C:/Users/naman.nagar/Downloads/JAVA &R/15_Minute_Site_ Avg.csv",header=TRUE,stringsAsFactors = FALSE)
data$Timestamp<-as.POSIXct(strptime(cognos_data$Timestamp,format="%Y-%m-%d %H:%M"))
data.xts<-xts(x=cognos_data$Wanamaker,cognos_data$Timestamp)
ep<-endpoints(data.xts,"hours")
period.apply(data.xts,ep,mean)
使用上述代码得到的数据是:
2015-12-19 10:15:00 1602
2015-12-19 11:15:00 1608
2015-12-19 12:15:00 1590
2015-12-19 13:15:00 1590
2015-12-19 14:15:00 1344
2015-12-19 15:15:00 1338
2015-12-19 16:15:00 1338
2015-12-19 17:15:00 1338
2015-12-19 18:15:00 1338
2015-12-19 19:15:00 1392
2015-12-19 20:15:00 1368
2015-12-19 21:15:00 1302
2015-12-19 22:15:00 1302
2015-12-19 23:15:00 1266
2015-12-20 00:15:00 1248
2015-12-20 01:15:00 1254
2015-12-20 02:15:00 1218
2015-12-20 03:15:00 1188
现在从这些数据中,我希望月平均数据为:
2015-12 10:00:00 1389
2015-12 11:00:00 1390
2015-12 12:00:00 1400
2015-12 13:00:00 1396
意思是说我得到完整的十二月份的平均数据,以小时为单位,比如在12:00:00,以获得整个月的特定小时的平均值。
请帮忙。 提前谢谢!
答案 0 :(得分:1)
这是使用dplyr
和lubridate
包的解决方案。
我们假设您有以下数据:
library(dplyr)
library(lubridate)
# just to make it reproducible
# also added a line at 10:00:00 so that we have at least once more than one value for hour 10
data <- structure(list(timestamp = c("2015-12-19 10:00:00", "2015-12-19 10:15:00",
"2015-12-19 11:15:00", "2015-12-19 12:15:00", "2015-12-19 13:15:00",
"2015-12-19 14:15:00", "2015-12-19 15:15:00", "2015-12-19 16:15:00",
"2015-12-19 17:15:00", "2015-12-19 18:15:00", "2015-12-19 19:15:00",
"2015-12-19 20:15:00", "2015-12-19 21:15:00", "2015-12-19 22:15:00",
"2015-12-19 23:15:00", "2015-12-20 00:15:00", "2015-12-20 01:15:00",
"2015-12-20 02:15:00", "2015-12-20 03:15:00"), x = c(400, 1602,
1608, 1590, 1590, 1344, 1338, 1338, 1338, 1338, 1392, 1368, 1302,
1302, 1266, 1248, 1254, 1218, 1188)), .Names = c("timestamp",
"x"), row.names = c(NA, 19L), class = "data.frame")
# let's have a look to it
head(data)
# timestamp x
# 1 2015-12-19 10:00:00 400
# 2 2015-12-19 10:15:00 1602
# 3 2015-12-19 11:15:00 1608
# 4 2015-12-19 12:15:00 1590
# 5 2015-12-19 13:15:00 1590
# 6 2015-12-19 14:15:00 1344
# etc.
然后,我们使用下面的管道:i)创建新列year_month(我假设你有多个)+小时,ii)按年份和月份分组,iii)总结每个平均值组(即每个小时给定月份):
data %>%
mutate(year_month=paste(year(timestamp), month(timestamp), sep="-"),
hour=hour(timestamp)) %>%
group_by(year_month, hour) %>% summarize(mean_x=mean(x))
# year_month hour mean_x
# (chr) (int) (dbl)
# 1 2015-12 0 1248
# 2 2015-12 1 1254
# 3 2015-12 2 1218
# 4 2015-12 3 1188
# 5 2015-12 10 1001
# 6 2015-12 11 1608
请注意小时10的值。
这是你想要的吗?
答案 1 :(得分:1)
我们可以在data.table
中执行类似的操作:
library(data.table)
setDT(df)[, .(mean = mean(value)), by = .(year = format(Timestamp, "%Y"),
month = format(Timestamp, "%m"),
hour = format(Timestamp, "%H"))]
# year month hour mean
#1: 2015 12 10 1602
#2: 2015 12 11 1608
#3: 2015 12 12 1590
#4: 2015 12 13 1590
#5: 2015 12 14 1344
#6: 2015 12 15 1338