没有日期的每小时平均分钟数据

时间:2015-07-22 00:07:32

标签: r date time average

我希望获得每小时平均数据。但是,数据不包含日期 - 只包含时间。我读过的问题和解决方案似乎依赖于使用日期。

数据摘录:

     Time photon activity food light
 11:51:39  18077       46    1     0
 11:52:39  22938       37    1     0
 11:53:39  24895       15    1     0
 11:54:39  24311        2    1     0
 11:55:39  21018        3    1     0
 11:56:39  21143       12    1     0

也有一些数据丢失,因此每60个观测值的平均值不起作用。

我试图在数据中添加一个人工日期,但您可以想象这只创建了24个平均值,这些平均值跨越整个数据集。

tt <- strptime(paste("2015-07-21", data$Time), format="%Y-%m-%d %H:%M")
data <- cbind2(tt, data[,3:6])

hr.means <- aggregate(data["activity"], 
                  list(hour = cut(data$x, breaks="hour")), 
                  mean, na.rm = TRUE)

我坚持最好的攻击方式。 感谢。

1 个答案:

答案 0 :(得分:1)

尝试在此处制作可重现的示例。我使用第一列时间创建了data.frame,第二列创建了您想要平均的任何数量。

   Time Whatever
1 10:00       17
2 10:02      119
3 10:04       98
4 10:06       94
5 10:08      219
6 10:10       71

使用stringr,我们可以将小时作为数字提取,其余的只是算术。

library(stringr)
data = data.frame(Time=c("10:00", "10:02", "10:04", "10:06", "10:08", "10:10", "10:12", "10:14", 
"10:16", "10:18", "10:20", "10:22", "10:24", "10:26", "10:28", 
"10:30", "10:32", "10:34", "10:36", "10:38", "10:40", "10:42", 
"10:44", "10:46", "10:48", "10:50", "10:52", "10:54", "10:56", 
"10:58", "11:00", "11:01", "11:02", "11:03", "11:04", "11:05", "11:06", 
"11:07", "11:08", "11:09", "11:10", "11:11", "11:12", "11:13", "11:14", 
"11:15", "11:16", "11:17", "11:18", "11:19", "11:20", "11:21", 
"11:22", "11:23", "11:24", "11:25", "11:26", "11:27", "11:28", 
"11:29", "11:30", "11:31", "11:32", "11:33", "11:34", "11:35", 
"11:36", "11:37", "11:38", "11:39", "11:40", "11:41", "11:42", 
"11:43", "11:44", "11:45", "11:46", "11:47", "11:48", "11:49", 
"11:50", "11:51", "11:52", "11:53", "11:54", "11:55", "11:56", 
"11:57", "11:58", "11:59", "15:00", "15:10", "15:20", "15:30", 
"15:40", "15:50", "16:00", "16:20", "16:40", "16:50")
,Whatever=c(17, 119, 98, 94, 219, 71, 38, 31, 8, 48, 139, 48, 90, 2, 40, 
130, 164, 66, 14, 218, 13, 31, 177, 55, 74, 75, 17, 167, 0, 21, 
56, 132, 138, 183, 94, 81, 1, 85, 25, 148, NA, 129, 25, 139, 
84, 15, 41, 226, 79, 215, 26, 218, 23, 119, 102, 31, 195, 73, 
50, 148, 29, 21, 154, 73, 114, 44, 80, 80, 86, 48, 52, 44, 106, 
124, 43, 43, 174, 47, 214, 202, 111, 13, 96, 153, 59, 83, 20, 
134, 163, 4, 59, 147, 71, 119, 113, 188, 19, 195, NA, 101), stringsAsFactors=F)
thour = as.numeric(str_extract(data$Time,'\\d{2}(?=:)'))
x = c(0,which(diff(thour) != 0),length(thour))
n = length(x)-1
interval = list()
for (i in 1:n) interval[[i]] = c(x[i]+1,x[i+1],thour[x[i+1]])
u1 = sapply(interval,function(j) j[3])
u2 = sapply(interval,function(j) mean(data$Whatever[j[1]:j[2]],na.rm=T))
data.frame(hour=u1,average=u2)

最后你会得到类似的东西。它还避免了在不同的一天从同一小时平均数据的问题。

  hour   average
1   10  76.13333
2   11  93.13559
3   15 116.16667
4   16 105.00000