我希望获得每小时平均数据。但是,数据不包含日期 - 只包含时间。我读过的问题和解决方案似乎依赖于使用日期。
数据摘录:
Time photon activity food light
11:51:39 18077 46 1 0
11:52:39 22938 37 1 0
11:53:39 24895 15 1 0
11:54:39 24311 2 1 0
11:55:39 21018 3 1 0
11:56:39 21143 12 1 0
也有一些数据丢失,因此每60个观测值的平均值不起作用。
我试图在数据中添加一个人工日期,但您可以想象这只创建了24个平均值,这些平均值跨越整个数据集。
tt <- strptime(paste("2015-07-21", data$Time), format="%Y-%m-%d %H:%M")
data <- cbind2(tt, data[,3:6])
hr.means <- aggregate(data["activity"],
list(hour = cut(data$x, breaks="hour")),
mean, na.rm = TRUE)
我坚持最好的攻击方式。 感谢。
答案 0 :(得分:1)
尝试在此处制作可重现的示例。我使用第一列时间创建了data.frame,第二列创建了您想要平均的任何数量。
Time Whatever
1 10:00 17
2 10:02 119
3 10:04 98
4 10:06 94
5 10:08 219
6 10:10 71
使用stringr,我们可以将小时作为数字提取,其余的只是算术。
library(stringr)
data = data.frame(Time=c("10:00", "10:02", "10:04", "10:06", "10:08", "10:10", "10:12", "10:14",
"10:16", "10:18", "10:20", "10:22", "10:24", "10:26", "10:28",
"10:30", "10:32", "10:34", "10:36", "10:38", "10:40", "10:42",
"10:44", "10:46", "10:48", "10:50", "10:52", "10:54", "10:56",
"10:58", "11:00", "11:01", "11:02", "11:03", "11:04", "11:05", "11:06",
"11:07", "11:08", "11:09", "11:10", "11:11", "11:12", "11:13", "11:14",
"11:15", "11:16", "11:17", "11:18", "11:19", "11:20", "11:21",
"11:22", "11:23", "11:24", "11:25", "11:26", "11:27", "11:28",
"11:29", "11:30", "11:31", "11:32", "11:33", "11:34", "11:35",
"11:36", "11:37", "11:38", "11:39", "11:40", "11:41", "11:42",
"11:43", "11:44", "11:45", "11:46", "11:47", "11:48", "11:49",
"11:50", "11:51", "11:52", "11:53", "11:54", "11:55", "11:56",
"11:57", "11:58", "11:59", "15:00", "15:10", "15:20", "15:30",
"15:40", "15:50", "16:00", "16:20", "16:40", "16:50")
,Whatever=c(17, 119, 98, 94, 219, 71, 38, 31, 8, 48, 139, 48, 90, 2, 40,
130, 164, 66, 14, 218, 13, 31, 177, 55, 74, 75, 17, 167, 0, 21,
56, 132, 138, 183, 94, 81, 1, 85, 25, 148, NA, 129, 25, 139,
84, 15, 41, 226, 79, 215, 26, 218, 23, 119, 102, 31, 195, 73,
50, 148, 29, 21, 154, 73, 114, 44, 80, 80, 86, 48, 52, 44, 106,
124, 43, 43, 174, 47, 214, 202, 111, 13, 96, 153, 59, 83, 20,
134, 163, 4, 59, 147, 71, 119, 113, 188, 19, 195, NA, 101), stringsAsFactors=F)
thour = as.numeric(str_extract(data$Time,'\\d{2}(?=:)'))
x = c(0,which(diff(thour) != 0),length(thour))
n = length(x)-1
interval = list()
for (i in 1:n) interval[[i]] = c(x[i]+1,x[i+1],thour[x[i+1]])
u1 = sapply(interval,function(j) j[3])
u2 = sapply(interval,function(j) mean(data$Whatever[j[1]:j[2]],na.rm=T))
data.frame(hour=u1,average=u2)
最后你会得到类似的东西。它还避免了在不同的一天从同一小时平均数据的问题。
hour average
1 10 76.13333
2 11 93.13559
3 15 116.16667
4 16 105.00000