我有一个具有以下形状的数据集:
2016-04-14 23:13:33
2016-04-14 23:18:37
2016-04-15 00:32:24
2016-04-15 00:33:11
2016-04-15 00:33:20
我想要做的是在15分钟间隔和每天对数据进行分组,所以看起来像:
Date Count
2016-04-14 23:00-23.15 27
. .
2016-04-15 00:00 - 00:15 41
因此count变量只计算该区间内有多少观察值。
更新
我删除了我的代码,因为我觉得它的答案令人困惑。那么,您如何以15分钟的间隔对这些数据进行分组?这是我的意思的一个例子:
Date count
2016-05-01 23:45 - 23:59 19
2016-05-02 00:00 - 00:14 276
2016-05-02 00:15 - 00:29 328
2016-05-02 00:30 - 00:44 244
任何有关此事的建议"计算"数据按日分解?
谢谢!
答案 0 :(得分:4)
POSIXct变量hourmessages$date
包含有关日期和时间的信息,因此您只需要按日期而不是按时间进行分组。这是修改过的代码。
messages <- data.frame(created_at = c('2016-04-14 23:13:33','2016-04-14 23:18:37','2016-04-15 00:32:24','2016-04-15 00:33:11','2016-04-15 00:33:20')
)
messages$created_at <- strptime(messages$created_at,"%Y-%m-%d %H:%M:%S")
messages$created_at[1]
hourmessages <- data.frame(
date=messages$created_at,
time=format(messages$created_at, "%H:%M")
)
denshours <- with( hourmessages, table(hourmessages$date)) #Replaced 'time' with 'date'
denshours <- as.data.frame(denshours)
denshours$Var1 <- strptime(denshours$Var1,"%Y-%m-%d %H:%M") #Corrected date formatting
denshours$Var1 = cut(denshours$Var1, breaks="15 min")
dat.summary = aggregate(denshours$Freq ~ denshours$Var1, FUN=sum, data=denshours)
colnames(dat.summary)[1] <- "time"
colnames(dat.summary)[2] <- "count"
更新:根据您问题的更新,您似乎希望将日期缩短为&#34; nice&#34;休息,例如00:00,15:00,而不是从13:00开始。 R使用第一个数据点来确定日期中断,从而确定复杂性。您可以利用POSIXct对象实际上是数字的事实,并获得如下所示的汇总表:
messages <- data.frame(created_at = c('2016-04-14 23:13:33','2016-04-14 23:18:37','2016-04-15 00:32:24','2016-04-15 00:33:11','2016-04-15 00:33:20')
)
messages$created_at <- strptime(messages$created_at,"%Y-%m-%d %H:%M:%S")
#This following line defines 15 minute breaks. If needed, you can replace 60*15 with the number of seconds for which you want your breaks to be defined.
messages$created_at_breaks <- as.POSIXct(floor(as.numeric(messages$created_at)/(60*15))*60*15,origin = '1970-01-01')
dat.summary <- data.frame(table(messages$created_at_breaks))