我有一个包含一列时间序列的数据集:
我在专栏上进行了strptime
timeStrip <- strptime(try$Created.Date, "%m/%d/%Y %I:%M:%S %p")
Large POSIXlt(114349 elements, 5.7mb
接下来,我执行table
和cut
个功能并按小时分组:
mytimeStrip <- table(cut(timeStrip, breaks="hour"))
table int[ 1:486(1d)] 212 200 168....
我只获得了486个值,并且缺少数据中的大量日期
答案 0 :(得分:0)
这可能会有所帮助
# example data frame
df = data.frame(x = c("10/29/2015 02:13:06 AM",
"10/29/2015 02:33:46 AM",
"10/29/2015 04:13:06 PM"))
df
# x
# 1 10/29/2015 02:13:06 AM
# 2 10/29/2015 02:33:46 AM
# 3 10/29/2015 04:13:06 PM
# get the hours from your dates
df$x = strptime(df$x, "%m/%d/%Y %I:%M:%S %p")
df$x2 = paste0(substr(df$x, 1, 14), "00:00")
df
# x x2
# 1 2015-10-29 02:13:06 2015-10-29 02:00:00
# 2 2015-10-29 02:33:46 2015-10-29 02:00:00
# 3 2015-10-29 16:13:06 2015-10-29 16:00:00
# count
df2 = data.frame(table(df$x2))
names(df2) = c("dates","Freq")
df2
# dates Freq
# 1 2015-10-29 02:00:00 2
# 2 2015-10-29 16:00:00 1
# create all possible hours in that time frame
dates = seq(min(df$x), max(df$x), by="hour")
dates = paste0(substr(dates, 1, 14), "00:00")
df3 = data.frame(dates)
df3
# dates
# 1 2015-10-29 02:00:00
# 2 2015-10-29 03:00:00
# 3 2015-10-29 04:00:00
# 4 2015-10-29 05:00:00
# 5 2015-10-29 06:00:00
# 6 2015-10-29 07:00:00
# 7 2015-10-29 08:00:00
# 8 2015-10-29 09:00:00
# 9 2015-10-29 10:00:00
# 10 2015-10-29 11:00:00
# 11 2015-10-29 12:00:00
# 12 2015-10-29 13:00:00
# 13 2015-10-29 14:00:00
# 14 2015-10-29 15:00:00
# 15 2015-10-29 16:00:00
# join to see where your counts belong
df4 = merge(df3,df2,by="dates", all.x = T)
df4$Freq[is.na(df4$Freq)] = 0
df4
# dates Freq
# 1 2015-10-29 02:00:00 2
# 2 2015-10-29 03:00:00 0
# 3 2015-10-29 04:00:00 0
# 4 2015-10-29 05:00:00 0
# 5 2015-10-29 06:00:00 0
# 6 2015-10-29 07:00:00 0
# 7 2015-10-29 08:00:00 0
# 8 2015-10-29 09:00:00 0
# 9 2015-10-29 10:00:00 0
# 10 2015-10-29 11:00:00 0
# 11 2015-10-29 12:00:00 0
# 12 2015-10-29 13:00:00 0
# 13 2015-10-29 14:00:00 0
# 14 2015-10-29 15:00:00 0
# 15 2015-10-29 16:00:00 1