我的数据框如下:
start_timestamp end_timestamp
2012-11-18 05:53:36.0 2012-11-18 7:46:40.0
2012-11-18 06:34:23.0 2012-12-18 09:21:57.0
我希望输出看起来像:
hour moves_being_played
2012-11-18 05:00:00.0 1
2012-11-18 06:00:00.0 2
2012-11-18 07:00:00.0 2
2012-11-18 08:00:00.0 1
2012-11-18 09:00:00.0 1
我能想到的唯一方法是创建一个看起来像这样的表:
hour moves_being_played
2012-11-18 05:00:00.0 NA
2012-11-18 06:00:00.0 NA
2012-11-18 07:00:00.0 NA
2012-11-18 08:00:00.0 NA
2012-11-18 09:00:00.0 NA
然后使用for循环迭代整个给定时间段内的每小时,看看有多少start_timestamps
较低,并与end_timestamp
配对更高,但这看起来非常低效。
答案 0 :(得分:1)
@ alistaire的评论是一个简洁,高效的解决方案,并且prbly shld都是一个真正的答案,如果作出答案肯定是被接受的答案。
将这一个放在那里以显示类似但更复杂的情况的一般习惯用法(IMO中没有足够的do()
个例子:
library(dplyr)
df <- data_frame(
start_timestamp=as.POSIXct(c("2012-11-18 05:53:36.0", "2012-11-18 06:34:23.0")),
end_timestamp=as.POSIXct(c("2012-11-18 07:46:40.0", "2012-11-18 09:21:57.0"))
)
hourly_count <- function(x) {
range(x$start_timestamp, x$end_timestamp) %>%
format("%Y-%m-%d %H:00:00") %>%
as.POSIXct()-> rng
hrs <- seq(from=rng[1], to=rng[2], by="1 hour")
data_frame(hour=hrs, is_playing=TRUE)
}
rowwise(df) %>%
do(hourly_count(.)) %>%
count(hour, is_playing) %>%
select(-is_playing, movies_being_played=n)
## Source: local data frame [5 x 2]
## Groups: hour [5]
##
## hour movies_being_played
## <dttm> <int>
## 1 2012-11-18 05:00:00 1
## 2 2012-11-18 06:00:00 2
## 3 2012-11-18 07:00:00 2
## 4 2012-11-18 08:00:00 1
## 5 2012-11-18 09:00:00 1