我有500万以上的约会数据行(开始/停止时间),我希望将其转换为15分钟的块以用于需求预测和调度。
示例:
Start time: 9:30
Stop time: 10:10
我希望在9:30-9:44
,9:45-9:59
,10:00-10:14
列中填充一个,而其他93列在该特定行中的填充为零。
谢谢。
答案 0 :(得分:0)
> dput <- structure(
+ list(
+ start = structure(c(1539764520, 1539763920, 1539765180, 1539765180, 1539764400, 1539764400), class = c("POSIXct", "POSIXt" ), tzone = ""),
+ stop = structure(c(1539769320, 1539777420, 1539803940, 1539803940, 1539770700, 1539770700), class = c("POSIXct", "POSIXt" ), tzone = "")),
+ row.names = c(NA, 6L), class = "data.frame")
> dput
start stop
1 2018-10-17 17:22:00 2018-10-17 18:42:00
2 2018-10-17 17:12:00 2018-10-17 20:57:00
3 2018-10-17 17:33:00 2018-10-18 04:19:00
4 2018-10-17 17:33:00 2018-10-18 04:19:00
5 2018-10-17 17:20:00 2018-10-17 19:05:00
6 2018-10-17 17:20:00 2018-10-17 19:05:00
请参阅下文,您也可以更改为ceiling_date
或floor_date
:
> dput %>% mutate_all(round_date, '15 mins')
start stop
1 2018-10-17 17:15:00 2018-10-17 18:45:00
2 2018-10-17 17:15:00 2018-10-17 21:00:00
3 2018-10-17 17:30:00 2018-10-18 04:15:00
4 2018-10-17 17:30:00 2018-10-18 04:15:00
5 2018-10-17 17:15:00 2018-10-17 19:00:00
6 2018-10-17 17:15:00 2018-10-17 19:00:00
答案 1 :(得分:0)
好的,这可能有效。您的数据在这里称为df。这种方法取决于lubridate的int_overlaps
函数的使用,该函数可以检测约会和您指定的间隔(块)之间是否存在重叠。
library(tidyverse)
library(lubridate)
no_intervals <- 95 #number of intervals
intervals_start <- ymd_hms("2018-10-17 10:00:00")
intervals_width <- 15 #in minutes
#define intervals for the blocks you want to populate
blocks <- lapply(1:no_intervals, function(shift){
interval((intervals_start + (shift-1) * minutes(intervals_width)),
(intervals_start + (shift) * minutes(intervals_width)))}) %>%
`names<-`(paste0("int", 1 : no_intervals))
#add the overlaps of the appointments with the blocks to the df
res<- df %>%
mutate(appointment = interval(ymd_hms(df$start), ymd_hms(df$stop))) %>%
cbind(as.data.frame(lapply(blocks, int_overlaps, .$appointment))) %>%
mutate_at(vars(matches("^int")), as.numeric) #convert booleans to 0/1