R和lubridate:x中的区间是否适合y中的任何区间?

时间:2014-12-31 18:38:11

标签: r intervals lubridate

我有兴趣只选择x中的区间(对我来说,taskInterval),这些区间也在y(campusInterval)的任何区间内。我试图使用lubridate来获得一个布尔值向量,然后使用它来对我的数据帧进行子集化。

理想情况下,我会这样做:

taskInterval %within% campusInterval

它会产生FALSE, FALSE, FALSE, TRUE, TRUE ...

的向量

我可以通过从campusInterval中选择一个间隔来实现它:

taskInterval %within% campusInterval[1]

最后,我可以使用for循环来为每个campusInterval生成一个向量,但我认为这是一个更优雅的方法。

这是我的数据,以及我如何制作我的间隔。非常感谢提前。

library(lubridate)

task.df <- structure(list(Start.Date = c("2014-09-01", "2014-09-01", "2014-09-01", 
"2014-09-02", "2014-09-02", "2014-09-02", "2014-09-02", "2014-09-03", 
"2014-09-03", "2014-09-03", "2014-09-03", "2014-09-03"), Start.Time = c("19:19", 
"19:41", "20:02", "07:43", "07:51", "08:03", "20:15", "07:40", 
"07:47", "08:03", "08:34", "09:30"), End.Date = c("2014-09-01", 
"2014-09-01", "2014-09-01", "2014-09-02", "2014-09-02", "2014-09-02", 
"2014-09-02", "2014-09-03", "2014-09-03", "2014-09-03", "2014-09-03", 
"2014-09-03"), End.Time = c("19:41", "20:02", "20:05", "07:44", 
"08:02", "08:19", "21:04", "18:00", "07:49", "08:28", "09:00", 
"09:38")), .Names = c("Start.Date", "Start.Time", "End.Date", 
"End.Time"), row.names = c(1L, 2L, 3L, 6L, 7L, 8L, 9L, 25L, 26L, 
27L, 28L, 29L), class = "data.frame")

campus.df <- structure(list(Start.Date = c("2014-09-02", "2014-09-03", "2014-09-04"
), Start.Time = c("07:37", "07:40", "07:40"), End.Date = c("2014-09-02", 
"2014-09-03", "2014-09-04"), End.Time = c("15:18", "18:00", "16:42"
)), .Names = c("Start.Date", "Start.Time", "End.Date", "End.Time"
), row.names = c(NA, 3L), class = "data.frame")


taskInterval <- interval(
    ymd_hm(paste(task.df$Start.Date, task.df$Start.Time)),
    ymd_hm(paste(task.df$End.Date, task.df$End.Time))
    )

campusInterval <- interval(
    ymd_hm(paste(campus.df$Start.Date, campus.df$Start.Time)),
    ymd_hm(paste(campus.df$End.Date, campus.df$End.Time))
)

1 个答案:

答案 0 :(得分:2)

我会使用data.table s foverlaps函数

来解决这个问题

首先,我们将转换为data.table个对象,创建startend间隔,并按这些间隔排序campus.df

library(data.table)
setDT(task.df)[, `:=`(start = as.POSIXct(paste(Start.Date, Start.Time)),
                      end = as.POSIXct(paste(End.Date, End.Time)))]

setkey(setDT(campus.df)[, `:=`(start = as.POSIXct(paste(Start.Date, Start.Time)),
                               end = as.POSIXct(paste(End.Date, End.Time)))], start, end)

然后,我们可以简单地做

foverlaps(task.df, campus.df, type = "any", which = TRUE) # You can also try `type = within` 
#     xid yid
# 1:    1  NA
# 2:    2  NA
# 3:    3  NA
# 4:    4   1
# 5:    5   1
# 6:    6   1
# 7:    7  NA
# 8:    8   2
# 9:    9   2
# 10:  10   2
# 11:  11   2
# 12:  12   2

输出告诉您4:6数据集的行task.df位于campus.df第一行的区间内,而行8:12位于第二行的区间内在campus.df


如果你想要的只是一个逻辑向量,表明task.df中的行是否在campus.df内的任何区间内,只需执行

!is.na(foverlaps(task.df, campus.df, type = "any", which = TRUE)$yid)
## [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE