我有兴趣只选择x中的区间(对我来说,taskInterval
),这些区间也在y(campusInterval
)的任何区间内。我试图使用lubridate来获得一个布尔值向量,然后使用它来对我的数据帧进行子集化。
理想情况下,我会这样做:
taskInterval %within% campusInterval
它会产生FALSE, FALSE, FALSE, TRUE, TRUE
...
我可以通过从campusInterval中选择一个间隔来实现它:
taskInterval %within% campusInterval[1]
最后,我可以使用for
循环来为每个campusInterval生成一个向量,但我认为这是一个更优雅的方法。
这是我的数据,以及我如何制作我的间隔。非常感谢提前。
library(lubridate)
task.df <- structure(list(Start.Date = c("2014-09-01", "2014-09-01", "2014-09-01",
"2014-09-02", "2014-09-02", "2014-09-02", "2014-09-02", "2014-09-03",
"2014-09-03", "2014-09-03", "2014-09-03", "2014-09-03"), Start.Time = c("19:19",
"19:41", "20:02", "07:43", "07:51", "08:03", "20:15", "07:40",
"07:47", "08:03", "08:34", "09:30"), End.Date = c("2014-09-01",
"2014-09-01", "2014-09-01", "2014-09-02", "2014-09-02", "2014-09-02",
"2014-09-02", "2014-09-03", "2014-09-03", "2014-09-03", "2014-09-03",
"2014-09-03"), End.Time = c("19:41", "20:02", "20:05", "07:44",
"08:02", "08:19", "21:04", "18:00", "07:49", "08:28", "09:00",
"09:38")), .Names = c("Start.Date", "Start.Time", "End.Date",
"End.Time"), row.names = c(1L, 2L, 3L, 6L, 7L, 8L, 9L, 25L, 26L,
27L, 28L, 29L), class = "data.frame")
campus.df <- structure(list(Start.Date = c("2014-09-02", "2014-09-03", "2014-09-04"
), Start.Time = c("07:37", "07:40", "07:40"), End.Date = c("2014-09-02",
"2014-09-03", "2014-09-04"), End.Time = c("15:18", "18:00", "16:42"
)), .Names = c("Start.Date", "Start.Time", "End.Date", "End.Time"
), row.names = c(NA, 3L), class = "data.frame")
taskInterval <- interval(
ymd_hm(paste(task.df$Start.Date, task.df$Start.Time)),
ymd_hm(paste(task.df$End.Date, task.df$End.Time))
)
campusInterval <- interval(
ymd_hm(paste(campus.df$Start.Date, campus.df$Start.Time)),
ymd_hm(paste(campus.df$End.Date, campus.df$End.Time))
)
答案 0 :(得分:2)
我会使用data.table
s foverlaps
函数
首先,我们将转换为data.table
个对象,创建start
和end
间隔,并按这些间隔排序campus.df
library(data.table)
setDT(task.df)[, `:=`(start = as.POSIXct(paste(Start.Date, Start.Time)),
end = as.POSIXct(paste(End.Date, End.Time)))]
setkey(setDT(campus.df)[, `:=`(start = as.POSIXct(paste(Start.Date, Start.Time)),
end = as.POSIXct(paste(End.Date, End.Time)))], start, end)
然后,我们可以简单地做
foverlaps(task.df, campus.df, type = "any", which = TRUE) # You can also try `type = within`
# xid yid
# 1: 1 NA
# 2: 2 NA
# 3: 3 NA
# 4: 4 1
# 5: 5 1
# 6: 6 1
# 7: 7 NA
# 8: 8 2
# 9: 9 2
# 10: 10 2
# 11: 11 2
# 12: 12 2
输出告诉您4:6
数据集的行task.df
位于campus.df
第一行的区间内,而行8:12
位于第二行的区间内在campus.df
如果你想要的只是一个逻辑向量,表明task.df
中的行是否在campus.df
内的任何区间内,只需执行
!is.na(foverlaps(task.df, campus.df, type = "any", which = TRUE)$yid)
## [1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE