我有两个数据框,
aa
的事件发生时间为start
和stop
-每天很多
prbb
有我对这些事件感兴趣的时间start
和stop
-每天每个事件一个start
和一个stop
。
我想从aa
中提取许多事件,这些事件属于prbb
aa <- data.frame(aaletters = c(rep("a",7), rep("b", 3),rep("c", 1)),
aastart = as.POSIXct(c("2019-05-02 05:06:35","2019-05-02 12:06:35", "2019-05-03 08:15:52", "2019-05-03 09:15:52", "2019-05-06 05:51:37",
"2019-05-06 07:01:37","2019-05-06 09:51:37","2019-05-02 07:15:32", "2019-05-03 12:14:04", "2019-05-06 12:24:37",
"2019-05-02 03:15:32"
)),
aastop = as.POSIXct(c("2019-05-02 05:15:30", "2019-05-02 12:15:30", "2019-05-03 08:44:08","2019-05-03 09:44:08", "2019-05-06 06:51:37",
"2019-05-06 07:02:37","2019-05-06 10:02:37","2019-05-02 08:15:32", "2019-05-03 13:41:16", "2019-05-06 13:24:43",
"2019-05-02 03:35:32"
)))
prbb <- data.frame(prbbletters = c(rep("a", 3), rep("b", 3), rep("c",3)),
prstart = as.POSIXct(c("2019-05-02 06:06:35", "2019-05-03 06:15:52", "2019-05-06 07:51:37", "2019-05-02 06:15:32", "2019-05-03 08:14:04",
"2019-05-06 06:24:37","2019-05-02 06:14:19", "2019-05-03 06:41:35", "2019-05-06 06:17:50"
)),
prstop = as.POSIXct(c("2019-05-02 23:18:30", "2019-05-03 20:44:08", "2019-05-06 22:37:20", "2019-05-02 23:24:27", "2019-05-03 19:41:16",
"2019-05-06 23:24:43","2019-05-02 19:50:52", "2019-05-03 23:57:47", "2019-05-06 23:56:39"
)))
我尝试了此操作,但没有成功,它忽略了by=
setDT(aa)
setDT(prbb)
aa[inrange(aa$aastart, prbb$prstart, prbb$prstop, incbounds = FALSE) & inrange(aa$aastop, prbb$prstart, prbb$prstop, incbounds = FALSE), by = prbletters]
setDT(aa)
setkey(aa, aastart, aastop)
setDT(prbb)
setkey(prbb, prstart, prstop)
foverlaps(aa, prbb, nomatch = NULL, mult = "first")[ , by = prbbletters]
我也尝试过fuzzy_joins
,但似乎无法正确整合分组。
# expected result: 7 rows
# 1: a 2019-05-02 12:06:35 2019-05-02 12:15:30
# 2: a 2019-05-03 08:15:52 2019-05-03 08:44:08
# 3: a 2019-05-03 09:15:52 2019-05-03 09:44:08
# 4: a 2019-05-06 09:51:37 2019-05-06 10:02:37
# 5: b 2019-05-02 07:15:32 2019-05-02 08:15:32
# 6: b 2019-05-03 12:14:04 2019-05-03 13:41:16
# 7: b 2019-05-06 12:24:37 2019-05-06 13:24:43
谢谢!
答案 0 :(得分:2)
看来您已经接近。.在使用foverlaps
之前只需要添加字母作为键即可。
来自foverlaps-help:
by.x和by.y中的最后两列应分别对应于x和y中的开始和结束间隔列。
因此,请设置要重叠连接的所有键,并确保最后两个键是 start 和 end 。
setDT(aa)
setDT(prbb)
setkey(aa, aaletters, aastart, aastop) # <-- added aalatters as first key !!
setkey(prbb, prbbletters, prstart, prstop) # <-- added prbbletters as key !!
foverlaps(aa, prbb, mult = "first", nomatch = 0L)
# aaletters prstart prstop aastart aastop
# 1: a 2019-05-02 06:06:35 2019-05-02 23:18:30 2019-05-02 12:06:35 2019-05-02 12:15:30
# 2: a 2019-05-03 06:15:52 2019-05-03 20:44:08 2019-05-03 08:15:52 2019-05-03 08:44:08
# 3: a 2019-05-03 06:15:52 2019-05-03 20:44:08 2019-05-03 09:15:52 2019-05-03 09:44:08
# 4: a 2019-05-06 07:51:37 2019-05-06 22:37:20 2019-05-06 09:51:37 2019-05-06 10:02:37
# 5: b 2019-05-02 06:15:32 2019-05-02 23:24:27 2019-05-02 07:15:32 2019-05-02 08:15:32
# 6: b 2019-05-03 08:14:04 2019-05-03 19:41:16 2019-05-03 12:14:04 2019-05-03 13:41:16
# 7: b 2019-05-06 06:24:37 2019-05-06 23:24:43 2019-05-06 12:24:37 2019-05-06 13:24:43