我有
household person start time end time
1 1 07:45:00 21:45:00
1 2 09:45:00 17:45:00
1 3 22:45:00 23:45:00
1 4 08:45:00 01:45:00
1 1 06:45:00 19:45:00
2 1 07:45:00 21:45:00
2 2 016:45:00 22:45:00
我想找到一列以查找家庭成员之间的重叠时间。
我需要将该列作为与另一个时间有交集的一个人的索引。
在上面的示例第一个家庭中,第一,第二和第四个人的时间相交。
输出:
household person start time end time overlap
1 1 07:45:00 21:45:00 2,4
1 2 09:45:00 17:45:00 1,4
1 3 22:45:00 23:45:00 NA
1 4 08:45:00 01:45:00 1,2
1 1 18:45:00 19:45:00 NA
2 1 07:45:00 21:45:00 2
2 2 016:45:00 22:45:00 1
NA表示与其他家庭成员无交集,可以为0或其他任何值
答案 0 :(得分:1)
将输入DF
左连接到自己,在同一家庭中以重叠条件连接到其他人。然后逐行将匹配的人连接成一个逗号分隔的字符串。
在没有解释什么构成重叠的情况下,我们尝试三种不同的重叠定义。第三个是最接近问题中显示的输出。
如果end_time < start_time
,则end_time
之前和start_time
之后的所有内容都应检查间隔是否重叠。然后,根据连接的左右两侧是否满足此条件,将重叠条件分解为4种情况。
如果start_time > end_time
位于左侧或右侧,则我们认为两者没有重叠
如果end_time> start_time,则将它们反转并像以前一样执行重叠。
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
when a.start_time <= a.end_time and b.start_time > b.end_time then
not (a.start_time between b.end_time and b.start_time and
a.end_time between b.end_time and b.start_time)
when a.start_time > a.end_time and b.start_time <= b.end_time then
not (b.start_time between a.end_time and a.start_time and
b.end_time between a.end_time and a.start_time)
else 1 end)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1,4
3 1 3 22:45:00 23:45:00 4
4 1 4 08:45:00 01:45:00 2,3
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
else 0 end)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 <NA>
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
sqldf("with DF2(rowid, household, person, start_time, end_time, st, en) as (
select rowid, *,
min(start_time, end_time) as st,
max(start_time, end_time) as en
from DF)
select a.household, a.person, a.start_time, a.end_time,
group_concat(distinct b.person) as overlap
from DF2 a
left join DF2 b
on a.household = b.household and
a.person != b.person and
(a.st between b.st and b.en or
b.st between a.st and a.en)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2,4
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 1
5 1 1 06:45:00 19:45:00 2,4
6 2 1 07:45:00 21:45:00 2
7 2 2 16:45:00 22:45:00 1
我们假设输入DF
的可重现形式为:
DF <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), person = c(1L,
2L, 3L, 4L, 1L, 1L, 2L), start_time = c("07:45:00", "09:45:00",
"22:45:00", "08:45:00", "06:45:00", "07:45:00", "16:45:00"),
end_time = c("21:45:00", "17:45:00", "23:45:00", "01:45:00",
"19:45:00", "21:45:00", "22:45:00")), class = "data.frame", row.names = c(NA,
-7L))