我有一个名为events
的表格,如下所示:
id: int
source_id: int
start_datetime: timestamp
end_datetime: timestamp
这些事件可能有重叠,我想知道在一段时间内发生的最大重叠事件数。例如,在这种情况下:
id | source_id | start_datetime | end_datetime
----------------------------------------------------------
1 | 23 | 2017-1-1T10:20:00 | 2017-1-1T10:40:00
1 | 42 | 2017-1-1T10:30:00 | 2017-1-1T10:35:00
1 | 11 | 2017-1-1T10:37:00 | 2017-1-1T10:50:00
答案是2,因为最多2个事件在10:30到10:35重叠 我正在使用Postgres 9.6
答案 0 :(得分:1)
我不完全确定应该如何处理id
和source_id
列,但是根据您的描述,可能会这样:
select e1.source_id,
count(distinct e2.source_id) as overlap_count,
array_agg(e2.source_id) as overlap_events
from events e1
join events e2
on e1.source_id <> e2.source_id
and (e1.start_datetime, e1.end_datetime) overlaps (e2.start_datetime, e2.end_datetime)
group by e1.source_id
order by overlap_count desc;
根据您的示例数据,返回以下内容:
source_id | overlap_count | overlap_events
----------+---------------+---------------
23 | 2 | {42,11}
11 | 1 | {23}
42 | 1 | {23}
要仅获取最大行,您可以向查询添加limit 1
。
另一个(可能更慢)选项,如果您需要事件表中的完整行:
select e1.id, e1.source_id, e1.start_datetime, e1.end_datetime,
(select count(*)
from events e2
where e2.source_id <> e1.source_id
and (e1.start_datetime, e1.end_datetime) overlaps (e2.start_datetime, e2.end_datetime)
) as overlap_count
from events e1
order by overlap_count desc;
另一种选择是使用range types和&&
运算符代替overlaps
:
select e1.source_id,
count(distinct e2.source_id) as overlap_count,
array_agg(e2.source_id) as overlap_events
from events e1
join events e2 on e1.source_id <> e2.source_id
and tsrange(e1.start_datetime, e1.end_datetime,'[]') && tsrange(e2.start_datetime, e2.end_datetime, '[]')
group by e1.source_id
order by overlap_count desc;
答案 1 :(得分:1)
这是一个想法:计算开始次数并减去停止次数。这给出了每次净额。其余的只是聚合:
with e as (
select start_datetime as dte, 1 as inc
from events
union all
select end_datetime as dte, -1 as inc
from events
)
select max(concurrent)
from (select dte, sum(sum(inc)) over (order by dte) as concurrent
from e
group by dte
) e;
子查询显示每次重叠事件的数量。
您可以将时间范围设为:
select dte, next_dte, concurrent
from (select dte, sum(sum(inc)) over (order by dte) as concurrent,
lead(dte) over (partition by dte) as next_dte
from e
) e
order by concurrent desc
fetch first 1 row only;