Question

我有一个包含多个条目的表。一个条目包括开始日期时间和结束日期时间。

我想以这样的方式找到条目集群：

如果条目在前一个条目结束之前开始，则两者都是群集的一部分。这是一种重叠的问题。

示例：

id      start                    end
1       2007-04-11 15:34:02      2007-05-11 13:09:01
2       2007-06-13 15:42:39      2009-07-21 11:30:00
3       2007-11-26 14:30:02      2007-12-11 14:09:07
4       2008-02-14 08:52:11      2010-02-23 16:00:00

我想输出

id      start                    end
1       2007-04-11 15:34:02      2007-05-11 13:09:01
2-4     2007-06-13 15:42:39      2010-02-23 16:00:00

我有一个解决方案，开始排序，然后用rownumber和lag / lead等进行一些计算。问题是特殊情况，第4行确实直接来到第2行，所以我不认识它......

这里的sql有一个很好的解决方案吗？也许我错过了什么？

Answer 1

好的，这是一个带递归cte的解决方案：

Create table t(id int, s date, e date)

Insert into t values
(1, '20070411', '20070511'),
(2, '20070613', '20090721'),
(3, '20071126', '20071211'),
(4, '20080214', '20100223')

;with cte as(
select id, s, e, id as rid, s as rs, e as re from t
Where not exists(select * from t ti where t.s > ti.s and t.s < ti.e)

Union all

Select t.*, 
  c.rid,
  c.rs,
  case when t.e > c.re then t.e else c.re end from t 
Join cte c on t.s > c.s and t.s < c.e

)

Select min(id) minid, max(id) maxid, min(rs) startdate, max(re) enddate from cte
group by rid

输出：

minid   maxid   startdate   enddate
1       1       2007-04-11  2007-05-11
2       4       2007-06-13  2010-02-23

小提琴http://sqlfiddle.com/#!6/2d6d3/10

Answer 2

试试这个......

select a.id ,a.start,a.end,b.id,b.start,b.end
from   tab   a
cross join tab b
where  a.start between b.start and b.end
order by a.start, a.end

我们必须针对所有其他行检查每一行，就像使用循环和内循环一样。为此我们进行交叉连接。

然后我们将使用BETWEEN AND运算符检查重叠

Answer 3

要回答此问题，您需要确定哪些时间开始新组。然后，在每次开始之前，计算这些开始的次数以定义一个组 - 并按此值聚合。

假设您没有重复次数，这应该可以设置标志：

select e.*,
       (case when not exists (select 1
                              from entries e2
                              where e2.start < e.start and e2.end > e.start
                             )
             then 1 else 0
        end) as BeginsIsland
from entries e;

以下是累加和和聚合，假设SQL Server 2012+（这可以很容易地适应早期版本，但这更容易编码）：

with e as (
      select e.*,
             (case when not exists (select 1
                                    from entries e2
                                    where e2.start < e.start and e2.end > e.start
                                   )
                       then 1 else 0
              end) as BeginIslandFlag
      from entries e
     )
select (case when min(id) = max(id) then cast(max(id) as varchar(255))
             else cast(min(id) as varchar(255)) + '-' + cast(max(id) as varchar(255))
        end) as ids,
       min(start) as start, max(end) as end
from (select e.* sum(BeginIslandFlag) over (order by start) as grp
      from e
     ) e
group by grp;

查找时间间隔群集

3 个答案: