SQL - 合并重叠数据

时间:2014-04-18 00:48:02

标签: sql sql-server aggregate overlap

我在SQL Server中有一个简单的数据集,如下所示

**ROW    Start    End**
  0     1        2
  1     3        5
  2     4        6
  3     8        9

图形上,数据看起来像这样 enter image description here

我想要实现的是折叠重叠数据,以便我的查询返回

**ROW    Start    End**
  0     1        2
  1     3        6
  2     8        9

这是否可以在SQL Server中编写,而无需编写复杂的过程或语句?

3 个答案:

答案 0 :(得分:2)

这是另一种选择的 SQL Fiddle

首先,所有限制都按顺序排序。然后删除重叠范围内的“重复”限制(因为“开始”后跟另一个“开始”或“结束”后跟另一个“结束”)。现在,范围已折叠,Start和End值将在同一行中再次写出。

with temp_positions as  --Select all limits as a single column along with the start / end flag (s / e)
(
    select startx limit, 's' as pos from t
    union
    select endx, 'e' as pos from t
)
, ordered_positions as --Rank all limits
(
    select limit, pos, RANK() OVER (ORDER BY limit) AS Rank
    from temp_positions
)
, collapsed_positions as --Collapse ranges (select the first limit, if s is preceded or followed by e, and the last limit) and rank limits again
(
    select op1.*, RANK() OVER (ORDER BY op1.Rank) AS New_Rank
    from ordered_positions op1
    inner join ordered_positions op2
    on (op1.Rank = op2.Rank and op1.Rank = 1 and op1.pos = 's')
    or (op2.Rank = op1.Rank-1 and op2.pos = 'e' and op1.pos = 's') 
    or (op2.Rank = op1.Rank+1 and op2.pos = 's' and op1.pos = 'e')
    or (op2.Rank = op1.Rank and op1.pos = 'e' and op1.Rank = (select max(Rank) from ordered_positions))
)
, final_positions as --Now each s is followed by e. So, select s limits and corresponding e limits. Rank ranges
(
    select cp1.limit as cp1_limit, cp2.limit as cp2_limit,  RANK() OVER (ORDER BY cp1.limit) AS Final_Rank
    from collapsed_positions cp1
    inner join collapsed_positions cp2
    on cp1.pos = 's' and cp2.New_Rank = cp1.New_Rank+1
)
--Finally, subtract 1 from Rank to start Range #'s from 0
select fp.Final_Rank-1 seq_no, fp.cp1_limit as starty, fp.cp2_limit as endy
from final_positions fp;

您可以测试每个CTE的结果并跟踪进展。您可以通过删除以下CTE并从前一个CTE中进行选择来完成此操作,例如,如下所示。

with temp_positions as  --Select all limits as a single column along with the start / end flag (s / e)
(
    select startx limit, 's' as pos from t
    union
    select endx, 'e' as pos from t
)
, ordered_positions as --Rank all limits
(
    select limit, pos, RANK() OVER (ORDER BY limit) AS Rank
    from temp_positions
)
select *
from ordered_positions;

答案 1 :(得分:1)

执行此操作的关键是为重叠段指定“分组”值。然后,您可以通过此列进行聚合以获取所需的信息。当一个段与前一个段不重叠时,它会启动一个组。

with starts as (
      select t.*,
             (case when exists (select 1 from table t2 where t2.start < t.start and t2.end >= .end)
                   then 0
                   else 1
              end) as isstart
      from table t
     ),
     groups as (
      select s.*,
             (select sum(isstart)
              from starts s2
              where s2.start <= s.start
             ) as grouping
      from starts s
     )
select row_number() over (order by min(start)) as row,
       min(start) as start, max(end) as end
from groups
group by grouping;

答案 2 :(得分:0)

我会创建一个返回段的表值函数。然后你会称之为:

select *
from dbo.getCollapsedSegments(2, 9)

这是一个例子(我用FIN替换了END,因为END是一个保留字。)

CREATE FUNCTION dbo.getCollapsedSegments(@Start int, @Fin int)
RETURNS @CollapsedSegments TABLE 
(
    -- Columns returned by the function
    start int,
    fin int
)
AS 
BEGIN

    SELECT @Start = (SELECT MIN(Start) FROM data WHERE @Start <= Start)

    WHILE (@Start IS NOT NULL AND @Start < @Fin)
    BEGIN
        INSERT INTO @CollapsedSegments
        SELECT MIN(s1.Start), MAX(ISNULL(s2.Fin, s1.Fin))
        FROM data s1
        LEFT JOIN data s2
        ON s1.Start < s2.Fin
        AND s2.Start <= s1.Fin
        AND @Fin > s2.start
        WHERE s1.Start <= @Start
        AND @Start < s1.Fin

        SELECT @Start = (SELECT MAX(Fin) FROM @CollapsedSegments)

        SELECT @Start = MIN(Start)
        FROM data
        WHERE Start > @Start
    END

    RETURN;
END

我的测试数据:

create table data
(start int,
fin int)

insert into data
select 1, 2
union all
select 3, 5
union all
select 4, 6
union all
select 8, 9
union all
select 10, 11