重叠日期的SQL分区

时间:2015-12-01 07:51:25

标签: sql sql-server tsql

我有如下表格,如果日期基于相同的成员ID重叠,我想分区表格。我使用以下代码,但它只基于成员ID进行分区,但没有重叠日期。如何包括考虑重叠日期的分区?

ID MemberID StartDate   EndDate      
1  2        2015-01-01  2015-02-28
2  2        2015-02-02  2015-02-03 
3  2        2015-05-01  2015-05-20 
4  1        2015-02-01  2015-02-28 
5  2        2015-02-01  2015-03-01 


SELECT *
,ROW_NUMBER() OVER(PARTITION BY MEMBERID ORDER BY ID) AS GROUPID
FROM TABLENAME AS A


ID MemberID StartDate   EndDate      
1  2        2015-01-01  2015-02-28
2  2        2015-02-01  2015-02-03 
3  2        2015-05-01  2015-05-20 
4  1        2015-02-01  2015-02-28 
5  2        2015-02-01  2015-03-01 

当前输出:

ID MemberID StartDate   EndDate      GROUPID
4  1        2015-02-01  2015-02-28   1
1  2        2015-01-01  2015-02-28   1
2  2        2015-02-02  2015-02-03   2
3  2        2015-05-01  2015-05-20   3
5  2        2015-02-01  2015-03-01   4

预期产出:

ID MemberID StartDate   EndDate      GROUPID
4  1        2015-02-01  2015-02-28   1
1  2        2015-01-01  2015-02-28   1
2  2        2015-02-02  2015-02-03   2
5  2        2015-02-01  2015-02-28   3
3  2        2015-05-01  2015-05-20   1

2 个答案:

答案 0 :(得分:1)

您必须使用窗口函数的组合才能获得所需内容。这是你可以做到的一种方式:

SELECT ID, MemberID, StartDate, EndDate,
       1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp 
                                ORDER BY EndDate) AS GroupID
FROM (                            
  SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
         ROW_NUMBER() OVER (PARTITION BY MemberID 
                            ORDER BY EndDate)
         -  SUM(bOverlaps) OVER (PARTITION BY MemberID 
                                 ORDER BY EndDate) AS grp                           
  FROM (
    SELECT ID, MemberID, StartDate, EndDate,
           CASE 
              WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID 
                                                   ORDER BY EndDate) 
              THEN 1
              ELSE 0 
           END AS bOverlaps
    FROM mytable) AS t ) AS u 

<强>解释

首先考虑最里面的子查询:

SELECT ID, MemberID, StartDate, EndDate,
           CASE 
              WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID 
                                                   ORDER BY EndDate) 
              THEN 1
              ELSE 0 
           END AS bOverlaps
FROM mytable

<强>输出:

ID  MemberID    StartDate   EndDate    bOverlaps
4   1           2015-02-01  2015-02-28 0
2   2           2015-02-02  2015-02-03 0
1   2           2015-01-01  2015-02-28 1
5   2           2015-02-01  2015-03-01 1
3   2           2015-05-01  2015-05-20 0

如果当前行与同一bOverlaps分区的前一行重叠,则计算字段1MemberID(true)。

下一级子查询使用上面派生的表来计算同一MemberID分区内连续重叠记录的 islands

此查询:

SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
       SUM(bOverlaps) OVER (PARTITION BY MemberID 
                            ORDER BY EndDate) AS GroupSeq,
       ROW_NUMBER() OVER (PARTITION BY MemberID 
                          ORDER BY EndDate)
       -  SUM(bOverlaps) OVER (PARTITION BY MemberID 
                               ORDER BY EndDate) AS grp      
FROM ( ... innermost derived table here ... )

产生以下输出:

ID  MemberID StartDate  EndDate    bOverlaps GroupSeq grp
4   1        2015-02-01 2015-02-28 0         0        1
2   2        2015-02-02 2015-02-03 0         0        1
1   2        2015-01-01 2015-02-28 1         1        1
5   2        2015-02-01 2015-03-01 1         2        1
3   2        2015-05-01 2015-05-20 0         2        2

GroupSeq基本上是bOverlaps的总计,用于计算grp。上面输出中的grp标识了3个独立的岛屿:

Island no. IDs    grp value
1          4      1
2          2,1,5  1
3          3      2

最后,最外层的查询使用以下表达式:

1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp 
                         ORDER BY EndDate) AS GroupID

为了计算GroupID:再次使用运行总计,我们可以枚举属于同一个岛的行。

我们也可以将ROW_NUMBER用于此目的:

ROW_NUMBER() OVER (PARTITION BY MemberID, grp 
                   ORDER BY EndDate) AS GroupID

Demo here

答案 1 :(得分:1)

此查询提供正确的输出:

WITH ord as (
    SELECT ID, MemberID
        , StartDate, EndDate
        , n = ROW_NUMBER() over(partition by [MemberID] order by [StartDate], [EndDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.MemberID
        , o1.n
    FROM ord o1
    INNER JOIN ord o2 ON o1.MemberID = o2.MemberID AND o2.n+1 = o1.n AND o1.StartDate > o2.EndDate
), groups as (
    SELECT o.ID, o.MemberID
        , p = ROW_NUMBER() over(partition by o.MemberID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
    FROM ord o
    LEFT JOIN first f ON o.MemberID = f.MemberID AND o.n < f.n
    GROUP BY o.ID, o.MemberID
)
SELECT g.ID, g.MemberID, d.StartDate, d.EndDate, GROUPID = g.p
FROM groups g
INNER JOIN @data d ON g.ID = d.ID

请注意,必须使用更多数据对其进行测试。

输出:

ID  MemberID    StartDate   EndDate     GROUPID
4   1           2015-02-01  2015-02-28  1
3   2           2015-05-01  2015-05-20  1
1   2           2015-01-01  2015-02-28  1
2   2           2015-02-02  2015-02-03  2
5   2           2015-02-01  2015-03-01  3

您的数据:

declare @data table([ID] int, [MemberID] int, [StartDate] date, [EndDate] date);
Insert into @data([ID], [MemberID], [StartDate], [EndDate])
VALUES
    (1, 2, '2015-01-01', '2015-02-28'),
    (2, 2, '2015-02-02', '2015-02-03'),
    (3, 2, '2015-05-01', '2015-05-20'),
    (4, 1, '2015-02-01', '2015-02-28'),
    (5, 2, '2015-02-01', '2015-03-01')
;