我有如下表格,如果日期基于相同的成员ID重叠,我想分区表格。我使用以下代码,但它只基于成员ID进行分区,但没有重叠日期。如何包括考虑重叠日期的分区?
ID MemberID StartDate EndDate
1 2 2015-01-01 2015-02-28
2 2 2015-02-02 2015-02-03
3 2 2015-05-01 2015-05-20
4 1 2015-02-01 2015-02-28
5 2 2015-02-01 2015-03-01
SELECT *
,ROW_NUMBER() OVER(PARTITION BY MEMBERID ORDER BY ID) AS GROUPID
FROM TABLENAME AS A
ID MemberID StartDate EndDate
1 2 2015-01-01 2015-02-28
2 2 2015-02-01 2015-02-03
3 2 2015-05-01 2015-05-20
4 1 2015-02-01 2015-02-28
5 2 2015-02-01 2015-03-01
当前输出:
ID MemberID StartDate EndDate GROUPID
4 1 2015-02-01 2015-02-28 1
1 2 2015-01-01 2015-02-28 1
2 2 2015-02-02 2015-02-03 2
3 2 2015-05-01 2015-05-20 3
5 2 2015-02-01 2015-03-01 4
预期产出:
ID MemberID StartDate EndDate GROUPID
4 1 2015-02-01 2015-02-28 1
1 2 2015-01-01 2015-02-28 1
2 2 2015-02-02 2015-02-03 2
5 2 2015-02-01 2015-02-28 3
3 2 2015-05-01 2015-05-20 1
答案 0 :(得分:1)
您必须使用窗口函数的组合才能获得所需内容。这是你可以做到的一种方式:
SELECT ID, MemberID, StartDate, EndDate,
1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp
ORDER BY EndDate) AS GroupID
FROM (
SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
ROW_NUMBER() OVER (PARTITION BY MemberID
ORDER BY EndDate)
- SUM(bOverlaps) OVER (PARTITION BY MemberID
ORDER BY EndDate) AS grp
FROM (
SELECT ID, MemberID, StartDate, EndDate,
CASE
WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID
ORDER BY EndDate)
THEN 1
ELSE 0
END AS bOverlaps
FROM mytable) AS t ) AS u
<强>解释强>
首先考虑最里面的子查询:
SELECT ID, MemberID, StartDate, EndDate,
CASE
WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID
ORDER BY EndDate)
THEN 1
ELSE 0
END AS bOverlaps
FROM mytable
<强>输出:强>
ID MemberID StartDate EndDate bOverlaps
4 1 2015-02-01 2015-02-28 0
2 2 2015-02-02 2015-02-03 0
1 2 2015-01-01 2015-02-28 1
5 2 2015-02-01 2015-03-01 1
3 2 2015-05-01 2015-05-20 0
如果当前行与同一bOverlaps
分区的前一行重叠,则计算字段1
为MemberID
(true)。
下一级子查询使用上面派生的表来计算同一MemberID
分区内连续重叠记录的 islands 。
此查询:
SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
SUM(bOverlaps) OVER (PARTITION BY MemberID
ORDER BY EndDate) AS GroupSeq,
ROW_NUMBER() OVER (PARTITION BY MemberID
ORDER BY EndDate)
- SUM(bOverlaps) OVER (PARTITION BY MemberID
ORDER BY EndDate) AS grp
FROM ( ... innermost derived table here ... )
产生以下输出:
ID MemberID StartDate EndDate bOverlaps GroupSeq grp
4 1 2015-02-01 2015-02-28 0 0 1
2 2 2015-02-02 2015-02-03 0 0 1
1 2 2015-01-01 2015-02-28 1 1 1
5 2 2015-02-01 2015-03-01 1 2 1
3 2 2015-05-01 2015-05-20 0 2 2
GroupSeq
基本上是bOverlaps
的总计,用于计算grp
。上面输出中的grp
标识了3个独立的岛屿:
Island no. IDs grp value
1 4 1
2 2,1,5 1
3 3 2
最后,最外层的查询使用以下表达式:
1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp
ORDER BY EndDate) AS GroupID
为了计算GroupID
:再次使用运行总计,我们可以枚举属于同一个岛的行。
我们也可以将ROW_NUMBER
用于此目的:
ROW_NUMBER() OVER (PARTITION BY MemberID, grp
ORDER BY EndDate) AS GroupID
答案 1 :(得分:1)
此查询提供正确的输出:
WITH ord as (
SELECT ID, MemberID
, StartDate, EndDate
, n = ROW_NUMBER() over(partition by [MemberID] order by [StartDate], [EndDate])
FROM @data d1
), first as (
SELECT o1.ID, o1.MemberID
, o1.n
FROM ord o1
INNER JOIN ord o2 ON o1.MemberID = o2.MemberID AND o2.n+1 = o1.n AND o1.StartDate > o2.EndDate
), groups as (
SELECT o.ID, o.MemberID
, p = ROW_NUMBER() over(partition by o.MemberID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
FROM ord o
LEFT JOIN first f ON o.MemberID = f.MemberID AND o.n < f.n
GROUP BY o.ID, o.MemberID
)
SELECT g.ID, g.MemberID, d.StartDate, d.EndDate, GROUPID = g.p
FROM groups g
INNER JOIN @data d ON g.ID = d.ID
请注意,必须使用更多数据对其进行测试。
输出:
ID MemberID StartDate EndDate GROUPID
4 1 2015-02-01 2015-02-28 1
3 2 2015-05-01 2015-05-20 1
1 2 2015-01-01 2015-02-28 1
2 2 2015-02-02 2015-02-03 2
5 2 2015-02-01 2015-03-01 3
您的数据:
declare @data table([ID] int, [MemberID] int, [StartDate] date, [EndDate] date);
Insert into @data([ID], [MemberID], [StartDate], [EndDate])
VALUES
(1, 2, '2015-01-01', '2015-02-28'),
(2, 2, '2015-02-02', '2015-02-03'),
(3, 2, '2015-05-01', '2015-05-20'),
(4, 1, '2015-02-01', '2015-02-28'),
(5, 2, '2015-02-01', '2015-03-01')
;