我有一张表,其中包含人员ID和日期范围(开始日期和停止日期)。每个人可能有多行,包含多个开始和结束日期。
create table #DateRanges (
tableID int not null,
personID int not null,
startDate date,
endDate date
);
insert #DateRanges (tableID, personID, startDate, endDate)
values (1, 100, '2011-01-01', '2011-01-31') -- Just January
, (2, 100, '2011-02-01', '2011-02-28') -- Just February
, (3, 100, '2011-04-01', '2011-04-30') -- April - Skipped March
, (4, 100, '2011-05-01', '2011-05-31') -- May
, (5, 100, '2011-06-01', '2011-12-31') -- June through December
我需要一种方法来折叠相邻的日期范围(前一行的结束日期正好是下一行的开始日期前一天)。但它必须包括所有连续范围,只有当端到端间隙大于一天时才会分裂。以上数据需要压缩为:
+-----------+----------+--------------+------------+
| SomeNewID | PersonID | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
| 1 | 100 | 2011-01-01 | 2011-02-28 |
+-----------+----------+--------------+------------+
| 2 | 100 | 2011-04-01 | 2011-12-31 |
+-----------+----------+--------------+------------+
只有两行,因为唯一缺失的范围是三月。现在,如果所有的游行都存在,无论是一行还是多行,压缩只会产生一行。但是如果三月中旬只有两天出现,我们将获得第三排显示3月份的日期。
我一直在使用SQL 2016中的LEAD和LAG函数尝试将其作为记录集操作完成,但到目前为止已经空白。我希望能够在没有循环和RBAR的情况下完成它,但我没有看到解决方案。
答案 0 :(得分:0)
您可以使用滞后并获取正确的存储桶,然后执行以下分组:
;with cte1 as (
select *,dtdiff = datediff(day, lag(startdate, 1, null) over (partition by personid order by startdate), startDate) --Getting date difference for grouping
from #DateRanges
),
cte2 as (
select *, grp = sum(case when dtdiff is null or dtdiff>50 then 1 else 0 end) over (order by startdate) -- Creating bucket for min/max
from cte1
)
select SomeNewId = Row_Number() over (order by (select null)), Personid, NewStartDate = min(startdate), NewEndDate = max(enddate) --Getting min/max based on bucket
from cte2 group by PersonId, grp
你的输出:
+-----------+----------+--------------+------------+
| SomeNewId | Personid | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
| 1 | 100 | 2011-01-01 | 2011-02-28 |
| 2 | 100 | 2011-04-01 | 2011-12-31 |
+-----------+----------+--------------+------------+
我的测试输入:
insert #DateRanges (tableID, personID, startDate, endDate)
values (1, 100, '2011-01-01', '2011-01-31') -- Just January
, (2, 100, '2011-02-01', '2011-02-28') -- Just February
, (3, 100, '2011-04-01', '2011-04-30') -- April - Skipped March
, (4, 100, '2011-05-01', '2011-05-31') -- May
, (5, 100, '2011-06-01', '2011-06-30') -- More gaps
, (6, 100, '2011-07-01', '2011-07-31') -- More gaps
, (7, 100, '2011-08-01', '2011-08-31') -- More gaps
, (8, 100, '2011-10-01', '2011-10-31') -- More gaps
, (9, 100, '2011-11-01', '2011-11-30') -- More gaps
测试数据的输出:
+-----------+----------+--------------+------------+
| SomeNewId | Personid | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
| 1 | 100 | 2011-01-01 | 2011-02-28 |
| 2 | 100 | 2011-04-01 | 2011-08-31 |
| 3 | 100 | 2011-10-01 | 2011-11-30 |
+-----------+----------+--------------+------------+
答案 1 :(得分:0)
经过几天的努力,我想我有一个我想分享的解决方案,以防任何其他人需要类似的东西。我使用了几个CTE来查找超前,滞后和间隙时间,将行提取到只有重要的开始和停止日期,然后使用更多的超前和滞后来查找压缩的开始和停止日期。可能有一种更简单的方法,但我认为这很好地处理了日级解决方案。
if-else