我有一个问题,我需要比较几行的日期。要求是数据需要按照“区域/区域”分组。结合最低的StartDate'最高的' EndDate'除非在之前的' EndDate'之间存在超过1天的差距。以及下一个“开始日期”#39;
' StartDate'将永远是本月的第一天,并且' EndDate'永远是这个月的最后一天。
给出一个简化的表格:
Region | Area | StartDate | EndDate
-------|------|---------------|-------------
A | 1 | 01/01/2016 | 03/31/2016
A | 1 | 04/01/2016 | 05/31/2016
A | 1 | 07/01/2016 | 09/30/2016
A | 1 | 10/01/2016 | 01/31/2017
A | 1 | 02/01/2017 | 12/31/2017
B | 2 | 01/01/2016 | 04/30/2016
B | 2 | 05/01/2016 | 09/30/2016
A | 4 | 01/01/2016 | 05/31/2016
A | 4 | 06/01/2016 | 12/31/2016
我需要将结果看起来像这样:
Region | Area | StartDate | EndDate
-------|------|--------------|-----------
A | 1 | 01/01/2016 | 05/31/2016
A | 1 | 07/01/2016 | 12/31/2017
B | 2 | 01/01/2016 | 09/30/2016
A | 4 | 01/01/2016 | 12/31/2016
我尝试过使用MIN和MAX日期的GROUP BY,但我似乎无法弄清楚它的逻辑。
非常感谢任何想法或建议。
答案 0 :(得分:2)
这似乎是一个数据岛问题。您可以使用SQL Server 2012中引入的窗口函数。使用LAG
窗口函数,您可以确定您的上次记录结束日期时间是否与当前记录开始日期时间之间的差距大于一天。接下来,您可以使用SUM OVER
子句为每个数据岛生成分组ID。
DECLARE @SourceData TABLE
(
Region NVARCHAR(10)
,Area INT
,StartDate DATETIME
,EndDate DATETIME
);
INSERT INTO @SourceData
VALUES
('A', 1, '01/01/2016', '03/31/2016'),
('A', 1, '04/01/2016', '05/31/2016'),
('A', 1, '07/01/2016', '09/30/2016'),
('A', 1, '10/01/2016', '01/31/2017'),
('A', 1, '02/01/2017', '12/31/2017'),
('B', 2, '01/01/2016', '04/30/2016'),
('B', 2, '05/01/2016', '09/30/2016'),
('A', 4, '01/01/2016', '05/31/2016'),
('A', 4, '06/01/2016', '12/31/2016');
;WITH CTE_DataIslands -- First CTE determine the start of each new data island
AS
(
SELECT Region
,Area
,StartDate
,EndDate
,(
CASE
WHEN DATEADD(DAY, 1, LAG(EndDate, 1) OVER (PARTITION BY Region, Area ORDER BY StartDate ASC)) < (StartDate) THEN 1 -- If prev record's end date + 1 day is not equal to current record's start date then it is the start of a new data island.
ELSE 0
END
) AS [IsNewDataIsland]
FROM @SourceData
)
, CTE_GenerateGroupingID
AS
(
SELECT Region
,Area
,StartDate
,EndDate
,SUM([IsNewDataIsland]) OVER (PARTITION BY Region, Area ORDER BY StartDate ASC ROWS UNBOUNDED PRECEDING) AS GroupingID -- Create a running total of the IsNewDataIsland column this will create a grouping id we can now group on
FROM CTE_DataIslands
)
SELECT Region
,Area
,MIN(StartDate) AS StartDate
,MAX(EndDate) AS StartDate
FROM CTE_GenerateGroupingID
GROUP BY Region, Area, GroupingID