是否有更好的方法来合并重叠的日期间隔?
我提出的解决方案非常简单,现在我想知道其他人是否更清楚如何做到这一点。
/***** DATA EXAMPLE *****/
DECLARE @T TABLE (d1 DATETIME, d2 DATETIME)
INSERT INTO @T (d1, d2)
SELECT '2010-01-01','2010-03-31' UNION SELECT '2010-04-01','2010-05-31'
UNION SELECT '2010-06-15','2010-06-25' UNION SELECT '2010-06-26','2010-07-10'
UNION SELECT '2010-08-01','2010-08-05' UNION SELECT '2010-08-01','2010-08-09'
UNION SELECT '2010-08-02','2010-08-07' UNION SELECT '2010-08-08','2010-08-08'
UNION SELECT '2010-08-09','2010-08-12' UNION SELECT '2010-07-04','2010-08-16'
UNION SELECT '2010-11-01','2010-12-31' UNION SELECT '2010-03-01','2010-06-13'
/***** INTERVAL ANALYSIS *****/
WHILE (1=1) BEGIN
UPDATE t1 SET t1.d2 = t2.d2
FROM @T AS t1 INNER JOIN @T AS t2 ON
DATEADD(day, 1, t1.d2) BETWEEN t2.d1 AND t2.d2
IF @@ROWCOUNT = 0 BREAK
END
/***** RESULT *****/
SELECT StartDate = MIN(d1) , EndDate = d2
FROM @T
GROUP BY d2
ORDER BY StartDate, EndDate
/***** OUTPUT *****/
/*****
StartDate EndDate
2010-01-01 2010-06-13
2010-06-15 2010-08-16
2010-11-01 2010-12-31
*****/
答案 0 :(得分:18)
我一直在寻找相同的解决方案,并在Combine overlapping datetime to return single overlapping range record上发现了这篇文章。
Packing Date Intervals上有另一个帖子。
我使用各种日期范围对此进行了测试,包括此处列出的日期范围,并且每次都能正常工作。
SELECT
s1.StartDate,
--t1.EndDate
MIN(t1.EndDate) AS EndDate
FROM @T s1
INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate
AND NOT EXISTS(SELECT * FROM @T t2
WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate)
WHERE NOT EXISTS(SELECT * FROM @T s2
WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate)
GROUP BY s1.StartDate
ORDER BY s1.StartDate
结果是:
StartDate | EndDate
2010-01-01 | 2010-06-13
2010-06-15 | 2010-06-25
2010-06-26 | 2010-08-16
2010-11-01 | 2010-12-31
答案 1 :(得分:6)
你在2010年问过这个问题,但没有说明任何特定的版本。
SQL Server 2012 +上人们的答案
WITH T1
AS (SELECT *,
MAX(d2) OVER (ORDER BY d1) AS max_d2_so_far
FROM @T),
T2
AS (SELECT *,
CASE
WHEN d1 <= DATEADD(DAY, 1, LAG(max_d2_so_far) OVER (ORDER BY d1))
THEN 0
ELSE 1
END AS range_start
FROM T1),
T3
AS (SELECT *,
SUM(range_start) OVER (ORDER BY d1) AS range_group
FROM T2)
SELECT range_group,
MIN(d1) AS d1,
MAX(d2) AS d2
FROM T3
GROUP BY range_group
返回
+-------------+------------+------------+
| range_group | d1 | d2 |
+-------------+------------+------------+
| 1 | 2010-01-01 | 2010-06-13 |
| 2 | 2010-06-15 | 2010-08-16 |
| 3 | 2010-11-01 | 2010-12-31 |
+-------------+------------+------------+
使用了 DATEADD(DAY, 1
,因为您想要的结果显示您希望将2010-06-25
上的句点折叠为一个以2010-06-26
开头的句点。对于其他用例,可能需要进行调整。
答案 2 :(得分:1)
这里是仅需三个简单扫描的解决方案。没有CTE,没有递归,没有联接,没有循环的表更新,没有“分组依据”-结果,此解决方案应该扩展到最佳状态(我认为)。 我认为,如果事先知道最小和最大日期,则扫描次数可以减少到两次。 逻辑本身只需要进行两次扫描-找到差距,并进行两次。
declare @datefrom datetime, @datethru datetime
DECLARE @T TABLE (d1 DATETIME, d2 DATETIME)
INSERT INTO @T (d1, d2)
SELECT '2010-01-01','2010-03-31'
UNION SELECT '2010-03-01','2010-06-13'
UNION SELECT '2010-04-01','2010-05-31'
UNION SELECT '2010-06-15','2010-06-25'
UNION SELECT '2010-06-26','2010-07-10'
UNION SELECT '2010-08-01','2010-08-05'
UNION SELECT '2010-08-01','2010-08-09'
UNION SELECT '2010-08-02','2010-08-07'
UNION SELECT '2010-08-08','2010-08-08'
UNION SELECT '2010-08-09','2010-08-12'
UNION SELECT '2010-07-04','2010-08-16'
UNION SELECT '2010-11-01','2010-12-31'
select @datefrom = min(d1) - 1, @datethru = max(d2) + 1 from @t
SELECT
StartDate, EndDate
FROM
(
SELECT
MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate,
LEAD(StartDate ) OVER (ORDER BY StartDate) - 1 EndDate
FROM
(
SELECT
StartDate, EndDate
FROM
(
SELECT
MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate,
LEAD(StartDate) OVER (ORDER BY StartDate) - 1 EndDate
FROM
(
SELECT d1 StartDate, d2 EndDate from @T
UNION ALL
SELECT @datefrom StartDate, @datefrom EndDate
UNION ALL
SELECT @datethru StartDate, @datethru EndDate
) T
) T
WHERE StartDate <= EndDate
UNION ALL
SELECT @datefrom StartDate, @datefrom EndDate
UNION ALL
SELECT @datethru StartDate, @datethru EndDate
) T
) T
WHERE StartDate <= EndDate
结果是:
StartDate EndDate
2010-01-01 2010-06-13
2010-06-15 2010-08-16
2010-11-01 2010-12-31
答案 3 :(得分:0)
在此解决方案中,我创建了一个临时Calendar表,该表存储范围内每天的值。这种类型的表可以是静态的。另外,从2009-12-31开始,我只存储400个奇数日期。显然,如果你的日期跨越更大的范围,你需要更多的值。
此外,此解决方案仅适用于SQL Server 2005+,因为我正在使用CTE。
With Calendar As
(
Select DateAdd(d, ROW_NUMBER() OVER ( ORDER BY s1.object_id ), '1900-01-01') As [Date]
From sys.columns as s1
Cross Join sys.columns as s2
)
, StopDates As
(
Select C.[Date]
From Calendar As C
Left Join @T As T
On C.[Date] Between T.d1 And T.d2
Where C.[Date] >= ( Select Min(T2.d1) From @T As T2 )
And C.[Date] <= ( Select Max(T2.d2) From @T As T2 )
And T.d1 Is Null
)
, StopDatesInUse As
(
Select D1.[Date]
From StopDates As D1
Left Join StopDates As D2
On D1.[Date] = DateAdd(d,1,D2.Date)
Where D2.[Date] Is Null
)
, DataWithEariestStopDate As
(
Select *
, (Select Min(SD2.[Date])
From StopDatesInUse As SD2
Where T.d2 < SD2.[Date] ) As StopDate
From @T As T
)
Select Min(d1), Max(d2)
From DataWithEariestStopDate
Group By StopDate
Order By Min(d1)
编辑 2009年使用日期的问题与最终查询无关。问题是Calendar表不够大。我在2009-12-31开始使用Calendar表。我从1900-01-01开始修改它。
答案 4 :(得分:0)
试试这个
;WITH T1 AS
(
SELECT d1, d2, ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R
FROM @T
), NUMS AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R
FROM T1 A
CROSS JOIN T1 B
CROSS JOIN T1 C
), ONERANGE AS
(
SELECT DISTINCT DATEADD(DAY, ROW_NUMBER() OVER(PARTITION BY T1.R ORDER BY (SELECT 0)) - 1, T1.D1) AS ELEMENT
FROM T1
CROSS JOIN NUMS
WHERE NUMS.R <= DATEDIFF(DAY, d1, d2) + 1
), SEQUENCE AS
(
SELECT ELEMENT, DATEDIFF(DAY, '19000101', ELEMENT) - ROW_NUMBER() OVER(ORDER BY ELEMENT) AS rownum
FROM ONERANGE
)
SELECT MIN(ELEMENT) AS StartDate, MAX(ELEMENT) as EndDate
FROM SEQUENCE
GROUP BY rownum
基本思想是首先展开现有数据,以便每天获得一个单独的行。这是在ONERANGE
中完成的然后,确定日期增量的方式与行号的方式之间的关系。 差异在现有范围/岛内保持不变。一旦到达新的数据岛,它们之间的差异就会增加,因为日期增量超过1,而行数增加1。
答案 5 :(得分:0)
这个想法是为了模拟合并间隔的扫描算法。我的解决方案确保它可以在各种SQL实现中使用。我已经在MySQL,Postgres,SQL-Server 2017,SQLite甚至Hive上对其进行了测试。
假定表架构如下。
CREATE TABLE t (
a DATETIME,
b DATETIME
);
我们还假设间隔是半开的,就像[a,b)。
当表中有(a,i,j)时,表明存在覆盖 a 的 j 个间隔,并且 i 间隔覆盖了上一点。
CREATE VIEW r AS
SELECT a,
Sum(d) OVER (ORDER BY a ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS i,
Sum(d) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) AS j
FROM (SELECT a, Sum(d) AS d
FROM (SELECT a, 1 AS d FROM t
UNION ALL
SELECT b, -1 AS d FROM t) e
GROUP BY a) f;
我们在间隔的并集中产生所有端点,并将相邻端点配对。最后,我们仅通过选择奇数行来生成间隔集。
SELECT a, b
FROM (SELECT a,
Lead(a) OVER (ORDER BY a) AS b,
Row_number() OVER (ORDER BY a) AS n
FROM r
WHERE j=0 OR i=0 OR i is null) e
WHERE n%2 = 1;
我已经创建了sample DB-fiddle和SQL-fiddle。我还写了blog post on union intervals in SQL。
答案 6 :(得分:0)
在这里和其他地方,我已经注意到日期打包问题并未提供解决此问题的几何方法。毕竟,包括日期范围在内的任何范围都可以解释为一行。因此,为什么不将它们转换为sql几何类型并利用geometry::UnionAggregate
合并范围。
这具有处理所有类型的重叠(包括完全嵌套的范围)的优势。它也像其他聚合查询一样工作,因此在这方面更加直观。如果您愿意使用它,还可以获得视觉效果的奖励。最后,这是我在simultaneous range packing中使用的方法(在这种情况下,您使用矩形而不是直线,并且还有很多注意事项)。在这种情况下,我只是无法使用现有的方法。
这具有需要更新版本的SQL Server的缺点。它还需要一个数字表,并且从聚合形状中提取出单独产生的线条很烦人。但是希望将来微软增加一个TVF,使您无需数字表即可轻松完成此操作(或者您可以自己构建一个)。此外,几何对象与浮点数一起使用,因此您要牢记转换烦恼和精度问题。
就性能而言,我不知道它是如何比较的,但是我做了一些事情(这里未显示),即使对于大型数据集,它也对我有用。
在“数字”中:
在“ mergeLines”中:
在外部查询中:
with
numbers as (
select row_number() over (order by (select null)) i
from @t
),
mergeLines as (
select lines = geometry::UnionAggregate(line)
from @t
cross apply (select line =
geometry::Point(convert(float, d1), 0, 0).STUnion(
geometry::Point(convert(float, d2) + 1, 0, 0)
).STEnvelope()
) l
)
select ap.StartDate,
ap.EndDate
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
StartDate = convert(datetime,l.line.STPointN(1).STX),
EndDate = convert(datetime,l.line.STPointN(3).STX) - 1
) ap
order by ap.StartDate;