对于一个scd类型2维度,我有一个问题是识别和修复一些具有重叠时间间隔的记录。 我所拥有的是:
Bkey Uid startDate endDate
'John' 1 1990-01-01 (some time stamp) 2017-01-10 (some time stamp)
'John' 2 2016-11=03 (some time stamp) 2016-11-14 (some time stamp)
'John' 3 2016-11-14 (some time stamp) 2016-12-29 (some time stamp)
'John' 4 2016-12-29 (some time stamp) 2017-01-10 (some time stamp)
'John' 5 2017-01-10 (some time stamp) 2017-04-22 (some time stamp)
......
我想找到(第一个)所有约翰都有重叠时间段的表,对于一个有很多很多约翰的表,然后找出纠正那些重叠时间段的方法。对于最新的我知道有一些功能LAGG,LEAD,它可以处理,但它让我不知道如何找到那些过度。 任何提示? 的问候,
答案 0 :(得分:1)
[1]以下查询将返回重叠的时间范围:
SELECT *,
(
SELECT *
FROM @Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM @Dimension1 x
完整脚本:
DECLARE @Dimension1 TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDate DATE NOT NULL,
endDate DATE NOT NULL,
CHECK(startDate < endDate)
);
INSERT @Dimension1
SELECT 'John', 1, '1990-01-01', '2017-01-10' UNION ALL
SELECT 'John', 2, '2016-11-03', '2016-11-14' UNION ALL
SELECT 'John', 3, '2016-11-14', '2016-12-29' UNION ALL
SELECT 'John', 4, '2016-12-29', '2017-01-10' UNION ALL
SELECT 'John', 5, '2017-01-11', '2017-04-22';
SELECT *,
(
SELECT *
FROM @Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM @Dimension1 x
<强> Demo here 强>
[2]为了找到具有重叠原始行的不同时间范围组,我将使用以下方法:
-- Edit 1
DECLARE @Groups TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDateNew DATE NOT NULL,
endDateNew DATE NOT NULL,
CHECK(startDateNew < endDateNew)
);
INSERT @Groups
SELECT x.Bkey, x.Uid, z.startDateNew, z.endDateNew
FROM @Dimension1 x
OUTER APPLY (
SELECT MIN(y.startDate) AS startDateNew, MAX(y.endDate) AS endDateNew
FROM @Dimension1 y
WHERE x.Bkey = y.Bkey
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
) z
-- End of Edit 1
-- This returns distinct groups identified by DistinctGroupId together with all overlapping Uid(s) from current group
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM @Groups a
) b
) c
OUTER APPLY (
SELECT d.Uid AS Overlapping_Uid
FROM @Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
) e
-- This returns distinct groups identified by DistinctGroupId together with an XML (XmlCol) which includes overlapping Uid(s)
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM @Groups a
) b
) c
OUTER APPLY (
SELECT (
SELECT d.Uid AS Overlapping_Uid
FROM @Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
FOR XML RAW, TYPE
) AS XmlCol
) e
注意:我的示例中使用的最后一个范围是'John', 5, '2017-01-11', '2017-04-22';
而不是'John', 5, '2017-01-10', '2017-04-22';
。此外,使用的数据类型为DATE
而非DATETIME[2][OFFSET]
。
答案 1 :(得分:0)
我认为您的查询中棘手的部分是能够清楚地表达重叠范围的逻辑。我们可以自己加入,条件是左边的一行与右边的任何一行重叠。所有匹配的行都是重叠的行。
我们可以考虑四种可能的重叠场景:
|---------| |---------| no overlap
|---------|
|---------| 1st end and 2nd start overlap
|---------|
|---------| 1st start and 2nd end overlap
|---------|
|---| 2nd completely contained inside 1st
(could be 1st inside 2nd also)
SELECT DISTINCT
t.Uid
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.startDate <= t2.endDate AND
t2.startDate <= t1.endDate
WHERE
t1.Bkey = 'John' AND t2.Bkey = 'John'
这至少可以让您识别重叠记录。以有意义的方式更新和分离它们可能最终会成为一个丑陋的差距和岛屿问题,或许会引起另一个问题。