环境是SQL Server 2014。
我正在处理将许多保险登记细节(第一个和最后一个的小范围)减少到更大的互连(ME)连续登记范围。
为清楚起见,问题被简化为按id,first,last排序的样本数据。 F(n)和L(n)是id中记录n中的第一个和最后一个值。
大多数细节范围都是典型的
但细节上有恶魔 - 欢迎来到现实世界的数据。
此图片展示了大多数情况
Have
1 30 60 90 120
+-------+--------+--------+--------+
1 +-------+ (1:30)
2 +-------+ (31:60) adjacent
3 +--+ (40:50) embedded
4 + (61:61) adjacent some earlier
5 +-+ (61:65) adjacent some earlier
6 +--+ (61:75) adjacent some earlier
7 +--+ (65:80) overlap
8 +---------+ (85:120) gap, boundaries of ME ranges located
9 +-------+ (91:120)
10 +--+ (110:120)
Want
1 30 60 90 120
+-------+--------+--------+--------+
1 +----------------------+ (1:80)
2 +---------+ (85:120)
There are other unusual cases, such as embed followed by gap
.....
..
....
AAAAA BBBB
DROP TABLE #Details
CREATE TABLE #Details (id int, first int, last int);
insert into #Details values (1, 1, 30);
insert into #Details values (1, 31, 60);
insert into #Details values (1, 40, 50);
insert into #Details values (1, 61, 75);
insert into #Details values (1, 65, 80);
insert into #Details values (1, 85, 120);
insert into #Details values (1, 91, 120);
insert into #Details values (1, 110, 120);
我在堆栈和Refactoring Ranges上阅读了一些答案,但无法实现我的数据安排。
- 对于jpw -
典型分析可能涉及20,000个ID,包含200个详细记录。这些情况已通过下载到本地计算机并在SAS数据步骤中处理(以类似游标的方式)来处理。最坏的情况是650K ID和150M细节的顺序 - 下载方式的数据太多,并导致其他资源问题。我相信所有细节都可能在1.2B行的范围内。无论如何,如果它都可以在SQL服务器中完成,那么整个过程就会简化。
答案 0 :(得分:1)
好的,这个答案会让你接近。感觉有点过度烘烤给我,但绝对是在正确的轨道上。我相信你可以根据自己的需要量身定制。问题的关键在于建立重叠家庭。建立父列表后我使用了递归cte。请参阅下面的解释以获取更多详细信息。
初始数据
USERID RangeStart RangeEnd
----------- ----------- -----------
1 1 2
1 2 4
1 3 5
1 6 7
2 1 3
2 5 9
2 11 14
2 14 15
<强>查询强>
DECLARE @USERID TABLE (USERID INT, RangeStart INT, RangeEnd INT)
INSERT INTO @USERID (USERID, RangeStart,RangeEnd) VALUES
(1,1,2),(1,2,4),(1,3,5),(1,6,7),
(2,1,3),(2,5,9),(2,11,14),(2,14,15)
;WITH Data AS (
SELECT ROW_NUMBER() OVER (ORDER BY USERID, RangeStart) AS MasterOrdering,
USERID,
RangeStart,
RangeEnd,
LAG(RangeStart) OVER (PARTITION BY USERID ORDER BY RangeStart ASC) AS PreviousStart,
LAG(RangeEnd) OVER (PARTITION BY USERID ORDER BY RangeStart ASC) AS PreviousEnd
FROM @USERID
), ParentChild AS (
SELECT *,
Parent = CASE
WHEN PreviousStart IS NULL AND PreviousEnd IS NULL THEN MasterOrdering
WHEN PreviousEnd NOT BETWEEN RangeStart AND RangeEnd THEN MasterOrdering
ELSE 0
END
FROM Data
), Family AS (
SELECT MasterOrdering,
USERID,
RangeStart,
RangeEnd,
PreviousStart,
PreviousEnd,
Parent
FROM ParentChild
WHERE Parent > 0
UNION ALL
SELECT A.MasterOrdering,
A.USERID,
A.RangeStart,
A.RangeEnd,
A.PreviousStart,
A.PreviousEnd,
F.Parent
FROM ParentChild AS A
INNER JOIN Family AS F ON ( A.MasterOrdering = F.MasterOrdering + 1
AND A.parent = 0)
)
SELECT USERID,
MIN(RangeStart) AS RangeStart,
MAX(RangeEnd) AS RangeEnd,
MIN(MasterOrdering) AS MasterOrdering
FROM Family
GROUP BY UserID,Parent
ORDER BY MIN(MasterOrdering)
<强>结果
USERID RangeStart RangeEnd MasterOrdering
----------- ----------- ----------- --------------------
1 1 5 1
1 6 7 4
2 1 3 5
2 5 9 6
2 11 15 7
查询漫游
假设
循序渐进
看一看。很棒的问题和一些有趣的脑筋急转弯。
干杯
马特