给定具有任意间隔(非日期/时间!!)的表中的数据定义如下:
START float
END float
VALUE varchar(40)
E.g。
START END VALUE
----- --- ------
0 1 Banana
1 3 Banana
3 4 Orange
4 7 Orange
7 8 Apple
8 9 Apple
9 10 Apple
10 15 Apple
20 22 Apple
22 23 Apple
23 28 Banana
28 30 Banana
etc..
如何汇总数据,以便对于连续间隔,仅列出一个值。即查询的结果应如下所示:
START END VALUE
----- --- ------
0 3 Banana
3 7 Orange
7 15 Apple
20 23 Apple
23 30 Banana
注意上面15和20之间的差距。我正在处理大量数据(~500k行),但不经常运行查询。所以效率很高。这可以在不使用游标的情况下完成吗?
(注意:使用SQL2008R2所以不能利用更新的功能,如果存在的话)
谢谢!
答案 0 :(得分:3)
这应该适合你:
DECLARE @T TABLE (Start INT, [End] INT, Value VARCHAR(100));
INSERT @T (Start, [End], Value)
VALUES
(0, 1, 'Banana'), (1, 3, 'Banana'), (3, 4, 'Orange'), (4, 7, 'Orange'),
(7, 8, 'Apple'), (8, 9, 'Apple'), (9, 10, 'Apple'), (10, 15, 'Apple'),
(20, 22, 'Apple'), (22, 23, 'Apple'), (23, 28, 'Banana'), (28, 30, 'Banana');
WITH CTE AS
( SELECT t.[Start],
t.[End],
t.[value],
IsStart = ISNULL(c.IsStart, 1)
FROM @T AS T
OUTER APPLY
( SELECT TOP 1 IsStart = 0
FROM @T AS T2
WHERE T2.Value = T.Value
AND T2.[End] = T.Start
) AS c
)
SELECT Value, Start = MIN(Start), [End] = MAX([End])
FROM CTE AS T
OUTER APPLY
( SELECT SUM(IsStart)
FROM CTE AS T2
WHERE T2.Value = T.Value
AND T2.Start <= T.Start
) g (GroupingSet)
GROUP BY Value, GroupingSet
ORDER BY Start;
第一步是识别作为新范围开始的每条记录。这部分:
SELECT t.[Start],
t.[End],
t.[value],
IsStart = ISNULL(c.IsStart, 1)
FROM @T AS T
OUTER APPLY
( SELECT TOP 1 IsStart = 0
FROM @T AS T2
WHERE T2.Value = T.Value
AND T2.[End] = T.Start
) AS c
会给:
Start End value IsStart
0 1 Banana 1
1 3 Banana 0
3 4 Orange 1
4 7 Orange 0
7 8 Apple 1
8 9 Apple 0
9 10 Apple 0
10 15 Apple 0
20 22 Apple 1
然后,您可以通过识别在当前记录之前开始的范围数来创建唯一组,实际上是添加按值分区的IsStart
列的运行总计。这是在这里做的:
SELECT *
FROM CTE AS T
OUTER APPLY
( SELECT SUM(IsStart)
FROM CTE AS T2
WHERE T2.Value = T.Value
AND T2.Start <= T.Start
) g (GroupingSet);
,并提供:
Start End value IsStart GroupingSet
0 1 Banana 1 1
1 3 Banana 0 1
3 4 Orange 1 1
4 7 Orange 0 1
7 8 Apple 1 1
8 9 Apple 0 1
9 10 Apple 0 1
10 15 Apple 0 1
20 22 Apple 1 2 -- SECOND NON CONTINUOUS RANGE FOR APPLES
22 23 Apple 0 2
23 28 Banana 1 2 -- SECOND NON CONTINUOUS RANGE FOR BANANAS
28 30 Banana 0 2
最后,您可以按值聚合分组,并使用此标识符列来标识唯一的组。
您也可以通过交叉连接到数字表格将每个范围扩展到行中来实现这一点(为简洁起见,我使用了master..spt_values
):
WITH CTE AS
( SELECT t.[value],
Number = t.Start + v.Number,
GroupingSet = t.Start + v.Number - ROW_NUMBER() OVER(PARTITION BY t.[value] ORDER BY t.Start + v.Number)
FROM @T AS T
INNER JOIN Master..spt_values v
ON v.[Type] = 'P'
AND v.Number < (t.[End] - t.[Start])
)
SELECT Value, [Start] = MIN(Number), [End] = MAX(Number)
FROM CTE
GROUP BY GroupingSet, Value;
如果你有很多行/大范围,那么它的垮台就是内存密集。扩展范围后,这只使用Itzik Ben-Gan's Gaps and Islands Solutions
中描述的排名函数的方法答案 1 :(得分:1)
使用SQLServer 2008,一种方法是使用三角形连接,稍加扭曲
WITH I AS (
SELECT ID = Row_Number() OVER (ORDER BY Start)
, _Start = [Start]
, _End = [End]
, Value
FROM Data
), D AS (
SELECT i.ID, i._Start, i._End, i.Value
, m.id _id, m.value _value
, R = CASE WHEN i.Value <> m.Value THEN 1
WHEN m._End <> i._Start THEN 1
ELSE 0
END
FROM I
CROSS APPLY (SELECT TOP 1
id, _Start, _End, value
FROM I m
WHERE m.ID IN (i.ID, i.ID - 1)
ORDER BY ID) m
), B AS (
SELECT i.ID, i._Start, i._End, i.Value
, R = SUM(l.R)
FROM D i
LEFT JOIN D l ON i.id >= l.id
GROUP BY i.ID, i._Start, i._End, i.Value
)
SELECT [START] = MIN(_Start)
, [END] = MAX(_End)
, Value
FROM B
GROUP BY R, Value
ORDER BY 1
CTE
I
(ID)会创建一个ID,只要后续两行之间有间隙(ID用于获取JOIN
)中的正确行。
CTE
D
(数据)使用CROSS APPLY
获取上一行(或第一行的相同行),这是相同的LAG
的{{1}},检查前一行的值,以查看Value
是否已更改,或者当前[START]
与前一个[END]
之间是否存在差距。< / p>
CTE
B
(阻止)使用JOIN
与其自身之间的三角形D
创建一个字段,其中存储的数量为从开始到当前行的变化和差距,该字段对于同一组数据具有相同的数字。
主查询使用该新列来聚合数据。
答案 2 :(得分:1)
WITH TableWithPreviousAndNext AS (
SELECT CA1.[Previous]
,Table1.[Start]
,Table1.[End]
,CA2.[Next]
,Table1.[Value]
,(1 + ROW_NUMBER() OVER (PARTITION BY [Value] ORDER BY Table1.[Start])) / 2 AS [Group]
FROM Table1
CROSS APPLY (
SELECT MAX([End]) AS [Previous]
FROM Table1 AS InnerTable1
WHERE InnerTable1.[Value] = Table1.[Value]
AND InnerTable1.[Start] < Table1.[Start]
) AS CA1
CROSS APPLY (
SELECT MIN([Start]) AS Next
FROM Table1 AS InnerTable1
WHERE InnerTable1.[Value] = Table1.[Value]
AND InnerTable1.[Start] > Table1.[Start]
) AS CA2
CROSS APPLY ( -- A little trick to create a 2 row group for isolated rows
SELECT 1 AS Dummy
UNION ALL
SELECT 1
WHERE ([Previous] IS NULL OR [Previous] <> [Start])
AND ([Next] IS NULL OR [Next] <> [End])
) AS CA3
WHERE [Previous] IS NULL -- Remove all but first and last in sequence
OR [Next] IS NULL
OR [Previous] <> [Start]
OR [End] <> [Next]
)
SELECT MIN([Start])
,MAX([End])
,[Value]
FROM TableWithPreviousAndNext
GROUP BY [Value]
,[Group]
ORDER BY MIN(Start)