我在MS SQL 2012上有一个庞大的数据集,必须进行特殊的聚合。 这是数据集的示例。
Key PartitionID StartTime Duration Name
1 1 23/05/2019 18:18:28.125 1 X
2 1 23/05/2019 18:18:28.480 2 Y
3 1 23/05/2019 18:18:29.622 1 X
4 1 23/05/2019 18:18:32.513 2 X
5 2 23/05/2019 18:21:13.973 3 X
6 2 23/05/2019 18:21:14.945 4 X
7 2 23/05/2019 18:21:21.949 5 X
8 2 23/05/2019 18:21:30.871 2 X
9 2 23/05/2019 18:21:35.710 4 X
10 2 23/05/2019 18:21:48.550 1 X
11 2 23/05/2019 18:22:00.144 3 X
12 2 23/05/2019 18:22:01.094 6 X
13 2 23/05/2019 18:22:03.354 1 X
14 3 23/05/2019 18:24:44.219 6 X
15 3 23/05/2019 18:24:46.076 1 Y
16 3 23/05/2019 18:24:52.399 4 X
17 3 23/05/2019 18:25:03.620 6 X
18 3 23/05/2019 18:25:11.208 1 X
19 3 23/05/2019 18:25:12.616 4 X
20 3 23/05/2019 18:25:28.019 6 X
21 3 23/05/2019 18:25:31.384 2 Y
21 3 23/05/2019 18:25:32.334 2 Y
21 3 23/05/2019 18:25:33.344 2 X
我必须创建一个新列,该列将根据Name将数据划分为多个集合,当以不同的Name分隔时,同一Name的CalculatedID必须不同。换句话说,如果相邻行具有相同的名称,那么它们也将具有相同的CalculatedId。
结果应与此类似:
Key PartitionID StartTime Duration Name CalculatedID
1 1 23/05/2019 18:18:28.125 1 X 1
2 1 23/05/2019 18:18:28.480 2 Y 2
3 1 23/05/2019 18:18:29.622 1 X 3
4 1 23/05/2019 18:18:32.513 2 X 3
5 2 23/05/2019 18:21:13.973 3 X 1
6 2 23/05/2019 18:21:14.945 4 X 1
7 2 23/05/2019 18:21:21.949 5 X 1
8 2 23/05/2019 18:21:30.871 2 X 1
9 2 23/05/2019 18:21:35.710 4 X 1
10 2 23/05/2019 18:21:48.550 1 X 1
11 2 23/05/2019 18:22:00.144 3 X 1
12 2 23/05/2019 18:22:01.094 6 X 1
13 2 23/05/2019 18:22:03.354 1 X 1
14 3 23/05/2019 18:24:44.219 6 X 1
15 3 23/05/2019 18:24:46.076 1 Y 2
16 3 23/05/2019 18:24:52.399 4 X 3
17 3 23/05/2019 18:25:03.620 6 X 3
18 3 23/05/2019 18:25:11.208 1 X 3
19 3 23/05/2019 18:25:12.616 4 X 3
20 3 23/05/2019 18:25:28.019 6 X 3
21 3 23/05/2019 18:25:31.384 2 Y 4
21 3 23/05/2019 18:25:32.334 2 Y 4
21 3 23/05/2019 18:25:33.344 2 X 5
我真的想避免循环访问数据,因为数据集很容易超过10M。
答案 0 :(得分:3)
这可以通过使用带有lag
的公用表表达式来完成,以基于PartitionId和StartTime的值为每个原始数据获取Name
的先前值,然后使用sum
作为一个窗口函数以得到一个可交换的和
名称与当前名称不同的行中的行。
首先,创建并填充示例表(请在您将来的问题中为我们保存此步骤):
DECLARE @T AS TABLE
(
[Key] int,
PartitionID int,
StartTime datetime,
Duration int,
Name char(1)
)
INSERT INTO @T ([Key] ,PartitionID, StartTime, Duration, Name) VALUES
(1 , 1, '2019-05-23T18:18:28.125', 1, 'X'),
(2 , 1, '2019-05-23T18:18:28.480', 2, 'Y'),
(3 , 1, '2019-05-23T18:18:29.622', 1, 'X'),
(4 , 1, '2019-05-23T18:18:32.513', 2, 'X'),
(5 , 2, '2019-05-23T18:21:13.973', 3, 'X'),
(6 , 2, '2019-05-23T18:21:14.945', 4, 'X'),
(7 , 2, '2019-05-23T18:21:21.949', 5, 'X'),
(8 , 2, '2019-05-23T18:21:30.871', 2, 'X'),
(9 , 2, '2019-05-23T18:21:35.710', 4, 'X'),
(10, 2, '2019-05-23T18:21:48.550', 1, 'X'),
(11, 2, '2019-05-23T18:22:00.144', 3, 'X'),
(12, 2, '2019-05-23T18:22:01.094', 6, 'X'),
(13, 2, '2019-05-23T18:22:03.354', 1, 'X'),
(14, 3, '2019-05-23T18:24:44.219', 6, 'X'),
(15, 3, '2019-05-23T18:24:46.076', 1, 'Y'),
(16, 3, '2019-05-23T18:24:52.399', 4, 'X'),
(17, 3, '2019-05-23T18:25:03.620', 6, 'X'),
(18, 3, '2019-05-23T18:25:11.208', 1, 'X'),
(19, 3, '2019-05-23T18:25:12.616', 4, 'X'),
(20, 3, '2019-05-23T18:25:28.019', 6, 'X'),
(21, 3, '2019-05-23T18:25:31.384', 2, 'Y'),
(21, 3, '2019-05-23T18:25:32.334', 2, 'Y'),
(21, 3, '2019-05-23T18:25:33.344', 2, 'X')
公用表表达式:
;WITH CTE AS
(
SELECT [Key] ,PartitionID, StartTime, Duration, Name,
LAG(Name) OVER(PARTITION BY PartitionID ORDER BY StartTime) As PrevName
FROM @T
)
查询:
SELECT [Key] ,PartitionID, StartTime, Duration, Name,
SUM(IIF(Name = PrevName, 0, 1)) OVER(PARTITION BY PartitionID ORDER BY StartTime) As CalculatedId
FROM CTE
ORDER BY [Key]
结果:
Key PartitionID StartTime Duration Name CalculatedId
1 1 23.05.2019 18:18:28 1 X 1
2 1 23.05.2019 18:18:28 2 Y 2
3 1 23.05.2019 18:18:29 1 X 3
4 1 23.05.2019 18:18:32 2 X 3
5 2 23.05.2019 18:21:13 3 X 1
6 2 23.05.2019 18:21:14 4 X 1
7 2 23.05.2019 18:21:21 5 X 1
8 2 23.05.2019 18:21:30 2 X 1
9 2 23.05.2019 18:21:35 4 X 1
10 2 23.05.2019 18:21:48 1 X 1
11 2 23.05.2019 18:22:00 3 X 1
12 2 23.05.2019 18:22:01 6 X 1
13 2 23.05.2019 18:22:03 1 X 1
14 3 23.05.2019 18:24:44 6 X 1
15 3 23.05.2019 18:24:46 1 Y 2
16 3 23.05.2019 18:24:52 4 X 3
17 3 23.05.2019 18:25:03 6 X 3
18 3 23.05.2019 18:25:11 1 X 3
19 3 23.05.2019 18:25:12 4 X 3
20 3 23.05.2019 18:25:28 6 X 3
21 3 23.05.2019 18:25:31 2 Y 4
21 3 23.05.2019 18:25:32 2 Y 4
21 3 23.05.2019 18:25:33 2 X 5