我正在对包含22 227行的表变量运行查询。该查询过去需要2-3秒才能完成(我仍然认为这个问题太慢了)但是由于我在ORDER BY
中的DENSE_RANK()
子句中添加了另一个字段,现在它在4.5分钟内完成了!
如果我在[t2].[aisdt]
中包含[t2].[aiID]
,则执行计划显示它正在扫描494 039 529行,即22 227平方。以下查询生成正确的结果,速度太慢而无法使用。
SELECT MAX([t].[SetNum]) OVER (PARTITION BY NULL) AS [MaxSet]
,*
FROM (
SELECT DENSE_RANK() OVER (ORDER BY [t2].[aisdt], [t2].[aiID]) AS [SetNum]
,[t2].*
FROM (
SELECT [aiID]
,COUNT(DISTINCT [acID]) AS [noac]
FROM @Temp
GROUP BY [aiID]
) [t1]
JOIN @Temp [t2]
ON [t2].[aiID] = [t1].[aiID]
WHERE [t1].[noac] < [t2].[asm]
) [t]
为了清楚起见,罪魁祸首是“DENSE_RANK()OVER(ORDER BY [t2]。[aisdt] ,[t2]。[aiID])”的大胆部分。删除此字段(需要保留)会将执行时间缩短至2-3秒。我认为这可能与JOIN
上的[aiID]
表格有关,但与[aisdt]
无关。
如何将此查询加速到与之前相同或更短的时间内完成?
表格定义:
DECLARE @Temp TABLE (
[aiID] INT NOT NULL INDEX [IX_Temp_aiID] -- not unique
,[aisdt] DATETIME NOT NULL INDEX [IX_Temp_aisdt] -- not unique
,[asm] INT NOT NULL
,[cpcID] INT NULL
,[cpce] VARCHAR(10) NULL
,[acID] INT NULL
,[ctvID] INT NULL
,[ct] VARCHAR(100) NULL
,[_36_other_non_matched_fields_] VARCHAR(MAX)
,UNIQUE ([aiID], [cpcID], [cpce], [acID], [ctvID], [ct])
)
[aisdt]
对于[aiID]
是唯一的,但可以有多个[aiID]
具有相同的[aisdt]
。
INSERT INTO @TEMP
VALUES (64, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah')
,(64, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah')
,(99, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 32, '', 32, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 32, '', 47, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 47, '', 32, NULL, NULL, 'blah')
,(42, '2017-04-14 16:00:00', 2, 47, '', 47, NULL, NULL, 'blah')
,(54, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah')
,(54, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah')
,(89, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 32, '', 32, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 32, '', 47, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 47, '', 32, NULL, NULL, 'blah')
,(32, '2017-04-14 16:00:00', 3, 47, '', 47, NULL, NULL, 'blah')
必须先按[aisdt]
(日期时间)排序,然后按[aiID]
排序,然后根据[aiID]
进行编号。
我想看看:
5, 1, 54, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah'
5, 1, 54, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah'
5, 2, 64, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah'
5, 2, 64, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 32, '', 32, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 32, '', 47, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 47, '', 32, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 47, '', 47, NULL, NULL, 'blah'
答案 0 :(得分:2)
主要想法取自Partition Function COUNT() OVER possible using DISTINCT @Jayvee指出的一个小小的补充,当acID
具有NULL
值时,它会起作用。
最有可能你可以删除@Temp
表中的所有索引,服务器必须以不同的方式对不同的窗口函数进行排序,但是没有自连接,所以它应该更快。
该计划将有很多种类,它们也可能很慢,尤其是当引擎低估了表中的行数时。表变量就是这种情况。 Optimiser认为表变量只有1行。所以,我建议在这里使用经典的#Temp
表,即使没有索引。
(aiID, acID)
上的索引应该会有所帮助,但会有其他任何方式。
WITH
CTE_Counts
AS
(
SELECT
*
-- use DENSE_RANK() to calculate COUNT(DISTINCT)
, DENSE_RANK() OVER (PARTITION BY [aiID] ORDER BY [acID])
+ DENSE_RANK() OVER (PARTITION BY [aiID] ORDER BY [acID] DESC)
-- subtract extra 1 if acID has NULL values within the partition
- MAX(CASE WHEN [acID] IS NULL THEN 1 ELSE 0 END) OVER (PARTITION BY [aiID])
- 1 AS [noac]
FROM @Temp
)
,CTE_SetNum
AS
(
SELECT
*
, DENSE_RANK() OVER (ORDER BY [aisdt], [aiID]) AS [SetNum]
FROM CTE_Counts
WHERE [noac] < [asm]
)
SELECT
*
, MAX([SetNum]) OVER () AS [MaxSet]
FROM CTE_SetNum
ORDER BY
[aisdt]
,[aiID]
,[SetNum]
;
答案 1 :(得分:1)
评论中建议的索引肯定会起主要作用,但我认为你可以用这种方式重新编写查询而无需自我加入:
SELECT MAX([t].[SetNum]) OVER (PARTITION BY NULL) AS [MaxSet]
,*
FROM (
SELECT DENSE_RANK() OVER (ORDER BY [t1].[aisdt], [t1].[aiID]) AS [SetNum]
,[t1].*
FROM (
SELECT * ,dense_rank() over(partition by aiID order by [acID]) -
dense_rank() over(partition by aiID order by [acID]) - 1 AS [noac]
FROM @Temp
) [t1]
WHERE [t1].[noac] < [t1].[asm]
) [t]