如何防止SQL Server对扫描的行数进行平方?

时间:2017-06-22 07:46:58

标签: sql sql-server sql-server-2008

我正在对包含22 227行的表变量运行查询。该查询过去需要2-3秒才能完成(我仍然认为这个问题太慢了)但是由于我在ORDER BY中的DENSE_RANK()子句中添加了另一个字段,现在它在4.5分钟内完成了!

如果我在[t2].[aisdt]中包含[t2].[aiID],则执行计划显示它正在扫描494 039 529行,即22 227平方。以下查询生成正确的结果,速度太慢而无法使用。

SELECT MAX([t].[SetNum]) OVER (PARTITION BY NULL) AS [MaxSet]
      ,*
FROM (
    SELECT DENSE_RANK() OVER (ORDER BY [t2].[aisdt], [t2].[aiID]) AS [SetNum]
          ,[t2].*
    FROM (
        SELECT [aiID]
              ,COUNT(DISTINCT [acID]) AS [noac]
        FROM @Temp
        GROUP BY [aiID]
    ) [t1]
    JOIN @Temp [t2]
      ON [t2].[aiID] = [t1].[aiID]
    WHERE [t1].[noac] < [t2].[asm]
) [t]

为了清楚起见,罪魁祸首是“DENSE_RANK()OVER(ORDER BY [t2]。[aisdt] ,[t2]。[aiID])”的大胆部分。删除此字段(需要保留)会将执行时间缩短至2-3秒。我认为这可能与JOIN上的[aiID]表格有关,但与[aisdt]无关。

如何将此查询加速到与之前相同或更短的时间内完成?

修改

表格定义:

DECLARE @Temp TABLE (
    [aiID] INT NOT NULL INDEX [IX_Temp_aiID] -- not unique
    ,[aisdt] DATETIME NOT NULL INDEX [IX_Temp_aisdt] -- not unique
    ,[asm] INT NOT NULL
    ,[cpcID] INT NULL
    ,[cpce] VARCHAR(10) NULL
    ,[acID] INT NULL
    ,[ctvID] INT NULL
    ,[ct] VARCHAR(100) NULL
    ,[_36_other_non_matched_fields_] VARCHAR(MAX)

    ,UNIQUE ([aiID], [cpcID], [cpce], [acID], [ctvID], [ct])
)

[aisdt]对于[aiID]是唯一的,但可以有多个[aiID]具有相同的[aisdt]

INSERT INTO @TEMP
VALUES (64, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah')
      ,(64, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah')
      ,(99, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah')
      ,(99, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah')
      ,(99, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah')
      ,(99, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah')
      ,(42, '2017-04-14 16:00:00', 2, 32, '', 32, NULL, NULL, 'blah')
      ,(42, '2017-04-14 16:00:00', 2, 32, '', 47, NULL, NULL, 'blah')
      ,(42, '2017-04-14 16:00:00', 2, 47, '', 32, NULL, NULL, 'blah')
      ,(42, '2017-04-14 16:00:00', 2, 47, '', 47, NULL, NULL, 'blah')
      ,(54, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah')
      ,(54, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah')
      ,(89, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah')
      ,(89, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah')
      ,(89, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah')
      ,(89, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah')
      ,(32, '2017-04-14 16:00:00', 3, 32, '', 32, NULL, NULL, 'blah')
      ,(32, '2017-04-14 16:00:00', 3, 32, '', 47, NULL, NULL, 'blah')
      ,(32, '2017-04-14 16:00:00', 3, 47, '', 32, NULL, NULL, 'blah')
      ,(32, '2017-04-14 16:00:00', 3, 47, '', 47, NULL, NULL, 'blah')

必须先按[aisdt](日期时间)排序,然后按[aiID]排序,然后根据[aiID]进行编号。

我想看看:

5, 1, 54, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah'
5, 1, 54, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah'
5, 2, 64, '2017-03-23 10:00:00', 1, 17, '', NULL, NULL, NULL, 'blah'
5, 2, 64, '2017-03-23 10:00:00', 1, 34, '', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah'
5, 3, 89, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 25, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 16, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 76, 'Y', NULL, NULL, NULL, 'blah'
5, 4, 99, '2017-04-08 09:00:00', 1, 82, 'Y', NULL, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 32, '', 32, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 32, '', 47, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 47, '', 32, NULL, NULL, 'blah'
5, 5, 32, '2017-04-14 16:00:00', 3, 47, '', 47, NULL, NULL, 'blah'

2 个答案:

答案 0 :(得分:2)

主要想法取自Partition Function COUNT() OVER possible using DISTINCT @Jayvee指出的一个小小的补充,当acID具有NULL值时,它会起作用。

最有可能你可以删除@Temp表中的所有索引,服务器必须以不同的方式对不同的窗口函数进行排序,但是没有自连接,所以它应该更快。

该计划将有很多种类,它们也可能很慢,尤其是当引擎低估了表中的行数时。表变量就是这种情况。 Optimiser认为表变量只有1行。所以,我建议在这里使用经典的#Temp表,即使没有索引。

(aiID, acID)上的索引应该会有所帮助,但会有其他任何方式。

WITH
CTE_Counts
AS
(
    SELECT
        *
        -- use DENSE_RANK() to calculate COUNT(DISTINCT)
        , DENSE_RANK() OVER (PARTITION BY [aiID] ORDER BY [acID])
        + DENSE_RANK() OVER (PARTITION BY [aiID] ORDER BY [acID] DESC)
        -- subtract extra 1 if acID has NULL values within the partition
        - MAX(CASE WHEN [acID] IS NULL THEN 1 ELSE 0 END) OVER (PARTITION BY [aiID])
        - 1 AS [noac]
    FROM @Temp
)
,CTE_SetNum
AS
(
    SELECT
        *
        , DENSE_RANK() OVER (ORDER BY [aisdt], [aiID]) AS [SetNum]
    FROM CTE_Counts
    WHERE [noac] < [asm]
)
SELECT
    *
    , MAX([SetNum]) OVER () AS [MaxSet]
FROM CTE_SetNum
ORDER BY
    [aisdt]
    ,[aiID]
    ,[SetNum]
;

答案 1 :(得分:1)

评论中建议的索引肯定会起主要作用,但我认为你可以用这种方式重新编写查询而无需自我加入:

   SELECT MAX([t].[SetNum]) OVER (PARTITION BY NULL) AS [MaxSet]
      ,*
FROM (
    SELECT DENSE_RANK() OVER (ORDER BY [t1].[aisdt], [t1].[aiID]) AS [SetNum]
          ,[t1].*
    FROM (
        SELECT * ,dense_rank() over(partition by aiID order by [acID]) - 
        dense_rank() over(partition by aiID order by [acID]) - 1 AS [noac]
        FROM @Temp
    ) [t1]
    WHERE [t1].[noac] < [t1].[asm]
) [t]