选择查询超时表中仅17000行

时间:2019-07-11 12:30:07

标签: sql sql-server

我需要确定在最高级别分组时无法达到大于等于阈值的计数的行。如果某行在较低的分组级别上满足阈值,那么将不考虑将这些行用于更高级别的检查。

例如:

我有这样的值,阈值为5。

COL_1   COL_2   COL_3
CH  ZZZZZZ  T77613
CH  ZZZZZZ  R537973
**CH    181600  19M8323**
**CH    HYC440  RE575008**
**CH    211000  AE74215**
CH  ZZZZZZ  T77858
CH  ZZZZZZ  T76938
CH  ZZZZZZ  T77932
CH  ZZZZZZ  T76938
CH  ZZZZZZ  14M7396
CH  ZZZZZZ  RE593267
CH  ZZZZZZ  RE593267
CH  ZZZZZZ  RE579130
CH  ZZZZZZ  14M7296
CH  ZZZZZZ  RE580337
CH  ZZZZZZ  RE580337

仅需选择粗体行。

我正在使用如下查询

WITH Step1 AS (
    SELECT x1.*
    FROM mytable AS x1
    LEFT JOIN (
        SELECT col_1
            ,col_2
            ,col_3
        FROM mytable
        GROUP BY col_1
            ,col_2
            ,col_3
        HAVING COUNT(*) >= 5
        ) y1 ON x1.col_1 = y1.col_1
        AND x1.col_2 = y1.col_2
        AND x1.col_3 = y1.col_3
    WHERE y1.col_1 IS NULL
        AND y1.col_2 IS NULL
        AND y1.col_3 IS NULL
    )
,Step2 AS (
    SELECT x1.*
    FROM Step1 x1
    LEFT JOIN (
        SELECT col_1
            ,col_2
        FROM Step1
        GROUP BY col_1
            ,col_2
        HAVING COUNT(*) >= 5
        ) y1 ON x1.col_1 = y1.col_1
        AND x1.col_2 = y1.col_2
    WHERE y1.col_1 IS NULL
        AND y1.col_2 IS NULL
    )
,Step3 AS (
    SELECT x1.*
    FROM Step2 x1
    LEFT JOIN (
        SELECT col_1
        FROM Step2
        GROUP BY col_1
        HAVING COUNT(*) >= 5
        ) y1 ON x1.col_1 = y1.col_1
    WHERE y1.col_1 IS NULL
    )
SELECT *
FROM Step3

此查询给出正确的结果。但是,一旦表中的行数超过17000左右,sql查询就会挂起并超时。

任何人都知道出了什么问题,并且可以提供更好的解决方案?

更新:

我从https://www.sqlshack.com/why-is-my-cte-so-slow/找到了一些答案。使用临时表存储前两个CTE的结果后,我能够运行查询并在45秒内获得结果。

WITH Step1 AS (
        SELECT x1.*
        FROM mytable AS x1
        LEFT JOIN (
            SELECT col_1
                ,col_2
                ,col_3
            FROM mytable
            GROUP BY col_1
                ,col_2
                ,col_3
            HAVING COUNT(*) >= 5
            ) y1 ON x1.col_1 = y1.col_1
            AND x1.col_2 = y1.col_2
            AND x1.col_3 = y1.col_3
        WHERE y1.col_1 IS NULL
            AND y1.col_2 IS NULL
            AND y1.col_3 IS NULL
        )
    ,Step2 AS (
        SELECT x1.*
        FROM Step1 x1
        LEFT JOIN (
            SELECT col_1
                ,col_2
            FROM Step1
            GROUP BY col_1
                ,col_2
            HAVING COUNT(*) >= 5
            ) y1 ON x1.col_1 = y1.col_1
            AND x1.col_2 = y1.col_2
        WHERE y1.col_1 IS NULL
            AND y1.col_2 IS NULL
        )

select * into #CTE2 from step2 ;

WITH Step3 AS (
        SELECT x1.*
        FROM #CTE2 x1
        LEFT JOIN (
            SELECT col_1
            FROM Step2
            GROUP BY col_1
            HAVING COUNT(*) >= 5
            ) y1 ON x1.col_1 = y1.col_1
        WHERE y1.col_1 IS NULL
        )
SELECT *
    FROM Step3 ;

但这确实意味着它不再是单个sql查询。

1 个答案:

答案 0 :(得分:0)

您的要求根本不清楚,但是正如您所说的那样,您的查询给出了正确的结果,而您的实际问题仅是性能,我将开始使用HAVING替换那些剩余的EXISTS联接以获取您的数据已经想返回,而不是放弃...

下一步是检查表是否正确索引

 ;WITH 
    Step1 AS ( 
        SELECT * 
        FROM MyTable S1
        WHERE EXISTS (
        SELECT 1
        FROM MyTable 
        WHERE COL_1 = S1.COL_1 AND COL_2 = S1.COL_2 ANd COL_3 = S1.COL_3 
        GROUP BY COL_1, COL_2, COL_3
        HAVING COUNT(*) < 5 )  
    ) , 
    Step2 AS 
    ( 
        SELECT * 
        FROM Step1 S1
        WHERE EXISTS (
        SELECT 1
        FROM Step1 
        WHERE COL_1 = S1.COL_1 AND COL_2 = S1.COL_2
        GROUP BY COL_1,COL_2 
        HAVING COUNT(*) < 5 )
    ) , 
    Step3 AS 
    ( 
        SELECT * 
        FROM Step2 S2
        WHERE EXISTS (
        SELECT 1
        FROM Step2 
        WHERE COL_1 = S2.COL_1
        GROUP BY COL_1 
        HAVING COUNT(*) < 5 )
    ) 
    SELECT * FROM Step3