将CTE转换为更好的查询

时间:2014-07-01 10:16:37

标签: sql sql-server sql-server-2005 common-table-expression database-performance

我有一个CTE查询,用于更新表中包含大约250万行的单个列。我让查询运行,大约需要16个小时!如何更新此程序以使其更快?我读到SELECT INTO并创建一个新表应该是一个更好的方法。我只是不知道如何将此CTE转换为SELECT INTO。

WITH CubeWithRowNumber
AS (
    SELECT rownum = ROW_NUMBER() OVER (
            ORDER BY CustomerId,
                Period
            ),
        c.Period,
        c.CustomerId,
        c.PayDate,
        NS_Regular,
        NS_Single,
        NySales
    FROM Cube2 c
    )
UPDATE Cube2
SET MonthlySales = (
    SELECT 
       CASE 
         WHEN YEAR(cu.Period) = YEAR(cu.PayDate)
             THEN cu.NySales
         ELSE 
           CASE 
             WHEN prev.Period IS NULL 
               OR YEAR(cu.Period) <> YEAR(prev.Period)
                 THEN cu.NS_Regular + cu.NS_Single
             ELSE cu.NS_Regular + cu.NS_Single - prev.NS_Regular - prev.NS_Single
           END
         END AS Result
     FROM CubeWithRowNumber cu
     LEFT JOIN CubeWithRowNumber prev
         ON prev.rownum = cu.rownum - 1
             AND cu.CustomerId = prev.CustomerId
     WHERE cu.CustomerId = Cube2.CustomerId
         AND cu.Period = Cube2.Period)

3 个答案:

答案 0 :(得分:2)

我认为你不需要三次引用该表。您的CTE是可更新的,因此我认为以下内容是等效的:

WITH CubeWithRowNumber AS (
      SELECT c.*,
             rownum = ROW_NUMBER() OVER (ORDER BY  CustomerId, Period),
      FROM Cube2 c
     )
UPDATE CubeWithRowNumber crn
     SET MonthlySales = (SELECT (CASE WHEN YEAR(crn.Period) = YEAR(crn.PayDate)
                                      THEN crn.NySales
                                      ELSE (CASE WHEN prev.Period IS NULL OR YEAR(crn.Period <> YEAR(prev.Period)
                                                 THEN crn.NS_Regular + crn.NS_Single
                                                 ELSE crn.NS_Regular + crn.NS_Single - prev.NS_Regular - prev.NS_Single
                                            END)
                                 END) AS Result
                         FROM CubeWithRowNumber prev
                         WHERE prev.rownum = crn.rownum - 1 AND crn.CustomerId = prev.CustomerId
                        );

这些行可能还有进一步的优化,但如果您使用的是更新版本的SQL Server,lag()函数将是更好的选择。

答案 1 :(得分:0)

尝试将数据插入#temp表:

SELECT ROW_NUMBER() OVER (ORDER BY  CustomerId, Period) as rownum,    
       c.Period, c.CustomerId, c.PayDate, NS_Regular, NS_Single, NySales
INTO #tmp_Cube
FROM Cube2 c

然后在更新中使用它:

UPDATE Cube2
SET MonthlySales=
  ( SELECT CASE
               WHEN YEAR(cu.Period)=YEAR(cu.PayDate) 
                    THEN cu.NySales
               ELSE CASE
                        WHEN prev.Period IS NULL
                             OR YEAR(cu.Period)<>YEAR(prev.Period) 
                        THEN cu.NS_Regular + cu.NS_Single
                        ELSE cu.NS_Regular + cu.NS_Single - prev.NS_Regular - prev.NS_Single
                    END
           END AS RESULT
   FROM #tmp_Cube cu
   LEFT JOIN #tmp_Cube prev ON prev.rownum = cu.rownum - 1
   AND cu.CustomerId = prev.CustomerId
   WHERE cu.CustomerId=Cube2.CustomerId
     AND cu.Period=Cube2.Period)

某些延迟可能是由于您使用CTE这么大量的数据造成的。但是,您可能仍然会看到我的解决方案有些延迟,希望更少,因为您仍然使用FROM #tmp_Cube cu LEFT JOIN #tmp_Cube prev ON prev.rownum = cu.rownum - 1将#temp表连接到自身,这会影响性能,并且会影响您的行数与...合作。

阅读这个答案:

What's the difference between a CTE and a Temp Table?

引用答案:

  

就何时使用它们而言,它们具有非常不同的用例。如果你有一个非常大的结果集,或者需要更多参考它   不止一次,把它放在#temp表中。如果它需要递归,是   一次性的,或仅仅是为了简化逻辑上的东西,CTE就是   优选的。

     

此外,CTE绝不能用于性能。你差不多了   永远不会通过使用CTE加快速度,因为,再次,它只是一个   一次性观点。你可以用它们做一些巧妙的事情,但速度加快   查询实际上不是其中之一。

答案 2 :(得分:0)

可以避免通过重新构建查询来调用CTE两次

UPDATE Cube2 SET
  MonthlySales = CASE WHEN YEAR(cu.Period) = YEAR(cu.PayDate) 
                           THEN cu.NySales
                      WHEN YEAR(cu.Period) <> YEAR(COALESCE(prev.Period, 0)) 
                           THEN cu.NS_Regular + cu.NS_Single
                      ELSE cu.NS_Regular + cu.NS_Single
                         - prev.NS_Regular - prev.NS_Single
                 END
FROM Cube2 cu
     CROSS APPLY (SELECT TOP 1 Period, NS_Regular, NS_Single 
                  FROM   cube2
                  WHERE  cu.CustomerId = cube2.CustomerId
                    AND  cu.Period > cube2.Period
                  ORDER BY Period Desc) prev;

可以提高性能,如果得到CustomerId和Period的索引支持,可以提高性能,但引入的ORDER BY有点贵,因此您可能希望在减少的数据集上进行检查。

另一个小问题是CROSS APPLY就像INNER JOIN,而且每个客户的第一个时期都没有上一个时期。要解决这个问题,可以将CROSS APPLY更改为OUTER APPLY LEFT JOIN,但这会破坏性能,或者我们可以从零创建一些值。合并分组函数可以做到这一点:如果行有其值,它将保持不变,如果子查询为空,MAX(或MINAVG,您的选择)将创建一个新行,因为没有行的表的MAXNULL

更新的UPDATE是:

UPDATE Cube2 SET
  MonthlySales = CASE WHEN YEAR(cu.Period) = YEAR(cu.PayDate) 
                           THEN cu.NySales
                      WHEN YEAR(cu.Period) <> YEAR(COALESCE(prev.Period, 0)) 
                           THEN cu.NS_Regular + cu.NS_Single
                      ELSE cu.NS_Regular + cu.NS_Single
                         - prev.NS_Regular - prev.NS_Single
                 END
FROM Cube2 cu
     CROSS APPLY (SELECT COALESCE(MAX(Period), 0) Period
                       , COALESCE(MAX(NS_Regular), 0) NS_Regular
                       , COALESCE(MAX(NS_Single), 0) NS_Single
                  FROM   (SELECT TOP 1 Period, NS_Regular, NS_Single 
                          FROM   cube2
                          WHERE  cu.CustomerId = cube2.CustomerId
                            AND  cu.Period > cube2.Period
                          ORDER BY Period Desc) a
                  ) prev;

分组还有一些额外的工作,但希望不是那么多。

有时将CASE逻辑转换为数学运算符有助于进一步提高性能,但除了它不起作用之外,查询的可读性也会降低。

如果你想在这里尝试转换版本

UPDATE Cube2 SET
  MonthlySales 
  = cu.NySales * (1 - CAST((YEAR(cu.Period) - YEAR(cu.PayDate)) as BIT))
  + (cu.NS_Regular + cu.NS_Single)
  * (0 + CAST(YEAR(cu.Period) - YEAR(COALESCE(prev.Period, 0)) as BIT))
  * (0 + CAST((YEAR(cu.Period) - YEAR(cu.PayDate)) as BIT))
  + (cu.NS_Regular + cu.NS_Single - prev.NS_Regular - prev.NS_Single)
  * (1 - CAST(YEAR(cu.Period) - YEAR(COALESCE(prev.Period, 0)) as BIT))
  * (0 + CAST((YEAR(cu.Period) - YEAR(cu.PayDate)) as BIT))
FROM Cube2 cu
     CROSS APPLY (SELECT COALESCE(MAX(Period), 0) Period
                       , COALESCE(MAX(NS_Regular), 0) NS_Regular
                       , COALESCE(MAX(NS_Single), 0) NS_Single
                  FROM   (SELECT TOP 1 Period, NS_Regular, NS_Single 
                          FROM   cube2
                          WHERE  cu.CustomerId = cube2.CustomerId
                            AND  cu.Period > cube2.Period
                          ORDER BY Period Desc) a
                  ) prev;