SQL性能使用ROW_NUMBER和动态排序依据

时间:2013-04-04 15:36:18

标签: sql-server

我意识到这是一个常见的问题,已经在SO previously上进行了讨论,但我想我会再次提出这个问题,希望找到一些可行的替代方案。

采用以下SQL,它将分页与动态排序相结合:

WITH CTE AS (
  SELECT 
      OrderID,
      ROW_NUMBER() OVER (ORDER BY
        CASE WHEN @SortCol='OrderID' THEN OrderID END ASC,
        CASE WHEN @SortCol='CustomerName' THEN Surname END ASC 
      ) AS ROW_ID
  FROM Orders WHERE X
)

SELECT Orders.* FROM CTE
INNER JOIN Orders ON CTE.OrderID = Orders.OrderID
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1;

众所周知,ROW_NUMBER()方法在大型表上不能很好地工作,因为在ORDER BY子句中使用多个CASE语句时,无法正确使用表上的索引(请参阅link)。 / p>

我们多年来一直使用的解决方案是构造一个字符串,然后使用sp_executesql执行该字符串。使用像这样的动态SQL时性能很好,但从易读性的角度来看,结果代码很糟糕。

我听说过ROWCOUNT method,但据我所知,当您按元素引入动态订单时,它仍然容易受到同样的问题。

那么,冒着问不可能的风险,还有其他选择吗?

修改

为了在这里取得一些有用的进展,我整理了三个查询,突出了各种建议的方法:

  1. 当前的动态SQL解决方案(执行时间147毫秒)

  2. gbn解决方案(执行时间1687毫秒)

  3. Anders Solution(执行时间1604ms)

  4. Muhmud解决方案(执行时间46毫秒)

6 个答案:

答案 0 :(得分:4)

这个怎么样:

WITH data as (
    SELECT OrderID,
            ROW_NUMBER() OVER ( ORDER BY OrderID asc) as OrderID_ROW_ID,
            ROW_NUMBER() OVER ( ORDER BY Surname asc) as Surname_ROW_ID
        FROM Orders --WHERE X   
), CTE AS (               
        SELECT OrderID, OrderID_ROW_ID as ROW_ID
        FROM data
        where @SortCol = 'OrderID'

        union all

        SELECT OrderID, Surname_ROW_ID
        FROM data
        where @SortCol = 'Surname'
)
SELECT Orders.*, ROW_ID FROM CTE
INNER JOIN Orders ON CTE.OrderID = Orders.OrderID       
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1
order by ROW_ID
option (recompile);

修改:在帖子中使用option (recompile)示例查询可以更快地进行操作。但是,case无法以完全相同的方式用于在升序/降序之间进行选择。

原因是正在为不适当的变量值生成计划,然后缓存此计划。强制重新编译允许它使用变量的实际值。

答案 1 :(得分:2)

(适用编者)

DECLARE 
      @OrderColumnName SYSNAME
    , @RowStart INT
    , @RowCount INT
    , @TopCount INT

SELECT 
      @OrderColumnName = 'EmployeeID'
    , @RowStart = 5
    , @RowCount = 50
    , @TopCount = @RowStart + @RowCount – 1

@ muhmud的解决方案 -

; WITH data AS 
(
    SELECT 
          wo.WorkOutID
        , RowIDByEmployee = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID)
        , RowIDByDateOut = ROW_NUMBER() OVER (ORDER BY wo.DateOut)  
    FROM dbo.WorkOut wo
), CTE AS 
(         
    SELECT
          wo.WorkOutID
        , RowID = RowIDByEmployee 
    FROM data wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT
          wo.WorkOutID
        , RowID = RowIDByDateOut
    FROM data wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID BETWEEN @RowStart AND @RowCount + @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14254, physical reads 1,
read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 1295 ms,  elapsed time = 3048 ms.

没有数据公用表表达式的解决方案 -

;WITH CTE AS 
(         
    SELECT
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.DateOut) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID BETWEEN @RowStart AND @RowCount + @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14254, physical reads 1, read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1296 ms,  elapsed time = 3049 ms.

TOP解决方案 -

;WITH CTE AS 
(         
    SELECT TOP (@TopCount)
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT TOP (@TopCount)
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.DateOut) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID > @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14246, physical reads 1, read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1248 ms,  elapsed time = 2864 ms.

enter image description here

答案 2 :(得分:1)

试试这个。这应该利用你拥有的索引

WITH CTE AS (               
        SELECT 
            Orders.*,
            ROW_NUMBER() OVER (ORDER BY OrderID) AS rnOrderID,
            ROW_NUMBER() OVER (ORDER BY Surname) AS rnSurname                                                                         
        FROM Orders WHERE X                                         
    )
SELECT CTE.*
FROM CTE
WHERE
   CASE @SortCol
       WHEN 'OrderID' THEN rnOrderID
   END BETWEEN @RowStart AND @RowStart + @RowCount -1; 

但是,对于大型数据集(100,000或更多),还有其他技术,例如http://www.4guysfromrolla.com/webtech/042606-1.shtml

答案 3 :(得分:1)

我会尝试类似的东西:

WITH CTEOrder AS (
  SELECT 
     OrderID,
     ROW_NUMBER() OVER (ORDER BY OrderID ASC) AS ROW_ID
  FROM Orders 
)
, CTECustomerName AS (
  SELECT 
     OrderID,
     ROW_NUMBER() OVER (ORDER BY Surname ASC) AS ROW_ID
  FROM Orders 
)
, CTECombined AS
(
  SELECT 'OrderID' OrderByType, OrderID, Row_ID
    FROM CTEOrder
   WHERE Row_id BETWEEN @RowStart AND @RowStart + @RowCount -1
  UNION
  SELECT 'CustomerName' OrderByType, OrderID, Row_ID
    FROM CTECustomerName
   WHERE row_id BETWEEN @RowStart AND @RowStart + @RowCount -1
)
SELECT Orders.* FROM CTECombined
INNER JOIN Orders ON CTECombined.OrderID = Orders.OrderID       
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1;  
AND OrderByType = @SortCol

我用自己的一张桌子尝试了这个,它有app。 400万条记录。显然,它有不同的字段名称,所以我很抱歉,如果我没有在答案中正确“翻译”这个并且SQL不适合你。但是,这个想法应该是显而易见的。

根据您的问题中的代码,我在我的桌面上获得200000个逻辑读取和6068毫秒CPU,并且上面我得到1422个逻辑读取和78毫秒CPU。

我没有刷新缓存或真正的基准测试所需的任何其他东西,但我确实尝试了不同的页面,页面大小等,我的结果从一开始就是一致的。

如果您有许多不同字段的查询,您可能希望按此顺序排序,此解决方案可能无法扩展,因为您必须扩展CTE的数量,但如果您正在构建SQL,则可以在代码中执行此操作 - 对于在两个不同领域进行排序的例子,它对我来说就像一个魅力。

编辑:想想看,你可能不需要为每个OrderBy列提供单独的CTE,你可能只需要有一个同时执行ROW_NUMBER()UNION的CTE在同一个CTE中。原理是一样的,我认为优化器最终会做同样的事情,但我还没有对此进行基准测试。如果我有时间验证,我会更新答案。

编辑2:正如预期的那样,您可以在一个CTE中执行UNION。我不打算更新代码,但我会在代码上提供一些基准测试。我做了10000行的页面大小,看看是否有很大的不同,但事实并非如此。 (两次运行cusinar9和我的代码分别运行相同,所以冷启动对于两个版本的代码都有相同的参数,第二次运行有不同的参数,但两个版本的代码相同):

cusimar9的代码,冷启动:

Table 'TestTable'. Scan count 10009, logical reads 43080, physical reads 189, read-ahead reads 12915, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 3037 ms,  elapsed time = 2206 ms.

cusimar9的代码,第二次运行,不同的参数:

Table 'TestTable'. Scan count 10009, logical reads 43096, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 4132 ms,  elapsed time = 1012 ms.

我的建议,冷启动:

Table 'TestTable'. Scan count 10001, logical reads 31963, physical reads 12, read-ahead reads 6984, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 218 ms,  elapsed time = 1410 ms.

我的建议,第二轮:

Table 'TestTable'. Scan count 10001, logical reads 31963, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 218 ms,  elapsed time = 358 ms.

编辑3:看到你发布的代码,我注意到了这个结构:

PagedCTE AS (
    SELECT (SELECT Max(ROW_ID) FROM OrderByCTE) AS TOTAL_ROWS, OrderID 
    FROM OrderByCTE             
    WHERE
        OrderByCTE.SortCol = @SortCol AND OrderByCTE.SortDir = @SortDir AND
        OrderByCTE.ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1
)

我不是100%肯定这个目的,但我猜你想要返回总行数,以便你可以计算(并向用户显示)我们记录集的距离。那么为什么不在所有ORDER BY噪音之外得到这个数字呢?要坚持当前的风格,为什么不用SELECT COUNT(*)制作CTE?然后加入你的最终选择?

答案 4 :(得分:0)

这是程序模式对存储过程失败的一种情况。您尝试参数化优化程序使用的属性不会让优化程序完成它的工作。

在这种情况下,我会使用2个存储过程。

如果你真的开始使用参数,那么:

DECLARE @strSQL varchar(1000) = 
    'SELECT OrderID,ROW_NUMBER() OVER ( ORDER BY '                                                                                        
    + @SortCol +
    ' ASC) AS ROW_ID FROM Orders WHERE X ' +
    ' AND ROW_ID BETWEEN @RowStart and @RowStart + @RowCount - 1;'                                        

EXECUTE (@StrSQL)

答案 5 :(得分:0)

没有CTE 在存在动态订单条件时生成行号

select TotalCount = COUNT(U.UnitID) OVER() ,

ROW_NUMBER() over(
    order by 
     (CASE @OrderBy WHEN '1' THEN m.Title  END) ASC ,
     (CASE @OrderBy WHEN '2' THEN m.Title  END) DESC,
     (CASE @OrderBy WHEN '3' THEN Stock.Stock  END) DESC,
     (CASE @OrderBy WHEN '4' THEN Stock.Stock   END) DESC

) as RowNumber,

M.Title,U.ColorCode,U.ColorName,U.UnitID, ISNULL(Stock.Stock,0) as Stock

 from tblBuyOnlineMaster M

    inner join BuyOnlineProductUnitIn U on U.BuyOnlineID=M.BuyOnlineID 

    left join 
            ( select IT.BuyOnlineID,IT.UnitID,ISNULL(sum(IT.UnitIn),0)-ISNULL(sum(IT.UnitOut),0) as Stock 
                from [dbo].[BuyOnlineItemTransaction] IT 
                group by IT.BuyOnlineID,IT.UnitID
             ) as Stock

        on U.UnitID=Stock.UnitID


order by 
 (CASE @OrderBy WHEN '1' THEN m.Title  END) ASC ,
 (CASE @OrderBy WHEN '2' THEN m.Title  END) DESC,
 (CASE @OrderBy WHEN '3' THEN Stock.Stock  END) DESC,
 (CASE @OrderBy WHEN '4' THEN Stock.Stock   END) DESC


offset  @offsetCount rows fetch next 6 rows only