带递归的CTE - row_number()聚合

时间:2014-02-05 07:08:46

标签: sql sql-server tsql recursion

我有一个包含父/子关系的表以及引用自身的日期创建列。 我想显示每个父记录以及节点上最近的“活动”所排序的所有后代。 因此,如果很久以前创建的第1行添加了一个新子项(或者将一个新子项添加到其子项中),那么我希望它位于结果的顶部。

我目前无法正常工作。

我的表格结构如下:

CREATE TABLE [dbo].[Orders](
    [OrderId] [int] NOT NULL,
    [Orders_OrderId] [int] NULL,
    [DateOrdered] datetime)

我编写了以下SQL来提取信息:

WITH allOrders AS 
   (SELECT po.orderid, po.Orders_OrderId, po.DateOrdered, 0 as distance,  
   row_number() over (order by DateOrdered desc) as RN1
    FROM orders po WHERE po.Orders_OrderId is null
    UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, b2.DateOrdered, c.distance + 1, 
    c.RN1
    FROM orders b2 
    INNER JOIN allOrders c 
    ON b2.Orders_OrderId = c.orderid
    )

SELECT * from allOrders
where RN1 between 0 and 2
order by rn1 asc, distance asc

有什么方法可以“聚合”递归选择的结果,这样我就可以选择整个“父”节点的最大日期?

SQLFiddle演示: http://sqlfiddle.com/#!3/ca6cb/11 (记录号1应该是第一个,因为它有一个最近更新过的孩子)

更新 感谢@twrowsell的建议,我有以下查询工作,但看起来很笨重,并且有一些性能问题,我觉得我不应该有3个CTE来实现这一点。有没有什么方法可以在保留“行号”的同时进行压缩(因为这是用于分页的用户显示)?

WITH allOrders AS 
  (SELECT po.orderid, po.Orders_OrderId, 0 as distance, po.DateOrdered, po.orderid as [rootId]
    FROM orders po WHERE po.Orders_OrderId is null 
    UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, c.distance + 1, b2.DateOrdered, c.[rootId]
    FROM orders b2     
    INNER JOIN allOrders c 
    ON b2.Orders_OrderId = c.orderid
    ),
    mostRecentOrders as (
    SELECT *,
    MAX(DateOrdered) OVER (PARTITION BY rootId) as [HighestOrderId]
    from allOrders
    ),
    pagedOrders as (
    select *, dense_rank() over (order by [HighestOrderId] desc) as [PagedRowNumber] from mostRecentOrders)

    SELECT  * from pagedOrders
    where PagedRowNumber between 0 and 2
    order by [HighestOrderId] desc

另外,我可以使用MAX(orderid),因为orderid是ident,而datecreated在创建后无法在我的场景中更新。

更新了SQLFiddle:http://sqlfiddle.com/#!3/ca6cb/41

5 个答案:

答案 0 :(得分:2)

首先,您需要存储“根”订单ID,以便区分订单的不同“树”。完成后,您可以对数据进行汇总和排序。

据我所知,由于您无法在DENSE_RANK()子句中使用WHERE,因此至少需要一个CTE来构建树,第二个需要进行排名。

以下查询使用临时表来存储树。查询从树中选择两次,一次用于行,第二次用于排名。如果我使用CTE来存储树,则必须将它构建两次,因为CTE基本上只是一个可重用的子查询(它将在每次使用时重建)。使用临时表确保我只需要构建一次。

这是SQL:

DECLARE @Offset INT = 0;
DECLARE @Fetch INT = 2;

-- Create the Order Trees
WITH OrderTree AS (
  SELECT  po.orderid AS RootOrderID,
          po.orderid,
          po.Orders_OrderId,
          po.DateOrdered,
          0 AS distance
  FROM orders po WHERE po.Orders_OrderId IS NULL
  UNION/**/ALL
  SELECT  parent.RootOrderID,
          child.orderid,
          child.Orders_OrderId,
          child.DateOrdered,
          parent.distance + 1 AS distance
  FROM orders child
  INNER JOIN OrderTree parent
  ON child.Orders_OrderId = parent.orderid
)
SELECT *
INTO #OrderTree
FROM OrderTree;

-- Rank the order trees by MAX(DateOrdered)
WITH
Rankings AS (
    SELECT RootOrderID,
         MAX(DateOrdered) AS MaxDate,
         ROW_NUMBER() OVER(ORDER BY MAX(DateOrdered) DESC, RootOrderID ASC) AS Rank
  FROM #OrderTree
  GROUP BY RootOrderID
)
-- Get the next @Fetch trees, starting at rank @Offset+1
SELECT  TREE.*,
        R.MaxDate,
        R.Rank
FROM Rankings R
INNER JOIN #OrderTree TREE
    ON R.RootOrderID = TREE.RootOrderID
WHERE R.Rank BETWEEN @Offset+1 AND (@Fetch+@Offset)
ORDER BY R.Rank ASC, TREE.distance ASC;

SQLFiddle

注意:/**/UNION之间的ALLthis issue的解决方法。

我使用我在数据库中的现有表中的数据构建了自己的“订单”表,并针对问题中的3-CTE查询进行了一些基准测试。这在大量数据池中略胜一筹(117棵树,总订单数为37215,最大深度为11)。我通过在STATISTICS IOSTATISTICS TIME打开的情况下运行每个查询进行基准测试,在每次运行之前清除缓存和缓冲区。

以下是两个查询的结果,以及两者共享的递归CTE的结果:

    
╔════════════╦══════════╦════════════╦══════════════╗
║ Query      ║ CPU Time ║ Scan Count ║ Logical Reads║
╠════════════╬══════════╬════════════╬══════════════╣
║ Tree CTE   ║ 24211ms  ║ 4          ║ 1116243      ║
╟────────────╫──────────╫────────────╫──────────────╢
║ 3-CTE      ║ 24789ms  ║ 7          ║ 1192221      ║
║ Temp Table ║ 24384ms  ║ 6          ║ 1116549      ║
╚════════════╩══════════╩════════════╩══════════════╝

这两个查询的大部分都是递归的订单树CTE。删除递归CTE的共享成本会产生以下结果:

╔════════════╦══════════╦════════════╦══════════════╗
║ Query      ║ CPU Time ║ Scan Count ║ Logical Reads║
╠════════════╬══════════╬════════════╬══════════════╣
║ 3-CTE      ║ 578ms    ║ 3          ║ 75978        ║
║ Temp Table ║ 173ms    ║ 2          ║ 306          ║
╚════════════╩══════════╩════════════╩══════════════╝

根据这些结果,我强烈建议您在订单表中添加RootOrderID列,以避免使用可能非常昂贵的递归CTE。

答案 1 :(得分:1)

在外部选择工作中的OVER子句中使用MAX on DateOrdered。

    WITH allOrders AS 
(
    SELECT po.orderid, po.Orders_OrderId, po.DateOrdered, 0 as distance,  
       row_number() over (order by DateOrdered desc) as RN1
    FROM orders po WHERE po.Orders_OrderId is null
    UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, b2.DateOrdered, c.distance + 1, 
      c.RN1
    FROM orders b2 
    INNER JOIN allOrders c 
    ON b2.Orders_OrderId = c.orderid
    )


    SELECT *,   MAX(DateOrdered) OVER (PARTITION BY Orders_OrderId) from allOrders
    where RN1 between 0 and 2
    order by rn1 asc, distance asc

修改 对不起,我第一次误解了你的要求。看起来您想要通过RN1字段而不是Orders_OrderId对结果进行分区,因此您的外部选择将类似于..

 SELECT MAX(DateOrdered) OVER (PARTITION BY RN1 ),*  from allOrders
where RN1 between 0 and 2
order by rn1 asc, distance asc

答案 2 :(得分:1)

看看以下内容:

;WITH allOrders AS 
   (SELECT po.orderid, po.Orders_OrderId, po.DateOrdered, 0 as distance, po.orderid as [parentOrder]
    FROM orders po WHERE po.Orders_OrderId is null
        UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, b2.DateOrdered, c.distance + 1, c.[parentOrder]
    FROM orders b2 
    INNER JOIN allOrders c ON b2.Orders_OrderId = c.orderid

    )
    SELECT a.OrderId
           ,a.Orders_OrderId
           ,a.DateOrdered
           ,top1.DateOrdered as HIghestDate
           ,a.distance
           ,a.parentOrder     
    FROM allOrders a
    INNER JOIN (SELECT TOP 2 parentOrder, MAX(DateOrdered)as highestdates FROM allOrders GROUP BY parentOrder ORDER BY MAX(DateOrdered)DESC)b on a.parentOrder=b.parentOrder 
    OUTER APPLY (SELECT TOP 1 parentOrder, DateOrdered FROM allOrders top1 WHERE a.parentOrder=top1.parentOrder ORDER BY top1.DateOrdered DESC)top1

SQLFiddle

答案 3 :(得分:1)

我很难理解您的确切全部需求,包括分页情况。您可以为您提供的样本提供预期的结果集,这将更容易检查。

无论如何,看来你的主要困难在于:

  

有什么方法可以“聚合”递归的结果   选择,以便我可以选择整个日期的最大日期   '父'节点?

...这可以通过递归CTE和APPLY轻松完成。

我不确定你到底想要什么,所以我做了这两个小提琴:

SQL Fiddle 1 - 这里所有的孩子都在一起根据根顺序,即,顺序3是父母的(顺序2)父母(顺序1)。

SQL Fiddle 2 - 这里的孩子与他们的直接父母一起分组,而且父母也成了根,所以第2顺序没有与父母(顺序1)一起到达顶部。

我认为你会对第一个进行一些修改。

同样,在这样的问题中提供您期望的结果非常重要,否则您将获得大量的试错法答案。

答案 4 :(得分:0)

我能够获得与您在更新的小提琴中所描述的相同的结果集。我作为pedro的交叉应用的一部分达到了我的解决方案..只是根据我自己的经验来说,应用是非常糟糕的。最终,它演变为目前的状态,主表上的左连接,其子查询具有您请求的分页。

请找小提琴>>here (SQLFiddle)

另外,附上的代码:

WITH allOrders AS (
  --anchor
  SELECT po.orderid
         , po.Orders_OrderId
         , 0 AS distance
         , po.DateOrdered
         , po.orderid AS [rootId]
  FROM orders po
  WHERE po.Orders_OrderId IS NULL

  --recursive
  UNION ALL

  SELECT b2.orderid
         , b2.Orders_OrderId
         , c.distance + 1
         , b2.DateOrdered
         , c.[rootId]
  FROM orders b2

    JOIN allOrders c
      ON b2.Orders_OrderId = c.orderid
)
SELECT a.*
       , b.max_orderdate
       , RN1
FROM allOrders a
LEFT JOIN (SELECT DISTINCT rootid, max(DateOrdered) max_orderdate 
           , row_number() over (order by max(dateordered) desc) as RN1
           FROM allOrders GROUP BY rootid) b
ON a.rootid = b.rootid
where RN1 between 0 and 2
ORDER BY b.max_orderdate DESC, a.rootid, a.orders_orderid, a.orderid