CTE如何真正起作用?

时间:2011-10-18 07:11:32

标签: sql-server common-table-expression recursive-query

我遇到了this CTE solution for concatenating row elements,我认为这很棒,我意识到CTE有多强大。

然而,为了有效地使用这样的工具,我需要知道它是如何在内部工作来构建心理图像,这对初学者来说是必不可少的,就像我一样,在不同场景中使用它。

所以我尝试慢动作上面代码片段的过程,这里是代码

USE [NORTHWIND]
GO
/****** Object:  Table [dbo].[Products2]  Script Date: 10/18/2011 08:55:07 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF OBJECT_ID('Products2','U') IS NOT NULL  DROP TABLE [Products2]
CREATE TABLE [dbo].[Products2](
  [ProductID] [int] IDENTITY(1,1) NOT NULL,
  [ProductName] [nvarchar](40) NOT NULL,
  [SupplierID] [int] NULL,
  [CategoryID] [int] NULL,
  [QuantityPerUnit] [nvarchar](20) NULL,
  [UnitPrice] [money] NULL,
  [UnitsInStock] [smallint] NULL,
  [UnitsOnOrder] [smallint] NULL,
  [ReorderLevel] [smallint] NULL,
  [Discontinued] [bit] NOT NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Products2] ON
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (1, N'vcbcbvcbvc', 1, 4, N'10 boxes x 20 bags', 18.0000, 39, 0, 10, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (2, N'Changassad', 1, 1, N'24 - 12 oz bottles', 19.0000, 17, 40, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (3, N'Aniseed Syrup', 1, 2, N'12 - 550 ml bottles', 10.0000, 13, 70, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (4, N'Chef Anton''s Cajun Seasoning', 2, 2, N'48 - 6 oz jars', 22.0000, 53, 0, 0, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (5, N'Chef Anton''s Gumbo Mix', 10, 2, N'36 boxes', 21.3500, 0, 0, 0, 1)
SET IDENTITY_INSERT [dbo].[Products2] OFF
GO
IF OBJECT_ID('DELAY_EXEC','FN') IS NOT NULL  DROP FUNCTION DELAY_EXEC
GO
CREATE FUNCTION DELAY_EXEC() RETURNS DATETIME
AS
BEGIN
  DECLARE @I INT=0
  WHILE @I<99999
  BEGIN
  SELECT @I+=1
  END
  RETURN GETDATE()
END
GO

WITH CTE (EXEC_TIME, CategoryID, product_list, product_name, length)
     AS (SELECT dbo.DELAY_EXEC(),
                CategoryID,
                CAST('' AS VARCHAR(8000)),
                CAST('' AS VARCHAR(8000)),
                0
         FROM   Northwind..Products2
         GROUP  BY CategoryID
         UNION ALL
         SELECT dbo.DELAY_EXEC(),
                p.CategoryID,
                CAST(product_list + CASE
                                      WHEN length = 0 THEN ''
                                      ELSE ', '
                                    END + ProductName AS VARCHAR(8000)),
                CAST(ProductName AS VARCHAR(8000)),
                length + 1
         FROM   CTE c
                INNER JOIN Northwind..Products2 p
                  ON c.CategoryID = p.CategoryID
         WHERE  p.ProductName > c.product_name)
SELECT *
FROM   CTE
ORDER  BY EXEC_TIME  

--SELECT CategoryId, product_list
--  FROM ( SELECT CategoryId, product_list,
--  RANK() OVER ( PARTITION BY CategoryId ORDER BY length DESC )
--   FROM CTE ) D ( CategoryId, product_list, rank )
--   WHERE rank = 1 ;

评论块是连接问题的理想输出,但这不是问题。

我添加了一个EXEC_TIME列,以了解首先添加了哪一行。 由于两个原因,输出看起来并不正确

  1. 我认为会有冗余数据,因为条件p.ProductName > c.product_name在另一个单词中CTE的第一部分空行总是少于Product2表中的值所以每次运行它应该再次带来一组新添加的行。这有什么意义吗?

  2. 数据的层次结构真的很奇怪,最后一项应该是最长的,看看最后一项是什么?包含length=1的项目

  3. 任何救援专家?提前谢谢。

    样本结果

    EXEC_TIME               CategoryID  product_list                                                        product_name                      length
    ----------------------- ----------- ------------------------------------------------------------------- --------------------------------- -----------
    2011-10-18 12:46:14.930 1                                                                                                                 0
    2011-10-18 12:46:14.990 2                                                                                                                 0
    2011-10-18 12:46:15.050 4                                                                                                                 0
    2011-10-18 12:46:15.107 4           vcbcbvcbvc                                                          vcbcbvcbvc                        1
    2011-10-18 12:46:15.167 2           Aniseed Syrup                                                       Aniseed Syrup                     1
    2011-10-18 12:46:15.223 2           Chef Anton's Cajun Seasoning                                        Chef Anton's Cajun Seasoning      1
    2011-10-18 12:46:15.280 2           Chef Anton's Gumbo Mix                                              Chef Anton's Gumbo Mix            1
    2011-10-18 12:46:15.340 2           Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mix                Chef Anton's Gumbo Mix            2
    2011-10-18 12:46:15.400 2           Aniseed Syrup, Chef Anton's Cajun Seasoning                         Chef Anton's Cajun Seasoning      2
    2011-10-18 12:46:15.463 2           Aniseed Syrup, Chef Anton's Gumbo Mix                               Chef Anton's Gumbo Mix            2
    2011-10-18 12:46:15.520 2           Aniseed Syrup, Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mi  Chef Anton's Gumbo Mix            3
    2011-10-18 12:46:15.580 1           Changassad                                                          Changassad                        1
    

2 个答案:

答案 0 :(得分:5)

这是一个有趣的问题,帮助我更好地理解递归CTE。

如果查看执行计划,您将看到使用了一个假脱机,并且它设置了WITH STACK属性。这意味着rows are read in a stack-like manner (Last In First Out)

首先,锚点部分运行

EXEC_TIME               CategoryID  product_list  
----------------------- ----------- --------------
2011-10-18 12:46:14.930 1                         
2011-10-18 12:46:14.990 2                         
2011-10-18 12:46:15.050 4                

然后处理4,因为这是添加的最后一行。 JOIN返回添加到假脱机的1行,然后处理这个新添加的行。在这种情况下,Join不返回任何内容,因此没有任何其他内容添加到假脱机中,并继续处理CategoryID = 2行。

返回3行,添加到假脱机

Aniseed Syrup
Chef Anton's Cajun Seasoning
Chef Anton's Gumbo Mix   

然后以类似的LIFO方式依次处理这些行中的每一行,并且在处理可以移动到兄弟行之前首先处理添加的任何子行。希望你能看到这个递归逻辑如何解释你观察到的结果,但万一你不能进行C#模拟

using System;
using System.Collections.Generic;
using System.Linq;

namespace Foo
{
    internal class Bar
    {
        private static void Main(string[] args)
        {
            var spool = new Stack<Tuple<int, string, string>>();

            //Add anchor elements
            AddRowToSpool(spool, new Tuple<int, string, string>(1, "", ""));
            AddRowToSpool(spool, new Tuple<int, string, string>(2, "", ""));
            AddRowToSpool(spool, new Tuple<int, string, string>(4, "", ""));

            while (spool.Count > 0)
            {
                Tuple<int, string, string> lastRowAdded = spool.Pop();
                AddChildRows(lastRowAdded, spool);
            }

            Console.ReadLine();
        }

    private static void AddRowToSpool(Stack<Tuple<int, string, string>> spool,
                                      Tuple<int, string, string> row)
        {
            Console.WriteLine("CategoryId={0}, product_list = {1}",
                              row.Item1,
                              row.Item3);
            spool.Push(row);
        }

    private static void AddChildRows(Tuple<int, string, string> lastRowAdded,
                                     Stack<Tuple<int, string, string>> spool)
        {
            int categoryId = lastRowAdded.Item1;
            string productName = lastRowAdded.Item2;
            string productList = lastRowAdded.Item3;

            string[] products;

            switch (categoryId)
            {
                case 1:
                    products = new[] {"Changassad"};
                    break;
                case 2:
                    products = new[]
                                   {
                                       "Aniseed Syrup",
                                       "Chef Anton's Cajun Seasoning",
                                       "Chef Anton's Gumbo Mix "
                                   };
                    break;
                case 4:
                    products = new[] {"vcbcbvcbvc"};
                    break;
                default:
                    products = new string[] {};
                    break;
            }


            foreach (string product in products.Where(
                product => string.Compare(productName, product) < 0))
            {
                string product_list = string.Format("{0}{1}{2}",
                                                 productList,
                                                 productList == "" ? "" : ",",
                                                 product);

                AddRowToSpool(spool,
                              new Tuple<int, string, string>
                                  (categoryId, product, product_list));
            }
        }
    }
}

返回

CategoryId=1, product_list =
CategoryId=2, product_list =
CategoryId=4, product_list =
CategoryId=4, product_list = vcbcbvcbvc
CategoryId=2, product_list = Aniseed Syrup
CategoryId=2, product_list = Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Chef Anton's Gumbo Mix
CategoryId=2, product_list = Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=1, product_list = Changassad

答案 1 :(得分:3)

页面Recursive Queries Using Common Table Expressions描述了CTE的逻辑:

  

递归执行的语义如下:

     
      
  1. 将CTE表达式拆分为锚点和递归成员。

  2.   
  3. 运行锚定成员创建第一个调用或基本结果集(T0)。

  4.   
  5. 以Ti作为输入并以Ti + 1作为输出运行递归成员。

  6.   
  7. 重复步骤3,直到返回空集。

  8.   
  9. 返回结果集。这是T0到Tn的UNION ALL。

  10.   

但是,这只是逻辑流程。与往常一样,使用SQL,如果结果“相同”,服务器可以自由地重新排序操作,如果结果“相同”,则可以认为重新排序可以更有效地提供结果。

在决定是否重新排序操作时,通常会考虑具有副作用的函数(导致延迟,然后返回GETDATE())。

可以重新排序查询的一种显而易见的方法是,它可能决定在完全创建结果集Ti+1之前开始处理结果集Ti - 执行此操作可能更有效而不是首先完全构造Ti,因为新行肯定已经在内存中并且最近已被访问过。