Question

我有一个通知（注释）列表，可将这些通知ping通给用户以供查看。这些注释可以以父/子方式彼此链接，其中一个通知是较早通知的子项。并非每个音符都有一个父母。总共大约有22,000条记录。

Note  Parent Note   
1     NULL   
2     NULL
3     1
4     NULL  
5     NULL
6     3 
7     4

我想为这些通知建立一棵树，追溯到一开始，以显示“子”通知有多深，以及有多少其他通知应用于该特定通知。上表的期望输出如下所示。

Note  Parent Note  Level  Full List
1     NULL         1      1
2     NULL         1      2
3     1            2      1/3
4     NULL         1      4
5     NULL         1      5
6     3            3      1/3/6
7     4            2      4/7

每次打开新的子通知时，级别都会增加一级，并与先前的通知列表连接。

我尝试使用递归CTE来完成此任务。这是查询的样子。

WITH CTE AS (
SELECT Note,
       Parent_Note,
       1 as Level,
       --Cast as nvarchar to keep data types the same
       CAST(QNote as nvarchar(MAX)) as Full_List
FROM Notifications
WHERE Parent_Note IS NULL
UNION ALL
SELECT Notifications.Note,
       CTE.Note as Parent_Note,
       Level = CTE.Level + 1,
       CAST(Full_List + '/' + Notifications.Note as nvarchar(MAX)) as Full_List
FROM CTE INNER JOIN Notifications ON CTE.Note = Notifications.Parent_Note)

SELECT * FROM CTE

不幸的是，该查询大约需要15分钟才能在第二个“通知”级别获取5条记录。

但是，如果我对每个递归进行硬编码，则可以在不到30秒的时间内加载完整的数据集。

WITH CTE1 AS (
SELECT Note,
       Parent_Note,
       1 as Level,
       --Cast as nvarchar to keep data types the same
       CAST(QNote as nvarchar(MAX)) as Full_List
FROM Notifications
WHERE Parent_Note IS NULL
),

CTE2 AS (
SELECT Notifications.Note,
       CTE1.Note as Parent_Note,
       Level = CTE1.Level + 1,
       CAST(Full_List + '/' + Notifications.Note as nvarchar(MAX)) as Full_List
FROM CTE1 INNER JOIN Notifications ON CTE1.Note = Notifications.Parent_Note
),

CTE3 AS (
SELECT Notifications.Note,
       CTE2.Note as Parent_Note,
       Level = CTE2.Level + 1,
       CAST(Full_List + '/' + Notifications.Note as nvarchar(MAX)) as Full_List
FROM CTE2 INNER JOIN Notifications ON CTE2.Note = Notifications.Parent_Note
)

SELECT * FROM CTE1 UNION SELECT * FROM CTE2 UNION SELECT * FROM CTE3

我不太明白为什么递归查询这么慢到无法加载的程度，而硬编码查询却在不到30秒的时间内加载了数据。我也不想使用硬编码查询，因为我不确定最终会存在多少个“级别”，尽管当前的最大值只有六个。

任何人都可以共享的信息将不胜感激，尽管我可能无法完全提供信息来帮助回答该问题（在工作中询问此问题，所以不能共享数据/查询计划），但我会一定会提供我所能提供的。

Answer 1

我将确保该表具有以下索引：

create index ix2 on notifications (parent_note, note);

Answer 2

如果仅在22K行上获得这种性能，那么我会说您的表模式有很多问题。您的表是不是有堆？

这是我执行的测试设置，其中包含10万行模拟数据：

-- Table with clustered PK and parent-child FK
create table dbo.Notes (
  Id int identity(1,1) primary key,
  ParentId int null references dbo.Notes(Id)
);
go
-- Generate data
insert into dbo.Notes (ParentId)
select top (100000) null
from sys.all_objects a, sys.all_objects b;
go
-- Introduce random parent-child hierarchy between rows
update sq set ParentId = sq.NewParent
from (
  select n.*,
    case when left(cast(newid() as binary(16)), 1) < 0xC0 then 1 else 0 end as [HasParent],
    abs(nullif(checksum(newid()) % (n.Id - 1), 0)) as [NewParent]
  from dbo.Notes n
  where n.Id > 1
    and n.ParentId is null
) sq
where sq.NewParent > 0
  and sq.HasParent = 1;
go
-- Create an index on ParentId
create index IX_Notes_ParentId on dbo.Notes (ParentId);
go

这是我正在测试的CTE查询：

set statistics time on;
go
set statistics io on;
go

with cte as (
  select n.*, cast(n.Id as varchar(max)) as [NotePath]
  from dbo.Notes n where n.ParentId is null
  union all
  select n.*, c.NotePath + '/' + cast(n.Id as varchar(max))
  from dbo.Notes n
    inner join cte c on c.Id = n.ParentId
)
select c.*
from cte c
order by c.Id
option (recompile);
go

set statistics time off;
go
set statistics io off;
go

这里是时间和CPU，ParentId列上没有索引：

（受影响的100000行）

表“工作表”。扫描计数100002，逻辑   读取1295309，物理读取0，预读读取0，lob逻辑读取   0，lob物理读为0，lob预读为0。

表“注释”。扫瞄   计数2，逻辑读426，物理读0，预读0，lob   逻辑读取为0，lob物理读取为0，lob提前读取为0。

SQL Server执行时间：CPU时间= 1032毫秒，经过的时间=   1522毫秒。

这里是索引：

（受影响的100000行）

表“工作表”。扫描计数2，逻辑读取   736852，物理读取0，预读读取0，lob逻辑读取0，lob   物理读为0，lob预读为0。

表“注释”。扫描计数   100001，逻辑读取200338，物理读取0，预读读取0，   lob逻辑读取0，lob物理读取0，lob提前读取读取0。

SQL Server执行时间：CPU时间= 781毫秒，经过的时间=   1313毫秒。

如您所见，两者之间的差异并不十分明显，因此得出结论，重要的是Id列上的聚集索引。

P.S。我还尝试了其他人建议的(ParentId, Id)索引，它与仅涵盖ParentId的索引没有任何统计上的显着差异。这是预期的行为，因为所有非聚集索引始终将来自聚集索引的条目包括为引用；除某些边缘情况外，无需在其中添加聚集索引的列。

递归CTE的执行速度比硬编码递归操作慢

2 个答案: