Question

我们有相当大的机器100GB +内存和8+核心。服务器范围MAXDOP = 8。

T_SEQ_FF rowcount = 61692209, size = 2991152 KB

UPD 1： 表T_SEQ_FF有两个索引：

1) create index idx_1 on T_SEQ_FF (first_num)
2) create index idx_2 on T_SEQ_FF (second_num)

表T_SEQ_FF有first_num，second_num pairs个nums应该在cte之后提供序列：

;with first_entity as ( 
    select first_num from  T_SEQ_FF a  where not exists (select 1 from  T_SEQ_FF b  where a.first_num = b.second_num) 
) ,
cte as ( 
select a.first_num, a.second_num, a.first_num as first_key, 1 as sequence_count 
from  T_SEQ_FF a  inner join first_entity b on a.first_num = b.first_num 
union all 
select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1 
from  T_SEQ_FF a  
inner join cte on a.first_num = cte.second_num 
) 
select * 
from cte 
option (maxrecursion 0);

但是当我运行这个查询时 - 我只看到没有Parallelism的串行查询计划。如果我从上面的查询删除 CTE的第二部分：

union all select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1 from T_SEQ_FF a inner join cte on a.first_num = cte.second_num

然后我可以看到使用Repartition和Gather Streams查询计划变为并行化。

所以我可以总结一下，这是因为 recurisve CTE SQL Server在处理此查询时没有使用Parallelism。

我相信在拥有大量免费资源的大型机器上并行应该有助于更快地完成查询。

现在它运行约40-50分钟。

您能否建议如何尽可能多地使用资源来更快地完成查询？

CTE是唯一的选择，因为我们需要填充来自first_num - second_num对的序列，这些序列可以是任意长度。

Answer 1

我会尝试重写CTE以删除其中一个步骤，即

;cte as ( 
select a.first_num, a.second_num, a.first_num as first_key, 1 as sequence_count 
from  T_SEQ_FF a  where not exists (select 1 from  T_SEQ_FF b  where a.first_num = b.second_num) 
union all 
select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1 
from  T_SEQ_FF a  
inner join cte on a.first_num = cte.second_num 
) 
select * 
from cte 
option (maxrecursion 0);

如果只有一个根元素，最好将其作为变量传递给查询，以便查询优化器可以使用该值。

要尝试的另一件事是更改查询以获取没有子查询的根元素，即second_num为null或first_num = second_num。

Answer 2

我不确定这是否可行，但我们已经排除了许多其他传统方法：您是否可以通过将first_entity集拆分为多个部分，然后通过代码并行运行此查询来进行显式并行化，以及最后将这些数据集合并在一起。

这比仅仅是一个t-sql解决方案要复杂得多，我不知道这对你的数据是否有效，数据分发和锁定都可能是问题。

Answer 3

我偶然发现了一个类似的问题，在仔细分析了UNION ALL Performance IN SQL Server 2005中的情况以及问题后，在我看来，在UNION ALL查询中引用cte会关闭并行化（很可能它是一个缺陷）。

为什么CTE（递归）没有并行化（MAXDOP = 8）？

3 个答案: