使用递归CTE函数检查每一行与每隔一行

时间:2015-01-23 09:45:15

标签: sql tsql recursion sql-server-2012 common-table-expression

我正在尝试更新表格,以便标记任何具有重复名称条目的条目。我做了一些处理来删除一些常见的前缀和后缀,然后可以使用模糊匹配CLR相互运行两个名称。我把它写成嵌套游标,目前需要4个小时来运行所有记录,因为我必须检查每一行与其他每一行。我已经读过使用递归CTE可以显着提高性能,但是我有点像SQL菜鸟并且不能完全使用它。我想我需要将一个递归CTE嵌套到另一个中,但不确定如何。

目前我有类似的事情:

;WITH AllOrgs (CompanyId, CompanyRoleId, Name, Recognized, Level)
AS 
(
    SELECT C.CompanyId, C.CompanyRoleId, C.Name, C.Recognized, 1
    FROM Company O
    WHERE DuplicateOfCompanyId IS NULL
    UNION ALL
    SELECT C.CompanyId, C.CompanyRoleId, C.Name, R.Recognized, R.Level + 1
    FROM AllOrgs R INNER JOIN Company C
    ON C.CompanyId = R.CompanyId
), 
DuplicateOrgs (CompanyId, CompanyRoleId, Name, Recognized, Level)
As 
(
    SELECT * FROM AllOrgs
    WHERE Recognized = 0 -- Recognized is what the companies are marked when we are satisfied they aren't incorrect
)
UPDATE O
SET C.DuplicateOfCompanyId = A.CompanyId
FROM Company O JOIN DuplicateOrgs A
ON C.CompanyId = A.CompanyID
WHERE master.dbo.fnClrFuzzyMatch(dbo.fnCleanUpCompanyName(A.Name), dbo.fnCleanUpCompanyName(C.Name)) 
    > @CompanyNameMatchValueThreshold
AND A.CompanyRoleID = C.CompanyRoleId -- Role ID must match as duplicates who provide a different function are fine

但每当我尝试运行它时,我得到一个“语句终止。最大递归100在语句完成之前已经用完了。”所以我显然做了一些愚蠢的事。

1 个答案:

答案 0 :(得分:0)

您的递归不会终止,因为您始终使用新级别插入锚点值本身。公司中只有一行的示例:

执行锚点后的AllOrgs:
CompanyId1,CompanyRoleId1,name1,Recognized1,1

递归后的AllOrgs 1:
CompanyId1,CompanyRoleId1,name1,Recognized1,1 CompanyId1,CompanyRoleId1,name1,Recognized1,2

递归后的AllOrgs 2:
CompanyId1,CompanyRoleId1,name1,Recognized1,1 CompanyId1,CompanyRoleId1,name1,Recognized1,2 CompanyId1,CompanyRoleId1,name1,Recognized1,3

...

尝试自我加入:

UPDATE C 
SET DuplicateOfCompanyId = Dup.CompanyId
FROM Company C
JOIN Company Dup ON C.CompanyId <> Dup.CompanyID 
    AND master.dbo.fnClrFuzzyMatch(dbo.fnCleanUpCompanyName(C.Name), dbo.fnCleanUpCompanyName(Dup.Name)) > @CompanyNameMatchValueThreshold
    AND C.CompanyRoleID = Dup.CompanyRoleId

注意:如果公司有多个副本,则duplicateOfCompanyId可能是任意的且不一致。