我正在尝试更新表格,以便标记任何具有重复名称条目的条目。我做了一些处理来删除一些常见的前缀和后缀,然后可以使用模糊匹配CLR相互运行两个名称。我把它写成嵌套游标,目前需要4个小时来运行所有记录,因为我必须检查每一行与其他每一行。我已经读过使用递归CTE可以显着提高性能,但是我有点像SQL菜鸟并且不能完全使用它。我想我需要将一个递归CTE嵌套到另一个中,但不确定如何。
目前我有类似的事情:
;WITH AllOrgs (CompanyId, CompanyRoleId, Name, Recognized, Level)
AS
(
SELECT C.CompanyId, C.CompanyRoleId, C.Name, C.Recognized, 1
FROM Company O
WHERE DuplicateOfCompanyId IS NULL
UNION ALL
SELECT C.CompanyId, C.CompanyRoleId, C.Name, R.Recognized, R.Level + 1
FROM AllOrgs R INNER JOIN Company C
ON C.CompanyId = R.CompanyId
),
DuplicateOrgs (CompanyId, CompanyRoleId, Name, Recognized, Level)
As
(
SELECT * FROM AllOrgs
WHERE Recognized = 0 -- Recognized is what the companies are marked when we are satisfied they aren't incorrect
)
UPDATE O
SET C.DuplicateOfCompanyId = A.CompanyId
FROM Company O JOIN DuplicateOrgs A
ON C.CompanyId = A.CompanyID
WHERE master.dbo.fnClrFuzzyMatch(dbo.fnCleanUpCompanyName(A.Name), dbo.fnCleanUpCompanyName(C.Name))
> @CompanyNameMatchValueThreshold
AND A.CompanyRoleID = C.CompanyRoleId -- Role ID must match as duplicates who provide a different function are fine
但每当我尝试运行它时,我得到一个“语句终止。最大递归100在语句完成之前已经用完了。”所以我显然做了一些愚蠢的事。
答案 0 :(得分:0)
您的递归不会终止,因为您始终使用新级别插入锚点值本身。公司中只有一行的示例:
执行锚点后的AllOrgs:
CompanyId1,CompanyRoleId1,name1,Recognized1,1
递归后的AllOrgs 1:
CompanyId1,CompanyRoleId1,name1,Recognized1,1
CompanyId1,CompanyRoleId1,name1,Recognized1,2
递归后的AllOrgs 2:
CompanyId1,CompanyRoleId1,name1,Recognized1,1
CompanyId1,CompanyRoleId1,name1,Recognized1,2
CompanyId1,CompanyRoleId1,name1,Recognized1,3
...
尝试自我加入:
UPDATE C
SET DuplicateOfCompanyId = Dup.CompanyId
FROM Company C
JOIN Company Dup ON C.CompanyId <> Dup.CompanyID
AND master.dbo.fnClrFuzzyMatch(dbo.fnCleanUpCompanyName(C.Name), dbo.fnCleanUpCompanyName(Dup.Name)) > @CompanyNameMatchValueThreshold
AND C.CompanyRoleID = Dup.CompanyRoleId
注意:如果公司有多个副本,则duplicateOfCompanyId可能是任意的且不一致。