我的任务是创建一个可能的解决方案,以根据多个条件来检测数据管理系统中有多少重复的人。问题是,数据管理系统和已实施的解决方案不允许正确检测重复人员。我正在尝试运行一个自我自我连接多次的查询,目前需要3个小时才能完成。我发现,如果将查询分为多个查询,它们将在大约11秒内完成。我想将这些查询的结果插入到单个表中,或者如果原始人员ID和重复的人员ID与查询结果匹配,则更新单个表。
我尝试做一个简单的INSERT和UPDATE命令,但是没有按照我的期望去做。它表明它已经更新了各种记录,但是当我快速检查数据时,除了原始INSERT之外没有任何更新。我在下面包含了一些代码。
我希望在最终的“参与者匹配”表中列出所有可能重复的项目,并列出其“匹配”分数。当我运行3小时的查询时,它在各种列(名称,DOB,ID1,ID2等)中为我提供了所有带有“匹配”的重复项。如果表中已经存在ID对,我想运行多个查询并更新那些“匹配”列;如果ID对不存在,则插入整行。
任何想法都将不胜感激。
我也尝试过
UPDATE Table SET (…) WHERE Column1 = … IF @@RowCount = 0 INSERT INTO Table …
SELECT list1.Organization_Name
,list1.Person_Reference__
,list1.Participant_First_Name
,list1.Participant_Last_Name
,list1.Soundex_First_Name
FROM [Participants] list1, [Participants] list2
WHERE
/*Only bring in members who are part of open cases*/
list1.Open_Case_Reference IS NOT NULL
/*Only bring in members who do not "match" themselves*/
AND list1.Person_Reference__ <> list2.Person_Reference__
AND (
/*Include members who have a matching first name AND last name*/
(list1.Participant_First_Name = list2.Participant_First_Name AND list1.Participant_Last_Name = list2.Participant_Last_Name)
)
/*DO NOT Include members that have already been merged*/
AND NOT (CONCAT(cast(list1.[Person_Reference__] AS varchar(50)),cast(list2.[Person_Reference__] AS varchar(50))) IN (SELECT CONCAT(cast([Original_Reference__] AS varchar(50)),cast([Duplicate_Reference__] AS varchar(50)))
FROM [Merged Pairs])
OR CONCAT(cast(list2.[Person_Reference__] AS varchar(50)),cast(list1.[Person_Reference__] AS varchar(50))) IN (SELECT CONCAT(cast([Original_Reference__] AS varchar(50)),cast([Duplicate_Reference__] AS varchar(50)))
FROM [Merged Pairs]))
UPDATE [Participant Match] SET [Soundex First Name] = 'Match' WHERE CONCAT(CAST([Person Ref #] AS VARCHAR(50)),CAST([DUP Person Ref #] AS VARCHAR(50))) IN (SELECT CONCAT(CAST(list1.Person_Reference__ AS VARCHAR(50)),CAST(list2.Person_Reference__ AS VARCHAR(50)))
FROM [Participants] list1, [Participants] list2
WHERE
/*Only bring in members who are part of open cases*/
list1.Open_Case_Reference IS NOT NULL
/*Only bring in members who do not "match" themselves*/
AND list1.Person_Reference__ <> list2.Person_Reference__
AND (
/*Include members who have a matching Last Name and SOUNDEX of their first names match*/
(list1.Soundex_First_Name = list2.Soundex_First_Name AND list1.Participant_Last_Name = list2.Participant_Last_Name)
)
AND (CASE
WHEN list1.Soundex_First_Name = list2.Soundex_First_Name AND list1.Participant_Last_Name = list2.Participant_Last_Name
THEN 'Match'
ELSE ''
END) LIKE 'Match'
/*DO NOT Include members that have already been merged*/
AND NOT (CONCAT(cast(list1.[Person_Reference__] AS varchar(50)),cast(list2.[Person_Reference__] AS varchar(50))) IN (SELECT CONCAT(cast([Original_Reference__] AS varchar(50)),cast([Duplicate_Reference__] AS varchar(50)))
FROM [Merged Pairs])
OR CONCAT(cast(list2.[Person_Reference__] AS varchar(50)),cast(list1.[Person_Reference__] AS varchar(50))) IN (SELECT CONCAT(cast([Original_Reference__] AS varchar(50)),cast([Duplicate_Reference__] AS varchar(50)))
FROM [Merged Pairs])))```