您正在运行以下查询以识别重复记录。
SELECT *
FROM unique2 P WHERE EXISTS(SELECT 1 FROM unique2 C
WHERE ( (C.surname) = (P.surname))
AND ( (C.postcode) = (P.postcode))
AND ((( (C.forename) IS NULL OR (P.forename) IS NULL)
AND (C.initials) = (P.initials))
OR (C.forename) = (P.forename))
AND ( (C.sex) = (P.sex)
OR (C.title) = (P.title))
AND (( (C.address1))=( (P.address1))
OR ( (C.address1))=( (P.address2))
OR ( (C.address2))=( (P.address1))
OR instr(C.address1_notrim, P.address1_notrim) > 0
OR instr(P.address1_notrim, C.address1_notrim) > 0)
AND C.rowid < P.rowid);
但是使用此查询,我无法识别与重复记录匹配的唯一记录ID。有没有办法确定 重复以及这些重复项匹配的唯一记录ID(我的表有唯一键)?
答案 0 :(得分:1)
select id
from promolog
where surname, postcode, dob in (
select surname, postcode,dob
from (
select surname, postcode, dob, count(1)
from promolog
group by surname,postcode,dob
having count(1) > 1
)
)
答案 1 :(得分:1)
您也可以使用分析函数执行此操作:
select id, num_of_ids, first_id, surname, postcode, dob
from (
select id,
count(*) over (partition by surname, postcode, dob) as num_of_ids,
first_value(id)
over (partition by surname, postcode, dob order by id) as first_id,
surname,
postcode,
dob
from promolog
)
where num_of_ids > 1;
根据您的更新,我认为您可以进行自我加入,您可以根据自己的喜好进行自我加入:
select dup.*, master.id as duplicate_of
from promolog dup
join promolog master
on master.surname = dup.surname
and master.postcode = dup.postcode
and master.dob = dup.dob
... and <address checks etc. > ...
and master.rowid < dup.rowid;
但也许我还在遗漏一些东西。顾名思义,exists
用于测试匹配记录的存在;如果你想从匹配的记录中检索任何数据,那么你需要在某个时候加入它。