我正在使用postgresql。 要删除表中的重复项,请使用以下查询:
DELETE FROM dups a USING (
SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid
参考:https://stackoverflow.com/a/12963112/4940278
但是,有一个表ref_table
也引用了dups.id
。在删除重复项之前,我需要更新其他表。
用重复的ID更新参考表的查询是什么,从而不会丢失数据?
例如:
表1,说dups
id key
1 Luna
2 Hermione
3 Luna
表2中的ref_table
id dups_id data
1 2 Auror
2 1 Researcher
现在,删除重复项的查询将删除dups表中ID为1的记录,因为它是重复项。
但是,该记录在ref_table
中被引用,因此我需要使用将要保留的记录进行更新。
即,表1应该变为:
id key
2 Hermione
3 Luna
,表2应该变成:
id dups_id data
1 2 Auror
2 3 Researcher
答案 0 :(得分:1)
使用CTE识别以dups维护的行,然后更新参考行,以使FK仅指向它们,最后删除不再需要的行。
with keeper as -- identify dups rows to be kept
( select id, key
, max(id) over(partition by key) mid
from dups)
, u as -- update ref so dup_id references only those being kept
( update ref r
set dup_id = (select k.mid
from keeper k
join dups d
on (k.id=d.id)
where r.dup_id != k.mid
and r.dup_id = k.id
)
)
delete -- final target remove dups rows no lnger needed
from dups d
where d.id not in (select mid from keeper);
答案 1 :(得分:0)
您可以使用CTE:
with d as (
DELETE FROM dups a USING
(SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key AND a.ctid <> b.ctid
RETURNING *
)
update othertable
set . . .
from d
where . . .;