我有这两个表,想删除ms_author中没有作者的所有作者。
author
(1.6M行)
+-------+-------------+------+-----+-------+
| Field | Type | Null | Key | index |
+-------+-------------+------+-----+-------+
| id | text | NO | PRI | true |
| name | text | YES | | |
+-------+-------------+------+-----+-------+
ms_author
(120M行)
+-------+-------------+------+-----+-------+
| Field | Type | Null | Key | index |
+-------+-------------+------+-----+-------+
| id | text | NO | PRI | |
| name | text | YES | | true |
+-------+-------------+------+-----+-------+
这是我的疑问:
DELETE
FROM ms_author AS m
WHERE m.name NOT IN
(SELECT a.name
FROM author AS a);
我试着估计查询持续时间:~130小时 有没有更快的方法来实现这一目标?
编辑:
EXPLAIN VERBOSE
输出
Delete on public.ms_author m (cost=0.00..2906498718724.75 rows=59946100 width=6)"
-> Seq Scan on public.ms_author m (cost=0.00..2906498718724.75 rows=59946100 width=6)"
Output: m.ctid"
Filter: (NOT (SubPlan 1))"
SubPlan 1"
-> Materialize (cost=0.00..44334.43 rows=1660295 width=15)"
Output: a.name"
-> Seq Scan on public.author a (cost=0.00..27925.95 rows=1660295 width=15)"
Output: a.name"
索引作者(name
):
create index author_name on author(name);
索引ms_author(名称):
create index ms_author_name on ms_author(name);
答案 0 :(得分:5)
我是“反加入”的狂热粉丝。这对大型和小型数据集都有效:
delete from ms_author ma
where not exists (
select null
from author a
where ma.name = a.name
)
答案 1 :(得分:0)
使用NOT IN
删除查询通常会导致嵌套循环反连接,从而导致性能下降。您可以按如下方式重写查询:
你可以这样写:
DELETE FROM ms_author AS m
WHERE m.id IN
(SELECT m.id FROM ms_author AS m
LEFT JOIN author AS a ON m.name = a.name
WHERE a.name IS NULL);
这种方法还有一个额外的好处,就是您使用的是主键' id'删除行,这应该快得多。