使用NOT IN(SELECT ...)执行DELETE的性能

时间:2015-12-14 09:11:07

标签: sql postgresql sql-delete postgresql-performance

我有这两个表,想删除ms_author中没有作者的所有作者。

author (1.6M行)

+-------+-------------+------+-----+-------+
| Field | Type        | Null | Key | index |
+-------+-------------+------+-----+-------+
| id    | text        | NO   | PRI | true  |
| name  | text        | YES  |     |       |
+-------+-------------+------+-----+-------+

ms_author (120M行)

+-------+-------------+------+-----+-------+
| Field | Type        | Null | Key | index |
+-------+-------------+------+-----+-------+
| id    | text        | NO   | PRI |       |
| name  | text        | YES  |     | true  |
+-------+-------------+------+-----+-------+

这是我的疑问:

    DELETE
FROM ms_author AS m
WHERE m.name NOT IN
                   (SELECT a.name
                    FROM author AS a);

我试着估计查询持续时间:~130小时 有没有更快的方法来实现这一目标?

编辑:

EXPLAIN VERBOSE输出

Delete on public.ms_author m  (cost=0.00..2906498718724.75 rows=59946100 width=6)"
  ->  Seq Scan on public.ms_author m  (cost=0.00..2906498718724.75 rows=59946100 width=6)"
        Output: m.ctid"
        Filter: (NOT (SubPlan 1))"
        SubPlan 1"
          ->  Materialize  (cost=0.00..44334.43 rows=1660295 width=15)"
                Output: a.name"
                ->  Seq Scan on public.author a  (cost=0.00..27925.95 rows=1660295 width=15)"
                      Output: a.name"

索引作者(name):

create index author_name on author(name);

索引ms_author(名称):

create index ms_author_name on ms_author(name);

2 个答案:

答案 0 :(得分:5)

我是“反加入”的狂热粉丝。这对大型和小型数据集都有效:

delete from ms_author ma
where not exists (
  select null
  from author a
  where ma.name = a.name
)

答案 1 :(得分:0)

使用NOT IN删除查询通常会导致嵌套循环反连接,从而导致性能下降。您可以按如下方式重写查询:

你可以这样写:

DELETE FROM ms_author AS m
WHERE m.id IN
               (SELECT m.id FROM ms_author AS m
                LEFT JOIN author AS a ON m.name = a.name
                WHERE a.name IS NULL);

这种方法还有一个额外的好处,就是您使用的是主键' id'删除行,这应该快得多。