Postgresql从存在的地方慢删除

时间:2017-11-20 22:22:14

标签: sql postgresql sql-delete

我遇到慢速删除查询的问题。我有一个模式,说“目标”包含所有在另一个表中具有等效表(相同列和主键)的表,比如说“delta”。我现在想要删除目标模式中出现在增量模式中的所有行。我尝试使用DELETE FROM WHERE EXISTS方法,但这似乎非常慢。这是一个示例查询:

DELETE FROM "target".name2phoneme
WHERE EXISTS(
  SELECT 1 FROM delta.name2phoneme d 
  WHERE name2phoneme.NAME_ID = d.NAME_ID 
  AND name2phoneme.PHONEME_ID = d.PHONEME_ID
);

这是两个表的布局(除了“delta”模式只有主键而没有外键)

CREATE TABLE name2phoneme
(
  name_id uuid NOT NULL,
  phoneme_id uuid NOT NULL,
  seq_num numeric(3,0),
  CONSTRAINT pk_name2phoneme PRIMARY KEY (name_id, phoneme_id),
  CONSTRAINT fk_name2phoneme_name_id_2_name FOREIGN KEY (name_id)
    REFERENCES name (name_id) MATCH SIMPLE
    ON UPDATE NO ACTION
    ON DELETE NO ACTION
    DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT fk_name2phoneme_phoneme_id_2_phoneme FOREIGN KEY (phoneme_id)
    REFERENCES phoneme (phoneme_id) MATCH SIMPLE
    ON UPDATE NO ACTION
    ON DELETE NO ACTION
    DEFERRABLE INITIALLY DEFERRED
)

“target”表最初包含超过18M的行,而delta表包含大约3.7M行(将从目标中删除)。

以下是上述查询的EXPLAIN输出:

"Delete on name2phoneme  (cost=154858.03..1068580.46 rows=6449114 width=12)"
"  ->  Hash Join  (cost=154858.03..1068580.46 rows=6449114 width=12)"
"        Hash Cond: ((name2phoneme.name_id = d.name_id) AND (name2phoneme.phoneme_id = d.phoneme_id))"
"        ->  Seq Scan on name2phoneme  (cost=0.00..331148.16 rows=18062616 width=38)"
"        ->  Hash  (cost=69000.01..69000.01 rows=3763601 width=38)"
"              ->  Seq Scan on name2phoneme d  (cost=0.00..69000.01 rows=3763601 width=38)"

我试图解析分析上面的查询,但执行过了2个小时,所以我杀了它。

关于如何优化此操作的任何想法?

2 个答案:

答案 0 :(得分:1)

删除370万行非常耗时,因为查找每一行然后记录和删除行的开销很大。只考虑所有脏页,日志记录和缓存未命中是令人难以置信的 - 更不用说对索引的更新了。

出于这个原因,这样的事情会快得多:

create temporary table temp_n2p as 
    select n2p.*
    from "target".name2phoneme n2p
    where not exists (select 1
                      from delta.name2phoneme d 
                      where n2p.NAME_ID = d.NAME_ID and
                            n2p.PHONEME_ID = d.PHONEME_ID
                     );

truncate table "target".name2phoneme;

insert into "target".name2phoneme
    select *
    from temp_n2p;

您还应该在截断之前删除索引,然后再重新创建它们。

答案 1 :(得分:0)

您是否尝试过以下任何一种方法:

DELETE 
FROM "target".name2phoneme t  
     USING delta.name2phoneme d 
WHERE t.NAME_ID = d.NAME_ID 
      AND t.PHONEME_ID = d.PHONEME_ID               
;

或使用WITH,但Postgres确实实现了CTE,因此我不相信这对您的需求是明智的。

WITH cte AS (
      SELECT t.name_id, t.phoneme_id
      FROM "target".name2phoneme t  
      INNER JOIN delta.name2phoneme d ON t.NAME_ID = d.NAME_ID 
                            AND t.PHONEME_ID = d.PHONEME_ID               
      )
DELETE FROM "target".name2phoneme t
     USING cte d
WHERE t.NAME_ID = d.NAME_ID 
      AND t.PHONEME_ID = d.PHONEME_ID               
;