Postgres在重新运行时使用无法执行的计划

时间:2017-06-20 21:01:48

标签: postgresql query-optimization

我正在导入一个非圆形图形,并将每个代码的祖先展平为一个数组。这样工作正常(稍微):对于超过900k边缘的400k代码,约为45s。

然而,在第一次成功执行后,Postgres决定停止使用Nested Loop并且更新查询性能急剧下降:每个代码约2秒。

我可以通过在更新前设置vacuum来强制解决问题,但我很好奇为什么会发生未优化。

DROP TABLE IF EXISTS tmp_anc;
DROP TABLE IF EXISTS tmp_rel;
DROP TABLE IF EXISTS tmp_edges;
DROP TABLE IF EXISTS tmp_codes; 

CREATE TABLE tmp_rel (
  from_id BIGINT,
  to_id   BIGINT,
);

COPY tmp_rel FROM 'rel.txt' WITH DELIMITER E'\t' CSV HEADER;

CREATE TABLE tmp_edges(
  start_node BIGINT,
  end_node   BIGINT
);

INSERT INTO tmp_edges(start_node, end_node) 
  SELECT from_id AS start_node, to_id AS end_node 
  FROM   tmp_rel;

CREATE INDEX tmp_edges_end ON tmp_edges (end_node);

CREATE TABLE tmp_codes (
  id     BIGINT,
  active SMALLINT,
);

COPY tmp_codes FROM 'codes.txt' WITH DELIMITER E'\t' CSV HEADER;

CREATE TABLE tmp_anc(
   code      BIGINT,
   ancestors BIGINT[]
);

INSERT INTO tmp_anc 
  SELECT DISTINCT(id) 
  FROM   tmp_codes 
  WHERE  active = 1;

CREATE INDEX tmp_anc_codes ON tmp_anc_codes (code);

VACUUM; -- Need this for the update to execute optimally

UPDATE tmp_anc sa SET ancestors = (
  WITH RECURSIVE ancestors(code) AS (
    SELECT start_node FROM tmp_edges WHERE end_node = sa.code
  UNION
    SELECT se.start_node
    FROM   tmp_edges se, ancestors a
    WHERE  se.end_node = a.code
  )
  SELECT array_agg(code) FROM ancestors
);

表统计数据:

tmp_rel     507 MB  0 bytes
tmp_edges   74 MB   37 MB
tmp_codes   32 MB   0 bytes
tmp_anc     22 MB   8544 kB

说明:

在UPDATE之前没有VACUUM:

Update on tmp_anc sa  (cost=10000000000.00..11081583053.74 rows=10 width=46) (actual time=38294.005..38294.005 rows=0 loops=1)
  ->  Seq Scan on tmp_anc sa  (cost=10000000000.00..11081583053.74 rows=10 width=46) (actual time=3300.974..38292.613 rows=10 loops=1)
        SubPlan 2
          ->  Aggregate  (cost=108158305.25..108158305.26 rows=1 width=32) (actual time=3829.253..3829.253 rows=1 loops=10)
                CTE ancestors
                  ->  Recursive Union  (cost=81.97..66015893.05 rows=1872996098 width=8) (actual time=0.037..3827.917 rows=45 loops=10)
                        ->  Bitmap Heap Scan on tmp_edges  (cost=81.97..4913.18 rows=4328 width=8) (actual time=0.022..0.022 rows=2 loops=10)
                              Recheck Cond: (end_node = sa.code)
                              Heap Blocks: exact=12
                              ->  Bitmap Index Scan on tmp_edges_end  (cost=0.00..80.89 rows=4328 width=0) (actual time=0.014..0.014 rows=2 loops=10)
                                    Index Cond: (end_node = sa.code)
                        ->  Merge Join  (cost=4198.89..2855105.79 rows=187299177 width=8) (actual time=163.746..425.295 rows=10 loops=90)
                              Merge Cond: (a.code = se.end_node)
                              ->  Sort  (cost=4198.47..4306.67 rows=43280 width=8) (actual time=0.012..0.016 rows=5 loops=90)
                                    Sort Key: a.code
                                    Sort Method: quicksort  Memory: 25kB
                                    ->  WorkTable Scan on ancestors a  (cost=0.00..865.60 rows=43280 width=8) (actual time=0.000..0.001 rows=5 loops=90)
                              ->  Materialize  (cost=0.42..43367.08 rows=865523 width=16) (actual time=0.010..337.592 rows=537171 loops=90)
                                    ->  Index Scan using tmp_edges_end on edges se  (cost=0.42..41203.27 rows=865523 width=16) (actual time=0.009..247.547 rows=537171 loops=90)
                ->  CTE Scan on ancestors  (cost=0.00..37459921.96 rows=1872996098 width=8) (actual time=1.227..3829.159 rows=45 loops=10)

在UPDATE之前使用VACUUM:

Update on tmp_anc sa  (cost=0.00..2949980136.43 rows=387059 width=14) (actual time=74701.329..74701.329 rows=0 loops=1)
  ->  Seq Scan on tmp_anc sa  (cost=0.00..2949980136.43 rows=387059 width=14) (actual time=0.336..70324.848 rows=387059 loops=1)
        SubPlan 2
          ->  Aggregate  (cost=7621.50..7621.51 rows=1 width=8) (actual time=0.180..0.180 rows=1 loops=387059)
                CTE ancestors
                  ->  Recursive Union  (cost=0.42..7583.83 rows=1674 width=8) (actual time=0.005..0.162 rows=32 loops=387059)
                        ->  Index Scan using tmp_edges_end on tmp_edges  (cost=0.42..18.93 rows=4 width=8) (actual time=0.004..0.005 rows=2 loops=387059)
                              Index Cond: (end_node = sa.code)
                        ->  Nested Loop  (cost=0.42..753.14 rows=167 width=8) (actual time=0.003..0.019 rows=10 loops=2700448)
                              ->  WorkTable Scan on ancestors a  (cost=0.00..0.80 rows=40 width=8) (actual time=0.000..0.001 rows=5 loops=2700448)
                              ->  Index Scan using tmp_edges_end on tmp_edges se  (cost=0.42..18.77 rows=4 width=16) (actual time=0.003..0.003 rows=2 loops=12559395)
                                    Index Cond: (end_node = a.code)
                ->  CTE Scan on ancestors  (cost=0.00..33.48 rows=1674 width=8) (actual time=0.007..0.173 rows=32 loops=387059)

1 个答案:

答案 0 :(得分:0)

第一个执行计划确实有不好的估计(Bitmap Index Scan on tmp_edges_end估计4328而不是2行),而第二个执行具有良好的估计,因此选择了一个好的计划。 因此,您在上面引用的两次执行之间的某些内容必须改变估计值。

此外,你说第一次执行UPDATE(我们没有EXPLAIN (ANALYZE)输出)很快。

初始性能下降的唯一好解释是,autovacuum守护程序需要一些时间来收集新表的统计信息。这通常会提高查询性能,但当然它也可以反过来。

此外,VACUUM通常无法解决性能问题。可能是你使用了VACUUM (ANALYZE)

在初始UPDATE

之前收集统计数据时,了解事情的方式会很有趣
ANALYZE tmp_edges;

但是,当我更仔细地阅读您的查询时,我想知道为什么要使用相关的子查询。也许这样做会更快:

UPDATE tmp_anc sa
   SET ancestors = a.codes
   FROM (WITH RECURSIVE ancestors(code, start_node) AS
            (SELECT tmp_anc.code, tmp_edges.start_node
                FROM tmp_edges
                   JOIN tmp_anc ON tmp_edges.end_node = tmp_anc.code
             UNION
             SELECT a.code, se.start_node
                FROM tmp_edges se
                   JOIN ancestors a ON se.end_node = a.code
            )
         SELECT code,
                array_agg(start_node) AS codes
            FROM ancestors
            GROUP BY (code)
        ) a
   WHERE sa.code = a.code;

(这是未经测试的,因此可能存在错误。)