Question

使用PostgreSQL 8.4.9，我对查询的PostgreSQL性能有一个奇怪的问题。此查询正在选择3D卷中的一组点，使用LEFT OUTER JOIN添加相关ID列，其中存在相关ID。 x范围内的小变化可能导致PostgreSQL选择不同的查询计划，执行时间从0.01秒到50秒。这是有问题的查询：

SELECT treenode.id AS id,
       treenode.parent_id AS parentid,
       (treenode.location).x AS x,
       (treenode.location).y AS y,
       (treenode.location).z AS z,
       treenode.confidence AS confidence,
       treenode.user_id AS user_id,
       treenode.radius AS radius,
       ((treenode.location).z - 50) AS z_diff,
       treenode_class_instance.class_instance_id AS skeleton_id
  FROM treenode LEFT OUTER JOIN
         (treenode_class_instance INNER JOIN
          class_instance ON treenode_class_instance.class_instance_id
                                                  = class_instance.id
                            AND class_instance.class_id = 7828307)
       ON (treenode_class_instance.treenode_id = treenode.id
           AND treenode_class_instance.relation_id = 7828321)
  WHERE treenode.project_id = 4
    AND (treenode.location).x >= 8000
    AND (treenode.location).x <= (8000 + 4736)
    AND (treenode.location).y >= 22244
    AND (treenode.location).y <= (22244 + 3248)
    AND (treenode.location).z >= 0
    AND (treenode.location).z <= 100
  ORDER BY parentid DESC, id, z_diff
  LIMIT 400;

该查询需要将近一分钟，如果我将EXPLAIN添加到该查询的前面，则似乎使用以下查询计划：

 Limit  (cost=56185.16..56185.17 rows=1 width=89)
   ->  Sort  (cost=56185.16..56185.17 rows=1 width=89)
         Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))
         ->  Nested Loop Left Join  (cost=6715.16..56185.15 rows=1 width=89)
               Join Filter: (treenode_class_instance.treenode_id = treenode.id)
               ->  Bitmap Heap Scan on treenode  (cost=148.55..184.16 rows=1 width=81)
                     Recheck Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision) AND ((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
                     Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))
                     ->  BitmapAnd  (cost=148.55..148.55 rows=9 width=0)
                           ->  Bitmap Index Scan on location_x_index  (cost=0.00..67.38 rows=2700 width=0)
                                 Index Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision))
                           ->  Bitmap Index Scan on location_z_index  (cost=0.00..80.91 rows=3253 width=0)
                                 Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
               ->  Hash Join  (cost=6566.61..53361.69 rows=211144 width=16)
                     Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id)
                     ->  Seq Scan on treenode_class_instance  (cost=0.00..25323.79 rows=969285 width=16)
                           Filter: (relation_id = 7828321)
                     ->  Hash  (cost=5723.54..5723.54 rows=51366 width=8)
                           ->  Seq Scan on class_instance  (cost=0.00..5723.54 rows=51366 width=8)
                                 Filter: (class_id = 7828307)
(20 rows)

但是，如果我将8000范围条件中的x替换为10644，则会在几分之一秒内执行查询并使用此查询计划：

 Limit  (cost=58378.94..58378.95 rows=2 width=89)
   ->  Sort  (cost=58378.94..58378.95 rows=2 width=89)
         Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))
         ->  Hash Left Join  (cost=57263.11..58378.93 rows=2 width=89)
               Hash Cond: (treenode.id = treenode_class_instance.treenode_id)
               ->  Bitmap Heap Scan on treenode  (cost=231.12..313.44 rows=2 width=81)
                     Recheck Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision) AND ((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))
                     Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))
                     ->  BitmapAnd  (cost=231.12..231.12 rows=21 width=0)
                           ->  Bitmap Index Scan on location_z_index  (cost=0.00..80.91 rows=3253 width=0)
                                 Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
                           ->  Bitmap Index Scan on location_x_index  (cost=0.00..149.95 rows=6157 width=0)
                                 Index Cond: (((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))
               ->  Hash  (cost=53361.69..53361.69 rows=211144 width=16)
                     ->  Hash Join  (cost=6566.61..53361.69 rows=211144 width=16)
                           Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id)
                           ->  Seq Scan on treenode_class_instance  (cost=0.00..25323.79 rows=969285 width=16)
                                 Filter: (relation_id = 7828321)
                           ->  Hash  (cost=5723.54..5723.54 rows=51366 width=8)
                                 ->  Seq Scan on class_instance  (cost=0.00..5723.54 rows=51366 width=8)
                                       Filter: (class_id = 7828307)
(21 rows)

我远不是解析这些查询计划的专家，但明显的区别似乎是x范围只有Hash Left Join LEFT OUTER JOIN（这是Nested Loop Left Join非常快），而在另一个范围内，它使用SET ENABLE_NESTLOOP TO FALSE（似乎非常慢）。在这两种情况下，查询返回大约90行。如果我在查询的慢速版本之前Table "public.treenode" Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------ id | bigint | not null default nextval('concept_id_seq'::regclass) user_id | bigint | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | bigint | not null location | double3d | not null parent_id | bigint | radius | double precision | not null default 0 confidence | integer | not null default 5 Indexes: "treenode_pkey" PRIMARY KEY, btree (id) "treenode_id_key" UNIQUE, btree (id) "location_x_index" btree (((location).x)) "location_y_index" btree (((location).y)) "location_z_index" btree (((location).z)) Foreign-key constraints: "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id) Referenced by: TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE TABLE "treenode" CONSTRAINT "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id) Triggers: on_edit_treenode BEFORE UPDATE ON treenode FOR EACH ROW EXECUTE PROCEDURE on_edit() Inherits: location，它会非常快，但我理解using that setting in general is a bad idea。

例如，我可以创建一个特定的索引，以使查询规划者更有可能选择明显更有效的策略吗？有人可以建议为什么PostgreSQL的查询规划器应该为这些查询之一选择这么糟糕的策略吗？下面我列出了可能有用的架构的详细信息。

treenode表有900,000行，定义如下：

double3d

Composite type "public.double3d" Column | Type --------+------------------ x | double precision y | double precision z | double precision复合类型定义如下：

treenode_class_instance

联接中涉及的其他两个表是Table "public.treenode_class_instance" Column | Type | Modifiers -------------------+--------------------------+------------------------------------------------------ id | bigint | not null default nextval('concept_id_seq'::regclass) user_id | bigint | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | bigint | not null relation_id | bigint | not null treenode_id | bigint | not null class_instance_id | bigint | not null Indexes: "treenode_class_instance_pkey" PRIMARY KEY, btree (id) "treenode_class_instance_id_key" UNIQUE, btree (id) "idx_class_instance_id" btree (class_instance_id) Foreign-key constraints: "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE "treenode_class_instance_relation_id_fkey" FOREIGN KEY (relation_id) REFERENCES relation(id) "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE "treenode_class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id) Triggers: on_edit_treenode_class_instance BEFORE UPDATE ON treenode_class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit() Inherits: relation_instance：

class_instance

...和Table "public.class_instance" Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------ id | bigint | not null default nextval('concept_id_seq'::regclass) user_id | bigint | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | bigint | not null class_id | bigint | not null name | character varying(255) | not null Indexes: "class_instance_pkey" PRIMARY KEY, btree (id) "class_instance_id_key" UNIQUE, btree (id) Foreign-key constraints: "class_instance_class_id_fkey" FOREIGN KEY (class_id) REFERENCES class(id) "class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id) Referenced by: TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_a_fkey" FOREIGN KEY (class_instance_a) REFERENCES class_instance(id) ON DELETE CASCADE TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_b_fkey" FOREIGN KEY (class_instance_b) REFERENCES class_instance(id) ON DELETE CASCADE TABLE "connector_class_instance" CONSTRAINT "connector_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE Triggers: on_edit_class_instance BEFORE UPDATE ON class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit() Inherits: concept：

{{1}}

Answer 1

如果查询计划程序做出错误的决定，那么它主要是两件事之一：

1。统计信息不准确。

你运行ANALYZE了吗？同样受欢迎的是它的组合形式VACUUM ANALYZE。如果启用autovacuum（这是现代Postgres中的默认设置），则会自动运行ANALYZE。但请考虑：

Are regular VACUUM ANALYZE still recommended under 9.1?

^{（前两个答案仍适用于Postgres 9.6。）}

如果您的表格大且数据分发不正常，则提升default_statistics_target可能有所帮助。或者更确切地说，只需set the statistics target相关列（您的查询的WHERE或JOIN子句中的那些列）：

ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 400;  -- calibrate number

目标可以在0到10000范围内设置;

之后再次运行ANALYZE（在相关表格上）。

2。计划员估算的费用设置已关闭。

阅读手册中的Planner Cost Constants章节。

查看此generally helpful PostgreSQL Wiki page上的 default_statistics_target 和 random_page_cost 章节。

还有许多其他可能的原因，但这些是迄今为止最常见的原因。

Answer 2

我怀疑这与坏的统计信息有关，除非您考虑数据库统计信息和自定义数据类型的组合。

我的猜测是PostgreSQL正在选择一个嵌套循环连接，因为它会查看谓词(treenode.location).x >= 8000 AND (treenode.location).x <= (8000 + 4736)，并在比较算法中做一些时髦的事情。当您在连接的内侧有少量数据时，通常会使用嵌套循环。

但是，一旦你将常数切换到10736，你就会得到一个不同的计划。该计划总是有可能足够复杂，以至于基因查询优化（GEQO）正在进行中，并且您正在看到非确定性计划构建的副作用。查询中的评估顺序存在足够的差异，使我认为这是正在发生的事情。

一种选择是使用参数化/预处理语句来检查，而不是使用ad hoc代码。由于您在三维空间中工作，您可能还想考虑使用PostGIS。虽然它可能过度，但它也可以为您提供使这些查询正常运行所需的性能。

虽然强制规划者行为不是最佳选择，但有时我们最终会做出比软件更好的决策。

Answer 3

Erwin对统计数据的评价。也：

ORDER BY parentid DESC, id, z_diff

排序

parentid DESC, id, z

可能会让优化者有更多的空间去洗牌。（我认为它不会太重要，因为它是最后一个术语，而且排序并不昂贵，但你可以尝试一下）

Answer 4

我不肯定它是你的问题的根源，但看起来在版本8.4.8和8.4.9之间的postgres查询规划器中做了一些更改。您可以尝试使用旧版本，看看它是否有所作为。

http://postgresql.1045698.n5.nabble.com/BUG-6275-Horrible-performance-regression-td4944891.html

如果您更改版本，请不要忘记重新分析您的表格。

Answer 5

+1用于调整统计目标并执行ANALYZE。对于PostGIS（对于OP）。

但是，与原始问题并不完全相关，但是，如果有人来到这里，通常来说，如何处理不正确的计划者行数估计值在复杂查询中，会导致不希望的结果计划。一种选择可能是将初始查询的一部分包装到一个函数中，并将其ROWS选项设置为或多或少期望的值。我从来没有做过，但是应该可以工作。

pg_hint_plan中也有行估计指令。我一般不会建议计划者提示，但是调整行估计是一个比较软的选择。

最后，要执行嵌套循环扫描，有时可能会对子查询中的LATERAL JOIN或LIMIT N进行OFFSET 0。那会给你你想要的。但是请注意，这是一个非常棘手的技巧。如果条件发生变化，由于表增长或只是不同的数据分布，在某些时候它将导致性能下降。仍然，这只是一个紧急解决方案，可能是一个好选择。

Answer 6

如果计划不好，你总是可以求助于 pg_hint_plan 扩展。它为 PostgreSQL 提供了 Oracle 风格的提示。

保持PostgreSQL有时选择错误的查询计划

6 个答案:

1。统计信息不准确。

2。计划员估算的费用设置已关闭。

保持PostgreSQL有时选择错误的查询计划

6 个答案:

1。 统计信息不准确。

2。计划员估算的费用设置已关闭。

1。统计信息不准确。