Question

我有办法强制在Postgres中使用特定的连接顺序吗？

我有一个看起来像这样的查询。我已经删除了一些真实查询中的东西，但这种简化证明了这个问题。剩下的不应该太神秘：使用角色/任务安全系统，我试图确定给定用户是否有权执行给定任务。

select task.taskid
from userlogin
join userrole using (userloginid)
join roletask using (roleid)
join task using (taskid)
where loginname='foobar'
and taskfunction='plugh'

但我意识到该程序已经知道了userlogin的价值，因此通过跳过userlogin上的查找并只填写userloginid，可以提高查询效率，如下所示：

select task.taskid
from userrole
join roletask using (roleid)
join task using (taskid)
where userloginid=42
and taskfunction='plugh'

当我这样做时 - 从查询中删除一个表并对从该表中检索到的值进行硬编码 - 解释计划时间上升了！在原始查询中，Postgres读取userlogin然后userrole然后roletask然后task。但在新查询中，它决定先读取roletask，然后加入userrole，即使这需要对roletask进行全文扫描。

完整的解释计划是：

版本1：

Hash Join  (cost=12.79..140.82 rows=1 width=8) 
  Hash Cond: (roletask.taskid = task.taskid) 
  ->  Nested Loop  (cost=4.51..129.73 rows=748 width=8) 
        ->  Nested Loop  (cost=4.51..101.09 rows=12 width=8) 
              ->  Index Scan using idx_userlogin_loginname on userlogin  (cost=0.00..8.27 rows=1 width=8) 
                    Index Cond: ((loginname)::text = 'foobar'::text) 
              ->  Bitmap Heap Scan on userrole  (cost=4.51..92.41 rows=33 width=16) 
                    Recheck Cond: (userrole.userloginid = userlogin.userloginid) 
                    ->  Bitmap Index Scan on idx_userrole_login  (cost=0.00..4.50 rows=33 width=0) 
                          Index Cond: (userrole.userloginid = userlogin.userloginid) 
        ->  Index Scan using idx_roletask_role on roletask  (cost=0.00..1.50 rows=71 width=16) 
              Index Cond: (roletask.roleid = userrole.roleid) 
  ->  Hash  (cost=8.27..8.27 rows=1 width=8) 
        ->  Index Scan using idx_task_taskfunction on task  (cost=0.00..8.27 rows=1 width=8) 
              Index Cond: ((taskfunction)::text = 'plugh'::text)

第2版：

Hash Join  (cost=96.58..192.82 rows=4 width=8) 
  Hash Cond: (roletask.roleid = userrole.roleid) 
  ->  Hash Join  (cost=8.28..104.10 rows=9 width=16) 
        Hash Cond: (roletask.taskid = task.taskid) 
        ->  Seq Scan on roletask  (cost=0.00..78.35 rows=4635 width=16) 
        ->  Hash  (cost=8.27..8.27 rows=1 width=8) 
              ->  Index Scan using idx_task_taskfunction on task  (cost=0.00..8.27 rows=1 width=8) 
                    Index Cond: ((taskfunction)::text = 'plugh'::text) 
  ->  Hash  (cost=87.92..87.92 rows=31 width=8) 
        ->  Bitmap Heap Scan on userrole  (cost=4.49..87.92 rows=31 width=8) 
              Recheck Cond: (userloginid = 42) 
              ->  Bitmap Index Scan on idx_userrole_login  (cost=0.00..4.49 rows=31 width=0) 
                    Index Cond: (userloginid = 42)

（是的，我知道在这两种情况下成本都很低，差异看起来并不重要。但这是在我从查询中删除了一堆额外的工作以简化我要发布的内容之后。真正的查询仍然不是很离谱，但我对这个原则更感兴趣。）

Answer 1

文档中的这个页面描述了如何防止PostgreSQL优化器重新排序连接表，允许您自己控制连接的顺序：

http://www.postgresql.org/docs/current/interactive/explicit-joins.html

Answer 2

您确定您的表格统计信息是最新的吗？当PostgreSQL基于成本的优化器失败时，这些微不足道的事情是一个非常好的迹象，表统计数据严重错误。通过覆盖内置的优化器来解决根本原因比解决它更好，因为问题也不可避免地会出现在其他地方。

在受影响的表上运行ANALYZE，看看它是否使PostgreSQL选择了不同的计划。如果它仍然选择愚蠢的东西，那么查看查询计划会非常有趣。优化器没有做正确的事情通常被认为是一个错误。

postgres中的表连接顺序

2 个答案: