Question

我有简单的查询（Postgres 9.4）：

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    bo_labels L
    LEFT JOIN bo_party party ON (party.id = L.bo_party_fkey)
    LEFT JOIN bo_document_base D ON (D.id = L.bo_doc_base_fkey)
    LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = D.id)
WHERE
    party.inn = '?'

解释如下：

QUERY PLAN                                                                                                                                                                                           
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 Aggregate  (cost=2385.30..2385.30 rows=1 width=0) (actual time=31762.367..31762.367 rows=1 loops=1)                                                                                                  
   ->  Nested Loop Left Join  (cost=1.28..2385.30 rows=1 width=0) (actual time=7.621..31760.776 rows=1694 loops=1)                                                                                    
         Join Filter: ((c.bo_document_fkey)::text = (d.id)::text)                                                                                                                                     
         Rows Removed by Join Filter: 101658634                                                                                                                                                       
         ->  Nested Loop Left Join  (cost=1.28..106.33 rows=1 width=10) (actual time=0.110..54.635 rows=1694 loops=1)                                                                                 
               ->  Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)                                                                                       
                     ->  Index Scan using bo_party_inn_idx on bo_party party  (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)                                            
                           Index Cond: (inn = '2534005760'::text)                                                                                                                                     
                     ->  Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l  (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3) 
                           Index Cond: (bo_party_fkey = (party.id)::text)                                                                                                                             
                           Heap Fetches: 0                                                                                                                                                            
               ->  Index Only Scan using bo_document_pkey on bo_document_base d  (cost=0.43..0.64 rows=1 width=10) (actual time=0.022..0.025 rows=1 loops=1694)                                       
                     Index Cond: (id = (l.bo_doc_base_fkey)::text)                                                                                                                                    
                     Heap Fetches: 1134                                                                                                                                                               
         ->  Seq Scan on bo_contract_hardwood_deal c  (cost=0.00..2069.77 rows=59770 width=9) (actual time=0.003..11.829 rows=60012 loops=1694)                                                       
 Planning time: 13.484 ms                                                                                                                                                                             
 Execution time: 31762.885 ms

http://explain.depesz.com/s/V2wn

令人讨厌的是行的错误估计：

Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)

因为postgres选择嵌套循环并且查询运行大约30秒。 SET LOCAL enable_nestloop = OFF;只需一秒即可完成。

有趣的是，我有default_statistics_target = 10000（最大值），而且所有4个表都在VACUUM VERBOSE ANALYZE之前运行。

对于postgres does not gather statistic between tables，此类案例的very likely possible to happens也适用于其他联接。

如果没有外部扩展pghintplan，则无法仅针对该查询更改enable_nestloop。

还有其他方法可以尝试强制使用更快速的方式来完成查询吗？

按评论更新

我不能以共同的方式消除加入。我的主要搜索是否有任何可能性更改统计（例如）包括打破正常统计外观的所需值？可能是强迫postgres改变nested loops的重量以便不经常使用它的其他方法吗？

还有人可以解释或指向文档如何postgres分析器嵌套循环的两个结果与3（完全正确）和1289（这将真正565，但实际上这样的错误不同的问题）行做出假设，结果将只是1排???我已经谈到了计划的这一部分：

           ->  Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)                                                                                       
                 ->  Index Scan using bo_party_inn_idx on bo_party party  (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)                                            
                       Index Cond: (inn = '2534005760'::text)                                                                                                                                     
                 ->  Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l  (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3) 
                       Index Cond: (bo_party_fkey = (party.id)::text)

乍一看，它看起来最初是错误的。那里使用了哪些统计数据？ postgres是否还维护一些索引统计数据？

Answer 1

实际上，我没有好的样本数据来测试我的答案，但我认为这可能会有所帮助。

根据您的加入列，我假设以下关系基数：

1) bo_party (id 1:N bo_party_fkey) bo_labels 
2) bo_labels (bo_doc_base_fkey N:1 id) bo_document_base 
3) bo_document_base (id 1:N bo_document_fkey) bo_contract_hardwood_deal

您想要计算选择的行数。因此，根据 1）和 2）中的基数，表格＆＃34; bo_labels＆＃34;有很多很多关系。这意味着加入＆＃34; bo_party＆＃34;和＆＃34; bo_document_base＆＃34;将不会产生比表中现有行更多的行。

但是，在加入＆＃34; bo_document_base＆＃34;之后，另外一次加入＆＃34; bo_contract_hardwood_deal＆＃34; 3）中描述的基数是一对多，可能在最终结果中产生更多行。

这样，要找到正确的行数，您可以将连接结构简化为＆＃34; bo_labels＆＃34;和＆＃34; bo_contract_hardwood_deal＆＃34;通过：

4) bo_labels (bo_doc_base_fkey 1:N bo_document_fkey) bo_contract_hardwood_deal

示例查询可以是以下之一：

SELECT COUNT(*)
FROM bo_labels L 
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey =     L.bo_doc_base_fkey)
WHERE 1=1
  and exists
  (
    select 1
    from bo_party party 
    where 1=1
      and party.id = L.bo_party_fkey
      and party.inn = '?'
  )
;

或

SELECT sum((select COUNT(*) from bo_contract_hardwood_deal C where C.bo_document_fkey = L.bo_doc_base_fkey)) 
FROM bo_labels L 
WHERE 1=1
  and exists
  (
    select 1
    from bo_party party 
    where 1=1
      and party.id = L.bo_party_fkey
      and party.inn = '?'
  )
;

我无法使用大型表格进行测试，因此我不确切知道它是否会提高原始查询的效果，但我认为这可能有所帮助。

联接

按评论更新

1 个答案: