联接

时间:2015-11-11 15:37:23

标签: sql performance postgresql

我有简单的查询(Postgres 9.4):

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    bo_labels L
    LEFT JOIN bo_party party ON (party.id = L.bo_party_fkey)
    LEFT JOIN bo_document_base D ON (D.id = L.bo_doc_base_fkey)
    LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = D.id)
WHERE
    party.inn = '?'

解释如下:

QUERY PLAN                                                                                                                                                                                           
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 Aggregate  (cost=2385.30..2385.30 rows=1 width=0) (actual time=31762.367..31762.367 rows=1 loops=1)                                                                                                  
   ->  Nested Loop Left Join  (cost=1.28..2385.30 rows=1 width=0) (actual time=7.621..31760.776 rows=1694 loops=1)                                                                                    
         Join Filter: ((c.bo_document_fkey)::text = (d.id)::text)                                                                                                                                     
         Rows Removed by Join Filter: 101658634                                                                                                                                                       
         ->  Nested Loop Left Join  (cost=1.28..106.33 rows=1 width=10) (actual time=0.110..54.635 rows=1694 loops=1)                                                                                 
               ->  Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)                                                                                       
                     ->  Index Scan using bo_party_inn_idx on bo_party party  (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)                                            
                           Index Cond: (inn = '2534005760'::text)                                                                                                                                     
                     ->  Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l  (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3) 
                           Index Cond: (bo_party_fkey = (party.id)::text)                                                                                                                             
                           Heap Fetches: 0                                                                                                                                                            
               ->  Index Only Scan using bo_document_pkey on bo_document_base d  (cost=0.43..0.64 rows=1 width=10) (actual time=0.022..0.025 rows=1 loops=1694)                                       
                     Index Cond: (id = (l.bo_doc_base_fkey)::text)                                                                                                                                    
                     Heap Fetches: 1134                                                                                                                                                               
         ->  Seq Scan on bo_contract_hardwood_deal c  (cost=0.00..2069.77 rows=59770 width=9) (actual time=0.003..11.829 rows=60012 loops=1694)                                                       
 Planning time: 13.484 ms                                                                                                                                                                             
 Execution time: 31762.885 ms

http://explain.depesz.com/s/V2wn

令人讨厌的是行的错误估计:

Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)

因为postgres选择嵌套循环并且查询运行大约30秒。 SET LOCAL enable_nestloop = OFF;只需一秒即可完成。

有趣的是,我有default_statistics_target = 10000(最大值),而且所有4个表都在VACUUM VERBOSE ANALYZE之前运行。

对于postgres does not gather statistic between tables,此类案例的very likely possible to happens也适用于其他联接。

如果没有外部扩展pghintplan,则无法仅针对该查询更改enable_nestloop

还有其他方法可以尝试强制使用更快速的方式来完成查询吗?

按评论更新

我不能以共同的方式消除加入。我的主要搜索是否有任何可能性更改统计(例如)包括打破正常统计外观的所需值?可能是强迫postgres改变nested loops的重量以便不经常使用它的其他方法吗?

还有人可以解释或指向文档如何postgres分析器嵌套循环的两个结果与3(完全正确)和1289(这将真正565,但实际上这样的错误不同的问题)行做出假设,结果将只是1排???我已经谈到了计划的这一部分:

           ->  Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)                                                                                       
                 ->  Index Scan using bo_party_inn_idx on bo_party party  (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)                                            
                       Index Cond: (inn = '2534005760'::text)                                                                                                                                     
                 ->  Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l  (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3) 
                       Index Cond: (bo_party_fkey = (party.id)::text)

乍一看,它看起来最初是错误的。那里使用了哪些统计数据? postgres是否还维护一些索引统计数据?

1 个答案:

答案 0 :(得分:1)

实际上,我没有好的样本数据来测试我的答案,但我认为这可能会有所帮助。

根据您的加入列,我假设以下关系基数:

1) bo_party (id 1:N bo_party_fkey) bo_labels 
2) bo_labels (bo_doc_base_fkey N:1 id) bo_document_base 
3) bo_document_base (id 1:N bo_document_fkey) bo_contract_hardwood_deal

您想要计算选择的行数。因此,根据 1) 2)中的基数,表格" bo_labels"有很多很多关系。这意味着加入" bo_party"和" bo_document_base"将不会产生比表中现有行更多的行。

但是,在加入" bo_document_base"之后,另外一次加入" bo_contract_hardwood_deal" 3)中描述的基数是一对多,可能在最终结果中产生更多行。

这样,要找到正确的行数,您可以将连接结构简化为" bo_labels"和" bo_contract_hardwood_deal"通过:

4) bo_labels (bo_doc_base_fkey 1:N bo_document_fkey) bo_contract_hardwood_deal

示例查询可以是以下之一:

SELECT COUNT(*)
FROM bo_labels L 
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey =     L.bo_doc_base_fkey)
WHERE 1=1
  and exists
  (
    select 1
    from bo_party party 
    where 1=1
      and party.id = L.bo_party_fkey
      and party.inn = '?'
  )
;

SELECT sum((select COUNT(*) from bo_contract_hardwood_deal C where C.bo_document_fkey = L.bo_doc_base_fkey)) 
FROM bo_labels L 
WHERE 1=1
  and exists
  (
    select 1
    from bo_party party 
    where 1=1
      and party.id = L.bo_party_fkey
      and party.inn = '?'
  )
;

我无法使用大型表格进行测试,因此我不确切知道它是否会提高原始查询的效果,但我认为这可能有所帮助。