我有简单的查询(Postgres 9.4):
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
bo_labels L
LEFT JOIN bo_party party ON (party.id = L.bo_party_fkey)
LEFT JOIN bo_document_base D ON (D.id = L.bo_doc_base_fkey)
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = D.id)
WHERE
party.inn = '?'
解释如下:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2385.30..2385.30 rows=1 width=0) (actual time=31762.367..31762.367 rows=1 loops=1)
-> Nested Loop Left Join (cost=1.28..2385.30 rows=1 width=0) (actual time=7.621..31760.776 rows=1694 loops=1)
Join Filter: ((c.bo_document_fkey)::text = (d.id)::text)
Rows Removed by Join Filter: 101658634
-> Nested Loop Left Join (cost=1.28..106.33 rows=1 width=10) (actual time=0.110..54.635 rows=1694 loops=1)
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (party.id)::text)
Heap Fetches: 0
-> Index Only Scan using bo_document_pkey on bo_document_base d (cost=0.43..0.64 rows=1 width=10) (actual time=0.022..0.025 rows=1 loops=1694)
Index Cond: (id = (l.bo_doc_base_fkey)::text)
Heap Fetches: 1134
-> Seq Scan on bo_contract_hardwood_deal c (cost=0.00..2069.77 rows=59770 width=9) (actual time=0.003..11.829 rows=60012 loops=1694)
Planning time: 13.484 ms
Execution time: 31762.885 ms
http://explain.depesz.com/s/V2wn
令人讨厌的是行的错误估计:
Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
因为postgres选择嵌套循环并且查询运行大约30秒。
SET LOCAL enable_nestloop = OFF;
只需一秒即可完成。
有趣的是,我有default_statistics_target
= 10000(最大值),而且所有4个表都在VACUUM VERBOSE ANALYZE
之前运行。
对于postgres does not gather statistic between tables,此类案例的very likely possible to happens也适用于其他联接。
如果没有外部扩展pghintplan,则无法仅针对该查询更改enable_nestloop
。
还有其他方法可以尝试强制使用更快速的方式来完成查询吗?
我不能以共同的方式消除加入。我的主要搜索是否有任何可能性更改统计(例如)包括打破正常统计外观的所需值?可能是强迫postgres改变nested loops
的重量以便不经常使用它的其他方法吗?
还有人可以解释或指向文档如何postgres分析器嵌套循环的两个结果与3(完全正确)和1289(这将真正565,但实际上这样的错误不同的问题)行做出假设,结果将只是1排???我已经谈到了计划的这一部分:
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (party.id)::text)
乍一看,它看起来最初是错误的。那里使用了哪些统计数据? postgres是否还维护一些索引统计数据?
答案 0 :(得分:1)
实际上,我没有好的样本数据来测试我的答案,但我认为这可能会有所帮助。
根据您的加入列,我假设以下关系基数:
1) bo_party (id 1:N bo_party_fkey) bo_labels 2) bo_labels (bo_doc_base_fkey N:1 id) bo_document_base 3) bo_document_base (id 1:N bo_document_fkey) bo_contract_hardwood_deal
您想要计算选择的行数。因此,根据 1)和 2)中的基数,表格" bo_labels"有很多很多关系。这意味着加入" bo_party"和" bo_document_base"将不会产生比表中现有行更多的行。
但是,在加入" bo_document_base"之后,另外一次加入" bo_contract_hardwood_deal" 3)中描述的基数是一对多,可能在最终结果中产生更多行。
这样,要找到正确的行数,您可以将连接结构简化为" bo_labels"和" bo_contract_hardwood_deal"通过:
4) bo_labels (bo_doc_base_fkey 1:N bo_document_fkey) bo_contract_hardwood_deal
示例查询可以是以下之一:
SELECT COUNT(*)
FROM bo_labels L
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = L.bo_doc_base_fkey)
WHERE 1=1
and exists
(
select 1
from bo_party party
where 1=1
and party.id = L.bo_party_fkey
and party.inn = '?'
)
;
或
SELECT sum((select COUNT(*) from bo_contract_hardwood_deal C where C.bo_document_fkey = L.bo_doc_base_fkey))
FROM bo_labels L
WHERE 1=1
and exists
(
select 1
from bo_party party
where 1=1
and party.id = L.bo_party_fkey
and party.inn = '?'
)
;
我无法使用大型表格进行测试,因此我不确切知道它是否会提高原始查询的效果,但我认为这可能有所帮助。