我首先确保规划人员更新了统计数据:
my_db=> vacuum analyze;
VACUUM
Time: 1401.958 ms
仅选择foos.bar_id
时,查询在该列上仅使用“仅索引扫描”执行正常:
my_db=> EXPLAIN ANALYZE SELECT foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
QUERY PLAN
Nested Loop (cost=0.43..16203.46 rows=353198 width=4) (actual time=0.045..114.746 rows=196205 loops=1)
-> Seq Scan on bar_ids (cost=0.00..16.71 rows=871 width=4) (actual time=0.005..0.195 rows=871 loops=1)
-> Index Only Scan using index_foos_on_bar_id on foos (cost=0.43..14.80 rows=378 width=4) (actual time=0.003..0.055 rows=225 loops=871)
Index Cond: (bar_id = bar_ids.id)
Heap Fetches: 0
Planning time: 0.209 ms
Execution time: 144.364 ms
(7 rows)
Time: 145.620 ms
但是,添加foos.id
会导致查询选择极慢的Seq Scan:
my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
QUERY PLAN
Hash Join (cost=27.60..221339.63 rows=353198 width=8) (actual time=294.091..3341.926 rows=196205 loops=1)
Hash Cond: (foos.bar_id = bar_ids.id)
-> Seq Scan on foos (cost=0.00..182314.70 rows=7093070 width=8) (actual time=0.004..1855.900 rows=7111807 loops=1)
-> Hash (cost=16.71..16.71 rows=871 width=4) (actual time=0.454..0.454 rows=866 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 39kB
-> Seq Scan on bar_ids (cost=0.00..16.71 rows=871 width=4) (actual time=0.002..0.222 rows=871 loops=1)
Planning time: 0.237 ms
Execution time: 3371.622 ms
(8 rows)
Time: 3373.150 ms
禁用Seq Scan会在同一索引上强制进行索引扫描,这比Seq Scan快一个数量级:
my_db=> set enable_seqscan=false;
SET
Time: 0.801 ms
my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
QUERY PLAN
Nested Loop (cost=10000000000.43..10000439554.99 rows=353198 width=8) (actual time=0.028..171.632 rows=196205 loops=1)
-> Seq Scan on bar_ids (cost=10000000000.00..10000000016.71 rows=871 width=4) (actual time=0.005..0.212 rows=871 loops=1)
-> Index Scan using index_foos_on_bar_id on foos (cost=0.43..500.86 rows=378 width=8) (actual time=0.003..0.118 rows=225 loops=871)
Index Cond: (bar_id = bar_ids.id)
Planning time: 0.186 ms
Execution time: 201.958 ms
(6 rows)
Time: 203.185 ms
其他答案说,糟糕的计划是由于糟糕的统计数据。我的统计数据是最新的。是什么给了什么?
bar_ids
是一个临时表,可能与上一个查询(Seq Scan on bar_ids (cost=10000000000.00..10000000016.71
)中的疯狂成本估算有关,但显式运行ANALYZE bar_ids
并不会更改查询计划。
答案 0 :(得分:2)
在这里对OP的评论进行跟进。
在第一个查询的情况下,当您只选择foos.bar_id
时,执行程序能够通过仅索引扫描来实现这一点,这是非常好的。然而,将另一列(未在索引中涵盖)添加到选择列表意味着继续使用此索引意味着典型的“双读”情况,我们首先读取索引页然后读取表页以获取剩余的列的值,这意味着可能存在相当多的随机IO。
设置random_page_cost
应该相对于seq_page_cost
来表示(松散地)随机IO比顺序IO(random_page_cost
)贵seq_page_cost=1
倍。对于现代驱动器,随机IO并不昂贵,因此降低random_page_cost
可以使索引扫描更加可取。为此找到“最佳”值非常棘手,但从2开始是一个不错的经验法则。