已修改:已添加Explain Analyze
我有下表(例如简化):
CREATE TABLE public.streamscombined
(
eventtype text COLLATE pg_catalog."default",
payload jsonb,
clienttime bigint, //as millis from epoch
)
clienttime + eventtype
索引修剪大量行时正确使用索引
正确执行以下格式的查询会使用带有clienttime
的索引来过滤大量文档。例如:
explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= 1522550900000 order by clienttime;
=&GT;
Index Scan using "clienttime/type" on streamscombined (cost=0.56..1781593.82 rows=1135725 width=583)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))
解释分析
Index Scan using "clienttime/type" on streamscombined (cost=0.56..1711616.01 rows=1079021 width=592) (actual time=1.369..13069.861 rows=1074896 loops=1)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))
Planning time: 0.208 ms
Execution time: 13369.330 ms
结果:流式传输结果我看到数据在100毫秒内传入。
当索引修剪较少的行时忽略索引
然而,如果在放宽clienttime
条件时完全失控,例如(增加3小时):
explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= (1522540900000 + (3*3600*1000)) order by clienttime;
=&GT;
Gather Merge (cost=2897003.10..3192254.78 rows=2530552 width=583)
Workers Planned: 2
-> Sort (cost=2896003.07..2899166.26 rows=1265276 width=583)
Sort Key: clienttime
-> Parallel Seq Scan on streamscombined (cost=0.00..2110404.89 rows=1265276 width=583)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))
解释分析
Gather Merge (cost=2918263.39..3193771.83 rows=2361336 width=592) (actual time=72505.138..75142.127 rows=2852704 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=2917263.37..2920215.04 rows=1180668 width=592) (actual time=70764.052..71430.200 rows=950901 loops=3)
Sort Key: clienttime
Sort Method: external merge Disk: 722336kB
-> Parallel Seq Scan on streamscombined (cost=0.00..2176719.08 rows=1180668 width=592) (actual time=0.451..57458.888 rows=950901 loops=3)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))
Rows Removed by Filter: 7736119
Planning time: 0.109 ms
Execution time: 76164.816 ms
结果:流媒体搜索结果我等待&gt; 5分钟没有任何结果。
这可能是因为PG认为索引不会修剪结果集那么多,所以它会使用不同的策略。
然而,这是关键,它完全似乎忽略了我想按clienttime
订购的事实,索引正在免费提供给我。
有没有办法强制PG使用独立于clienttime
条件的实际值的索引?
答案 0 :(得分:0)
排序结果很便宜,索引扫描很昂贵,因为它可以查找很多磁盘。
较低的ramdom_page_cost设置会降低索引扫描的成本估算值,从而导致索引扫描用于更大的结果集。