强制PG使用带有时间范围的索引。适用于小型结果集,不适用于较大的集合

时间:2018-04-01 15:16:49

标签: postgresql

已修改:已添加Explain Analyze

我有下表(例如简化):

CREATE TABLE public.streamscombined
(
    eventtype text COLLATE pg_catalog."default",
    payload jsonb,
    clienttime bigint, //as millis from epoch
)

clienttime + eventtype

上的b树复合索引

索引修剪大量行时正确使用索引

正确执行以下格式的查询会使用带有clienttime的索引来过滤大量文档。例如:

explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= 1522550900000 order by clienttime;

=&GT;

Index Scan using "clienttime/type" on streamscombined  (cost=0.56..1781593.82 rows=1135725 width=583)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))

解释分析

Index Scan using "clienttime/type" on streamscombined (cost=0.56..1711616.01 rows=1079021 width=592) (actual time=1.369..13069.861 rows=1074896 loops=1) Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text)) Planning time: 0.208 ms Execution time: 13369.330 ms

结果:流式传输结果我看到数据在100毫秒内传入。

当索引修剪较少的行时忽略索引

然而,如果在放宽clienttime条件时完全失控,例如(增加3小时):

explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= (1522540900000 + (3*3600*1000)) order by clienttime;

=&GT;

Gather Merge  (cost=2897003.10..3192254.78 rows=2530552 width=583)
Workers Planned: 2
->  Sort  (cost=2896003.07..2899166.26 rows=1265276 width=583)
Sort Key: clienttime
->  Parallel Seq Scan on streamscombined  (cost=0.00..2110404.89 rows=1265276 width=583)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))

解释分析

Gather Merge (cost=2918263.39..3193771.83 rows=2361336 width=592) (actual time=72505.138..75142.127 rows=2852704 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=2917263.37..2920215.04 rows=1180668 width=592) (actual time=70764.052..71430.200 rows=950901 loops=3) Sort Key: clienttime Sort Method: external merge Disk: 722336kB -> Parallel Seq Scan on streamscombined (cost=0.00..2176719.08 rows=1180668 width=592) (actual time=0.451..57458.888 rows=950901 loops=3) Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text)) Rows Removed by Filter: 7736119 Planning time: 0.109 ms Execution time: 76164.816 ms

结果:流媒体搜索结果我等待&gt; 5分钟没有任何结果。

这可能是因为PG认为索引不会修剪结果集那么多,所以它会使用不同的策略。

然而,这是关键,它完全似乎忽略了我想按clienttime订购的事实,索引正在免费提供给我。

有没有办法强制PG使用独立于clienttime条件的实际值的索引?

1 个答案:

答案 0 :(得分:0)

排序结果很便宜,索引扫描很昂贵,因为它可以查找很多磁盘。

较低的ramdom_page_cost设置会降低索引扫描的成本估算值,从而导致索引扫描用于更大的结果集。