Question

已修改：已添加Explain Analyze

我有下表（例如简化）：

CREATE TABLE public.streamscombined
(
    eventtype text COLLATE pg_catalog."default",
    payload jsonb,
    clienttime bigint, //as millis from epoch
)

clienttime + eventtype

上的b树复合索引

索引修剪大量行时正确使用索引

正确执行以下格式的查询会使用带有clienttime的索引来过滤大量文档。例如：

explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= 1522550900000 order by clienttime;

=＆GT;

Index Scan using "clienttime/type" on streamscombined  (cost=0.56..1781593.82 rows=1135725 width=583)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))

解释分析

Index Scan using "clienttime/type" on streamscombined (cost=0.56..1711616.01 rows=1079021 width=592) (actual time=1.369..13069.861 rows=1074896 loops=1) Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text)) Planning time: 0.208 ms Execution time: 13369.330 ms

结果：流式传输结果我看到数据在100毫秒内传入。

当索引修剪较少的行时忽略索引

然而，如果在放宽clienttime条件时完全失控，例如（增加3小时）：

explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= (1522540900000 + (3*3600*1000)) order by clienttime;

=＆GT;

Gather Merge  (cost=2897003.10..3192254.78 rows=2530552 width=583)
Workers Planned: 2
->  Sort  (cost=2896003.07..2899166.26 rows=1265276 width=583)
Sort Key: clienttime
->  Parallel Seq Scan on streamscombined  (cost=0.00..2110404.89 rows=1265276 width=583)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))

解释分析

Gather Merge (cost=2918263.39..3193771.83 rows=2361336 width=592) (actual time=72505.138..75142.127 rows=2852704 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=2917263.37..2920215.04 rows=1180668 width=592) (actual time=70764.052..71430.200 rows=950901 loops=3) Sort Key: clienttime Sort Method: external merge Disk: 722336kB -> Parallel Seq Scan on streamscombined (cost=0.00..2176719.08 rows=1180668 width=592) (actual time=0.451..57458.888 rows=950901 loops=3) Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text)) Rows Removed by Filter: 7736119 Planning time: 0.109 ms Execution time: 76164.816 ms

结果：流媒体搜索结果我等待＆gt; 5分钟没有任何结果。

这可能是因为PG认为索引不会修剪结果集那么多，所以它会使用不同的策略。

然而，这是关键，它完全似乎忽略了我想按clienttime订购的事实，索引正在免费提供给我。

有没有办法强制PG使用独立于clienttime条件的实际值的索引？

Answer 1

排序结果很便宜，索引扫描很昂贵，因为它可以查找很多磁盘。

较低的ramdom_page_cost设置会降低索引扫描的成本估算值，从而导致索引扫描用于更大的结果集。

强制PG使用带有时间范围的索引。适用于小型结果集，不适用于较大的集合

1 个答案: