Question

我面临着postgres查询的非常奇怪的行为。我有一个表“p_MyTable”，它是另一个表的分区。 “p_MyTable”有大约6亿条记录并且有索引，

CREATE INDEX idx_p_MyTable_id ON MySchema.p_MyTable USING btree(IndColumn);

当我执行以下查询时，它会立即运行并且非常快速地提供结果。这是查询和解释计划。

    explain select max(IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"Result  (cost=4.85..4.86 rows=1 width=0)"
"  InitPlan 1 (returns $0)"
"    ->  Limit  (cost=0.57..4.85 rows=1 width=8)"
"          ->  Index Scan Backward using idx_p_MyTable_id on p_MyTable ms  (cost=0.57..727648341.49 rows=169996075 width=8)"
"                Index Cond: (IndColumn IS NOT NULL)"
"                Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_96 = 'EFG'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"

它也适用于聚合函数“min”以及相同的方式。但是当我尝试其他功能时，解释计划会发生变化，查询不会在很短的时间内执行。

    explain select count(IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"Aggregate  (cost=53319339.50..53319339.51 rows=1 width=8)"
"  ->  Bitmap Heap Scan on p_MyTable ms  (cost=11209851.27..52894349.31 rows=169996075 width=8)"
"        Recheck Cond: (ent_attr_96 = 'EFG'::text)"
"        Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
"        ->  Bitmap Index Scan on p_MyTable_comp  (cost=0.00..11167352.25 rows=509988224 width=0)"
"              Index Cond: (ent_attr_96 = 'EFG'::text)"

explain select distinct (IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"HashAggregate  (cost=53319339.50..53319339.71 rows=21 width=8)"
"  Group Key: IndColumn"
"  ->  Bitmap Heap Scan on p_MyTable ms  (cost=11209851.27..52894349.31 rows=169996075 width=8)"
"        Recheck Cond: (ent_attr_96 = 'EFG'::text)"
"        Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
"        ->  Bitmap Index Scan on p_MyTable_comp  (cost=0.00..11167352.25 rows=509988224 width=0)"
"              Index Cond: (ent_attr_96 = 'EFG'::text)"

explain select avg (IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"Aggregate  (cost=53319339.50..53319339.51 rows=1 width=8)"
"  ->  Bitmap Heap Scan on p_MyTable ms  (cost=11209851.27..52894349.31 rows=169996075 width=8)"
"        Recheck Cond: (ent_attr_96 = 'EFG'::text)"
"        Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
"        ->  Bitmap Index Scan on p_MyTable_comp  (cost=0.00..11167352.25 rows=509988224 width=0)"
"              Index Cond: (ent_attr_96 = 'EFG'::text)"

请解释一下，为什么解释会完全改变，因为我认为如果max / min正确使用索引，那么它也适用于其他功能。

提前致谢。

Answer 1

索引（大部分）用于＆＃34;其中＆＃34;条件，或帮助排序（＆＃34;顺序＃34;）。（好的索引可用于许多其他事情，但对于这种情况，它将有助于将其限制为这些情况）

＆＃34; max（IndColumn）＆＃34;是误导性的，因为查询被重写为：

select IndColumn
from MySchema.p_MyTable ms 
where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date
AND IndColumn is not null 
ORDER by IndColumn desc 
LIMIT 1

因此索引用于＆＃34; AND IndColumn不为null ORDER by IndColumn desc＆＃34;。

对于您的其他查询，您需要索引＆＃34;其中＆＃34;中的列，像

这样的东西

CREATE INDEX idx_p_foo ON MySchema.p_MyTable USING btree(ent_attr_96, ent_attr_97, ent_attr_98);

可能有所帮助。假设您有许多使用这3列的查询。

您还可以添加日期。虽然我不确定这会有多大帮助，因为它只是一个结束日期，没有开始。所以不确定查询器是否会使用索引中的日期。如果确实如此，问题是按日期过滤了多少行，是否值得增加索引大小？

Postgres查询不使用索引

1 个答案: