我面临着postgres查询的非常奇怪的行为。我有一个表“p_MyTable”,它是另一个表的分区。 “p_MyTable”有大约6亿条记录并且有索引,
CREATE INDEX idx_p_MyTable_id ON MySchema.p_MyTable USING btree(IndColumn);
当我执行以下查询时,它会立即运行并且非常快速地提供结果。这是查询和解释计划。
explain select max(IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date
"Result (cost=4.85..4.86 rows=1 width=0)"
" InitPlan 1 (returns $0)"
" -> Limit (cost=0.57..4.85 rows=1 width=8)"
" -> Index Scan Backward using idx_p_MyTable_id on p_MyTable ms (cost=0.57..727648341.49 rows=169996075 width=8)"
" Index Cond: (IndColumn IS NOT NULL)"
" Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_96 = 'EFG'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
它也适用于聚合函数“min”以及相同的方式。但是当我尝试其他功能时,解释计划会发生变化,查询不会在很短的时间内执行。
explain select count(IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date
"Aggregate (cost=53319339.50..53319339.51 rows=1 width=8)"
" -> Bitmap Heap Scan on p_MyTable ms (cost=11209851.27..52894349.31 rows=169996075 width=8)"
" Recheck Cond: (ent_attr_96 = 'EFG'::text)"
" Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
" -> Bitmap Index Scan on p_MyTable_comp (cost=0.00..11167352.25 rows=509988224 width=0)"
" Index Cond: (ent_attr_96 = 'EFG'::text)"
explain select distinct (IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date
"HashAggregate (cost=53319339.50..53319339.71 rows=21 width=8)"
" Group Key: IndColumn"
" -> Bitmap Heap Scan on p_MyTable ms (cost=11209851.27..52894349.31 rows=169996075 width=8)"
" Recheck Cond: (ent_attr_96 = 'EFG'::text)"
" Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
" -> Bitmap Index Scan on p_MyTable_comp (cost=0.00..11167352.25 rows=509988224 width=0)"
" Index Cond: (ent_attr_96 = 'EFG'::text)"
explain select avg (IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date
"Aggregate (cost=53319339.50..53319339.51 rows=1 width=8)"
" -> Bitmap Heap Scan on p_MyTable ms (cost=11209851.27..52894349.31 rows=169996075 width=8)"
" Recheck Cond: (ent_attr_96 = 'EFG'::text)"
" Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
" -> Bitmap Index Scan on p_MyTable_comp (cost=0.00..11167352.25 rows=509988224 width=0)"
" Index Cond: (ent_attr_96 = 'EFG'::text)"
请解释一下,为什么解释会完全改变,因为我认为如果max / min正确使用索引,那么它也适用于其他功能。
提前致谢。
答案 0 :(得分:0)
索引(大部分)用于&#34;其中&#34;条件,或帮助排序(&#34;顺序#34;)。 (好的索引可用于许多其他事情,但对于这种情况,它将有助于将其限制为这些情况)
&#34; max(IndColumn)&#34;是误导性的,因为查询被重写为:
select IndColumn
from MySchema.p_MyTable ms
where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date
AND IndColumn is not null
ORDER by IndColumn desc
LIMIT 1
因此索引用于&#34; AND IndColumn不为null ORDER by IndColumn desc&#34;。
对于您的其他查询,您需要索引&#34;其中&#34;中的列, 像
这样的东西CREATE INDEX idx_p_foo ON MySchema.p_MyTable USING btree(ent_attr_96, ent_attr_97, ent_attr_98);
可能有所帮助。 假设您有许多使用这3列的查询。
您还可以添加日期。虽然我不确定这会有多大帮助,因为它只是一个结束日期,没有开始。 所以不确定查询器是否会使用索引中的日期。如果确实如此,问题是按日期过滤了多少行,是否值得增加索引大小?