Question

我有一个大约有1000万行的表和一个日期字段的索引。当我尝试提取索引字段的唯一值时，即使结果集只有26个项目，Postgres也会运行顺序扫描。为什么优化人员会选择这个计划？我能做些什么来避免它？

explain select "labelDate" from pages group by "labelDate";
                              QUERY PLAN
-----------------------------------------------------------------------
 HashAggregate  (cost=524616.78..524617.04 rows=26 width=4)
   Group Key: "labelDate"
   ->  Seq Scan on pages  (cost=0.00..499082.42 rows=10213742 width=4)
(3 rows)

Answer 1

I think your problem here is that the query planner wants to read the whole table because you have a GROUP BY clause even though you do not use any aggregate function. It therefore looks similar to the issue of "Why is count(*) so slow" which you will find in many forms in postgresql questions.

In your case, the query is a bit odd. Your question is answered with this simple query:

SELECT DISTINCT "labelDate" FROM pages;

Postgres是执行顺序扫描而不是索引扫描

1 个答案: