Question

PostgreSQL 9.2

我有下表（tbl）：

-------------------------------------------------------------
| id   |  mailing_id  |  recipient_id  |  delivery_state_id |
-------------------------------------------------------------
| PK   |   integer    |     integer    |       integer      |
-------------------------------------------------------------

另外，我创建了以下索引：

CREATE INDEX idx_name
  ON tbl
  USING btree
  (recipient_id);

因为posgtresql中的索引有默认排序，我期待查询

SELECT DISTINCT recipient_id 
FROM tbl

可以避免排序步骤。但是运行

EXPLAIN ANALYZE SELECT DISTINCT recipient_id 
FROM mailing.mailing_recipient mr

告诉我它不能：

 Unique  (cost=1401370.66..1442288.31 rows=145798 width=4) (actual time=9377.410..11388.869 rows=1037472 loops=1) 
   ->  Sort  (cost=1401370.66..1421829.48 rows=8183530 width=4) (actual time=9377.408..10849.160 rows=8183160 loops=1) 
         Sort Key: recipient_id 
         Sort Method: external merge  Disk: 111968kB 
         ->  Seq Scan on tbl  (cost=0.00..126072.30 rows=8183530 width=4) (actual time=0.008..1073.771 rows=8183160 loops=1) 
 Total runtime: 11448.373 ms

正如您所看到的，仍然在排序。

问题：如何创建索引以避免排序步骤？

Answer 1

唉，这个评论太长了。

这令我感到惊讶;我希望Postgres比那更聪明。这个版本会发生什么？

SELECT recipient_id 
FROM tbl
GROUP BY recipient_id;

您使用的是什么版本的Postgres？ Postgres在9.2版本中引入了仅索引扫描（参见here），这可能解释了索引的使用不足。我可以说在9.3中将索引扫描用于distinct。

以下是关于类似查询（select distinct totalprice from orders）的9.3的解释：

Unique  (cost=0.42..5505.62 rows=2794 width=8)
  ->  Index Only Scan using idx_orders_totalprice on orders  (cost=0.42..5023.16 rows=192983 width=8)"

Answer 2

确保您的order by语句完全匹配您的索引，完全，包括查询中的NULLS LAST（或FIRST）。

如何创建索引以避免与DISTINCT一起使用的排序步骤？

2 个答案: