Question

我已着手尝试优化具有3个嵌套子查询（如俄罗斯玩偶）的相当大的查询。查询本身是由south从Django项目生成的，我自由地承认我不是SQL优化方面的专家。到目前为止，我的策略是从最内层的查询开始，然后向外工作。

因此，第一个和最内部的查询是

SELECT
  DISTINCT ON (quote_id) quote_id,
  MAX(created_at) AS max_created_at
FROM billing_pricequotestatus
GROUP BY quote_id, created_at
ORDER BY quote_id, created_at DESC;

以上EXPLAIN ANALYZE是

 Unique  (cost=535905.10..610867.38 rows=3331657 width=12) (actual time=4364.469..7587.242 rows=1462625 loops=1)
   ->  GroupAggregate  (cost=535905.10..602538.24 rows=3331657 width=12) (actual time=4364.467..6996.550 rows=3331656 loops=1)
         Group Key: quote_id, created_at
         ->  Sort  (cost=535905.10..544234.24 rows=3331657 width=12) (actual time=4364.460..5574.351 rows=3331657 loops=1)
               Sort Key: quote_id, created_at
               Sort Method: external merge  Disk: 84648kB
               ->  Seq Scan on billing_pricequotestatus  (cost=0.00..61080.57 rows=3331657 width=12) (actual time=0.013..854.722 rows=3331657 loops=1)
 Planning time: 0.107 ms
 Execution time: 7759.317 ms
(9 rows)

表结构是

                                    Table "public.billing_pricequotestatus"
   Column   |           Type           |                               Modifiers
------------+--------------------------+-----------------------------------------------------------------------
 id         | integer                  | not null default nextval('billing_pricequotestatus_id_seq'::regclass)
 created_at | timestamp with time zone | not null
 updated_at | timestamp with time zone | not null
 notes      | text                     | not null
 name       | character varying(20)    | not null
 quote_id   | integer                  | not null
Indexes:
    "billing_pricequotestatus_pkey" PRIMARY KEY, btree (id)
    "billing_pricequotestatus_quote_id" btree (quote_id)
    "status_timestamp_idx" btree (quote_id, created_at)
Foreign-key constraints:
    "quote_id_refs_id_2b0d5331de8d31b7" FOREIGN KEY (quote_id) REFERENCES billing_pricequote(id) DEFERRABLE INITIALLY DEFERRED

我已经尝试了http://explain.depesz.com/，但我并不完全确定如何从报告中获取后续步骤。我还发现了一篇文章，建议如果ORDER BY将按顺序返回行，SELECT子句可以删除，我 认为 < / strong>可能就是这里的情况？不确定如何辨别。

如果我删除ORDER BY子句，那会削弱~3410 ms，但我觉得这应该更快（如果我只做一个没有聚合函数的SELECT，{{1或者排序，我的基准时间似乎是DISTINCT）。我已经看到其他几个关于10x表格的SO帖子，我用相应的索引获得了3-5倍的性能。我知道它总是不是苹果对苹果的比较，但总是希望有一些见解。

Answer 1

令人困惑的是，您创建了一个独特的SELECT quote_id, MAX(created_at) AS max_created_at FROM billing_pricequotestatus GROUP BY quote_id列表，并执行2016-03-04 14:35:53加{{1}}？

这应该返回相同的结果：

{{1}}

Answer 2

对于同一件事，它看起来像是两种不同解决方案的混合：为每个不同的quote_id获取最大的created_at。

1）

SELECT
  quote_id,
  MAX(created_at) AS max_created_at
FROM billing_pricequotestatus
GROUP BY quote_id

2）

SELECT
  distinct on (quote_id) quote_id,
  created_at
FROM billing_pricequotestatus
ORDER BY quote_id, created_at DESC

查询生产者可能有问题。

查询优化可能性？

2 个答案: