Question

我有点困惑。

这是我的（简体）查询：

SELECT *
from (SELECT documents.*,
             (SELECT max(date)
              FROM registrations
              WHERE registrations.document_id = documents.id) AS register_date
      FROM documents) AS dcmnts
ORDER BY register_date
LIMIT 20;

这是我的EXPLAIN ANALYSE结果：

Limit  (cost=46697025.51..46697025.56 rows=20 width=193) (actual time=80329.201..80329.206 rows=20 loops=1)
  ->  Sort  (cost=46697025.51..46724804.61 rows=11111641 width=193) (actual time=80329.199..80329.202 rows=20 loops=1)
        Sort Key: ((SubPlan 1))
        Sort Method: top-N heapsort  Memory: 29kB
        ->  Seq Scan on documents  (cost=0.00..46401348.74 rows=11111641 width=193) (actual time=0.061..73275.304 rows=11114254 loops=1)
              SubPlan 1
                ->  Aggregate  (cost=3.95..4.05 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=11114254)
                      ->  Index Scan using registrations_document_id_index on registrations  (cost=0.43..3.95 rows=2 width=4) (actual time=0.004..0.004 rows=1 loops=11114254)
                            Index Cond: (document_id = documents.id)
Planning Time: 0.334 ms
Execution Time: 80329.287 ms

查询需要1m 20s来执行，是否有任何优化方法？这些表（documents:11114642;registrations:13176070）中有很多行。

在实际的完整查询中，我还有更多的过滤器，执行过程最多需要4秒钟，而且仍然太慢。子查询orderby似乎是这里的瓶颈，我无法找出优化方法。

我试图在date / document_id列上设置索引。

Answer 1

不要使用标量子查询：

SELECT documents.*,
       reg.register_date
FROM documents
JOIN (
  SELECT document_id, max(date) as register_date
  FROM registrations
  GROUP BY document_id
) reg on reg.document_id = documents.id;
ORDER BY register_date
LIMIT 20;

Answer 2

尝试取消嵌套查询

该查询至少应由SELECT documents.id, documents.other_attr, max(registrations.date) register_date FROM documents JOIN registrations ON registrations.document_id = documents.id GROUP BY documents.id, documents.other_attr ORDER BY 2 LIMIT 20上的索引支持：

registrations(document_id, date)

Answer 3

在实际的完整查询中，我还有更多的过滤器，执行过程最多需要4秒钟，而且仍然太慢。

然后询问该查询。对于看不到的查询，我们能说些什么？显然，除了在完成所有工作之后将内容过滤掉之外，其他查询与该查询不一样，因为这样查询（比由于缓存热而导致的查询）快得多了。它在做一些不同的事情，必须进行不同的优化。

此子查询orderby似乎是这里的瓶颈，我不知道优化它的方法。

排序节点的时间包括其之前所有工作的时间，因此实际排序的时间为80329.206-73275.304 = 7秒，这可能很长，但只占总时间的一小部分。（从输出本身来看，这种解释不是很明显，而是从经验中得出的。）

对于您确实显示给我们的查询，您可以通过使用相当复杂的结构来使其变得非常快，但仅在概率上是正确的。

with t as (select date, document_id from registrations 
    order by date desc, document_id desc limit 200), 
t2 as (select distinct on (document_id) document_id, date from t 
    order by document_id, date desc), 
t3 as ( select document_id, date from t2 order by date desc limit 20)
SELECT documents.*,
   t3.date as register_date
FROM documents join t3 on t3.document_id = documents.id;
order by register_date

它将得到有效的支持：

create index on registrations (register_date, document_id);
create index on documents(id);

这里的想法是最近200个注册中至少有20个不同的document_id。当然，没有办法确定这是正确的，因此您可能必须增加200到20000（与您当前正在执行的操作相比，这应该还是非常快的），或者甚至更多以确保获得正确答案。这也假定每个不同的document_id都与一个document.id完全匹配。

按子查询结果排序速度太慢了

3 个答案: