Question

我这里有一个查询小问题。

SELECT DISTINCT ON ("reporting_processedamazonsnapshot"."offer_id") *
FROM "reporting_processedamazonsnapshot" INNER JOIN 
     "offers_boooffer"
        ON ("reporting_processedamazonsnapshot"."offer_id" =
            "offers_boooffer"."id") INNER JOIN
     "offers_offersettings"
        ON ("offers_boooffer"."id" = "offers_offersettings"."offer_id")
WHERE "offers_offersettings"."account_id" = 20
ORDER BY "reporting_processedamazonsnapshot"."offer_id" ASC,
         "reporting_processedamazonsnapshot"."scraping_date" DESC

我在latest_scraping上有一个名为offer_id ASC, scraping_date DESC的索引，但是由于某种原因，PostgreSQL使用索引后仍在进行排序，从而导致巨大的性能问题。

我不明白为什么它不使用已排序的数据而不是重做排序。我的索引不对吗？还是应该尝试以其他方式进行查询？

这是解释加上实际数据

'Unique  (cost=21260.47..21263.06 rows=519 width=1288) (actual time=38053.685..38177.348 rows=1783 loops=1)'
'  ->  Sort  (cost=21260.47..21261.76 rows=519 width=1288) (actual time=38053.683..38161.478 rows=153095 loops=1)'
'        Sort Key: reporting_processedamazonsnapshot.offer_id, reporting_processedamazonsnapshot.scraping_date DESC'
'        Sort Method: external merge  Disk: 162088kB'
'        ->  Nested Loop  (cost=41.90..21237.06 rows=519 width=1288) (actual time=70.874..36148.348 rows=153095 loops=1)'
'              ->  Nested Loop  (cost=41.47..17547.90 rows=1627 width=8) (actual time=54.287..126.740 rows=1784 loops=1)'
'                    ->  Bitmap Heap Scan on offers_offersettings  (cost=41.04..4823.48 rows=1627 width=4) (actual time=52.532..84.102 rows=1784 loops=1)'
'                          Recheck Cond: (account_id = 20)'
'                          Heap Blocks: exact=38'
'                          ->  Bitmap Index Scan on offers_offersettings_account_id_fff7a8c0  (cost=0.00..40.63 rows=1627 width=0) (actual time=49.886..49.886 rows=4132 loops=1)'
'                                Index Cond: (account_id = 20)'
'                    ->  Index Only Scan using offers_boooffer_pkey on offers_boooffer  (cost=0.43..7.81 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1784)'
'                          Index Cond: (id = offers_offersettings.offer_id)'
'                          Heap Fetches: 1784'
'              ->  Index Scan using latest_scraping on reporting_processedamazonsnapshot  (cost=0.43..1.69 rows=58 width=1288) (actual time=0.526..20.146 rows=86 loops=1784)'
'                    Index Cond: (offer_id = offers_boooffer.id)'
'Planning time: 187.133 ms'
'Execution time: 38195.266 ms'

Answer 1

要使用索引避免排序，PostgreSQL必须首先按索引顺序扫描全部 "reporting_processedamazonsnapshot"，然后加入全部 {{1 }} 使用嵌套循环连接（这样可以保留顺序），然后加入所有 "offers_boooffer"，再次使用嵌套循环连接< / strong>。

最后，所有不符合条件"offers_offersettings"的行都将被丢弃。

PostgreSQL（我认为是正确的）认为，使用条件尽可能多地减少行数，然后使用最高效的join方法联接表，然后对{{ 1}}子句。

我想知道以下查询是否会更快：

"offers_offersettings"."account_id" = 20

执行计划将是相似的，只是从索引中扫描的行更少，这将减少最需要执行的时间。

如果要加快排序速度，请将该查询的DISTINCT增加到大约500MB（如果可以承受的话）。

PostgreSQL通过查询优化来区分+顺序

1 个答案: