Postgres:为什么LEFT JOIN会影响查询计划?

时间:2017-11-22 10:31:59

标签: sql postgresql sql-execution-plan

我有PostgreSQL 9.5.9和两个表:table1和table2

 Column   |              Type              |                 Modifiers                 
------------+--------------------------------+-------------------------------------------
 id         | integer                        | not null
 status     | integer                        | not null
 table2_id  | integer                        | 
 start_date | timestamp(0) without time zone | default NULL::timestamp without time zone
Indexes:
    "table1_pkey" PRIMARY KEY, btree (id)
    "table1_start_date" btree (start_date)
    "table1_table2" btree (table2_id)
Foreign-key constraints:
    "fk_t1_t2" FOREIGN KEY (table2_id) REFERENCES table2(id)


 Column |          Type           |            Modifiers            
--------+-------------------------+---------------------------------
 id     | integer                 | not null
 name   | character varying(2000) | default NULL::character varying
Indexes:
    "table2_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "table1" CONSTRAINT "fk_t1_t2" FOREIGN KEY (table2_id) REFERENCES table2(id)

table2 只包含3行; table1 包含大约400000行,其中只有一半在 table_2_id 列中有一些值。

当我从 start_date 列中选择 table1 中的某些值时,查询速度足够快,因为 table1_start_date 索引被有效使用:

SELECT t1.*
FROM table1 AS t1 
ORDER BY t1.start_date DESC
LIMIT 25 OFFSET 150000;

EXPLAIN ANALYZE 结果

   Limit  (cost=7797.40..7798.70 rows=25 width=20) (actual time=40.994..41.006 rows=25 loops=1)
   ->  Index Scan Backward using table1_start_date on table1 t1  (cost=0.42..20439.74 rows=393216 width=20) (actual time=0.078..36.251 rows=150025
 loops=1)
 Planning time: 0.097 ms
 Execution time: 41.033 ms

但是当我添加LEFT JOIN以从 table2 获取值时,查询变得非常慢:

SELECT t1.*, t2.*
FROM table1 AS t1
LEFT JOIN table2 AS t2 ON t2.id = t1.table2_id
ORDER BY t1.start_date DESC
LIMIT 25 OFFSET 150000;

EXPLAIN ANALYZE 结果:

 Limit  (cost=33690.80..33696.42 rows=25 width=540) (actual time=191.282..191.320 rows=25 loops=1)
   ->  Nested Loop Left Join  (cost=0.57..88317.50 rows=393216 width=540) (actual time=0.028..184.537 rows=150025 loops=1)
         ->  Index Scan Backward using table1_start_date on table1 t1  (cost=0.42..20439.74 rows=393216 width=20) (actual time=0.018..35.196 rows=
150025 loops=1)
         ->  Index Scan using table2_pkey on table2 t2  (cost=0.14..0.16 rows=1 width=520) (actual time=0.000..0.001 rows=1 loops=150025)
               Index Cond: (id = t1.table2_id)
 Planning time: 0.210 ms
 Execution time: 191.357 ms

为什么查询时间从32ms增加到191ms?据我了解, LEFT JOIN 不会影响结果。因此,我们可以先从 table1 (LIMIT 25)中选择25行,然后从 table2 中加入行。查询的执行时间不应该显着增加。没有一些棘手的条件可以打破索引等的使用。

我完全不理解第二次查询的 EXPLAIN ANALYZE ,但似乎postgres分析师决定"执行连接然后过滤"而不是"过滤然后加入"。这样查询太慢了。有什么问题?

1 个答案:

答案 0 :(得分:2)

它只是不知道该限制应该适用于table1而不是加入的结果,因此它获取所需的最小行,即 150025 ,然后在{{{1500}上执行150025循环1}}。如果你选择table2加上table1的限制并加入table2到该子选择,你应该得到你想要的。

SELECT t1.*, t2.*
FROM (SELECT *
        FROM table1
       ORDER BY start_date DESC
       LIMIT 25 OFFSET 150000) AS t1
LEFT JOIN table2 AS t2 ON t2.id = t1.table2_id;