手动将查询拆分为两个查询可将执行时间缩短90%。如何强制查询优化器做正确的事情?

时间:2016-08-22 21:55:53

标签: sql postgresql postgresql-9.4

我有两个查询在结果中完全相同。

方法1:在一个查询中执行所有操作(需要235秒)

with stage1 as(
   ... BLAH BLAH BLAH...
)
SELECT ... FROM stage1 ....

方法2:在mytemptable中保存中间数据(总共大约18秒)

-- query 2a
CREATE TABLE mytemptable AS 
...BLAH BLAH BLAH...                   -- takes about 2 seconds

-- query 2b
SELECT ... FROM mytemptable ...        -- takes about 16 seconds

方法2似乎 WAY 更快,因为它使用嵌套循环,seq扫描,索引扫描而不是某些合并连接方法。是否有一种理智的方法来强制查询1上更快的查询计划? (我不需要保存临时表......)

- 更新1 -

正确的做法是使用嵌套循环,而合并连接速度非常慢。

通过添加set local enable_mergejoin=false;,我可以通过单个查询获得理智的行为,但我想知道是否有更好的方式?

原始查询的完整来源:

方法1

EXPLAIN analyze
with stage1 as(
SELECT DISTINCT permno, d.date, d.date/100 as yyyymm
    FROM eb2.msk_bt_datesandstuff
    JOIN mycrsp.trading_dates d ON (rdq_i <= d.i and d.i <= rdq_i + 2) OR (fdate_i <= d.i and d.i <= fdate_i + 2)
    WHERE repm_shares_pp > 0
)

SELECT yyyymm, AVG(ret)
FROM stage1 t
JOIN q_stock.dsf dsf ON dsf.permno = t.permno AND dsf.date = t.date
GROUP BY yyyymm
ORDER BY yyyymm

注意q_stock.dsf上有一个主键(permno,date),我已完成analyze q_stock.dsf;

解释查询1的分析结果

Sort  (cost=66540412.41..66540412.91 rows=200 width=12) (actual time=235793.281..235793.310 rows=135 loops=1)
  Sort Key: t.yyyymm
  Sort Method: quicksort  Memory: 31kB
  CTE stage1
    ->  Unique  (cost=45289896.52..47173580.31 rows=39876540 width=12) (actual time=1612.561..1914.188 rows=163223 loops=1)
          ->  Sort  (cost=45289896.52..45760817.47 rows=188368379 width=12) (actual time=1612.559..1817.441 rows=349028 loops=1)
                Sort Key: msk_bt_datesandstuff.permno, d.date, ((d.date / 100))
                Sort Method: external merge  Disk: 8856kB
                ->  Nested Loop  (cost=31.39..9742058.04 rows=188368379 width=12) (actual time=3.102..894.148 rows=349028 loops=1)
                      ->  Seq Scan on msk_bt_datesandstuff  (cost=0.00..2080.69 rows=67809 width=16) (actual time=2.179..50.339 rows=67815 loops=1)
                            Filter: (repm_shares_pp > 0::double precision)
                      ->  Bitmap Heap Scan on trading_dates d  (cost=31.39..108.91 rows=2778 width=8) (actual time=0.008..0.009 rows=5 loops=67815)
                            Recheck Cond: (((msk_bt_datesandstuff.rdq_i <= i) AND (i <= (msk_bt_datesandstuff.rdq_i + 2))) OR ((msk_bt_datesandstuff.fdate_i <= i) AND (i <= (msk_bt_datesandstuff.fdate_i + 2))))
                            Heap Blocks: exact=72014
                            ->  BitmapOr  (cost=31.39..31.39 rows=2941 width=0) (actual time=0.007..0.007 rows=0 loops=67815)
                                  ->  Bitmap Index Scan on trading_dates_i_idx  (cost=0.00..15.00 rows=1471 width=0) (actual time=0.003..0.003 rows=3 loops=67815)
                                        Index Cond: ((msk_bt_datesandstuff.rdq_i <= i) AND (i <= (msk_bt_datesandstuff.rdq_i + 2)))
                                  ->  Bitmap Index Scan on trading_dates_i_idx  (cost=0.00..15.00 rows=1471 width=0) (actual time=0.003..0.003 rows=3 loops=67815)
                                        Index Cond: ((msk_bt_datesandstuff.fdate_i <= i) AND (i <= (msk_bt_datesandstuff.fdate_i + 2)))
  ->  HashAggregate  (cost=19366821.95..19366824.45 rows=200 width=12) (actual time=235793.000..235793.120 rows=135 loops=1)
        Group Key: t.yyyymm
        ->  Merge Join  (cost=17586928.66..19060390.61 rows=61286269 width=12) (actual time=171508.569..235548.161 rows=163180 loops=1)
              Merge Cond: ((t.permno = dsf.permno) AND (t.date = dsf.date))
              ->  Sort  (cost=7194721.95..7294413.30 rows=39876540 width=16) (actual time=2206.763..2333.899 rows=163223 loops=1)
                    Sort Key: t.permno, t.date
                    Sort Method: external sort  Disk: 4784kB
                    ->  CTE Scan on stage1 t  (cost=0.00..797530.80 rows=39876540 width=16) (actual time=1612.565..1996.972 rows=163223 loops=1)
              ->  Materialize  (cost=10392206.71..10595963.09 rows=40751276 width=24) (actual time=169281.705..211131.879 rows=28454152 loops=1)
                    ->  Sort  (cost=10392206.71..10494084.90 rows=40751276 width=24) (actual time=169281.700..196431.128 rows=28454152 loops=1)
                          Sort Key: dsf.permno, dsf.date
                          Sort Method: external merge  Disk: 942416kB
                          ->  Seq Scan on dsf  (cost=0.00..2734006.76 rows=40751276 width=24) (actual time=2.293..73874.003 rows=28460488 loops=1)
Planning time: 0.597 ms
Execution time: 235941.636 ms

解释查询分析结果1b

Sort  (cost=1558249.38..1558249.72 rows=135 width=12) (actual time=16634.455..16634.464 rows=135 loops=1)
  Sort Key: t.yyyymm
  Sort Method: quicksort  Memory: 31kB
  ->  HashAggregate  (cost=1558242.92..1558244.61 rows=135 width=12) (actual time=16634.322..16634.357 rows=135 loops=1)
        Group Key: t.yyyymm
        ->  Nested Loop  (cost=0.56..1556982.65 rows=252054 width=12) (actual time=0.348..16425.602 rows=163180 loops=1)
              ->  Seq Scan on deleteme2 t  (cost=0.00..2515.23 rows=163223 width=16) (actual time=0.021..60.422 rows=163223 loops=1)
              ->  Index Scan using dsf_pkey on dsf  (cost=0.56..9.50 rows=2 width=24) (actual time=0.096..0.098 rows=1 loops=163223)
                    Index Cond: ((permno = t.permno) AND (date = t.date))
Planning time: 3.336 ms
Execution time: 16634.577 ms

0 个答案:

没有答案