我有两个查询在结果中完全相同。
方法1:在一个查询中执行所有操作(需要235秒)
with stage1 as(
... BLAH BLAH BLAH...
)
SELECT ... FROM stage1 ....
方法2:在mytemptable中保存中间数据(总共大约18秒)
-- query 2a
CREATE TABLE mytemptable AS
...BLAH BLAH BLAH... -- takes about 2 seconds
-- query 2b
SELECT ... FROM mytemptable ... -- takes about 16 seconds
方法2似乎 WAY 更快,因为它使用嵌套循环,seq扫描,索引扫描而不是某些合并连接方法。是否有一种理智的方法来强制查询1上更快的查询计划? (我不需要保存临时表......)
- 更新1 -
正确的做法是使用嵌套循环,而合并连接速度非常慢。
通过添加set local enable_mergejoin=false;
,我可以通过单个查询获得理智的行为,但我想知道是否有更好的方式?
EXPLAIN analyze
with stage1 as(
SELECT DISTINCT permno, d.date, d.date/100 as yyyymm
FROM eb2.msk_bt_datesandstuff
JOIN mycrsp.trading_dates d ON (rdq_i <= d.i and d.i <= rdq_i + 2) OR (fdate_i <= d.i and d.i <= fdate_i + 2)
WHERE repm_shares_pp > 0
)
SELECT yyyymm, AVG(ret)
FROM stage1 t
JOIN q_stock.dsf dsf ON dsf.permno = t.permno AND dsf.date = t.date
GROUP BY yyyymm
ORDER BY yyyymm
注意q_stock.dsf上有一个主键(permno,date),我已完成analyze q_stock.dsf;
Sort (cost=66540412.41..66540412.91 rows=200 width=12) (actual time=235793.281..235793.310 rows=135 loops=1)
Sort Key: t.yyyymm
Sort Method: quicksort Memory: 31kB
CTE stage1
-> Unique (cost=45289896.52..47173580.31 rows=39876540 width=12) (actual time=1612.561..1914.188 rows=163223 loops=1)
-> Sort (cost=45289896.52..45760817.47 rows=188368379 width=12) (actual time=1612.559..1817.441 rows=349028 loops=1)
Sort Key: msk_bt_datesandstuff.permno, d.date, ((d.date / 100))
Sort Method: external merge Disk: 8856kB
-> Nested Loop (cost=31.39..9742058.04 rows=188368379 width=12) (actual time=3.102..894.148 rows=349028 loops=1)
-> Seq Scan on msk_bt_datesandstuff (cost=0.00..2080.69 rows=67809 width=16) (actual time=2.179..50.339 rows=67815 loops=1)
Filter: (repm_shares_pp > 0::double precision)
-> Bitmap Heap Scan on trading_dates d (cost=31.39..108.91 rows=2778 width=8) (actual time=0.008..0.009 rows=5 loops=67815)
Recheck Cond: (((msk_bt_datesandstuff.rdq_i <= i) AND (i <= (msk_bt_datesandstuff.rdq_i + 2))) OR ((msk_bt_datesandstuff.fdate_i <= i) AND (i <= (msk_bt_datesandstuff.fdate_i + 2))))
Heap Blocks: exact=72014
-> BitmapOr (cost=31.39..31.39 rows=2941 width=0) (actual time=0.007..0.007 rows=0 loops=67815)
-> Bitmap Index Scan on trading_dates_i_idx (cost=0.00..15.00 rows=1471 width=0) (actual time=0.003..0.003 rows=3 loops=67815)
Index Cond: ((msk_bt_datesandstuff.rdq_i <= i) AND (i <= (msk_bt_datesandstuff.rdq_i + 2)))
-> Bitmap Index Scan on trading_dates_i_idx (cost=0.00..15.00 rows=1471 width=0) (actual time=0.003..0.003 rows=3 loops=67815)
Index Cond: ((msk_bt_datesandstuff.fdate_i <= i) AND (i <= (msk_bt_datesandstuff.fdate_i + 2)))
-> HashAggregate (cost=19366821.95..19366824.45 rows=200 width=12) (actual time=235793.000..235793.120 rows=135 loops=1)
Group Key: t.yyyymm
-> Merge Join (cost=17586928.66..19060390.61 rows=61286269 width=12) (actual time=171508.569..235548.161 rows=163180 loops=1)
Merge Cond: ((t.permno = dsf.permno) AND (t.date = dsf.date))
-> Sort (cost=7194721.95..7294413.30 rows=39876540 width=16) (actual time=2206.763..2333.899 rows=163223 loops=1)
Sort Key: t.permno, t.date
Sort Method: external sort Disk: 4784kB
-> CTE Scan on stage1 t (cost=0.00..797530.80 rows=39876540 width=16) (actual time=1612.565..1996.972 rows=163223 loops=1)
-> Materialize (cost=10392206.71..10595963.09 rows=40751276 width=24) (actual time=169281.705..211131.879 rows=28454152 loops=1)
-> Sort (cost=10392206.71..10494084.90 rows=40751276 width=24) (actual time=169281.700..196431.128 rows=28454152 loops=1)
Sort Key: dsf.permno, dsf.date
Sort Method: external merge Disk: 942416kB
-> Seq Scan on dsf (cost=0.00..2734006.76 rows=40751276 width=24) (actual time=2.293..73874.003 rows=28460488 loops=1)
Planning time: 0.597 ms
Execution time: 235941.636 ms
Sort (cost=1558249.38..1558249.72 rows=135 width=12) (actual time=16634.455..16634.464 rows=135 loops=1)
Sort Key: t.yyyymm
Sort Method: quicksort Memory: 31kB
-> HashAggregate (cost=1558242.92..1558244.61 rows=135 width=12) (actual time=16634.322..16634.357 rows=135 loops=1)
Group Key: t.yyyymm
-> Nested Loop (cost=0.56..1556982.65 rows=252054 width=12) (actual time=0.348..16425.602 rows=163180 loops=1)
-> Seq Scan on deleteme2 t (cost=0.00..2515.23 rows=163223 width=16) (actual time=0.021..60.422 rows=163223 loops=1)
-> Index Scan using dsf_pkey on dsf (cost=0.56..9.50 rows=2 width=24) (actual time=0.096..0.098 rows=1 loops=163223)
Index Cond: ((permno = t.permno) AND (date = t.date))
Planning time: 3.336 ms
Execution time: 16634.577 ms