如何确定Postgres选择查询计划的原因?

时间:2016-03-27 03:58:46

标签: sql postgresql optimization

我在Postgres中使用两个表连接进行了相对适度的查询,但是当我在开发环境中对生产环境运行查询时,性能却大不相同。

这是查询:

select count(seat_id) as avail, ev.event_name, price_code, 
   (case when substring(section_name, 4, 1) = 'A' then substring(section_name, 1, 3)
         when row_name < '9999' then section_name 
         else section_name || 'C' 
   end) as section_name_full, class_name
from tm_availseats3_exp seats join tm_event_map ev on ev.event_name = seats.event_name
where event_sub_type = 'General'
group by ev.event_name, price_code, section_name_full, class_name, row_name

两个环境中的数据与索引相同。我已在两个环境中使用&#34; Analyze Explain&#34;运行查询。并获得以下结果。

这很快:

HashAggregate  (cost=29061.69..29229.88 rows=7475 width=41) (actual time=662.006..682.448 rows=17444 loops=1)
  Group Key: ev.event_name, seats.price_code, CASE WHEN ("substring"((seats.section_name)::text, 4, 1) = 'A'::text) THEN "substring"((seats.section_name)::text, 1, 3) WHEN ((seats.row_name)::text < '9999'::text) THEN (seats.section_name)::text ELSE ((seats.section_name)::text || 'C'::text) END, seats.class_name, seats.row_name
->  Nested Loop  (cost=1090.79..28949.57 rows=7475 width=41) (actual time=2.267..488.597 rows=110977 loops=1)
    ->  HashAggregate  (cost=784.42..784.44 rows=1 width=51) (actual time=2.076..2.163 rows=61 loops=1)
          Group Key: ev_1.event_name, ev.event_name, ev_1.event_date, ev.event_name_long, ev.event_time, ev.event_day, CASE WHEN ("substring"((ev.event_name)::text, 1, 4) = 'EUCB'::text) THEN 'General'::text ELSE 'Premium'::text END
          ->  Nested Loop  (cost=558.96..784.41 rows=1 width=51) (actual time=0.997..1.967 rows=61 loops=1)
                ->  HashAggregate  (cost=558.68..558.78 rows=10 width=12) (actual time=0.953..1.021 rows=61 loops=1)
                      Group Key: ev_1.event_name, ev_1.event_date
                      ->  Seq Scan on tm_evnt3 ev_1  (cost=0.00..558.63 rows=10 width=12) (actual time=0.035..0.876 rows=61 loops=1)
                            Filter: ("substring"((event_name)::text, 1, 4) = 'EUCB'::text)
                            Rows Removed by Filter: 1981
                ->  Index Scan using idx_tm_evnt3__event_date on tm_evnt3 ev  (cost=0.28..22.54 rows=1 width=43) (actual time=0.006..0.011 rows=1 loops=61)
                      Index Cond: (event_date = ev_1.event_date)
                      Filter: (("substring"((event_name)::text, 1, 4) <> 'PARK'::text) AND ("substring"((event_name)::text, 1, 5) <> 'PROMO'::text) AND ("substring"((event_name)::text, length((event_name)::text), 1) <> 'P'::text) AND (CASE WHEN ("substring"((event_name)::text, 1, 4) = 'EUCB'::text) THEN 'General'::text ELSE 'Premium'::text END = 'General'::text))
                      Rows Removed by Filter: 5
    ->  Bitmap Heap Scan on tm_availseats3_exp seats  (cost=306.36..27996.93 rows=7475 width=41) (actual time=0.194..2.352 rows=1819 loops=61)
          Recheck Cond: ((event_name)::text = (ev.event_name)::text)
          Heap Blocks: exact=12875
          ->  Bitmap Index Scan on tm_availseats3_exp_on_event  (cost=0.00..304.50 rows=7475 width=0) (actual time=0.168..0.168 rows=1819 loops=61)
                Index Cond: ((event_name)::text = (ev.event_name)::text)
Planning time: 0.498 ms
Execution time: 700.538 ms

这真的非常慢:

HashAggregate  (cost=1083030.39..1083267.27 rows=10528 width=41) (actual time=107897.847..107918.705 rows=17444 loops=1)
  Group Key: ev.event_name, seats.price_code, CASE WHEN ("substring"((seats.section_name)::text, 4, 1) = 'A'::text) THEN "substring"((seats.section_name)::text, 1, 3) WHEN ((seats.row_name)::text < '9999'::text) THEN (seats.section_name)::text ELSE ((seats.section_name)::text || 'C'::text) END, seats.class_name, seats.row_name
  ->  Hash Join  (cost=795.21..1082872.47 rows=10528 width=41) (actual time=47773.210..107704.968 rows=110977 loops=1)
    Hash Cond: ((seats.event_name)::text = (ev.event_name)::text)
    ->  Seq Scan on tm_availseats3_exp seats  (cost=0.00..1052862.73 rows=7727373 width=41) (actual time=3352.769..103536.131 rows=3609106 loops=1)
    ->  Hash  (cost=795.20..795.20 rows=1 width=8) (actual time=2.364..2.364 rows=61 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 3kB
          ->  Subquery Scan on ev  (cost=795.18..795.20 rows=1 width=8) (actual time=2.107..2.292 rows=61 loops=1)
                ->  HashAggregate  (cost=795.18..795.19 rows=1 width=51) (actual time=2.104..2.169 rows=61 loops=1)
                      Group Key: ev_2.event_name, ev_1.event_name, ev_2.event_date, ev_1.event_name_long, ev_1.event_time, ev_1.event_day, CASE WHEN ("substring"((ev_1.event_name)::text, 1, 4) = 'EUCB'::text) THEN 'General'::text ELSE 'Premium'::text END
                      ->  Nested Loop  (cost=568.96..795.16 rows=1 width=51) (actual time=0.998..1.987 rows=61 loops=1)
                            ->  HashAggregate  (cost=568.68..568.78 rows=10 width=12) (actual time=0.942..1.018 rows=61 loops=1)
                                  Group Key: ev_2.event_name, ev_2.event_date
                                  ->  Seq Scan on tm_evnt3 ev_2  (cost=0.00..568.63 rows=10 width=12) (actual time=0.039..0.864 rows=61 loops=1)
                                        Filter: ("substring"((event_name)::text, 1, 4) = 'EUCB'::text)
                                        Rows Removed by Filter: 1981
                            ->  Index Scan using idx_tm_evnt3__event_date on tm_evnt3 ev_1  (cost=0.28..22.62 rows=1 width=43) (actual time=0.006..0.011 rows=1 loops=61)
                                  Index Cond: (event_date = ev_2.event_date)
                                  Filter: (("substring"((event_name)::text, 1, 4) <> 'PARK'::text) AND ("substring"((event_name)::text, 1, 5) <> 'PROMO'::text) AND ("substring"((event_name)::text, length((event_name)::text), 1) <> 'P'::text) AND (CASE WHEN ("substring"((event_name)::text, 1, 4) = 'EUCB'::text) THEN 'General'::text ELSE 'Premium'::text END = 'General'::text))
                                  Rows Removed by Filter: 5
Planning time: 0.482 ms
Execution time: 107936.927 ms

我很清楚,问题在于第二个执行计划是用Seq Scan开始查询这里涉及的两个表中更大的一个,但我不知道为什么它没有制定相同的计划。

Postgres查询规划器是否具有确定性?有没有办法提供它应该使用的查询计划的提示?

1 个答案:

答案 0 :(得分:1)

正如Ildar Musin评论的那样,正确的方法是确保所有数据库的统计数据都是最新的。我的理解是,这是自动发生的,但事实并非如此。

VACUUM ANALYZE能够使慢速运行查询的性能与更快的查询非常相似。