Question

这个查询在postgres上运行得很慢：

SELECT 
  class_service.name AS "classServiceName", 
  market.name AS "marketName", 
  market_pricing.day_x AS "dayX", 
  station_1.iata AS "odDestination", 
  coalesce(market_pricing.availability, -1) AS "marketAvailability",         
  station_2.iata AS "odOrigin"
FROM market_pricing 
JOIN class_service ON class_service.id = market_pricing.class_service_id 
JOIN market ON market.id = market_pricing.market_id 
JOIN od ON market.id = od.market_id 
JOIN train_stop AS train_stop_1 ON train_stop_1.id = od.stop_destination_id 
JOIN station AS station_1 ON train_stop_1.station_id = station_1.id 
JOIN train_stop AS train_stop_2 ON train_stop_2.id = od.stop_origin_id 
JOIN station AS station_2 ON train_stop_2.station_id = station_2.id 
JOIN train ON train.id = market.train_id
WHERE train.departure_date IN ('2016-01-16') 
AND train.train_number IN (2967)

基本上我只是在其中一个表上加入了一堆带有条件的表。该查询返回少量行（~2000），因为条件非常有选择性。

当我在postgres上尝试使用EXPLAIN时，我得到了这个计划：

Hash Join  (cost=29575.77..905867.89 rows=849 width=32)
   Hash Cond: (market.train_id = train.id)
   ->  Hash Join  (cost=29567.45..810779.82 rows=25352335 width=36)
         Hash Cond: (market_pricing.market_id = market.id)
         ->  Hash Join  (cost=1.99..232335.84 rows=6578983 width=14)
               Hash Cond: (market_pricing.class_service_id = class_service.id)
               ->  Seq Scan on market_pricing  (cost=0.00..141872.83 rows=6578983 width=16)
               ->  Hash  (cost=1.44..1.44 rows=44 width=6)
                     ->  Seq Scan on class_service  (cost=0.00..1.44 rows=44 width=6)
         ->  Hash  (cost=27373.77..27373.77 rows=107895 width=34)
               ->  Hash Join  (cost=12462.88..27373.77 rows=107895 width=34)
                     Hash Cond: (train_stop_2.station_id = station_2.id)
                     ->  Hash Join  (cost=12459.97..25887.30 rows=107895 width=34)
                           Hash Cond: (train_stop_1.station_id = station_1.id)
                           ->  Hash Join  (cost=12457.06..24400.84 rows=107895 width=34)
                                 Hash Cond: (od.market_id = market.id)
                                 ->  Hash Join  (cost=11596.08..21228.71 rows=109529 width=12)
                                       Hash Cond: (od.stop_origin_id = train_stop_2.id)
                                       ->  Hash Join  (cost=5798.04..11642.00 rows=109529 width=12)
                                             Hash Cond: (od.stop_destination_id = train_stop_1.id)
                                             ->  Seq Scan on od  (cost=0.00..2055.29 rows=109529 width=12)
                                             ->  Hash  (cost=3005.24..3005.24 rows=170224 width=8)
                                                   ->  Seq Scan on train_stop train_stop_1  (cost=0.00..3005.24 rows=170224 width=8)
                                       ->  Hash  (cost=3005.24..3005.24 rows=170224 width=8)
                                             ->  Seq Scan on train_stop train_stop_2  (cost=0.00..3005.24 rows=170224 width=8)
                                 ->  Hash  (cost=510.99..510.99 rows=27999 width=22)
                                       ->  Seq Scan on market  (cost=0.00..510.99 rows=27999 width=22)
                           ->  Hash  (cost=1.85..1.85 rows=85 width=8)
                                 ->  Seq Scan on station station_1  (cost=0.00..1.85 rows=85 width=8)
                     ->  Hash  (cost=1.85..1.85 rows=85 width=8)
                           ->  Seq Scan on station station_2  (cost=0.00..1.85 rows=85 width=8)
   ->  Hash  (cost=8.31..8.31 rows=1 width=4)
         ->  Index Scan using train_unique on train  (cost=0.29..8.31 rows=1 width=4)
               Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967))

我不是查询规划方面的专家，但我认为昂贵的部分是postgres哈希一整个表（约200万行）只是为了连接右侧的一行，而它应该只使用一个嵌套循环在这种情况下快得多。查询计划中使用的统计信息非常准确。这种行为背后的原因是什么？

修改

EXPLAIN ANALYZE

Hash Join  (cost=29575.77..905867.89 rows=849 width=32) (actual time=919.433..20674.305 rows=2028 loops=1)
   Hash Cond: (market.train_id = train.id)
   ->  Hash Join  (cost=29567.45..810779.82 rows=25352335 width=36) (actual time=861.335..17606.129 rows=24711872 loops=1)
         Hash Cond: (market_pricing.market_id = market.id)
         ->  Hash Join  (cost=1.99..232335.84 rows=6578983 width=14) (actual time=0.085..5699.519 rows=6845943 loops=1)
               Hash Cond: (market_pricing.class_service_id = class_service.id)
               ->  Seq Scan on market_pricing  (cost=0.00..141872.83 rows=6578983 width=16) (actual time=0.020..2463.255 rows=6845943 loops=1)
               ->  Hash  (cost=1.44..1.44 rows=44 width=6) (actual time=0.045..0.045 rows=44 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 2kB
                     ->  Seq Scan on class_service  (cost=0.00..1.44 rows=44 width=6) (actual time=0.016..0.032 rows=44 loops=1)
         ->  Hash  (cost=27373.77..27373.77 rows=107895 width=34) (actual time=861.166..861.166 rows=107132 loops=1)
               Buckets: 8192  Batches: 2  Memory Usage: 3549kB
               ->  Hash Join  (cost=12462.88..27373.77 rows=107895 width=34) (actual time=217.318..814.250 rows=107132 loops=1)
                     Hash Cond: (train_stop_2.station_id = station_2.id)
                     ->  Hash Join  (cost=12459.97..25887.30 rows=107895 width=34) (actual time=217.237..776.679 rows=107132 loops=1)
                           Hash Cond: (train_stop_1.station_id = station_1.id)
                           ->  Hash Join  (cost=12457.06..24400.84 rows=107895 width=34) (actual time=217.162..739.602 rows=107132 loops=1)
                                 Hash Cond: (od.market_id = market.id)
                                 ->  Hash Join  (cost=11596.08..21228.71 rows=109529 width=12) (actual time=188.590..578.450 rows=107132 loops=1)
                                       Hash Cond: (od.stop_origin_id = train_stop_2.id)
                                       ->  Hash Join  (cost=5798.04..11642.00 rows=109529 width=12) (actual time=106.059..312.845 rows=107132 loops=1)
                                             Hash Cond: (od.stop_destination_id = train_stop_1.id)
                                             ->  Seq Scan on od  (cost=0.00..2055.29 rows=109529 width=12) (actual time=0.006..41.699 rows=107132 loops=1)
                                             ->  Hash  (cost=3005.24..3005.24 rows=170224 width=8) (actual time=105.850..105.850 rows=171096 loops=1)
                                                   Buckets: 16384  Batches: 2  Memory Usage: 3357kB
                                                   ->  Seq Scan on train_stop train_stop_1  (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.005..45.071 rows=171096 loops=1)
                                       ->  Hash  (cost=3005.24..3005.24 rows=170224 width=8) (actual time=82.340..82.340 rows=171096 loops=1)
                                             Buckets: 16384  Batches: 2  Memory Usage: 3357kB
                                             ->  Seq Scan on train_stop train_stop_2  (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.007..37.142 rows=171096 loops=1)
                                 ->  Hash  (cost=510.99..510.99 rows=27999 width=22) (actual time=28.538..28.538 rows=29839 loops=1)
                                       Buckets: 4096  Batches: 1  Memory Usage: 1606kB
                                       ->  Seq Scan on market  (cost=0.00..510.99 rows=27999 width=22) (actual time=0.004..16.594 rows=29839 loops=1)
                           ->  Hash  (cost=1.85..1.85 rows=85 width=8) (actual time=0.054..0.054 rows=85 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 4kB
                                 ->  Seq Scan on station station_1  (cost=0.00..1.85 rows=85 width=8) (actual time=0.003..0.026 rows=85 loops=1)
                     ->  Hash  (cost=1.85..1.85 rows=85 width=8) (actual time=0.063..0.063 rows=85 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 4kB
                           ->  Seq Scan on station station_2  (cost=0.00..1.85 rows=85 width=8) (actual time=0.006..0.032 rows=85 loops=1)
   ->  Hash  (cost=8.31..8.31 rows=1 width=4) (actual time=0.094..0.094 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Index Scan using train_unique on train  (cost=0.29..8.31 rows=1 width=4) (actual time=0.087..0.090 rows=1 loops=1)
               Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967))
 Planning time: 12.338 ms
 Execution time: 20676.057 ms

编辑2

我注意到更改连接顺序会修复它。但我不明白。我认为postgres在内部重新排序连接以选择最佳顺序。

Answer 1

我终于明白了。我不得不更改join_collapse_limit参数，以便postgres能够重新加入连接。

Postgres错误的查询计划

1 个答案: