这个查询在postgres上运行得很慢:
SELECT
class_service.name AS "classServiceName",
market.name AS "marketName",
market_pricing.day_x AS "dayX",
station_1.iata AS "odDestination",
coalesce(market_pricing.availability, -1) AS "marketAvailability",
station_2.iata AS "odOrigin"
FROM market_pricing
JOIN class_service ON class_service.id = market_pricing.class_service_id
JOIN market ON market.id = market_pricing.market_id
JOIN od ON market.id = od.market_id
JOIN train_stop AS train_stop_1 ON train_stop_1.id = od.stop_destination_id
JOIN station AS station_1 ON train_stop_1.station_id = station_1.id
JOIN train_stop AS train_stop_2 ON train_stop_2.id = od.stop_origin_id
JOIN station AS station_2 ON train_stop_2.station_id = station_2.id
JOIN train ON train.id = market.train_id
WHERE train.departure_date IN ('2016-01-16')
AND train.train_number IN (2967)
基本上我只是在其中一个表上加入了一堆带有条件的表。该查询返回少量行(~2000),因为条件非常有选择性。
当我在postgres上尝试使用EXPLAIN时,我得到了这个计划:
Hash Join (cost=29575.77..905867.89 rows=849 width=32)
Hash Cond: (market.train_id = train.id)
-> Hash Join (cost=29567.45..810779.82 rows=25352335 width=36)
Hash Cond: (market_pricing.market_id = market.id)
-> Hash Join (cost=1.99..232335.84 rows=6578983 width=14)
Hash Cond: (market_pricing.class_service_id = class_service.id)
-> Seq Scan on market_pricing (cost=0.00..141872.83 rows=6578983 width=16)
-> Hash (cost=1.44..1.44 rows=44 width=6)
-> Seq Scan on class_service (cost=0.00..1.44 rows=44 width=6)
-> Hash (cost=27373.77..27373.77 rows=107895 width=34)
-> Hash Join (cost=12462.88..27373.77 rows=107895 width=34)
Hash Cond: (train_stop_2.station_id = station_2.id)
-> Hash Join (cost=12459.97..25887.30 rows=107895 width=34)
Hash Cond: (train_stop_1.station_id = station_1.id)
-> Hash Join (cost=12457.06..24400.84 rows=107895 width=34)
Hash Cond: (od.market_id = market.id)
-> Hash Join (cost=11596.08..21228.71 rows=109529 width=12)
Hash Cond: (od.stop_origin_id = train_stop_2.id)
-> Hash Join (cost=5798.04..11642.00 rows=109529 width=12)
Hash Cond: (od.stop_destination_id = train_stop_1.id)
-> Seq Scan on od (cost=0.00..2055.29 rows=109529 width=12)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8)
-> Seq Scan on train_stop train_stop_1 (cost=0.00..3005.24 rows=170224 width=8)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8)
-> Seq Scan on train_stop train_stop_2 (cost=0.00..3005.24 rows=170224 width=8)
-> Hash (cost=510.99..510.99 rows=27999 width=22)
-> Seq Scan on market (cost=0.00..510.99 rows=27999 width=22)
-> Hash (cost=1.85..1.85 rows=85 width=8)
-> Seq Scan on station station_1 (cost=0.00..1.85 rows=85 width=8)
-> Hash (cost=1.85..1.85 rows=85 width=8)
-> Seq Scan on station station_2 (cost=0.00..1.85 rows=85 width=8)
-> Hash (cost=8.31..8.31 rows=1 width=4)
-> Index Scan using train_unique on train (cost=0.29..8.31 rows=1 width=4)
Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967))
我不是查询规划方面的专家,但我认为昂贵的部分是postgres哈希一整个表(约200万行)只是为了连接右侧的一行,而它应该只使用一个嵌套循环在这种情况下快得多。 查询计划中使用的统计信息非常准确。 这种行为背后的原因是什么?
修改
EXPLAIN ANALYZE
Hash Join (cost=29575.77..905867.89 rows=849 width=32) (actual time=919.433..20674.305 rows=2028 loops=1)
Hash Cond: (market.train_id = train.id)
-> Hash Join (cost=29567.45..810779.82 rows=25352335 width=36) (actual time=861.335..17606.129 rows=24711872 loops=1)
Hash Cond: (market_pricing.market_id = market.id)
-> Hash Join (cost=1.99..232335.84 rows=6578983 width=14) (actual time=0.085..5699.519 rows=6845943 loops=1)
Hash Cond: (market_pricing.class_service_id = class_service.id)
-> Seq Scan on market_pricing (cost=0.00..141872.83 rows=6578983 width=16) (actual time=0.020..2463.255 rows=6845943 loops=1)
-> Hash (cost=1.44..1.44 rows=44 width=6) (actual time=0.045..0.045 rows=44 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Seq Scan on class_service (cost=0.00..1.44 rows=44 width=6) (actual time=0.016..0.032 rows=44 loops=1)
-> Hash (cost=27373.77..27373.77 rows=107895 width=34) (actual time=861.166..861.166 rows=107132 loops=1)
Buckets: 8192 Batches: 2 Memory Usage: 3549kB
-> Hash Join (cost=12462.88..27373.77 rows=107895 width=34) (actual time=217.318..814.250 rows=107132 loops=1)
Hash Cond: (train_stop_2.station_id = station_2.id)
-> Hash Join (cost=12459.97..25887.30 rows=107895 width=34) (actual time=217.237..776.679 rows=107132 loops=1)
Hash Cond: (train_stop_1.station_id = station_1.id)
-> Hash Join (cost=12457.06..24400.84 rows=107895 width=34) (actual time=217.162..739.602 rows=107132 loops=1)
Hash Cond: (od.market_id = market.id)
-> Hash Join (cost=11596.08..21228.71 rows=109529 width=12) (actual time=188.590..578.450 rows=107132 loops=1)
Hash Cond: (od.stop_origin_id = train_stop_2.id)
-> Hash Join (cost=5798.04..11642.00 rows=109529 width=12) (actual time=106.059..312.845 rows=107132 loops=1)
Hash Cond: (od.stop_destination_id = train_stop_1.id)
-> Seq Scan on od (cost=0.00..2055.29 rows=109529 width=12) (actual time=0.006..41.699 rows=107132 loops=1)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8) (actual time=105.850..105.850 rows=171096 loops=1)
Buckets: 16384 Batches: 2 Memory Usage: 3357kB
-> Seq Scan on train_stop train_stop_1 (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.005..45.071 rows=171096 loops=1)
-> Hash (cost=3005.24..3005.24 rows=170224 width=8) (actual time=82.340..82.340 rows=171096 loops=1)
Buckets: 16384 Batches: 2 Memory Usage: 3357kB
-> Seq Scan on train_stop train_stop_2 (cost=0.00..3005.24 rows=170224 width=8) (actual time=0.007..37.142 rows=171096 loops=1)
-> Hash (cost=510.99..510.99 rows=27999 width=22) (actual time=28.538..28.538 rows=29839 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 1606kB
-> Seq Scan on market (cost=0.00..510.99 rows=27999 width=22) (actual time=0.004..16.594 rows=29839 loops=1)
-> Hash (cost=1.85..1.85 rows=85 width=8) (actual time=0.054..0.054 rows=85 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 4kB
-> Seq Scan on station station_1 (cost=0.00..1.85 rows=85 width=8) (actual time=0.003..0.026 rows=85 loops=1)
-> Hash (cost=1.85..1.85 rows=85 width=8) (actual time=0.063..0.063 rows=85 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 4kB
-> Seq Scan on station station_2 (cost=0.00..1.85 rows=85 width=8) (actual time=0.006..0.032 rows=85 loops=1)
-> Hash (cost=8.31..8.31 rows=1 width=4) (actual time=0.094..0.094 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using train_unique on train (cost=0.29..8.31 rows=1 width=4) (actual time=0.087..0.090 rows=1 loops=1)
Index Cond: ((departure_date = '2016-01-16'::date) AND (train_number = 2967))
Planning time: 12.338 ms
Execution time: 20676.057 ms
编辑2
我注意到更改连接顺序会修复它。但我不明白。我认为postgres在内部重新排序连接以选择最佳顺序。
答案 0 :(得分:0)
我终于明白了。 我不得不更改join_collapse_limit参数,以便postgres能够重新加入连接。