我在heroku上运行Postgres 9.56的数据库。 我正在使用不同的参数值运行以下SQL,但是在性能上会产生非常不同的结果。
查询1
SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
FROM tk_seat s
LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
WHERE DATE_PART('year', t.departure)= '2017'
AND t.trip_status = 'BOOKABLE'
AND t.route_id = '278'
AND s.seat_status_type != 'NONE'
AND s.operator_id = '15'
GROUP BY DATE_TRUNC('MONTH', t.departure)
ORDER BY DATE_TRUNC('MONTH', t.departure)
查询2
SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
FROM tk_seat s
LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
WHERE DATE_PART('year', t.departure)= '2017'
AND t.trip_status = 'BOOKABLE'
AND t.route_id = '150'
AND s.seat_status_type != 'NONE'
AND s.operator_id = '15'
GROUP BY DATE_TRUNC('MONTH', t.departure)
ORDER BY DATE_TRUNC('MONTH', t.departure)
只有差异才是t.route_id值。
所以,我尝试运行 explain 并给我非常不同的结果。
查询1
"GroupAggregate (cost=279335.17..279335.19 rows=1 width=298)"
" Group Key: (date_trunc('MONTH'::text, t.departure))"
" -> Sort (cost=279335.17..279335.17 rows=1 width=298)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" -> Nested Loop (cost=0.00..279335.16 rows=1 width=298)"
" Join Filter: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_trip t (cost=0.00..5951.88 rows=1 width=12)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '278'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"
" -> Seq Scan on tk_seat s (cost=0.00..271738.35 rows=131594 width=298)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
对于查询2
"Sort (cost=278183.94..278183.95 rows=1 width=298)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" -> HashAggregate (cost=278183.92..278183.93 rows=1 width=298)"
" Group Key: date_trunc('MONTH'::text, t.departure)"
" -> Hash Join (cost=5951.97..278183.88 rows=7 width=298)"
" Hash Cond: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_seat s (cost=0.00..271738.35 rows=131594 width=298)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" -> Hash (cost=5951.88..5951.88 rows=7 width=12)"
" -> Seq Scan on tk_trip t (cost=0.00..5951.88 rows=7 width=12)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '150'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"
我的问题是为什么以及如何使它变得相同?因为第一个查询给我的表现非常糟糕
查询1分析
"GroupAggregate (cost=274051.28..274051.31 rows=1 width=8) (actual time=904682.606..904684.283 rows=7 loops=1)"
" Group Key: (date_trunc('MONTH'::text, t.departure))"
" -> Sort (cost=274051.28..274051.29 rows=1 width=8) (actual time=904682.432..904682.917 rows=13520 loops=1)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" Sort Method: quicksort Memory: 1018kB"
" -> Nested Loop (cost=0.42..274051.27 rows=1 width=8) (actual time=1133.925..904676.254 rows=13520 loops=1)"
" Join Filter: (s.trip_id = t.trip_id)"
" Rows Removed by Join Filter: 42505528"
" -> Index Scan using tk_trip_route_id_idx on tk_trip t (cost=0.42..651.34 rows=1 width=12) (actual time=0.020..2.720 rows=338 loops=1)"
" Index Cond: (route_id = '278'::bigint)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
" Rows Removed by Filter: 28"
" -> Seq Scan on tk_seat s (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.071..2662.102 rows=125796 loops=338)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" Rows Removed by Filter: 6782294"
"Planning time: 1.172 ms"
"Execution time: 904684.570 ms"
查询2分析
"Sort (cost=275018.88..275018.89 rows=1 width=8) (actual time=2153.843..2153.843 rows=9 loops=1)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" Sort Method: quicksort Memory: 25kB"
" -> HashAggregate (cost=275018.86..275018.87 rows=1 width=8) (actual time=2153.833..2153.834 rows=9 loops=1)"
" Group Key: date_trunc('MONTH'::text, t.departure)"
" -> Hash Join (cost=2797.67..275018.82 rows=7 width=8) (actual time=2.472..2147.093 rows=36565 loops=1)"
" Hash Cond: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_seat s (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.127..2116.153 rows=125796 loops=1)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" Rows Removed by Filter: 6782294"
" -> Hash (cost=2797.58..2797.58 rows=7 width=12) (actual time=1.853..1.853 rows=1430 loops=1)"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 78kB"
" -> Bitmap Heap Scan on tk_trip t (cost=32.21..2797.58 rows=7 width=12) (actual time=0.176..1.559 rows=1430 loops=1)"
" Recheck Cond: (route_id = '150'::bigint)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
" Rows Removed by Filter: 33"
" Heap Blocks: exact=333"
" -> Bitmap Index Scan on tk_trip_route_id_idx (cost=0.00..32.21 rows=1572 width=0) (actual time=0.131..0.131 rows=1463 loops=1)"
" Index Cond: (route_id = '150'::bigint)"
"Planning time: 0.211 ms"
"Execution time: 2153.972 ms"
答案 0 :(得分:2)
如果您暗示postgres不使用嵌套循环,您可以 - 可能 - 使它们相同:
SET enable_nestloop = 'off';
您可以通过将其设置为服务器,角色,内部函数定义或服务器配置来使其永久化:
ALTER DATABASE postgres
SET enable_nestloop = 'off';
ALTER ROLE lkaminski
SET enable_nestloop = 'off';
CREATE FUNCTION add(integer, integer) RETURNS integer
AS 'select $1 + $2;'
LANGUAGE SQL
SET enable_nestloop = 'off'
IMMUTABLE
RETURNS NULL ON NULL INPUT;
至于为什么 - 你改变搜索条件和计划者估计从tk_trip
他将得到1行而不是7,所以它改变了计划,因为看起来嵌套循环会更好。有时这是错误的,你可能会得到更慢的执行时间。但是如果你“强迫”它不使用嵌套循环,那么对于不同的参数,使用第二个计划而不是第一个计划(使用嵌套循环)可能会更慢。
您可以通过增加每列收集的统计信息来使计划程序估算更准确。 可能帮助。
ALTER TABLE tk_trip ALTER COLUMN route_id SET STATISTICS 1000;
作为旁注 - 您的LEFT JOIN
实际上是INNER JOIN
,因为您已将该表的条件放在WHERE而不是ON
中。如果将它们移动到ON
,你应该得到不同的计划(和结果) - 假设你想要LEFT JOIN。