对于以下查询:
SELECT *
FROM "routes_trackpoint"
WHERE "routes_trackpoint"."track_id" = 593
ORDER BY "routes_trackpoint"."id" ASC
LIMIT 1;
Postgres正在选择一个查询计划,该计划读取“id”索引中的所有行以执行排序,并执行手动过滤以获取具有正确轨道ID的条目:
Limit (cost=0.43..511.22 rows=1 width=65) (actual time=4797.964..4797.966 rows=1 loops=1)
Buffers: shared hit=3388505
-> Index Scan using routes_trackpoint_pkey on routes_trackpoint (cost=0.43..719699.79 rows=1409 width=65) (actual time=4797.958..4797.958 rows=1 loops=1)
Filter: (track_id = 75934)
Rows Removed by Filter: 13005436
Buffers: shared hit=3388505
Total runtime: 4798.019 ms
(7 rows)
禁用索引扫描,查询计划(SET enable_indexscan=OFF;
)更好,响应更快。
Limit (cost=6242.46..6242.46 rows=1 width=65) (actual time=77.584..77.586 rows=1 loops=1)
Buffers: shared hit=1075 read=6
-> Sort (cost=6242.46..6246.64 rows=1674 width=65) (actual time=77.577..77.577 rows=1 loops=1)
Sort Key: id
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1075 read=6
-> Bitmap Heap Scan on routes_trackpoint (cost=53.41..6234.09 rows=1674 width=65) (actual time=70.384..74.782 rows=1454 loops=1)
Recheck Cond: (track_id = 75934)
Buffers: shared hit=1075 read=6
-> Bitmap Index Scan on routes_trackpoint_track_id (cost=0.00..52.99 rows=1674 width=0) (actual time=70.206..70.206 rows=1454 loops=1)
Index Cond: (track_id = 75934)
Buffers: shared hit=2 read=6
Total runtime: 77.655 ms
(13 rows)
如何让Postgres自动选择更好的计划?
我尝试了以下内容:
ALTER TABLE routes_trackpoint ALTER COLUMN track_id SET STATISTICS 5000;
ALTER TABLE routes_trackpoint ALTER COLUMN id SET STATISTICS 5000;
ANALYZE routes_trackpoint;
但查询计划保持不变。
表定义是:
watchdog2=# \d routes_trackpoint
Table "public.routes_trackpoint"
Column | Type | Modifiers
----------+--------------------------+----------------------------------------------------------------
id | integer | not null default nextval('routes_trackpoint_id_seq'::regclass)
track_id | integer | not null
position | geometry(Point,4326) | not null
speed | double precision | not null
bearing | double precision | not null
valid | boolean | not null
created | timestamp with time zone | not null
Indexes:
"routes_trackpoint_pkey" PRIMARY KEY, btree (id)
"routes_trackpoint_position_id" gist ("position")
"routes_trackpoint_track_id" btree (track_id)
Foreign-key constraints:
"track_id_refs_id_d59447ae" FOREIGN KEY (track_id) REFERENCES routes_track(id) DEFERRABLE INITIALLY DEFERRED
PS:我们强迫postgres按“创建”排序,这也帮助他在“track_id”上使用索引。
答案 0 :(得分:1)
尽可能避免LIMIT
。
计划#1:使用NOT EXISTS()
获取第一个
EXPLAIN ANALYZE
SELECT * FROM routes_trackpoint tp
WHERE tp.track_id = 593
AND NOT EXISTS (
SELECT * FROM routes_trackpoint nx
WHERE nx.track_id = tp.track_id AND nx.id < tp.id
);
计划#2:使用row_number()OVER some_window 来获取该组的第一个。
EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
, row_number() OVER (partition BY track_id ORDER BY id) rn
FROM routes_trackpoint tp2
) omg ON omg.id = tp.id
WHERE tp.track_id = 593
AND omg.rn = 1
;
或者更好 - 将WHERE子句移动到子查询:
EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
, row_number() OVER (partition BY track_id ORDER BY id) rn
FROM routes_trackpoint tp2
WHERE tp2.track_id = 593
) omg ON omg.id = tp.id
WHERE 1=1
-- AND tp.track_id = 593
AND omg.rn = 1
;
计划#3使用postgres特定的DISTINCT ON()
结构(感谢@a_horse_with_no_name):
-- EXPLAIN ANALYZE
SELECT DISTINCT ON (track_id) track_id, id
FROM routes_trackpoint tp2
WHERE tp2.track_id = 593
-- order by track_id, created desc
order by track_id, id
;