Question

对于以下查询：

SELECT *
FROM "routes_trackpoint"
WHERE "routes_trackpoint"."track_id" = 593
ORDER BY "routes_trackpoint"."id" ASC
LIMIT 1;

Postgres正在选择一个查询计划，该计划读取“id”索引中的所有行以执行排序，并执行手动过滤以获取具有正确轨道ID的条目：

Limit  (cost=0.43..511.22 rows=1 width=65) (actual time=4797.964..4797.966 rows=1 loops=1)
   Buffers: shared hit=3388505
   ->  Index Scan using routes_trackpoint_pkey on routes_trackpoint  (cost=0.43..719699.79 rows=1409 width=65) (actual time=4797.958..4797.958 rows=1 loops=1)
         Filter: (track_id = 75934)
         Rows Removed by Filter: 13005436
         Buffers: shared hit=3388505
 Total runtime: 4798.019 ms
(7 rows)

禁用索引扫描，查询计划（SET enable_indexscan=OFF;）更好，响应更快。

Limit  (cost=6242.46..6242.46 rows=1 width=65) (actual time=77.584..77.586 rows=1 loops=1)
   Buffers: shared hit=1075 read=6
   ->  Sort  (cost=6242.46..6246.64 rows=1674 width=65) (actual time=77.577..77.577 rows=1 loops=1)
         Sort Key: id
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=1075 read=6
         ->  Bitmap Heap Scan on routes_trackpoint  (cost=53.41..6234.09 rows=1674 width=65) (actual time=70.384..74.782 rows=1454 loops=1)
               Recheck Cond: (track_id = 75934)
               Buffers: shared hit=1075 read=6
               ->  Bitmap Index Scan on routes_trackpoint_track_id  (cost=0.00..52.99 rows=1674 width=0) (actual time=70.206..70.206 rows=1454 loops=1)
                     Index Cond: (track_id = 75934)
                     Buffers: shared hit=2 read=6
 Total runtime: 77.655 ms
(13 rows)

如何让Postgres自动选择更好的计划？

我尝试了以下内容：

ALTER TABLE routes_trackpoint ALTER COLUMN track_id SET STATISTICS 5000;
ALTER TABLE routes_trackpoint ALTER COLUMN id SET STATISTICS 5000;
ANALYZE routes_trackpoint;

但查询计划保持不变。

表定义是：

watchdog2=# \d routes_trackpoint
                                   Table "public.routes_trackpoint"
  Column  |           Type           |                           Modifiers                            
----------+--------------------------+----------------------------------------------------------------
 id       | integer                  | not null default nextval('routes_trackpoint_id_seq'::regclass)
 track_id | integer                  | not null
 position | geometry(Point,4326)     | not null
 speed    | double precision         | not null
 bearing  | double precision         | not null
 valid    | boolean                  | not null
 created  | timestamp with time zone | not null
Indexes:
    "routes_trackpoint_pkey" PRIMARY KEY, btree (id)
    "routes_trackpoint_position_id" gist ("position")
    "routes_trackpoint_track_id" btree (track_id)
Foreign-key constraints:
    "track_id_refs_id_d59447ae" FOREIGN KEY (track_id) REFERENCES routes_track(id) DEFERRABLE INITIALLY DEFERRED

PS：我们强迫postgres按“创建”排序，这也帮助他在“track_id”上使用索引。

Answer 1

尽可能避免LIMIT。计划＃1：使用NOT EXISTS()获取第一个

EXPLAIN ANALYZE
SELECT * FROM routes_trackpoint tp
WHERE tp.track_id = 593
AND NOT EXISTS (
        SELECT * FROM routes_trackpoint nx
        WHERE nx.track_id = tp.track_id AND nx.id < tp.id
        );

计划＃2：使用row_number（）OVER some_window 来获取该组的第一个。

EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
        , row_number() OVER (partition BY track_id ORDER BY id) rn
        FROM routes_trackpoint tp2
        ) omg ON omg.id = tp.id
WHERE tp.track_id = 593
AND omg.rn = 1
        ;

或者更好 - 将WHERE子句移动到子查询：

EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
        , row_number() OVER (partition BY track_id ORDER BY id) rn
        FROM routes_trackpoint tp2
        WHERE tp2.track_id = 593
        ) omg ON omg.id = tp.id
WHERE 1=1
-- AND tp.track_id = 593
AND omg.rn = 1
        ;

计划＃3使用postgres特定的DISTINCT ON()结构（感谢@a_horse_with_no_name）：

-- EXPLAIN ANALYZE
SELECT DISTINCT ON (track_id) track_id, id
FROM routes_trackpoint tp2
WHERE tp2.track_id = 593
-- order by track_id, created desc
order by track_id, id
        ;

如何调整postgres中的index_scan成本？

1 个答案: