我们有一个postgres 11数据库,其中的表具有很多行,因此我们使用postgres限定性分区来确保查询性能。
今天,在使用数据库功能时,我注意到了postgres查询计划程序的某些奇怪行为:
在这种情况下,我们有两个表track.track
和sensor.location
。
该功能应返回给定轨道的所有位置。
track.track
和sensor.location
之间的关系由user_vehicle_id
和时间范围给出。
sensor.location
按月对 time
进行分区
对此问题的查询可能如下所示:
WITH single_track AS (
SELECT
start_time, stop_time, user_vehicle_id
FROM
track.track
WHERE
id = 1350000744800)
SELECT *
FROM
sensor.location as l, single_track as t
WHERE
l.time >= t.start_time AND
l.time <= t.stop_time AND
l.user_vehicle_id = t.user_vehicle_id
我希望查询计划器仅查看与location
到start_time
的给定时间范围相匹配的stop_time
分区。
相反,它在所有分区上执行位图堆/索引扫描:
Nested Loop (cost=8.59..9308018.00 rows=722021 width=106) (actual time=1.796..2.296 rows=1025 loops=1)
CTE single_track
-> Index Scan using track_pkey on track (cost=0.42..8.44 rows=1 width=24) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (id = '1350000744800'::bigint)
-> CTE Scan on single_track t (cost=0.00..0.02 rows=1 width=24) (actual time=0.027..0.029 rows=1 loops=1)
-> Append (cost=0.15..9286171.84 rows=2183770 width=82) (actual time=1.750..1.998 rows=1025 loops=1)
-> Index Scan using location_p2011_01_pkey on location_p2011_01 l (cost=0.15..8.83 rows=1 width=136) (never executed)
Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
-> Seq Scan on location_p2011_02 l_1 (cost=0.00..7.71 rows=1 width=82) (never executed)
Filter: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (t.user_vehicle_id = user_vehicle_id))
-> Bitmap Heap Scan on location_p2011_03 l_2 (cost=643.94..3370.03 rows=2087 width=114) (never executed)
Recheck Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
...
-> Index Scan using location_p2020_10_pkey on location_p2020_10 l_117 (cost=0.15..8.83 rows=1 width=136) (never executed)
Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
-> Index Scan using location_p2020_11_pkey on location_p2020_11 l_118 (cost=0.15..8.83 rows=1 width=136) (never executed)
Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
-> Index Scan using location_p2020_12_pkey on location_p2020_12 l_119 (cost=0.15..8.83 rows=1 width=136) (never executed)
Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
Planning Time: 11.046 ms
Execution Time: 4.144 ms
我在玩耍时发现,使用相同的查询但明确地传递时间:
EXPLAIN ANALYSE
WITH single_track AS (
SELECT
start_time,
stop_time,
user_vehicle_id
FROM
track.track
WHERE
id = 1350000744800)
SELECT *
FROM
sensor.location as l, single_track as t
WHERE
l.time >= '2016-04-12 18:04:59' AND
l.time <= '2016-04-12 18:22:49' AND
l.user_vehicle_id = t.user_vehicle_id
产生预期的行为:
Nested Loop (cost=9.00..2111.73 rows=141 width=102) (actual time=0.085..2.408 rows=1025 loops=1)
CTE single_track
-> Index Scan using track_pkey on track (cost=0.42..8.44 rows=1 width=24) (actual time=0.017..0.018 rows=1 loops=1)
Index Cond: (id = '1350000744800'::bigint)
-> CTE Scan on single_track t (cost=0.00..0.02 rows=1 width=24) (actual time=0.021..0.022 rows=1 loops=1)
-> Append (cost=0.56..2099.99 rows=328 width=78) (actual time=0.060..2.081 rows=1025 loops=1)
-> Index Scan using location_p2016_04_pkey on location_p2016_04 l (cost=0.56..2098.35 rows=328 width=78) (actual time=0.058..1.994 rows=1025 loops=1)
Index Cond: (("time" >= '2016-04-12 18:04:59'::timestamp without time zone) AND ("time" <= '2016-04-12 18:22:49'::timestamp without time zone) AND (user_vehicle_id = t.user_vehicle_id))
Planning Time: 4.709 ms
Execution Time: 2.494 ms
任何人都可以解释这种行为,并帮助我解决该问题吗?
答案 0 :(得分:0)
对于PostgreSQL执行程序来说,这似乎太复杂了。
我建议尝试
$copychk= Copy-Item -Path "A" -destination "B" -verbose -Recurse
然后PostgreSQL至少知道只有一个值。
如果仍然无效,请将查询分为两部分:
SELECT *
FROM
sensor.location as l, single_track as t
WHERE
l.time >= (SELECT start_time FROM track.track WHERE id = 1350000744800) AND
l.time <= (SELECT stop_time FROM track.track WHERE id = 1350000744800) AND
l.user_vehicle_id = (SELECT user_vehicle_id FROM track.track WHERE id = 1350000744800)
获取值。答案 1 :(得分:0)
我也尝试过这样做,这可能与Laurenz Albe的建议很接近。由于EXPLAIN ANALYSE
未显示psql
函数的查询计划,因此我无法确认这会导致正确的行为。
CREATE OR REPLACE FUNCTION location_from_track_id(
_track_id bigint)
RETURNS SETOF sensor.location
LANGUAGE 'plpgsql'
AS
$BODY$
DECLARE
_user_vehicle_id bigint;
_start_time timestamp without time zone;
_stop_time timestamp without time zone;
BEGIN
SELECT
user_vehicle_id,
start_time,
stop_time
INTO
_user_vehicle_id,
_start_time,
_stop_time
FROM
track.track
WHERE id=_track_id;
RETURN QUERY
SELECT *
FROM sensor.location
WHERE
time BETWEEN _start_time AND _stop_time AND
user_vehicle_id = _user_vehicle_id;
END;
$BODY$;