Question

我们有一个postgres 11数据库，其中的表具有很多行，因此我们使用postgres限定性分区来确保查询性能。

今天，在使用数据库功能时，我注意到了postgres查询计划程序的某些奇怪行为：

在这种情况下，我们有两个表track.track和sensor.location。该功能应返回给定轨道的所有位置。

track.track和sensor.location之间的关系由user_vehicle_id和时间范围给出。

使用列sensor.location

按月对

time进行分区

对此问题的查询可能如下所示：

WITH single_track AS (
    SELECT 
        start_time, stop_time, user_vehicle_id 
    FROM 
        track.track 
    WHERE 
        id = 1350000744800)

SELECT * 
FROM 
    sensor.location as l, single_track as t
WHERE 
    l.time >= t.start_time AND 
    l.time <= t.stop_time  AND
    l.user_vehicle_id = t.user_vehicle_id

我希望查询计划器仅查看与location到start_time的给定时间范围相匹配的stop_time分区。

相反，它在所有分区上执行位图堆/索引扫描：

Nested Loop  (cost=8.59..9308018.00 rows=722021 width=106) (actual time=1.796..2.296 rows=1025 loops=1)
  CTE single_track
    ->  Index Scan using track_pkey on track  (cost=0.42..8.44 rows=1 width=24) (actual time=0.023..0.024 rows=1 loops=1)
          Index Cond: (id = '1350000744800'::bigint)
  ->  CTE Scan on single_track t  (cost=0.00..0.02 rows=1 width=24) (actual time=0.027..0.029 rows=1 loops=1)
  ->  Append  (cost=0.15..9286171.84 rows=2183770 width=82) (actual time=1.750..1.998 rows=1025 loops=1)
        ->  Index Scan using location_p2011_01_pkey on location_p2011_01 l  (cost=0.15..8.83 rows=1 width=136) (never executed)
              Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
        ->  Seq Scan on location_p2011_02 l_1  (cost=0.00..7.71 rows=1 width=82) (never executed)
              Filter: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (t.user_vehicle_id = user_vehicle_id))
        ->  Bitmap Heap Scan on location_p2011_03 l_2  (cost=643.94..3370.03 rows=2087 width=114) (never executed)
              Recheck Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))

        ...

        ->  Index Scan using location_p2020_10_pkey on location_p2020_10 l_117  (cost=0.15..8.83 rows=1 width=136) (never executed)
              Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
        ->  Index Scan using location_p2020_11_pkey on location_p2020_11 l_118  (cost=0.15..8.83 rows=1 width=136) (never executed)
              Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
        ->  Index Scan using location_p2020_12_pkey on location_p2020_12 l_119  (cost=0.15..8.83 rows=1 width=136) (never executed)
              Index Cond: (("time" >= t.start_time) AND ("time" <= t.stop_time) AND (user_vehicle_id = t.user_vehicle_id))
Planning Time: 11.046 ms
Execution Time: 4.144 ms

我在玩耍时发现，使用相同的查询但明确地传递时间：

EXPLAIN ANALYSE
WITH single_track AS (
    SELECT 
        start_time, 
        stop_time, 
        user_vehicle_id 
    FROM 
        track.track 
    WHERE 
        id = 1350000744800)

SELECT * 
FROM 
    sensor.location as l, single_track as t
WHERE 
    l.time >= '2016-04-12 18:04:59' AND 
    l.time <= '2016-04-12 18:22:49'  AND
    l.user_vehicle_id = t.user_vehicle_id

产生预期的行为：

Nested Loop  (cost=9.00..2111.73 rows=141 width=102) (actual time=0.085..2.408 rows=1025 loops=1)
  CTE single_track
    ->  Index Scan using track_pkey on track  (cost=0.42..8.44 rows=1 width=24) (actual time=0.017..0.018 rows=1 loops=1)
          Index Cond: (id = '1350000744800'::bigint)
  ->  CTE Scan on single_track t  (cost=0.00..0.02 rows=1 width=24) (actual time=0.021..0.022 rows=1 loops=1)
  ->  Append  (cost=0.56..2099.99 rows=328 width=78) (actual time=0.060..2.081 rows=1025 loops=1)
        ->  Index Scan using location_p2016_04_pkey on location_p2016_04 l  (cost=0.56..2098.35 rows=328 width=78) (actual time=0.058..1.994 rows=1025 loops=1)
              Index Cond: (("time" >= '2016-04-12 18:04:59'::timestamp without time zone) AND ("time" <= '2016-04-12 18:22:49'::timestamp without time zone) AND (user_vehicle_id = t.user_vehicle_id))
Planning Time: 4.709 ms
Execution Time: 2.494 ms

任何人都可以解释这种行为，并帮助我解决该问题吗？

Answer 1

对于PostgreSQL执行程序来说，这似乎太复杂了。

我建议尝试

$copychk= Copy-Item -Path "A" -destination "B" -verbose -Recurse

然后PostgreSQL至少知道只有一个值。

如果仍然无效，请将查询分为两部分：

首先，从SELECT * FROM sensor.location as l, single_track as t WHERE l.time >= (SELECT start_time FROM track.track WHERE id = 1350000744800) AND l.time <= (SELECT stop_time FROM track.track WHERE id = 1350000744800) AND l.user_vehicle_id = (SELECT user_vehicle_id FROM track.track WHERE id = 1350000744800)获取值。
然后使用结果构建查询并执行。

Answer 2

我也尝试过这样做，这可能与Laurenz Albe的建议很接近。由于EXPLAIN ANALYSE未显示psql函数的查询计划，因此我无法确认这会导致正确的行为。

CREATE OR REPLACE FUNCTION location_from_track_id(
    _track_id bigint)
    RETURNS SETOF sensor.location 
    LANGUAGE 'plpgsql'

AS 
$BODY$
DECLARE 
    _user_vehicle_id bigint;
    _start_time timestamp without time zone;
    _stop_time timestamp without time zone;

BEGIN

SELECT 
    user_vehicle_id, 
    start_time, 
    stop_time 
INTO 
    _user_vehicle_id,
    _start_time,
    _stop_time
FROM 
    track.track
WHERE id=_track_id;

RETURN QUERY
SELECT *
FROM sensor.location 
WHERE
    time BETWEEN _start_time AND _stop_time AND 
    user_vehicle_id = _user_vehicle_id;

END;
$BODY$;

Postgres查询计划程序如何在分区表上工作？

2 个答案: