Postgis ST_DWithin查询运行缓慢

时间:2019-11-11 17:46:08

标签: postgresql postgis

我有实时公交车位置以及公交车站的时间和位置的数据。它们分为两个表:

vehicle_position_trips (大约 3 千万条记录,包括经纬度坐标的实时位置数据)

stop_time_locations (大约 1 4百万条记录,公交车站的位置,包括经纬度坐标和公交车站时间)

我正在尝试查找特定公交路线距公交车站15m以内的所有车辆位置,并且该查询未使用索引并且运行速度很慢

我正在使用ST_Dwithin和以下查询:

SELECT T.trip_id, T.ROUTEMAJOR
FROM vehicle_position_trips as T
INNER JOIN stop_time_locations as X
ON ST_DWithin(T.geog, X.geog, 25)
WHERE T.trip_id = X.trip_id AND T.trip_id IS NOT NULL AND T.ROUTEMAJOR = 4

使用每个表格的点坐标,我为每个表格创建了一个地理列以及一个空间索引

ALTER TABLE vehicle_position_trips ADD COLUMN geog geography(POINT,4326);
UPDATE vehicle_position_trips SET geog = ST_GeogFromText('SRID=4326;POINT(' || longitude || ' ' || latitude || ')');
CREATE INDEX vehicle_position_geog_idx ON vehicle_position_trips USING gist(geog);
ALTER TABLE stop_time_locations ADD COLUMN geog geography(POINT,4326);
UPDATE stop_time_locations SET geog = ST_GeogFromText('SRID=4326;POINT(' || stop_lon || ' ' || stop_lat || ')');
CREATE INDEX stop_times_geog_idx ON stop_time_locations USING gist(geog);

我还为两个表的trip_id创建了一个索引:

CREATE INDEX positions_trip_id_idx ON vehicle_position_trips(trip_id);
CREATE INDEX stops_trip_id_idx ON stop_time_locations(trip_id);

这是我从EXPLAIN ANALYZE得到的结果:

  - Gather  (cost=125964.43..445659.67 rows=1 width=12) (actual time=1978.543..2610.686 rows=5859 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Hash Join  (cost=124964.43..444659.57 rows=1 width=12) (actual time=1971.782..2506.291 rows=1953 loops=3)
        Hash Cond: (t.trip_id = x.trip_id)
        Join Filter: ((t.geog && _st_expand(x.geog, '15'::double precision)) AND (x.geog && _st_expand(t.geog, '15'::double precision)) AND _st_dwithin(t.geog, x.geog, '15'::double precision, true))
        Rows Removed by Join Filter: 670155
        ->  Parallel Seq Scan on vehicle_position_trips t  (cost=0.00..257701.18 rows=6961 width=40) (actual time=205.792..880.096 rows=14423 loops=3)
              Filter: ((trip_id IS NOT NULL) AND (routemajor = 10))
              Rows Removed by Filter: 3360183
        ->  Parallel Hash  (cost=93174.97..93174.97 rows=1564997 width=40) (actual time=1012.117..1012.117 rows=1249653 loops=3)
              Buckets: 65536  Batches: 128  Memory Usage: 2816kB
.97 rows=1564997 width=40) (actual time=50.692..314.688 rows=1249653 loops=3)
Planning Time: 15.059 ms
Execution Time: 2611.505 ms
(15 rows)

我能做些什么使它运行得更快,为什么不使用任何索引呢?

0 个答案:

没有答案