我有实时公交车位置以及公交车站的时间和位置的数据。它们分为两个表:
vehicle_position_trips (大约 3 千万条记录,包括经纬度坐标的实时位置数据)
stop_time_locations (大约 1 4百万条记录,公交车站的位置,包括经纬度坐标和公交车站时间)
我正在尝试查找特定公交路线距公交车站15m以内的所有车辆位置,并且该查询未使用索引并且运行速度很慢
我正在使用ST_Dwithin和以下查询:
SELECT T.trip_id, T.ROUTEMAJOR
FROM vehicle_position_trips as T
INNER JOIN stop_time_locations as X
ON ST_DWithin(T.geog, X.geog, 25)
WHERE T.trip_id = X.trip_id AND T.trip_id IS NOT NULL AND T.ROUTEMAJOR = 4
使用每个表格的点坐标,我为每个表格创建了一个地理列以及一个空间索引
ALTER TABLE vehicle_position_trips ADD COLUMN geog geography(POINT,4326);
UPDATE vehicle_position_trips SET geog = ST_GeogFromText('SRID=4326;POINT(' || longitude || ' ' || latitude || ')');
CREATE INDEX vehicle_position_geog_idx ON vehicle_position_trips USING gist(geog);
ALTER TABLE stop_time_locations ADD COLUMN geog geography(POINT,4326);
UPDATE stop_time_locations SET geog = ST_GeogFromText('SRID=4326;POINT(' || stop_lon || ' ' || stop_lat || ')');
CREATE INDEX stop_times_geog_idx ON stop_time_locations USING gist(geog);
我还为两个表的trip_id
创建了一个索引:
CREATE INDEX positions_trip_id_idx ON vehicle_position_trips(trip_id);
CREATE INDEX stops_trip_id_idx ON stop_time_locations(trip_id);
这是我从EXPLAIN ANALYZE得到的结果:
- Gather (cost=125964.43..445659.67 rows=1 width=12) (actual time=1978.543..2610.686 rows=5859 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=124964.43..444659.57 rows=1 width=12) (actual time=1971.782..2506.291 rows=1953 loops=3)
Hash Cond: (t.trip_id = x.trip_id)
Join Filter: ((t.geog && _st_expand(x.geog, '15'::double precision)) AND (x.geog && _st_expand(t.geog, '15'::double precision)) AND _st_dwithin(t.geog, x.geog, '15'::double precision, true))
Rows Removed by Join Filter: 670155
-> Parallel Seq Scan on vehicle_position_trips t (cost=0.00..257701.18 rows=6961 width=40) (actual time=205.792..880.096 rows=14423 loops=3)
Filter: ((trip_id IS NOT NULL) AND (routemajor = 10))
Rows Removed by Filter: 3360183
-> Parallel Hash (cost=93174.97..93174.97 rows=1564997 width=40) (actual time=1012.117..1012.117 rows=1249653 loops=3)
Buckets: 65536 Batches: 128 Memory Usage: 2816kB
.97 rows=1564997 width=40) (actual time=50.692..314.688 rows=1249653 loops=3)
Planning Time: 15.059 ms
Execution Time: 2611.505 ms
(15 rows)
我能做些什么使它运行得更快,为什么不使用任何索引呢?