我有一个PostgreSQL数据库(使用PostGIS)和GTFS数据(https://developers.google.com/transit/gtfs/reference)来自几个运输机构。我已根据邻近度确定了所有可能的转移位置,并使用这些数据填充了表格。我现在想要查找该地区各点之间的旅行时间,最多允许2次转移。我创建了一个视图,它连接我的所有表格,使我的查询更容易阅读旅行时间。以下是我的观点:
CREATE OR REPLACE VIEW trip_planning_data_view AS
SELECT b.agency_id, i.agency_name, h.route_id, h.route_long_name, h.route_short_name, h.route_type,
e.trip_headsign, e.direction_id, a.stop_id AS stop_id_a, c.stop_name AS origin_stop_name, a.arrival_time AS origin_arrival_time,
b.stop_id AS stop_id_b, d.stop_name AS destination_stop_name, b.arrival_time AS destination_arrival_time,
b.arrival_time - a.arrival_time AS travel_time,
g.agency_id_b AS transfer_agency_id, g.stop_id_b AS transfer_stop_id, g.distance_meters AS transfer_distance_meters,
(round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval AS transfer_time,
b.arrival_time + ((round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval) AS transfer_arrival_time
FROM stop_time a
JOIN stop_time b ON a.agency_id = b.agency_id AND a.trip_id = b.trip_id AND a.stop_id <> b.stop_id AND a.stop_sequence < b.stop_sequence AND a.arrival_time < b.arrival_time
JOIN stop c ON a.agency_id = c.agency_id AND a.stop_id = c.stop_id
JOIN stop d ON b.agency_id = d.agency_id AND b.stop_id = d.stop_id
JOIN trip e ON a.agency_id = e.agency_id AND a.trip_id = e.trip_id
JOIN calendar f ON e.agency_id = f.agency_id AND e.service_id = f.service_id
LEFT JOIN stop_transfers g ON b.agency_id = g.agency_id_a AND b.stop_id = g.stop_id_a
JOIN route h ON e.agency_id = h.agency_id AND e.route_id = h.route_id
JOIN agency i ON h.agency_id = i.agency_id
WHERE f.monday = true
ORDER BY a.stop_id, b.arrival_time - a.arrival_time;
(我只对周一旅行感兴趣,我不知道为什么,但视图中的ORDER BY子句带来了巨大的性能提升。)
这些表符合GTFS文件结构,并添加了stop_transfers表,其中包含可以进行传输的代理和停止ID以及停靠点之间的距离。
使用1次转移查询此视图非常快(通常不到1秒),但是对2次转移的查询需要非常长的时间(几分钟)。以下是2次转移行程查询的示例:
select *
from trip_planning_data_view t0
join trip_planning_data_view t1 on t0.transfer_agency_id = t1.agency_id and t0.transfer_stop_id = t1.stop_id_a
join trip_planning_data_view t2 on t1.transfer_agency_id = t2.agency_id and t1.transfer_stop_id = t2.stop_id_a
where t0.agency_id = '1A'
and t0.stop_id_a = 's101'
and t0.origin_arrival_time between ('08:00:00'::interval) and ('08:00:00'::interval + '30 minutes'::interval )
and t1.origin_arrival_time between (t0.origin_arrival_time + t0.travel_time + t0.transfer_time) and (t0.origin_arrival_time + '30 minutes'::interval + t0.travel_time + t0.transfer_time)
and t2.agency_id = '1A'
and t2.stop_id_b = 's247'
and t2.origin_arrival_time between (t1.origin_arrival_time + t1.travel_time + t1.transfer_time) and (t1.origin_arrival_time + '30 minutes'::interval + t1.travel_time + t1.transfer_time)
以下是查询计划:
Nested Loop (cost=168984.47..203333.30 rows=1 width=651)
-> Nested Loop (cost=168984.47..203324.90 rows=1 width=686)
Join Filter: (((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
-> Nested Loop (cost=0.00..117.22 rows=1 width=252)
Join Filter: ((a.agency_id)::text = (h.agency_id)::text)
-> Nested Loop (cost=0.00..108.94 rows=1 width=216)
-> Nested Loop (cost=0.00..100.65 rows=1 width=220)
Join Filter: ((a.agency_id)::text = (e.agency_id)::text)
-> Nested Loop (cost=0.00..91.92 rows=1 width=198)
-> Nested Loop (cost=0.00..83.50 rows=1 width=161)
Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
-> Nested Loop Left Join (cost=0.00..42.66 rows=1 width=112)
-> Nested Loop (cost=0.00..34.29 rows=1 width=90)
-> Index Scan using st_a_s_idx on stop_time b (cost=0.00..25.88 rows=1 width=53)
Index Cond: (((agency_id)::text = '1A'::text) AND ((stop_id)::text = 's247'::text))
-> Index Scan using a_stop_idx on stop d (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
-> Index Scan using stop_transfers_as_a_idx on stop_transfers g (cost=0.00..8.35 rows=1 width=36)
Index Cond: (((b.agency_id)::text = (agency_id_a)::text) AND ((b.stop_id)::text = (stop_id_a)::text))
-> Index Scan using st_a_t_idx on stop_time a (cost=0.00..40.78 rows=3 width=53)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
-> Index Scan using a_stop_idx on stop c (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
-> Index Scan using trip_id_idx on trip e (cost=0.00..8.71 rows=1 width=80)
Index Cond: ((trip_id)::text = (a.trip_id)::text)
-> Index Scan using a_s_idx on calendar f (cost=0.00..8.28 rows=1 width=20)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
Filter: monday
-> Index Scan using route_id_idx on route h (cost=0.00..8.27 rows=1 width=41)
Index Cond: ((route_id)::text = (e.route_id)::text)
-> Nested Loop (cost=168984.47..203207.60 rows=1 width=434)
-> Nested Loop (cost=168984.47..203199.32 rows=1 width=477)
-> Nested Loop (cost=168984.47..203191.04 rows=1 width=520)
Join Filter: (((g.agency_id_b)::text = (b.agency_id)::text) AND ((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
-> Nested Loop (cost=168933.50..178461.70 rows=1 width=260)
-> Nested Loop (cost=168933.50..178453.41 rows=1 width=264)
-> Nested Loop (cost=168933.50..178444.99 rows=1 width=227)
Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
-> Nested Loop (cost=168933.50..178404.16 rows=1 width=236)
Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
-> Nested Loop (cost=168933.50..178387.59 rows=2 width=200)
Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
-> Nested Loop (cost=168933.50..177724.50 rows=76 width=120)
-> Merge Join (cost=168933.50..170942.05 rows=869 width=89)
Merge Cond: (((b.agency_id)::text = (g.agency_id_a)::text) AND ((b.stop_id)::text = (g.stop_id_a)::text))
-> Sort (cost=144224.83..144325.07 rows=40096 width=53)
Sort Key: b.agency_id, b.stop_id
-> Bitmap Heap Scan on stop_time b (cost=1068.60..141159.25 rows=40096 width=53)
Recheck Cond: ((agency_id)::text = '1A'::text)
-> Bitmap Index Scan on st_a_s_idx (cost=0.00..1058.58 rows=40096 width=0)
Index Cond: ((agency_id)::text = '1A'::text)
-> Sort (cost=24708.45..25274.92 rows=226587 width=36)
Sort Key: g.agency_id_a, g.stop_id_a
-> Seq Scan on stop_transfers g (cost=0.00..4553.87 rows=226587 width=36)
-> Index Scan using a_stop_idx on stop d (cost=0.00..7.79 rows=1 width=44)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
-> Index Scan using trip_id_idx on trip e (cost=0.00..8.71 rows=1 width=80)
Index Cond: ((trip_id)::text = (b.trip_id)::text)
-> Index Scan using route_id_idx on route h (cost=0.00..8.27 rows=1 width=41)
Index Cond: ((route_id)::text = (e.route_id)::text)
-> Index Scan using st_a_t_idx on stop_time a (cost=0.00..40.80 rows=1 width=53)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
Filter: ((arrival_time >= '08:11:00'::interval) AND (arrival_time <= '08:30:00'::interval) AND ((stop_id)::text = 's101'::text))
-> Index Scan using a_stop_idx on stop c (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
-> Index Scan using a_s_idx on calendar f (cost=0.00..8.28 rows=1 width=20)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
Filter: monday
-> Nested Loop (cost=50.97..24729.27 rows=1 width=260)
-> Nested Loop (cost=50.97..24720.85 rows=1 width=223)
Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
-> Nested Loop (cost=50.97..24680.01 rows=1 width=232)
-> Nested Loop (cost=50.97..24663.42 rows=2 width=236)
Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
-> Nested Loop (cost=50.97..24447.16 rows=29 width=200)
Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
-> Nested Loop (cost=50.97..15148.58 rows=1096 width=120)
-> Nested Loop (cost=50.97..12475.29 rows=59 width=80)
-> Bitmap Heap Scan on stop_transfers g (cost=50.97..2141.81 rows=1375 width=36)
Recheck Cond: ((agency_id_b)::text = '1A'::text)
-> Bitmap Index Scan on stop_transfers_as_b_idx (cost=0.00..50.63 rows=1375 width=0)
Index Cond: ((agency_id_b)::text = '1A'::text)
-> Index Scan using a_stop_idx on stop d (cost=0.00..7.50 rows=1 width=44)
Index Cond: (((agency_id)::text = (g.agency_id_a)::text) AND ((stop_id)::text = (g.stop_id_a)::text))
-> Index Scan using st_a_s_idx on stop_time b (cost=0.00..45.22 rows=6 width=53)
Index Cond: (((agency_id)::text = (d.agency_id)::text) AND ((stop_id)::text = (d.stop_id)::text))
-> Index Scan using trip_id_idx on trip e (cost=0.00..8.47 rows=1 width=80)
Index Cond: ((trip_id)::text = (b.trip_id)::text)
-> Index Scan using route_id_idx on route h (cost=0.00..7.44 rows=1 width=41)
Index Cond: ((route_id)::text = (e.route_id)::text)
-> Index Scan using a_s_idx on calendar f (cost=0.00..8.28 rows=1 width=20)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
Filter: monday
-> Index Scan using st_a_t_idx on stop_time a (cost=0.00..40.78 rows=3 width=53)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
-> Index Scan using a_stop_idx on stop c (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
-> Index Scan using agency_id_idx on agency i (cost=0.00..8.27 rows=1 width=31)
Index Cond: ((agency_id)::text = (a.agency_id)::text)
-> Index Scan using agency_id_idx on agency i (cost=0.00..8.27 rows=1 width=31)
Index Cond: ((agency_id)::text = (a.agency_id)::text)
-> Index Scan using agency_id_idx on agency i (cost=0.00..8.27 rows=1 width=31)
Index Cond: ((agency_id)::text = (a.agency_id)::text)
查询计划似乎正在使用索引。任何建议,以优化这个或更好的方法将不胜感激。提前谢谢。
答案 0 :(得分:0)
我认为使用像OpenTripPlanner(http://www.opentripplanner.org/)这样的工具可能会更好,这是一个与GTFS配合使用的开源传输路由引擎。它可用于快速有效地回答各种路由查询的问题,包括&#34;两站之间允许N次传输的最快时间&#34;。
或者,如果代理机构与Google共享数据(可能性很好 - http://www.google.com/landing/transit/cities/index.html),那么您可以使用Google Directions API(https://developers.google.com/maps/documentation/directions/)查询两个输入位置的传输路由。