如何使用PostgreSQL中的GTFS数据快速查找中转站之间的传输时间和传输时间

时间:2014-04-02 19:10:15

标签: sql postgresql query-optimization gtfs

我有一个PostgreSQL数据库(使用PostGIS)和GTFS数据(https://developers.google.com/transit/gtfs/reference)来自几个运输机构。我已根据邻近度确定了所有可能的转移位置,并使用这些数据填充了表格。我现在想要查找该地区各点之间的旅行时间,最多允许2次转移。我创建了一个视图,它连接我的所有表格,使我的查询更容易阅读旅行时间。以下是我的观点:

CREATE OR REPLACE VIEW trip_planning_data_view AS 
 SELECT b.agency_id, i.agency_name, h.route_id, h.route_long_name, h.route_short_name, h.route_type, 
    e.trip_headsign, e.direction_id, a.stop_id AS stop_id_a, c.stop_name AS origin_stop_name, a.arrival_time AS origin_arrival_time, 
    b.stop_id AS stop_id_b, d.stop_name AS destination_stop_name, b.arrival_time AS destination_arrival_time, 
    b.arrival_time - a.arrival_time AS travel_time, 
    g.agency_id_b AS transfer_agency_id, g.stop_id_b AS transfer_stop_id, g.distance_meters AS transfer_distance_meters, 
    (round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval AS transfer_time, 
    b.arrival_time + ((round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval) AS transfer_arrival_time
   FROM stop_time a
   JOIN stop_time b ON a.agency_id = b.agency_id AND a.trip_id = b.trip_id AND a.stop_id <> b.stop_id AND a.stop_sequence < b.stop_sequence AND a.arrival_time < b.arrival_time
   JOIN stop c ON a.agency_id = c.agency_id AND a.stop_id = c.stop_id
   JOIN stop d ON b.agency_id = d.agency_id AND b.stop_id = d.stop_id
   JOIN trip e ON a.agency_id = e.agency_id AND a.trip_id = e.trip_id
   JOIN calendar f ON e.agency_id = f.agency_id AND e.service_id = f.service_id
   LEFT JOIN stop_transfers g ON b.agency_id = g.agency_id_a AND b.stop_id = g.stop_id_a
   JOIN route h ON e.agency_id = h.agency_id AND e.route_id = h.route_id
   JOIN agency i ON h.agency_id = i.agency_id
  WHERE f.monday = true
  ORDER BY a.stop_id, b.arrival_time - a.arrival_time;

(我只对周一旅行感兴趣,我不知道为什么,但视图中的ORDER BY子句带来了巨大的性能提升。)

这些表符合GTFS文件结构,并添加了stop_transfers表,其中包含可以进行传输的代理和停止ID以及停靠点之间的距离。

使用1次转移查询此视图非常快(通常不到1秒),但是对2次转移的查询需要非常长的时间(几分钟)。以下是2次转移行程查询的示例:

select *
from trip_planning_data_view t0 
join trip_planning_data_view t1 on t0.transfer_agency_id = t1.agency_id and t0.transfer_stop_id = t1.stop_id_a 
join trip_planning_data_view t2 on t1.transfer_agency_id = t2.agency_id and t1.transfer_stop_id = t2.stop_id_a 
where t0.agency_id = '1A' 
and t0.stop_id_a = 's101' 
and t0.origin_arrival_time between ('08:00:00'::interval) and ('08:00:00'::interval + '30 minutes'::interval )
and t1.origin_arrival_time between (t0.origin_arrival_time + t0.travel_time + t0.transfer_time) and (t0.origin_arrival_time + '30 minutes'::interval + t0.travel_time + t0.transfer_time) 
and t2.agency_id = '1A' 
and t2.stop_id_b = 's247' 
and t2.origin_arrival_time between (t1.origin_arrival_time + t1.travel_time + t1.transfer_time) and (t1.origin_arrival_time + '30 minutes'::interval + t1.travel_time + t1.transfer_time) 

以下是查询计划:

Nested Loop  (cost=168984.47..203333.30 rows=1 width=651)
  ->  Nested Loop  (cost=168984.47..203324.90 rows=1 width=686)
        Join Filter: (((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
        ->  Nested Loop  (cost=0.00..117.22 rows=1 width=252)
              Join Filter: ((a.agency_id)::text = (h.agency_id)::text)
              ->  Nested Loop  (cost=0.00..108.94 rows=1 width=216)
                    ->  Nested Loop  (cost=0.00..100.65 rows=1 width=220)
                          Join Filter: ((a.agency_id)::text = (e.agency_id)::text)
                          ->  Nested Loop  (cost=0.00..91.92 rows=1 width=198)
                                ->  Nested Loop  (cost=0.00..83.50 rows=1 width=161)
                                      Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                      ->  Nested Loop Left Join  (cost=0.00..42.66 rows=1 width=112)
                                            ->  Nested Loop  (cost=0.00..34.29 rows=1 width=90)
                                                  ->  Index Scan using st_a_s_idx on stop_time b  (cost=0.00..25.88 rows=1 width=53)
                                                        Index Cond: (((agency_id)::text = '1A'::text) AND ((stop_id)::text = 's247'::text))
                                                  ->  Index Scan using a_stop_idx on stop d  (cost=0.00..8.40 rows=1 width=44)
                                                        Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
                                            ->  Index Scan using stop_transfers_as_a_idx on stop_transfers g  (cost=0.00..8.35 rows=1 width=36)
                                                  Index Cond: (((b.agency_id)::text = (agency_id_a)::text) AND ((b.stop_id)::text = (stop_id_a)::text))
                                      ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.78 rows=3 width=53)
                                            Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                          ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.71 rows=1 width=80)
                                Index Cond: ((trip_id)::text = (a.trip_id)::text)
                    ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                          Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                          Filter: monday
              ->  Index Scan using route_id_idx on route h  (cost=0.00..8.27 rows=1 width=41)
                    Index Cond: ((route_id)::text = (e.route_id)::text)
        ->  Nested Loop  (cost=168984.47..203207.60 rows=1 width=434)
              ->  Nested Loop  (cost=168984.47..203199.32 rows=1 width=477)
                    ->  Nested Loop  (cost=168984.47..203191.04 rows=1 width=520)
                          Join Filter: (((g.agency_id_b)::text = (b.agency_id)::text) AND ((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
                          ->  Nested Loop  (cost=168933.50..178461.70 rows=1 width=260)
                                ->  Nested Loop  (cost=168933.50..178453.41 rows=1 width=264)
                                      ->  Nested Loop  (cost=168933.50..178444.99 rows=1 width=227)
                                            Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                            ->  Nested Loop  (cost=168933.50..178404.16 rows=1 width=236)
                                                  Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
                                                  ->  Nested Loop  (cost=168933.50..178387.59 rows=2 width=200)
                                                        Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
                                                        ->  Nested Loop  (cost=168933.50..177724.50 rows=76 width=120)
                                                              ->  Merge Join  (cost=168933.50..170942.05 rows=869 width=89)
                                                                    Merge Cond: (((b.agency_id)::text = (g.agency_id_a)::text) AND ((b.stop_id)::text = (g.stop_id_a)::text))
                                                                    ->  Sort  (cost=144224.83..144325.07 rows=40096 width=53)
                                                                          Sort Key: b.agency_id, b.stop_id
                                                                          ->  Bitmap Heap Scan on stop_time b  (cost=1068.60..141159.25 rows=40096 width=53)
                                                                                Recheck Cond: ((agency_id)::text = '1A'::text)
                                                                                ->  Bitmap Index Scan on st_a_s_idx  (cost=0.00..1058.58 rows=40096 width=0)
                                                                                      Index Cond: ((agency_id)::text = '1A'::text)
                                                                    ->  Sort  (cost=24708.45..25274.92 rows=226587 width=36)
                                                                          Sort Key: g.agency_id_a, g.stop_id_a
                                                                          ->  Seq Scan on stop_transfers g  (cost=0.00..4553.87 rows=226587 width=36)
                                                              ->  Index Scan using a_stop_idx on stop d  (cost=0.00..7.79 rows=1 width=44)
                                                                    Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
                                                        ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.71 rows=1 width=80)
                                                              Index Cond: ((trip_id)::text = (b.trip_id)::text)
                                                  ->  Index Scan using route_id_idx on route h  (cost=0.00..8.27 rows=1 width=41)
                                                        Index Cond: ((route_id)::text = (e.route_id)::text)
                                            ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.80 rows=1 width=53)
                                                  Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                                  Filter: ((arrival_time >= '08:11:00'::interval) AND (arrival_time <= '08:30:00'::interval) AND ((stop_id)::text = 's101'::text))
                                      ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                            Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                                ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                                      Filter: monday
                          ->  Nested Loop  (cost=50.97..24729.27 rows=1 width=260)
                                ->  Nested Loop  (cost=50.97..24720.85 rows=1 width=223)
                                      Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                      ->  Nested Loop  (cost=50.97..24680.01 rows=1 width=232)
                                            ->  Nested Loop  (cost=50.97..24663.42 rows=2 width=236)
                                                  Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
                                                  ->  Nested Loop  (cost=50.97..24447.16 rows=29 width=200)
                                                        Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
                                                        ->  Nested Loop  (cost=50.97..15148.58 rows=1096 width=120)
                                                              ->  Nested Loop  (cost=50.97..12475.29 rows=59 width=80)
                                                                    ->  Bitmap Heap Scan on stop_transfers g  (cost=50.97..2141.81 rows=1375 width=36)
                                                                          Recheck Cond: ((agency_id_b)::text = '1A'::text)
                                                                          ->  Bitmap Index Scan on stop_transfers_as_b_idx  (cost=0.00..50.63 rows=1375 width=0)
                                                                                Index Cond: ((agency_id_b)::text = '1A'::text)
                                                                    ->  Index Scan using a_stop_idx on stop d  (cost=0.00..7.50 rows=1 width=44)
                                                                          Index Cond: (((agency_id)::text = (g.agency_id_a)::text) AND ((stop_id)::text = (g.stop_id_a)::text))
                                                              ->  Index Scan using st_a_s_idx on stop_time b  (cost=0.00..45.22 rows=6 width=53)
                                                                    Index Cond: (((agency_id)::text = (d.agency_id)::text) AND ((stop_id)::text = (d.stop_id)::text))
                                                        ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.47 rows=1 width=80)
                                                              Index Cond: ((trip_id)::text = (b.trip_id)::text)
                                                  ->  Index Scan using route_id_idx on route h  (cost=0.00..7.44 rows=1 width=41)
                                                        Index Cond: ((route_id)::text = (e.route_id)::text)
                                            ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                                                  Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                                                  Filter: monday
                                      ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.78 rows=3 width=53)
                                            Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                    ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
                          Index Cond: ((agency_id)::text = (a.agency_id)::text)
              ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
                    Index Cond: ((agency_id)::text = (a.agency_id)::text)
  ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
        Index Cond: ((agency_id)::text = (a.agency_id)::text)

查询计划似乎正在使用索引。任何建议,以优化这个或更好的方法将不胜感激。提前谢谢。

1 个答案:

答案 0 :(得分:0)

我认为使用像OpenTripPlanner(http://www.opentripplanner.org/)这样的工具可能会更好,这是一个与GTFS配合使用的开源传输路由引擎。它可用于快速有效地回答各种路由查询的问题,包括&#34;两站之间允许N次传输的最快时间&#34;。

或者,如果代理机构与Google共享数据(可能性很好 - http://www.google.com/landing/transit/cities/index.html),那么您可以使用Google Directions API(https://developers.google.com/maps/documentation/directions/)查询两个输入位置的传输路由。