使用R从GTFS数据创建iGraph图

时间:2016-12-12 09:42:02

标签: r graph igraph gtfs

我的目标是将GTFS停止和行程信息转换为一个图形,其中顶点是停靠点(来自GTFS' stops.txt),边缘是行程(来自GTFS' stop_times.txt)。第一步很明显:

> library(igraph)

#Reading in GTFS files
> stops<-read.csv("stops.txt")
> stop_times<-read.csv("stop_times.txt")

我的第一直觉只是使用来自iGraph的graph_from_data_frame函数,但是有一个严重的缺点:stop_times DF并没有真正构建到所需的方案中。它的计划如下:

>head(stop_times)
  trip_id stop_id arrival_time departure_time stop_sequence shape_dist_traveled
1 A895151  F04272     06:20:00       06:20:00            10                   0
2 A895151  F04184     06:22:00       06:22:00            20                 648
3 A895151  F04319     06:24:00       06:24:00            30                1224
4 A895151  F04369     06:27:00       06:27:00            40                2779
5 A895151  008264     06:31:00       06:31:00            50                5620
6 A895151  F01520     06:33:00       06:33:00            60                6691

这意味着它包含stop_ids,其中包含相应停靠点的到达和离开时间,而我希望每行获得start_stop_id,end_stop_id,start_time,end_time(实际上,不是&#34;停止&#34;但是&#34;过渡&#34;从停止转换。但是这种转换似乎具有挑战性,因为我应该在stop_times中迭代行并决定它们是否在同一个trip_id中,如果是,则计算起始端数据,如果不是这样,插入NULL或找到另一个解决方案来分开行程......这对我来说非常混乱。

有没有优雅的方法将所有这两个数据框组合成所需的图形?

1 个答案:

答案 0 :(得分:2)

&#39;来自&#39;和&#39;到&#39;可以通过“转移”来生成来自下一行的值,直到当前的&#39;行。停止信息可以简单地加入

让我用一个例子来解释,并使用library(data.table)

## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")

#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops

#setDT(dt_stop_times)
#setDT(dt_stops)


## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]

## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)

## create a new column by shifting the stop_id of the following row up 
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]

## you will have NAs at this point because the last stop doesn't go anywhere.

## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"), 
                     arrival_time_stop_to = shift(arrival_time, type = "lead"),
                     departure_time_stop_to = shift(departure_time, type = "lead")),
              by = .(trip_id)]

## now you have your 'from' and 'to' columns from which you can make your igraph

## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]

#                           trip_id stop_id                                                  stop_name_from arrival_time stop_id_to
# 1:          1.T0.3-86-A-mjp-1.7.R    4174                                    71-RMIT/Plenty Rd (Bundoora)     25:42:00       4485
# 2:          1.T0.3-86-A-mjp-1.7.R    4485                            70-Janefield Dr/Plenty Rd (Bundoora)     25:43:00       4486
# 3:          1.T0.3-86-A-mjp-1.7.R    4486                              69-Taunton Dr/Plenty Rd (Bundoora)     25:44:00       4487
# 4:          1.T0.3-86-A-mjp-1.7.R    4487                           68-Greenhills Rd/Plenty Rd (Bundoora)     25:45:00       4488
# 5:          1.T0.3-86-A-mjp-1.7.R    4488                      67-Bundoora Square SC/Plenty Rd (Bundoora)     25:46:00       4489
# ---                                                                                                                         
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H   17871           7-Queen Victoria Market/Elizabeth St (Melbourne City)     23:25:00      17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H   17873       5-Melbourne Central Station/Elizabeth St (Melbourne City)     23:27:00      17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H   17875              3-Bourke Street Mall/Elizabeth St (Melbourne City)     23:30:00      17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H   17876                      2-Collins St/Elizabeth St (Melbourne City)     23:31:00      17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H   17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City)     23:32:00         NA
#          arrival_time_stop_to
# 1:                   25:43:00
# 2:                   25:44:00
# 3:                   25:45:00
# 4:                   25:46:00
# 5:                   25:47:00
# ---                     
# 9415793:             23:27:00
# 9415794:             23:30:00
# 9415795:             23:31:00
# 9415796:             23:32:00
# 9415797:                   NA

现在,要使用graph_from_data_frame{igraph},您只需:

# get a df with nodes
  nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]

# links beetween stops
  links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]

# create graph
  g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)

请注意,在GTFS.zip文件中,您可能有多种传输模式(火车,公共汽车,地铁等),并且由于服务频率的变化,某些停靠点的连接速度要高于其他停靠点。我还不清楚在从GTFS.zip构建图表时应该如何考虑这两点。可能前进的方向是根据每个边缘的频率对每个边缘进行加权,并构建一个多层网络,在每个传输模式中将一些停靠点视为相互依赖的层。