我的目标是将GTFS停止和行程信息转换为一个图形,其中顶点是停靠点(来自GTFS' stops.txt),边缘是行程(来自GTFS' stop_times.txt)。第一步很明显:
> library(igraph)
#Reading in GTFS files
> stops<-read.csv("stops.txt")
> stop_times<-read.csv("stop_times.txt")
我的第一直觉只是使用来自iGraph的graph_from_data_frame
函数,但是有一个严重的缺点:stop_times DF并没有真正构建到所需的方案中。它的计划如下:
>head(stop_times)
trip_id stop_id arrival_time departure_time stop_sequence shape_dist_traveled
1 A895151 F04272 06:20:00 06:20:00 10 0
2 A895151 F04184 06:22:00 06:22:00 20 648
3 A895151 F04319 06:24:00 06:24:00 30 1224
4 A895151 F04369 06:27:00 06:27:00 40 2779
5 A895151 008264 06:31:00 06:31:00 50 5620
6 A895151 F01520 06:33:00 06:33:00 60 6691
这意味着它包含stop_ids,其中包含相应停靠点的到达和离开时间,而我希望每行获得start_stop_id,end_stop_id,start_time,end_time(实际上,不是&#34;停止&#34;但是&#34;过渡&#34;从停止转换。但是这种转换似乎具有挑战性,因为我应该在stop_times中迭代行并决定它们是否在同一个trip_id中,如果是,则计算起始端数据,如果不是这样,插入NULL或找到另一个解决方案来分开行程......这对我来说非常混乱。
有没有优雅的方法将所有这两个数据框组合成所需的图形?
答案 0 :(得分:2)
&#39;来自&#39;和&#39;到&#39;可以通过“转移”来生成来自下一行的值,直到当前的&#39;行。停止信息可以简单地加入
让我用一个例子来解释,并使用library(data.table)
## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")
#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops
#setDT(dt_stop_times)
#setDT(dt_stops)
## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]
## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)
## create a new column by shifting the stop_id of the following row up
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]
## you will have NAs at this point because the last stop doesn't go anywhere.
## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"),
arrival_time_stop_to = shift(arrival_time, type = "lead"),
departure_time_stop_to = shift(departure_time, type = "lead")),
by = .(trip_id)]
## now you have your 'from' and 'to' columns from which you can make your igraph
## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]
# trip_id stop_id stop_name_from arrival_time stop_id_to
# 1: 1.T0.3-86-A-mjp-1.7.R 4174 71-RMIT/Plenty Rd (Bundoora) 25:42:00 4485
# 2: 1.T0.3-86-A-mjp-1.7.R 4485 70-Janefield Dr/Plenty Rd (Bundoora) 25:43:00 4486
# 3: 1.T0.3-86-A-mjp-1.7.R 4486 69-Taunton Dr/Plenty Rd (Bundoora) 25:44:00 4487
# 4: 1.T0.3-86-A-mjp-1.7.R 4487 68-Greenhills Rd/Plenty Rd (Bundoora) 25:45:00 4488
# 5: 1.T0.3-86-A-mjp-1.7.R 4488 67-Bundoora Square SC/Plenty Rd (Bundoora) 25:46:00 4489
# ---
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H 17871 7-Queen Victoria Market/Elizabeth St (Melbourne City) 23:25:00 17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H 17873 5-Melbourne Central Station/Elizabeth St (Melbourne City) 23:27:00 17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H 17875 3-Bourke Street Mall/Elizabeth St (Melbourne City) 23:30:00 17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H 17876 2-Collins St/Elizabeth St (Melbourne City) 23:31:00 17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H 17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City) 23:32:00 NA
# arrival_time_stop_to
# 1: 25:43:00
# 2: 25:44:00
# 3: 25:45:00
# 4: 25:46:00
# 5: 25:47:00
# ---
# 9415793: 23:27:00
# 9415794: 23:30:00
# 9415795: 23:31:00
# 9415796: 23:32:00
# 9415797: NA
现在,要使用graph_from_data_frame{igraph}
,您只需:
# get a df with nodes
nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]
# links beetween stops
links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]
# create graph
g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)
请注意,在GTFS.zip
文件中,您可能有多种传输模式(火车,公共汽车,地铁等),并且由于服务频率的变化,某些停靠点的连接速度要高于其他停靠点。我还不清楚在从GTFS.zip
构建图表时应该如何考虑这两点。可能前进的方向是根据每个边缘的频率对每个边缘进行加权,并构建一个多层网络,在每个传输模式中将一些停靠点视为相互依赖的层。