我试图读取一个GTFS文件,并将字段的值与下一行的相同值进行比较。它应该逐行读取,当当前行的trip_id与last的id相同时,将stop_id的值附加到列表中。当stop_sequence等于1时,代码应跳转到下一行。结果是要在图形应用程序中分析的边缘列表(使用图论)。
文件内容示例:
"trip_id","arrival_time","departure_time","stop_id","stop_sequence"
"1156-10-0","07:00:00","07:00:00",940003729,1
"1156-10-0","07:01:30","07:01:30",940003730,2
"1156-10-0","07:03:00","07:03:00",940003731,3
"1156-10-1","07:04:30","07:04:30",940003767,1
"1156-10-1","07:06:00","07:06:00",940003886,2
"1156-10-1","07:07:30","07:07:30",940004427,3
结果应该是:
940003729, 940003730
940003730, 940003731
-- jump to next trip_id --
940003767, 940003886
940003886, 940004427
我的部分代码:
def read_file():
path = "file directory"
data = open(path, "r")
result = data.readline()
search_comma = result.split(',')
trip_id = search_comma[0]
stop_id = search_comma[3]
stop_sequence = search_comma[4]
data.close()
return trip_id, int(stop_id), int(stop_sequence)
old_trip, old_stop, old_sequence = read_file()
edge_list = []
for line in read_file():
new_trip, new_stop, new_sequence = read_file()
if old_trip == new_trip and new_sequence != 1:
edge_list.append()
next(read_file())
print(edge_list)
答案 0 :(得分:0)
我会使用csv模块来读取文件,然后使用itertools.groupby()对旅行进行分组。
这样的事情可以解决问题:
import csv
import itertools
from operator import itemgetter
with open('/path/to/file') as f:
reader = csv.DictReader(f)
# group the rows by their trip_id
for key, group in itertools.groupby(reader, key=itemgetter('trip_id')):
print 'trip_id:', key
stop_ids = [row['stop_id'] for row in group]
# process the stop_ids in pairs
for start, end in zip(stop_ids, stop_ids[1:]):
print start, end
样本数据的输出是:
trip_id: 1156-10-0
940003729 940003730
940003730 940003731
trip_id: 1156-10-1
940003767 940003886
940003886 940004427
我相信您将能够根据此示例构建边缘列表。
答案 1 :(得分:0)
我用以下方式解决了这个问题:
def gtfs_to_edge_list():
rsource = "file.txt"
wsource = "file2.txt"
with open(rsource, "r") as data, open(wsource, "w") as target:
# create two equal lists containing file' lines
file1 = file2 = [line.strip() for line in data]
# loop reading the two lists created, where the second list is read from the second line.
for line1, line2 in zip(file1, file2[1:]):
# select the first column from line 1 and 2 (position 0).
trip_old = line1.split(',')[0]
trip_new = line2.split(',')[0]
# select the fourth column from line 1 and 2 (position 3).
stop_old = line1.split(',')[3]
stop_new = line2.split(',')[3]
# Compare if trip_id of line 1 is equal to trip_id of line 2.
if trip_old == trip_new:
# if true, write stop_id from line 1 and 2 to target file. Trip_id
target.writelines([stop_old + ',', stop_new + '\n'])
continue
data.close()
target.close()