将新字段与旧字段和附加列表进行比较

时间:2015-02-22 20:02:01

标签: python-3.x

我试图读取一个GTFS文件,并将字段的值与下一行的相同值进行比较。它应该逐行读取,当当前行的trip_id与last的id相同时,将stop_id的值附加到列表中。当stop_sequence等于1时,代码应跳转到下一行。结果是要在图形应用程序中分析的边缘列表(使用图论)。

文件内容示例:

"trip_id","arrival_time","departure_time","stop_id","stop_sequence"
"1156-10-0","07:00:00","07:00:00",940003729,1
"1156-10-0","07:01:30","07:01:30",940003730,2
"1156-10-0","07:03:00","07:03:00",940003731,3
"1156-10-1","07:04:30","07:04:30",940003767,1
"1156-10-1","07:06:00","07:06:00",940003886,2
"1156-10-1","07:07:30","07:07:30",940004427,3

结果应该是:

940003729, 940003730
940003730, 940003731
-- jump to next trip_id --
940003767, 940003886
940003886, 940004427

我的部分代码:

def read_file():
    path = "file directory"
    data = open(path, "r")
    result = data.readline()
    search_comma = result.split(',')
    trip_id = search_comma[0]
    stop_id = search_comma[3]
    stop_sequence = search_comma[4]
    data.close()
    return trip_id, int(stop_id), int(stop_sequence)


old_trip, old_stop, old_sequence = read_file()


edge_list = []
for line in read_file():
    new_trip, new_stop, new_sequence = read_file()
    if old_trip == new_trip and new_sequence != 1:
        edge_list.append()
    next(read_file())

print(edge_list)

2 个答案:

答案 0 :(得分:0)

我会使用csv模块来读取文件,然后使用itertools.groupby()对旅行进行分组。

这样的事情可以解决问题:

import csv
import itertools
from operator import itemgetter

with open('/path/to/file') as f:
    reader = csv.DictReader(f)
    # group the rows by their trip_id
    for key, group in itertools.groupby(reader, key=itemgetter('trip_id')):
        print 'trip_id:', key
        stop_ids = [row['stop_id'] for row in group]
        # process the stop_ids in pairs
        for start, end in zip(stop_ids, stop_ids[1:]):
            print start, end

样本数据的输出是:

trip_id: 1156-10-0
940003729 940003730
940003730 940003731
trip_id: 1156-10-1
940003767 940003886
940003886 940004427

我相信您将能够根据此示例构建边缘列表。

答案 1 :(得分:0)

我用以下方式解决了这个问题:

def gtfs_to_edge_list():
    rsource = "file.txt"
    wsource = "file2.txt"
    with open(rsource, "r") as data, open(wsource, "w") as target:
        # create two equal lists containing file' lines
        file1 = file2 = [line.strip() for line in data]

        # loop reading the two lists created, where the second list is read from the second line.
        for line1, line2 in zip(file1, file2[1:]):

            # select the first column from line 1 and 2 (position 0).
            trip_old = line1.split(',')[0]
            trip_new = line2.split(',')[0]

            # select the fourth column from line 1 and 2 (position 3).
            stop_old = line1.split(',')[3]
            stop_new = line2.split(',')[3]

            # Compare if trip_id of line 1 is equal to trip_id of line 2.
            if trip_old == trip_new:

                # if true, write stop_id from line 1 and 2 to target file. Trip_id 
                target.writelines([stop_old + ',', stop_new + '\n'])
                continue
        data.close()
        target.close()