迭代两个文件,比较行中的匹配字符串,合并匹配的行

时间:2016-01-23 04:22:57

标签: python regex python-3.x

我有两个有生物列表的文件。第一个文件包含一个表示“Family Genus”的列表,因此有两列。第二个文件包含'Genus species',也有两列。两个文件都符合所有列出物种的属。我想使用每个文件的Genus合并两个列表,以便能够将姓氏添加到'Genus species'。因此,输出应包含'家族属物种'。由于每个名称之间都有一个空格,我使用该空格分割成列。到目前为止,这是我的代码:

with open('FAMILY_GENUS.TXT') as f1, open('GENUS_SPECIES.TXT') as f2:
    for line1 in f1:
        line1 = line1.strip()
        c1 = line1.split(' ')
        print(line1, end=' ')
        for line2 in f2:
            line2 = line2.strip()
            c2 = line2.split(' ')
            if line1[1] == line2[0]:
                print(line2[1], end=' ')
        print()

结果输出仅由两行组成,而不是整个记录。我错过了什么?

另外,如何将其保存到文件而不是仅仅在屏幕上打印?

2 个答案:

答案 0 :(得分:3)

这是另一种解决方案。

f1 = open('fg','r')
f2 = open('gs','r')
genera= {}
for i in f1.readlines():
    family,genus = i.strip().split(" ")
    genera[genus] = family

for i in f2.readlines():
    genus,species = i.strip().split(" ")
    print(genera[genus], genus,species)

答案 1 :(得分:0)

我会先处理这些文件,然后获取属于家族和它可能包含的多个物种的映射。然后使用该映射将它们匹配并打印出来。

genuses = {}

# Map all genuses to a family
with open('FAMILY_GENUS.TXT') as f1:
    for line in f1:
        family, genus = line.strip().split()
        genuses.setdefault(genus, {})['family'] = family

# Map all species to a genus
with open('GENUS_SPECIES.TXT') as f2:
    for line in f2:
        genus, species = line.strip().split()
        genuses.setdefault(genus, {}).setdefault('species', []).append(species)

# Go through each genus and create a specie string for
# each specie it contains.
species_strings = []
for genus, d in genuses.items():
    family = d.get('family')
    species = d.get('species')
    if family and species:
        for specie in species:
            s = '{0} {1} {2}'.format(family, genus, specie)
            species_strings.append(s)

# Sort the strings to make the output pretty and print them out.
species_strings.sort()
for s in species_strings:
    print s