使用python搜索包含第二个文件内容的一个文件的内容

时间:2015-02-01 22:10:08

标签: python

我有以下代码将输入file1第一列的项目与输入文件2的内容进行比较:

导入os

newfile2=[]
outfile=open("outFile.txt","w")
infile1=open("infile1.txt", "r")
infile2=open("infile2.txt","r")
for file1 in infile1:
    #print file1
    file1=str(file1).strip().split("\t")
    print file1[0]
    for file2 in infile2:
        if file2 == file1[0]:
            outfile.write(file2.replace(file2,file1[1]))
        else:
            outfile.write(file2)

输入文件1:

Modex_xxR_SL1344_3920   Modex_sseE_SL1344_3920
Modex_seA_hemN  Modex_polA_SGR222_3950
Modex_GF2333_3962_SL1344_3966   Modex_ertd_wedS

输入文件2:

Sardes_xxR_SL1344_4567  
Modex_seA_hemN
MOdex_uui_gytI

由于输入文件1项(第1列,第2行)与输入文件2(第2行)中的项匹配,因此输入文件1中的第2列项替换输出文件中的输入文件2项,如下所示(所需的输出):

Sardes_xxR_SL1344_4567  
Modex_polA_SGR222_3950
MOdex_uui_gytI

到目前为止,我的代码只输出输入文件1中的项目。有人可以帮助修改此代码。感谢

1 个答案:

答案 0 :(得分:2)

看起来你有一个tsv文件,所以让我们继续这样对待它。我们将构建一个tsv阅读器csv.reader(fileobj, delimiter="\t"),它将遍历infile1并从中构建翻译词典。字典将包含第一列的键和每行第二列的值。

然后使用dict.get我们可以翻译来自infile2的行(如果它存在于我们的翻译词典中),或者只是在没有可用翻译的情况下写行本身。

import csv

with open("infile1.txt", 'r') as infile1,\
     open('infile2.txt', 'r') as infile2,\
     open('outfile.txt', 'w') as outfile:
    trans_dict = dict(csv.reader(infile1, delimiter="\t"))

    for line in infile2:
        outfile.write(trans_dict.get(line.strip(),line.strip()) + "\n")

结果:

# contents of outfile.txt
Sardes_xxR_SL1344_4567
Modex_polA_SGR222_3950
MOdex_uui_gytI

编辑根据您的评论:

import csv

    with open("infile1.txt", 'r') as infile1:
        # build our translation dict
        trans_dict = dict(csv.reader(infile1, delimiter="\t"))

    with open("infile2.txt", 'r') as infile2,\
         open("outfile.txt", 'w') as outfile:
        # open the file to translate and our output file
        reader = csv.reader(infile2, delimiter="\t")
        # treat our file to translate like a tsv file instead of flat text
        for line in reader:
            outfile.write("\t".join([trans_dict.get(col, col) for col in line] + "\n"))
            # map each column from trans_dict, writing the whole row
            # back re-tab-delimited with a trailing newline