我有以下代码将输入file1第一列的项目与输入文件2的内容进行比较:
导入os
newfile2=[]
outfile=open("outFile.txt","w")
infile1=open("infile1.txt", "r")
infile2=open("infile2.txt","r")
for file1 in infile1:
#print file1
file1=str(file1).strip().split("\t")
print file1[0]
for file2 in infile2:
if file2 == file1[0]:
outfile.write(file2.replace(file2,file1[1]))
else:
outfile.write(file2)
输入文件1:
Modex_xxR_SL1344_3920 Modex_sseE_SL1344_3920
Modex_seA_hemN Modex_polA_SGR222_3950
Modex_GF2333_3962_SL1344_3966 Modex_ertd_wedS
输入文件2:
Sardes_xxR_SL1344_4567
Modex_seA_hemN
MOdex_uui_gytI
由于输入文件1项(第1列,第2行)与输入文件2(第2行)中的项匹配,因此输入文件1中的第2列项替换输出文件中的输入文件2项,如下所示(所需的输出):
Sardes_xxR_SL1344_4567
Modex_polA_SGR222_3950
MOdex_uui_gytI
到目前为止,我的代码只输出输入文件1中的项目。有人可以帮助修改此代码。感谢
答案 0 :(得分:2)
看起来你有一个tsv
文件,所以让我们继续这样对待它。我们将构建一个tsv阅读器csv.reader(fileobj, delimiter="\t")
,它将遍历infile1
并从中构建翻译词典。字典将包含第一列的键和每行第二列的值。
然后使用dict.get
我们可以翻译来自infile2
的行(如果它存在于我们的翻译词典中),或者只是在没有可用翻译的情况下写行本身。
import csv
with open("infile1.txt", 'r') as infile1,\
open('infile2.txt', 'r') as infile2,\
open('outfile.txt', 'w') as outfile:
trans_dict = dict(csv.reader(infile1, delimiter="\t"))
for line in infile2:
outfile.write(trans_dict.get(line.strip(),line.strip()) + "\n")
结果:
# contents of outfile.txt
Sardes_xxR_SL1344_4567
Modex_polA_SGR222_3950
MOdex_uui_gytI
编辑根据您的评论:
import csv
with open("infile1.txt", 'r') as infile1:
# build our translation dict
trans_dict = dict(csv.reader(infile1, delimiter="\t"))
with open("infile2.txt", 'r') as infile2,\
open("outfile.txt", 'w') as outfile:
# open the file to translate and our output file
reader = csv.reader(infile2, delimiter="\t")
# treat our file to translate like a tsv file instead of flat text
for line in reader:
outfile.write("\t".join([trans_dict.get(col, col) for col in line] + "\n"))
# map each column from trans_dict, writing the whole row
# back re-tab-delimited with a trailing newline