我有两个文件(每个索引用空格隔开):
file1.txt
OTU0001 Archaea
OTU0002 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon
OTU0003 Archaea;Altiarchaeales;uncultured euryarchaeote
OTU0004 Archaea;Bathyarchaeota;uncultured archaeon
OTU0005 Archaea;Diapherotrites;uncultured euryarchaeote
OTU0006 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured
OTU0007 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome
file2.txt
UniRef90_1 OTU0001 OTU0004 OTU0005 OTU0007
UniRef90_2 OTU0002 OTU0003 OTU0005
UniRef90_3 OTU0004 OTU0006 OTU0007
我想在第二个文件中,将OTUXXXX
替换为第一个文件中的值。而且我需要将Uniref90_X
保留在每一行的开头。第二个文件的第一行应该是这样的:
UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
目前,我已经为第二个文件创建了字典,其中
UniRef90_X
为键,OTUXXXX
为值。
f1=open("file1.txt", "r")
f2=open("file2.txt", "r")
dict={}
for i in f2:
i=i.split(" ")
dict[i[0]]=i[1:]
for j in f1:
j=j.split(" ")
if j[0] in dict.values():
dico[i[0]]=j[1:]
但是我不知道如何用第一个Fileny想法中的相应值替换OTUXXXX?
答案 0 :(得分:1)
首先,不要完全像类一样命名变量。永远改用d2之类的东西。
然后,将[1]替换为[1:]
然后,将第一个文件导入字典后,就像处理第二个文件一样-我们将其命名为d1-您可以组合以下值:
<div>
最后,将其转回字符串并写入文件中。
答案 1 :(得分:1)
我建议将第一个文件放入字典中。这样,当您阅读file2时,您可以查找从file1捕获的 ids 。
设置循环的方式是,您将从file2中读取第一条记录,并将其输入到哈希中。密钥永远不会匹配file1中的任何内容。然后,您从file1中读取内容并在那里进行操作。下次您从文件2中读取文件时,文件1的所有迭代都将耗尽文件1的所有内容。
这是一种将文件1读入字典的方法,当它在文件2中找到匹配项时,将其打印出来。
file1 = {} # declare a dictionary
fin = open('f1.txt', 'r')
for line in fin:
# strip the ending newline
line = line.rstrip()
# only split once
# first part into _id and second part into data
_id, data = line.split(' ', 1)
# data here is a single string possibly containing spaces
# because only split once (above)
file1[_id] = data
fin.close()
fin = open('f2.txt', 'r')
for line in fin:
uniref, *ids = line.split() # here ids is a list (because prepended by *)
print(uniref, end='')
for _id in ids:
if _id in file1:
print(' ', file1[_id], '(#' + _id + ')', end='')
print()
fin.close()
打印输出为:
UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
UniRef90_2 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon (#OTU0002) Archaea;Altiarchaeales;uncultured euryarchaeote (#OTU0003) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)
UniRef90_3 Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured (#OTU0006) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)