我正在尝试遍历多个列表,并发现其他列表中是否存在某个项目。 这是我的代码:
with open("Busca1.txt", "r") as f, open("CELLO1.txt","r") as f1, open("PSORT.txt","r") as f2, open("results","w+") as of :
file_in = f.readlines()
file_in1 = f1.readlines()
file_in2 = f2.readlines()
file_in3 = f3.readlines()
for line in file_in:
temp = line.split()
ID_busca = temp[1]
for line in file_in1:
temp2 = line.split()
ID_cello = temp2[1]
for line in file_in2:
temp3 = line.split()
ID_psort = temp3[1]
all = [i for i in ID_busca if i in ID_cello + ID_psort]
print all
这是我得到的:
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
['Y', 'P', '_', '2', '0', '7', '6', '8', '4', '.', '1']
So the item is split by a letter and it seems to be printed out and one item seems to be several times.
以下是文件示例:
Busca1.txt:
C:extracellular YP_207690.1
C:plasma YP_207698.1
C:extracellular YP_207699.1
C:extracellular YP_207700.1
C:extracellular YP_207701.1
C:extracellular YP_207704.1
C:extracellular YP_207706.1
C:extracellular YP_207716.1
C:extracellular YP_207717.1
C:extracellular YP_207719.1
C:plasma YP_207722.1
C:plasma YP_207728.1
C:plasma YP_207729.1
C:extracellular YP_207731.1
CELLO1.txt:
OuterMembrane YP_008914846.1 opacity
Periplasmic YP_008914847.1 hypothetical
OuterMembrane YP_008914851.1 opacity
OuterMembrane YP_008914852.1 opacity
OuterMembrane YP_008914853.1 opacity
OuterMembrane YP_008914854.1 opacity
OuterMembrane YP_008914855.1 opacity
OuterMembrane YP_008914857.1 opacity
OuterMembrane YP_008914859.1 opacity
OuterMembrane YP_008914860.1 opacity
Periplasmic YP_008994831.1 hypothetical
Periplasmic YP_009115479.1 DNA
Extracellular YP_009115480.1 bacterioferritin-associated
OuterMembrane YP_009115486.1 pilus
InnerMembrane YP_009115487.1 hypothetical
InnerMembrane YP_009115488.1 membrane
Periplasmic YP_009115490.1 pilin
Periplasmic YP_009179204.1 hypothetical
Periplasmic YP_207190.2 leucine--tRNA
PSORT.txt:
SeqID: YP_008914846.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914847.1 hypothetical protein NGO0146a [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914848.1 hypothetical protein NGO0250a [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914849.1 hypothetical protein NGO0590a [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914851.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914852.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914853.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914854.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914855.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914857.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914859.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008914860.1 opacity protein [Neisseria gonorrhoeae FA 1090]
SeqID: YP_008994831.1 hypothetical protein NGO1621a [Neisseria gonorrhoeae FA 1090]
SeqID: YP_009115480.1 bacterioferritin-associated ferredoxin [Neisseria gonorrhoeae FA 1090
任何人都可以帮助获得代码来完成我需要的工作吗? 如果要显示在所有当前列表中,我想打印YP _ *********代码。
谢谢
答案 0 :(得分:0)
您需要修正逻辑。我不确定您是如何尝试完成这项工作的。首先,您引用一个不存在的文件f3
;您有一个打算使用的of
别名吗?
其次,您给定的输出不是来自您发布的代码。请确保您的问题陈述完全正确。
关于预期的操作,请查看循环形式:
for line in file_in:
temp = line.split()
ID_busca = temp[1]
您仔细阅读文件的每一行,将其拆分,提取ID号,然后用最新的ID覆盖以前的ID号。退出此循环时,ID_busca
是一个简单的字符串,仅 last ID号。当您到达第二个循环的结尾时,您已经
ID_busca = "YP_207731.1"
ID_cello = "YP_207190.2"
现在,您遍历PSORT,依次提取每个ID。让我们看第一个:
ID_psort = "YP_008914846.1"
现在,您的列表理解会逐步遍历ID_busca
的每个字符,以查看该字符是否在其他两个字符串中。
all = [i for i in ID_busca if i in ""YP_207190.2YP_008914846.1"]
学会一次编写少量代码。在您知道要搜索的ID列表之前,请勿尝试搜索ID。使用print
语句。在您编写的所有内容均经过测试并按预期工作之前,请不要再编写任何代码。您发布的代码正在解决至少四个错误。
如果要获取ID编号列表,请 make 列出ID编号列表:搜索有关列表以及append
和extend
方法的在线教程。同时查找sets
;我怀疑最简单的方法是制作三组ID号并简单地将它们的相交。