我使用for循环搜索NCBI蛋白质数据库中的蛋白质ID列表,并尝试将这些ID转换为描述。这是一个例子:
import pandas as pd
from Bio import Entrez
from Bio import SeqIO
df2=pd.read_csv('ID.txt', header=None)
df.columns = ['protein_ID'] # put a header 'protein_ID' to the dataframe
lists=df.protein_ID.tolist() #convert the column into a list of protein IDs.
description = ''
for num, line in enumerate(lists):
handle = Entrez.efetch(db="protein", id=line, rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
description += record.description
description
它返回一个巨大的字符串:
'hypothetical protein UR61_C0009G0014 [candidate division WS6 bacterium GW2011_GWE1_34_7]ATPase [candidate division WS6 bacterium GW2011_GWE2_33_157]hypothetical protein UR96_C0034G0007 [candidate division WS6 bacterium GW2011_GWC1_36_11]phosphoenolpyruvate synthase [Candidatus Komeilibacteria bacterium RIFOXYC1_FULL_37_11]'
我想要的是带有新换行符的字符串列表,如下所示:
[
'hypothetical protein UR61_C0009G0014 [candidate division WS6 bacterium GW2011_GWE1_34_7]',
'ATPase [candidate division WS6 bacterium GW2011_GWE2_33_157]',
'hypothetical protein UR96_C0034G0007 [candidate division WS6 bacterium GW2011_GWC1_36_11]',
'phosphoenolpyruvate synthase [Candidatus Komeilibacteria bacterium RIFOXYC1_FULL_37_11]'
]
如何实现这一目标?非常感谢你!
答案 0 :(得分:0)
我想要的是一个字符串列表
description = []
for num, line in enumerate(lists):
....
description.append(record.description)
有新的换行符
默认情况下,不会以这种方式打印列表,请使用pprint
import pprint
# you original code here
pprint.pprint(description)