Question

我正在尝试使用Biopython（Entrez）搜索条件来返回入藏号（而不是GI *）。

以下是我的代码的一小部分摘录：

from Bio import Entrez

Entrez.email = 'myemailaddress'
search_phrase = 'Escherichia coli[organism]) AND (complete genome[keyword])'
handle = Entrez.esearch(db='nuccore', term=search_phrase, retmax=100, rettype='acc', retmode='text')
result = Entrez.read(handle)
handle.close()
gi_numbers = result['IdList']
print(gi_numbers)

'745369752'，'910228862'，'187736741'，'802098270'，'802098269'， '802098267'，'387610477'，'544579032'，'544574430'，'215485161'， '749295052'，'387823261'，'387605479'，'641687520'，'641682562'， '594009615'，'557270520'，'313848522'，'309700213'，'284919779'， '215263233'，'544345556'，'544340954'，'144661'，'51773702'， '202957457'，'202957451'，'172051323'

我确信我可以从GI转换为加入，但是避免额外的步骤会很好。我错过了什么片段的魔法？

提前谢谢。

*特别是因为NCBI正逐步淘汰GI数

Answer 1

通过NCBI网站上的docs for esearch查看，只有两个rettype可用 - uilist，这是您目前获得的默认XML格式（它被解析为由Entrez.read()）和count组成的dict，只显示Count值（查看result的完整内容，它就在那里），我对此不清楚确切含义，因为它不代表IdList ...

中的项目总数

无论如何，Entrez.esearch()将采用您喜欢的rettype和retmode的任意值，但它只返回uilist或count xml 1}}或json模式 - 没有加入ID，没有没什么'。

Entrez.efetch()会将您all sorts of cool stuff传回给您，具体取决于您查询的是哪个数据库。当然，缺点是您需要通过一个或多个ID进行查询，而不是通过搜索字符串进行查询，因此为了获取您的入藏ID，您需要运行两个查询：

search_phrase = "Escherichia coli[organism]) AND (complete genome[keyword])"
handle = Entrez.esearch(db="nuccore", term=search_phrase, retmax=100)
result = Entrez.read(handle)
handle.close()
fetch_handle = Entrez.efetch(db="nuccore", id=results["IdList"], rettype="acc", retmode="text")
acc_ids = [id.strip() for id in fetch_handle]
fetch_handle.close()
print(acc_ids)

给出

['HF572917.2'，'NZ_HF572917.1'，'NC_010558.1'，'NZ_HG941720.1'，'NZ_HG941719.1'，'NZ_HG941718.1'，'NC_017633.1'，'NC_022371.1 '，'NC_022370.1'，'NC_011601.1'，'NZ_HG738867.1'，'NC_012892.2'，'NC_017626.1'，'HG941719.1'，'HG941718.1'，'HG941720.1'， 'HG738867.1'，'AM946981.2'，'FN649414.1'，'FN554766.1'，'FM180568.1'，'HG428756.1'，'HG428755.1'，'M37402.1'，'AJ304858 .2'，'FM206294.1'，'FM206293.1'，'AM886293.1']

所以，我并不十分肯定我是否满意地回答了你的问题，但不幸的是我认为答案是“没有魔力”。

使用Biopython搜索条件返回入藏号

1 个答案: