Biopython并检索期刊的全名

时间:2017-10-04 12:20:07

标签: python-3.x biopython pubmed

我正在使用Biopython和Python 3.x从PubMed数据库进行搜索。我正确地得到了搜索结果,但接下来我需要提取搜索结果的所有期刊名称(全名,而不仅仅是缩写)。目前我使用以下代码:

from Bio import Entrez
from Bio import Medline

Entrez.email = "my_email@gmail.com"
handle = Entrez.esearch(db="pubmed", term="search_term", retmax=20)
record = Entrez.read(handle)
handle.close()

idlist = record["IdList"]

records = list(records)

for record in records:
    print("source:", record.get("SO", "?"))

所以这很好用,但是record.get(" SO"),"?")只返回日记的缩写(例如, N Engl J Med ,而非 New England Journal of Medicine )。根据我使用手动PubMed搜索的经验,您可以使用缩写或全名搜索,PubMed将以相同的方式处理这些,所以我想是否还有一些参数来获取全名?

1 个答案:

答案 0 :(得分:2)

  

所以这很好用,但是record.get(“SO”),“?”)只返回期刊的缩写

不,不。由于这一行,它甚至不会运行:

records = list(records)

未定义records。即使你解决了这个问题,你只需要从中获取:

idlist = record["IdList"]

是一个数字列表,例如:['17510654', '2246389'],旨在通过Entrez.efetch()调用传回来获取实际数据。因此,当您对其中一个数字字符串执行record.get("SO", "?")时,您的代码会再次爆炸。

首先,"SO"字段缩写被定义为返回期刊标题缩写(TA)作为其返回的一部分。您可能希望"JT"期刊标题改为MEDLINE/PubMed Data Element (Field) Descriptions中定义的。但这些都与查找无关。

以下是对代码的修改,以获取文章标题和日记的标题:

from Bio import Entrez

Entrez.email = "my_email@gmail.com"  # change this to be your email address
handle = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmax=20)
record = Entrez.read(handle)
handle.close()

for identifier in record['IdList']:
    pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
    result = Entrez.read(pubmed_entry)
    article = result['PubmedArticle'][0]['MedlineCitation']['Article']

    print('"{}" in "{}"'.format(article['ArticleTitle'], article['Journal']['Title']))

<强>输出

> python3 test.py
"Of wombats and whales: telomere tales in Madrid. Conference on telomeres and telomerase." in "EMBO reports"
"Spontaneous proliferations in Australian marsupials--a survey and review. 1. Macropods, koalas, wombats, possums and gliders." in "Journal of comparative pathology"
>

详细信息可在文档中找到:MEDLINE PubMed XML Element Descriptions