从pubmed

时间:2017-08-31 10:49:19

标签: python biopython

我从以下查询中获取摘要时遇到问题

Entrez.email = "anonymous@gmail.com"
esearch_query = Entrez.esearch(db="pubmed", term="cancer AND food", retmode="xml")
esearch_result = Entrez.read(esearch_query)

# Now we need to get all papers from our search using the IDList
for iden in esearch_result['IdList'][-1]:
    pubmed_entry = Entrez.efetch(db="pubmed", id=iden, retmode="xml")
    result = Entrez.read(pubmed_entry)
    print result

输出如下(仅作为示例中的一个条目)。

  

{u' PubmedArticle':[{u' MedlineCitation':   DictElement({u' DateCompleted&#39 ;: {u' Month':' 01',u' Day':' 10',u& #39;年&#39 ;:   ' 1976'},你' OtherID':[],你' DateRevised':{u'月':' 03', U'日&#39 ;:   ' 22',u'年':' 2017'},u' MeshHeadingList':[{u' QualifierName':[] ,   u' DescriptorName&#39 ;: StringElement('绑定网站',attributes = {u' UI':   你' D001665',你' MajorTopicYN':你' N'}}},{u' QualifierName':   [StringElement(' blood',attributes = {u' UI':u' Q000097',   你是主要的专家':你' N'}),你' DescriptorName':StringElement(' Cobalt',   attributes = {u' UI':你' D003035',你' MajorTopicYN':你' N'})},   {u' QualifierName':[],u' DescriptorName&#39 ;: StringElement(' Hemoglobins',   attributes = {u' UI&#39 ;: u' D006454',u' MajorTopicYN':u' Y'})},   {u' QualifierName':[],u' DescriptorName&#39 ;: StringElement(' Humans',   attributes = {u' UI':你' D006801',你' MajorTopicYN':你' N'})},   {u' QualifierName':[],u' DescriptorName&#39 ;: StringElement(' Hydrogen-Ion   专注',属性= {u' UI&#39 ;: u' D006863',u' MajorTopicYN':   你' N'}},{u' QualifierName':[StringElement(' blood',attributes = {u' UI':   你' Q000097',你' MajorTopicYN':你' N'}),你' DescriptorName':   StringElement(' Iron',attributes = {u' UI&#39 ;: u' D007501',u' MajorTopicYN':   你' N'}}},{u' QualifierName':[],u' DescriptorName':   StringElement(' Ligands',attributes = {u' UI':u' D008024',   你是MajorTopicYN':你'}}}},{u' QualifierName':[],u' DescriptorName':   StringElement('数学',属性= {u' UI':你' D008433',   你是MajorTopicYN':你'}}}},{u' QualifierName':[StringElement(' blood',   attributes = {u' UI&#39 ;: u' Q000097',u' MajorTopicYN':u' Y'})],   你的DescriptorName&#39 ;: StringElement(' Oxygen',attributes = {u' UI':   你' D010100',你' MajorTopicYN':你'}}}},{u' QualifierName':[],   你的DescriptorName&#39 ;: StringElement(' Oxyhemoglobins',attributes = {u' UI':   你' D010108',你' MajorTopicYN':你'}}}},{u' QualifierName':[],   你的DescriptorName&#39 ;: StringElement(' Protein Binding',attributes = {u' UI':   你' D011485',你' MajorTopicYN':你' N'}},{u' QualifierName':[],   u' DescriptorName&#39 ;: StringElement(' Spectrum Analysis',   attributes = {u' UI&#39 ;: u' D013057',u' MajorTopicYN':u' N'})}],   你' OtherAbstract':[],你' CitationSubset':[' IM'],u' ChemicalList':   [{u' NameOfSubstance&#39 ;: StringElement(' Hemoglobins',attributes = {u' UI':   你' D006454'}),你' RegistryNumber':' 0'},{u' NameOfSubstance':   StringElement(' Ligands',attributes = {u' UI':u' D008024'}),   你的注册管理机构编号':' 0'},{u' NameOfSubstance':   StringElement(' Oxyhemoglobins',attributes = {u' UI':u' D010108'}),   你注册登记号码':' 0'},{u' NameOfSubstance&#39 ;: StringElement(' Cobalt',   attributes = {u' UI':u' D003035'}),u' RegistryNumber':' 3G0H8C9362'},   {u' NameOfSubstance&#39 ;: StringElement(' Iron',attributes = {u' UI':   你' D007501'}),你' RegistryNumber':' E1UOL152H7'},{u' NameOfSubstance':   StringElement(' Oxygen',attributes = {u' UI':u' D010100'}),   你注册登记号码':' S88TT14065'}],你' KeywordList':[],你' DateCreated':   {u'月':' 01',你' Day':' 10',u'年':' 1976'},   你和SpaceFlightMission':[],你' GeneralNote':[],你'文章':   DictElement({u' ArticleDate':[],u'分页':{u' MedlinePgn':   ' 1424-31'},u'作者列表':ListElement([DictElement({u' LastName':   ' Chow',u'缩写':' YW',u'标识符':[],u' AffiliationInfo':[] ,   你' ForeName':' Y W'},attributes = {u' ValidYN':你' Y'}),   DictElement({u' LastName':' Pietranico',u'缩写':' R',   你'标识符:[],你' AffiliationInfo':[],你' ForeName':' R'},   attributes = {u' ValidYN':u' Y'}),DictElement({u' LastName':' Mukerji',   你'缩写':' A',u'标识符':[],u' AffiliationInfo':[],   你的名字':' A'},属性= {u' ValidYN':你' Y'})],   attributes = {u' CompleteYN&#39 ;: u' Y'}),u'语言':[' eng'],   u' PublicationTypeList':[StringElement(' Journal Article',   attributes = {u' UI':u' D016428'}),StringElement(" Research Support,U.S.   Gov&#t; tt,Non-P.H.S。",attributes = {u' UI':u' D013486'})],u' Journal':   {u' ISSN&#39 ;: StringElement(' 0006-291X',attributes = {u' IssnType':   你打印'}),u' ISOA缩写':' Biochem。生物物理学。 RES。 COMMUN。&#39 ;,   你的日记问题:DictElement({u'卷':' 66',你'问题':' 4',   你' PubDate':{u'月':' 10月',' Day':' 27',u'年':' 1975'}},   attributes = {u' CitedMedium':u' Print'}),u' Title':' Biochemical and   生物物理研究通讯'},' ArticleTitle':'研究   氧与血红蛋白分子的结合能。',u' ELocationID':[]},   attributes = {u' PubModel':u' Print'}),u' PMID&#39 ;: StringElement(' 6',   attributes = {u' Version&#39 ;: u'}),u' MedlineJournalInfo':{u' MedlineTA':   ' Biochem Biophys Res Commun',u' Country':' United States',   你好,我是' 0372516',你' ISSNLinking':' 0006-291X'}},   attributes = {u'状态':你' MEDLINE',你'所有者':你' NLM'}),u' PubmedData':   {u' ArticleIdList':[StringElement(' 6',attributes = {u' IdType':   你发布了'}),StringElement(' 0006-291X(75)90518-5',   attributes = {u' IdType':u' pii'})],u' PublicationStatus':' ppublish',   你的历史':[DictElement({u'月':' 10',你' Day':' 27',你& #39;年&#39 ;:   ' 1975'},attributes = {u' PubStatus':u' pubmed'}),   DictElement({u' Minute':' 1',u' Month':' 10',u' Day':&# 39; 27',你'小时':   ' 0',你'年':' 1975'},attributes = {u' PubStatus':你' medline'}) ,   DictElement({u'分钟':' 0',你'月':' 10',你' Day':&# 39; 27',你'小时':   ' 0',你'年':' 1975'},attributes = {u' PubStatus':u' entrez'}) ]}}],   你和PubmedBookArticle':[]}

我怎样才能得到摘要?最后的想法是在sql数据库中包含一些字段(例如title,abstract ..)。

谢谢, 大卫

1 个答案:

答案 0 :(得分:3)

可能对你不利的是,通常没有1975年以前MEDLINE PubMed记录的摘要 - 你的例子就在1975年的尖端。我使用你的代码和另一个查询,发现了两篇文章ID ,一个有摘要,一个没有:

from Bio import Entrez

Entrez.email = "anonymous@gmail.com"

esearch_query = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmode="xml")
esearch_result = Entrez.read(esearch_query)

for identifier in esearch_result['IdList']:
    pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
    result = Entrez.read(pubmed_entry)

    article = result['PubmedArticle'][0]['MedlineCitation']['Article']

    if 'Abstract' in article:
        print(article['Abstract']['AbstractText'])

截止输出

  

['本报告编目了巨型动物的所有自发增殖,   比较时期举行的考拉,袋熊,负鼠和滑翔机   Taronga动物园病理学登记处。存在增生性病变   在14个巨型动物,26只考拉,2只袋熊和22只负鼠和滑翔机中。   记录在macropods中的大多数肿瘤都是单数的......']

详细信息可在文档中找到:MEDLINE PubMed XML Element Descriptions