有没有办法获得给定的pubid列表的摘要?

时间:2017-11-29 18:07:33

标签: biopython pubmed

我有pmids列表 我想在一个网址中获取两者的摘要

    pmids=[17284678,9997]
    abstract_dict={}
    url = https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
    db=pubmed&id=**17284678,9997**&retmode=text&rettype=xml

我的要求是采用这种格式

   abstract_dict={"pmid1":"abstract1","pmid2":"abstract2"}

我可以通过尝试每个id并更新字典来获得上述格式,但为了优化时间,我希望将所有ID提供给url并处理并仅获取摘要部分。

2 个答案:

答案 0 :(得分:3)

使用BioPython,您可以将已发布的Pubmed ID列表提供给Entrez.efetch,并执行单个URL查找:

from Bio import Entrez

Entrez.email = 'your_email@provider.com'

pmids = [17284678,9997]
handle = Entrez.efetch(db="pubmed", id=','.join(map(str, pmids)),
                       rettype="xml", retmode="text")
records = Entrez.read(handle)
abstracts = [pubmed_article['MedlineCitation']['Article']['Abstract']['AbstractText'][0]
             for pubmed_article in records['PubmedArticle']]


abstract_dict = dict(zip(pmids, abstracts))

结果如下:

{9997: 'Electron paramagnetic resonance and magnetic susceptibility studies of Chromatium flavocytochrome C552 and its diheme flavin-free subunit at temperatures below 45 degrees K are reported. The results show that in the intact protein and the subunit the two low-spin (S = 1/2) heme irons are distinguishable, giving rise to separate EPR signals. In the intact protein only, one of the heme irons exists in two different low spin environments in the pH range 5.5 to 10.5, while the other remains in a constant environment. Factors influencing the variable heme iron environment also influence flavin reactivity, indicating the existence of a mechanism for heme-flavin interaction.',
 17284678: 'Eimeria tenella is an intracellular protozoan parasite that infects the intestinal tracts of domestic fowl and causes coccidiosis, a serious and sometimes lethal enteritis. Eimeria falls in the same phylum (Apicomplexa) as several human and animal parasites such as Cryptosporidium, Toxoplasma, and the malaria parasite, Plasmodium. Here we report the sequencing and analysis of the first chromosome of E. tenella, a chromosome believed to carry loci associated with drug resistance and known to differ between virulent and attenuated strains of the parasite. The chromosome--which appears to be representative of the genome--is gene-dense and rich in simple-sequence repeats, many of which appear to give rise to repetitive amino acid tracts in the predicted proteins. Most striking is the segmentation of the chromosome into repeat-rich regions peppered with transposon-like elements and telomere-like repeats, alternating with repeat-free regions. Predicted genes differ in character between the two types of segment, and the repeat-rich regions appear to be associated with strain-to-strain variation.'}

修改

对于没有相应摘要的pmids,请注意您建议的修复:

abstracts = [pubmed_article['MedlineCitation']['Article']['Abstract'] ['AbstractText'][0] 
             for pubmed_article in records['PubmedArticle'] if 'Abstract' in
             pubmed_article['MedlineCitation']['Article'].keys()] 

假设您有Pubmed ID列表pmids = [1, 2, 3],但pmid 2没有摘要,因此abstracts = ['abstract of 1', 'abstract of 3']

这会导致最后一步出现问题,我zip同时列出一个字典:

>>> abstract_dict = dict(zip(pmids, abstracts))
>>> print(abstract_dict)
{1: 'abstract of 1', 
 2: 'abstract of 3'}

请注意,摘要现在与其相应的Pubmed ID不同步,因为您没有抽象而没有摘要的{pm}过滤掉zip截断到最短的list

相反,请执行:

abstract_dict = {}
without_abstract = []

for pubmed_article in records['PubmedArticle']:
    pmid = int(str(pubmed_article['MedlineCitation']['PMID']))
    article = pubmed_article['MedlineCitation']['Article']
    if 'Abstract' in article:
        abstract = article['Abstract']['AbstractText'][0]
        abstract_dict[pmid] = abstract
    else:
       without_abstract.append(pmid)

print(abstract_dict)
print(without_abstract)

答案 1 :(得分:1)

  let ctx = canvas.getContext('2d')
  let lineChart = new Chart(ctx, {
    type: 'line',
    data: {
      labels: ['0', '1'], // I DON'T WANT THE X AXES TO REPRESENT ANYTHING
      datasets: [{
        label: 'For',
        backgroundColor: '#85CE36',
        fill: false,
        showLine: showLine, // this is true
        borderColor: '#85CE36',
        data: userGoals // The array as described
      }, {
        label: 'Against',
        backgroundColor: 'red',
        fill: false,
        showLine: showLine, // this is true
        borderColor: 'red',
        data: opponentGoals // The array as described
      }
      ]
    },
    options: {
      responsive: true,
      title: {
        display: true,
        text: nameOfStat
      },
      tooltips: {
        mode: 'index',
        intersect: false
      },
      hover: {
        mode: 'nearest',
        intersect: true
      },
      elements: {
        point: {
          pointStyle: style
        }
      },
      scales: {
        xAxes: [{
          display: false // Not really what I am looking for... This just hides the X axes but that's it
        }]
      }
    }
  })