Bio.Entrez的efetch()是否检索PubMed文章的所有元数据?

时间:2016-06-27 04:42:01

标签: python biopython ncbi pubmed

我想知道Bio.Entrez' s efetch()是否检索了PubMed文章的所有元数据,并给出了PMID作为输入。根据所有元数据,我的意思是PubMed是否还有efetch()检索的元数据。

例如,我看到对于PMID 23954024efetch()检索的摘要包含的信息少于PubMed网站上的摘要(http://www.ncbi.nlm.nih.gov/pubmed/23954024):

efetch()

"AbstractText": [
    "Rotator cuff tendinopathy is a common source of shoulder pain characterised by persistent and/or recurrent problems for a proportion of sufferers. The aim of this study was to pilot the methods proposed to conduct a substantive study to evaluate the effectiveness of a self-managed loaded exercise programme versus usual physiotherapy treatment for rotator cuff tendinopathy.", 
    "A single-centre pragmatic unblinded parallel group pilot randomised controlled trial.", 
    "One private physiotherapy clinic, northern England.", 
    "Twenty-four participants with rotator cuff tendinopathy.", 
    "The intervention was a programme of self-managed loaded exercise. The control group received usual physiotherapy treatment.", 
    "Baseline assessment comprised the Shoulder Pain and Disability Index (SPADI) and the Short-Form 36, repeated three months post randomisation.", 
    "The recruitment target was met and the majority of participants (98%) were willing to be randomised. 100% retention was attained with all participants completing the SPADI at three months. Exercise adherence rates were excellent (90%). The mean change in SPADI score was -23.7 (95% CI -14.4 to -33.3) points for the self-managed exercise group and -19.0 (95% CI -6.0 to -31.9) points for the usual physiotherapy treatment group. The difference in three month SPADI scores was 0.1 (95% CI -16.6 to 16.9) points in favour of the usual physiotherapy treatment group.", 
    "In keeping with previous research which indicates the need for further evaluation of self-managed loaded exercise for rotator cuff tendinopathy, these methods and the preliminary evaluation of outcome offer a foundation and stimulus to conduct a substantive study."
], 

http://www.ncbi.nlm.nih.gov/pubmed/23954024:     抽象     目的:     肩袖肌腱病是肩痛的常见原因,其特征是一部分患者存在持续性和/或反复发作​​的问题。本研究的目的是试验所提出的方法进行实质性研究,以评估自我管理的负荷运动项目与肩袖肌腱病变的常规物理治疗的有效性。

DESIGN:
A single-centre pragmatic unblinded parallel group pilot randomised controlled trial.

SETTING:
One private physiotherapy clinic, northern England.

PARTICIPANTS:
Twenty-four participants with rotator cuff tendinopathy.

INTERVENTIONS:
The intervention was a programme of self-managed loaded exercise. The control group received usual physiotherapy treatment.

MAIN OUTCOMES:
Baseline assessment comprised the Shoulder Pain and Disability Index (SPADI) and the Short-Form 36, repeated three months post randomisation.

RESULTS:
The recruitment target was met and the majority of participants (98%) were willing to be randomised. 100% retention was attained with all participants completing the SPADI at three months. Exercise adherence rates were excellent (90%). The mean change in SPADI score was -23.7 (95% CI -14.4 to -33.3) points for the self-managed exercise group and -19.0 (95% CI -6.0 to -31.9) points for the usual physiotherapy treatment group. The difference in three month SPADI scores was 0.1 (95% CI -16.6 to 16.9) points in favour of the usual physiotherapy treatment group.

CONCLUSIONS:
In keeping with previous research which indicates the need for further evaluation of self-managed loaded exercise for rotator cuff tendinopathy, these methods and the preliminary evaluation of outcome offer a foundation and stimulus to conduct a substantive study.

OBJECTIVES的摘要中缺少DESIGNSETTINGefetch()等。)

efetch()错过了哪些其他元数据,有没有办法以编程方式检索丢失的信息?

2 个答案:

答案 0 :(得分:2)

要扩展xbello的答案,不,信息不会丢失,但有点隐藏。

from Bio import Entrez

Entrez.email = "Your.Name.Here@example.org"
handle = Entrez.efetch(db="pubmed", id="23954024", rettype="xml")
records = Entrez.read(handle)

for record in records:

    m = record['MedlineCitation']['Article']['Abstract']['AbstractText']
    for subsection in m:
        print(subsection.attributes['Label'])
        print(subsection)

截断输出:

  

目标

     肩袖肌腱病是肩部的常见来源   疼痛的特征是持续性和/或反复出现的问题   患者比例。这项研究的目的是试点   建议进行实质性研究以评估的方法   自我管理的负荷锻炼计划与平时相比的有效性   肩袖肌腱病变的物理疗法治疗。

     

设计

     

单中心实用的非盲平行组试点随机化   对照试验。

答案 1 :(得分:1)

信息不缺:

from Bio import Entrez
Entrez.email = "sample@sample.org"

handle = Entrez.efetch(db="pubmed", id="23954024", rettype="xml")

print(handle.read())

部分输出:

<Abstract>
 <AbstractText Label="OBJECTIVES" NlmCategory="OBJECTIVE">Rotator cuff tendinopathy is a common source of shoulder pain characterised by persistent and/or recurrent problems for a proportion of sufferers. The aim of this study was to pilot the methods proposed to conduct a substantive study to evaluate the effectiveness of a self-managed loaded exercise programme versus usual physiotherapy treatment for rotator cuff tendinopathy.</AbstractText>
 <AbstractText Label="DESIGN" NlmCategory="METHODS">A single-centre pragmatic unblinded parallel group pilot randomised controlled trial.</AbstractText>
 <AbstractText Label="SETTING" NlmCategory="METHODS">One private physiotherapy clinic, northern England.</AbstractText>
[...]