如何在不删除随附文本的情况下摆脱python 3中xml文档中的粗体标签?

时间:2019-03-20 17:40:01

标签: python xml elementtree

我正在尝试从this xml文档中删除粗体标签(<b> Some text in bold here </b>)(但希望保持标签覆盖的文本完整)。粗体标签出现在以下单词/文本周围:目标,设计,设置,参与者,干预,主要结果指标,结果,结论和试验注册。

这是我的Python代码:

import requests
import urllib
from urllib.request import urlopen
import xml.etree.ElementTree as etree
from time import sleep
import json    

urlHead = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=abstract&id='
pmid = "28420629"
completeUrl = urlHead + pmid    
response = urllib.request.urlopen(completeUrl)
tree = etree.parse(response)
studyAbstractParts = tree.findall('.//AbstractText')
for studyAbstractPart in studyAbstractParts:
    print(studyAbstractPart.text)

此代码的问题在于,它会找到“ AbstractText”标记下的所有文本,但是会停止(或忽略)黑体标记及其后的文本。原则上,我需要“ <AbstractText> </AbstractText>”标记之间的所有文本,但是粗体格式<b> </b>只是对其的一种卑鄙的阻碍。

1 个答案:

答案 0 :(得分:1)

您可以使用itertext()方法来获取<AbstractText>及其子元素中的所有文本。

studyAbstractParts = tree.findall('.//AbstractText')
for studyAbstractPart in studyAbstractParts:
    for t in studyAbstractPart.itertext():
        print(t)

输出:

Objectives
 To determine whether preoperative dexamethasone reduces postoperative vomiting in patients undergoing elective bowel surgery and whether it is associated with other measurable benefits during recovery from surgery, including quicker return to oral diet and reduced length of stay.
Design
 Pragmatic two arm parallel group randomised trial with blinded postoperative care and outcome assessment.
Setting
 45 UK hospitals.
Participants
 1350 patients aged 18 or over undergoing elective open or laparoscopic bowel surgery for malignant or benign pathology.
Interventions
 Addition of a single dose of 8 mg intravenous dexamethasone at induction of anaesthesia compared with standard care.
Main outcome measures
 Primary outcome: reported vomiting within 24 hours reported by patient or clinician.
vomiting with 72 and 120 hours reported by patient or clinician; use of antiemetics and postoperative nausea and vomiting at 24, 72, and 120 hours rated by patient; fatigue and quality of life at 120 hours or discharge and at 30 days; time to return to fluid and food intake; length of hospital stay; adverse events.
Results
 1350 participants were recruited and randomly allocated to additional dexamethasone (n=674) or standard care (n=676) at induction of anaesthesia. Vomiting within 24 hours of surgery occurred in 172 (25.5%) participants in the dexamethasone arm and 223 (33.0%) allocated standard care (number needed to treat (NNT) 13, 95% confidence interval 5 to 22; P=0.003). Additional postoperative antiemetics were given (on demand) to 265 (39.3%) participants allocated dexamethasone and 351 (51.9%) allocated standard care (NNT 8, 5 to 11; P<0.001). Reduction in on demand antiemetics remained up to 72 hours. There was no increase in complications.
Conclusions
 Addition of a single dose of 8 mg intravenous dexamethasone at induction of anaesthesia significantly reduces both the incidence of postoperative nausea and vomiting at 24 hours and the need for rescue antiemetics for up to 72 hours in patients undergoing large and small bowel surgery, with no increase in adverse events.
Trial registration
 EudraCT (2010-022894-32) and ISRCTN (ISRCTN21973627).