我希望使用python从记录标签中的.xml文件中提取一些ID(doi,pmcid和pmid):
xml文件:
<pmcids status="ok">
<request idtype="doi" dois="" versions="yes" showaiid="no">
<warning>no e-mail provided</warning>
<warning>no tool provided</warning>
<echo>ids=10.1371%2Fjournal.pone.0054577</echo>
</request>
<record requested-id="10.1371/JOURNAL.PONE.0054577" pmcid="PMC3557238" pmid="23382917" doi="10.1371/journal.pone.0054577">
<versions><version pmcid="PMC3557238.1" current="true"/>
</versions>
</record>
</pmcids>
我尝试了以下python代码:
import xml.etree.cElementTree as etree
xmlDoc = open('garbage_collector/tmp.xml', 'r')
xmlDocData = xmlDoc.read()
xmlDocTree = etree.XML(xmlDocData)
for ingredient in xmlDocTree.iter('record'):
print ingredient[0].text
我希望将pmcid,doi和pmid作为字符串形式输出
答案 0 :(得分:0)
如果可以使用BeautifulSoup,则可以
from bs4 import BeautifulSoup
soup = BeautifulSoup(input_xml)
t = soup.find('record')
其中input_xml
是要以字符串形式检查的xml。
我们使用record
函数找到find()
标签并将其存储在变量t
中。现在可以通过索引<record>
来访问t
标记的属性。
print(t['pmcid'])
print(t['doi'])
print(t['pmid'])
将打印
PMC3557238
10.1371/journal.pone.0054577
23382917