我有以下XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE MedlineCitationSet PUBLIC "-//NLM//DTD Medline Citation, 1st January, 2014//EN"
"http://www.nlm.nih.gov/databases/dtd/nlmmedlinecitationset_140101.dtd">
<MedlineCitationSet>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">15326085</PMID>
<Article PubModel="Print-Electronic">
<Journal>
<JournalIssue CitedMedium="Internet">
<Volume>44</Volume>
<Issue>4</Issue>
<PubDate>
<Year>2004</Year>
<Month>Oct</Month>
</PubDate>
</JournalIssue>
<Title>Hypertension</Title>
<ISOAbbreviation>Hypertension</ISOAbbreviation>
</Journal>
<ArticleTitle>Arterial pressure lowering effect of chronic atenolol therapy in hypertension and vasoconstrictor sympathetic drive.</ArticleTitle>
<Pagination>
<MedlinePgn>454-8</MedlinePgn>
</Pagination>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Burns</LastName>
<ForeName>Joanna</ForeName>
<Initials>J</Initials>
<Affiliation>Department of Cardiology, Leeds Teaching Hospitals NHS Trust, Leeds, UK. burnsjoanna1@hotmail.com</Affiliation>
</Author>
<Author ValidYN="Y">
<LastName>Mary</LastName>
<ForeName>David A S G</ForeName>
<Initials>DA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Mackintosh</LastName>
<ForeName>Alan F</ForeName>
<Initials>AF</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Ball</LastName>
<ForeName>Stephen G</ForeName>
<Initials>SG</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Greenwood</LastName>
<ForeName>John P</ForeName>
<Initials>JP</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<ArticleDate DateType="Electronic">
<Year>2004</Year>
<Month>08</Month>
<Day>23</Day>
</ArticleDate>
</Article>
</MedlineCitation>
<MedlineCitation Owner="NLM" Status="In-Data-Review">
<PMID Version="1">24096967</PMID>
<Article PubModel="Print-Electronic">
<Journal>
<JournalIssue CitedMedium="Internet">
<Volume>31</Volume>
<Issue>3</Issue>
<PubDate>
<Year>2014</Year>
<Month>Mar</Month>
</PubDate>
</JournalIssue>
<Title>Pharmaceutical research</Title>
<ISOAbbreviation>Pharm. Res.</ISOAbbreviation>
</Journal>
<ArticleTitle>Semi-mechanistic Modelling of the Analgesic Effect of Gabapentin in the Formalin-Induced Rat Model of Experimental Pain.</ArticleTitle>
<Pagination>
<MedlinePgn>593-606</MedlinePgn>
</Pagination>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Taneja</LastName>
<ForeName>A</ForeName>
<Initials>A</Initials>
<Affiliation>Division of Pharmacology, Leiden Academic Centre for Drug Research, POBox 9502, 2300 RA, Leiden, The Netherlands.</Affiliation>
</Author>
<Author ValidYN="Y">
<LastName>Troconiz</LastName>
<ForeName>I F</ForeName>
<Initials>IF</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Danhof</LastName>
<ForeName>M</ForeName>
<Initials>M</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Della Pasqua</LastName>
<ForeName>O</ForeName>
<Initials>O</Initials>
</Author>
<Author ValidYN="Y">
<CollectiveName>neuropathic pain project of the PKPD modelling platform</CollectiveName>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2013</Year>
<Month>10</Month>
<Day>05</Day>
</ArticleDate>
</Article>
</MedlineCitation>
</MedlineCitationSet>
请注意,它包含两个条目PMID 15326085和24096967。 我想要做的是解析XML文件并提取作者的姓氏或集体名称。结果:
15326085 Burns,Mary,Mackintosh,Ball,Greenwood
24096967 Taneja,Troconiz,Danhof,Della Pasqua, neuropathic pain project of the PKPD modelling platform
但为什么这段代码无法在第二个条目中捕获“集体名称”?
#!/usr/bin/env python
import xml.etree.ElementTree as ET
def parse_xml(xmlfile):
"""docstring for parse_xml"""
tree = ET.parse(xmlfile)
root = tree.getroot()
for medcit in root.findall('MedlineCitation'):
pmid = medcit.find('PMID').text
authors = medcit.find('Article/AuthorList/')
lnlist = []
for auth in authors:
lastname = auth.find('LastName').text.encode('utf8')
colcname = auth.find('CollectiveName').text
if lastname is not None:
lnlist.append(lastname)
elif colcname is not None:
lnlist.append(colcname)
print pmid, ",".join(lnlist)
parse_xml('myfile.xml')
上述代码的输出是:
Traceback (most recent call last):
File "test.py", line 70, in <module>
parse_xml(fvar)
File "test.py", line 49, in parse_xml
colcname = auth.find('CollectiveName').text
AttributeError: 'NoneType' object has no attribute 'text'
答案 0 :(得分:1)
仅在找到节点时抓取text
:
for auth in authors:
lastname = auth.find('LastName')
if lastname is not None:
lnlist.append(lastname.text.encode('utf8'))
else:
colcname = auth.find('CollectiveName')
if colcname is not None:
lnlist.append(colcname.text)