Question

我有以下XML文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE MedlineCitationSet PUBLIC "-//NLM//DTD Medline Citation, 1st January, 2014//EN"
                                    "http://www.nlm.nih.gov/databases/dtd/nlmmedlinecitationset_140101.dtd">
<MedlineCitationSet>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">15326085</PMID>
<Article PubModel="Print-Electronic">
<Journal>
<JournalIssue CitedMedium="Internet">
<Volume>44</Volume>
<Issue>4</Issue>
<PubDate>
<Year>2004</Year>
<Month>Oct</Month>
</PubDate>
</JournalIssue>
<Title>Hypertension</Title>
<ISOAbbreviation>Hypertension</ISOAbbreviation>
</Journal>
<ArticleTitle>Arterial pressure lowering effect of chronic atenolol therapy in hypertension and vasoconstrictor sympathetic drive.</ArticleTitle>
<Pagination>
<MedlinePgn>454-8</MedlinePgn>
</Pagination>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Burns</LastName>
<ForeName>Joanna</ForeName>
<Initials>J</Initials>
<Affiliation>Department of Cardiology, Leeds Teaching Hospitals NHS Trust, Leeds, UK. burnsjoanna1@hotmail.com</Affiliation>
</Author>
<Author ValidYN="Y">
<LastName>Mary</LastName>
<ForeName>David A S G</ForeName>
<Initials>DA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Mackintosh</LastName>
<ForeName>Alan F</ForeName>
<Initials>AF</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Ball</LastName>
<ForeName>Stephen G</ForeName>
<Initials>SG</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Greenwood</LastName>
<ForeName>John P</ForeName>
<Initials>JP</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<ArticleDate DateType="Electronic">
<Year>2004</Year>
<Month>08</Month>
<Day>23</Day>
</ArticleDate>
</Article>

</MedlineCitation>
<MedlineCitation Owner="NLM" Status="In-Data-Review">
<PMID Version="1">24096967</PMID>
<Article PubModel="Print-Electronic">
<Journal>
<JournalIssue CitedMedium="Internet">
<Volume>31</Volume>
<Issue>3</Issue>
<PubDate>
<Year>2014</Year>
<Month>Mar</Month>
</PubDate>
</JournalIssue>
<Title>Pharmaceutical research</Title>
<ISOAbbreviation>Pharm. Res.</ISOAbbreviation>
</Journal>
<ArticleTitle>Semi-mechanistic Modelling of the Analgesic Effect of Gabapentin in the Formalin-Induced Rat Model of Experimental Pain.</ArticleTitle>
<Pagination>
<MedlinePgn>593-606</MedlinePgn>
</Pagination>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Taneja</LastName>
<ForeName>A</ForeName>
<Initials>A</Initials>
<Affiliation>Division of Pharmacology, Leiden Academic Centre for Drug Research, POBox 9502, 2300 RA, Leiden, The Netherlands.</Affiliation>
</Author>
<Author ValidYN="Y">
<LastName>Troconiz</LastName>
<ForeName>I F</ForeName>
<Initials>IF</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Danhof</LastName>
<ForeName>M</ForeName>
<Initials>M</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Della Pasqua</LastName>
<ForeName>O</ForeName>
<Initials>O</Initials>
</Author>
<Author ValidYN="Y">
<CollectiveName>neuropathic pain project of the PKPD modelling platform</CollectiveName>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2013</Year>
<Month>10</Month>
<Day>05</Day>
</ArticleDate>
</Article>
</MedlineCitation>

</MedlineCitationSet>

请注意，它包含两个条目PMID 15326085和24096967。我想要做的是解析XML文件并提取作者的姓氏或集体名称。结果：

15326085 Burns,Mary,Mackintosh,Ball,Greenwood
24096967 Taneja,Troconiz,Danhof,Della Pasqua, neuropathic pain project of the PKPD modelling platform

但为什么这段代码无法在第二个条目中捕获“集体名称”？

#!/usr/bin/env python
import xml.etree.ElementTree as ET
def parse_xml(xmlfile):
    """docstring for parse_xml"""
    tree = ET.parse(xmlfile)
    root = tree.getroot()
    for medcit in root.findall('MedlineCitation'):
        pmid = medcit.find('PMID').text
        authors = medcit.find('Article/AuthorList/')

        lnlist = []
        for auth in authors:
            lastname = auth.find('LastName').text.encode('utf8')
            colcname = auth.find('CollectiveName').text

            if lastname is not None:
                lnlist.append(lastname)
            elif colcname is not None:
                lnlist.append(colcname)

         print pmid, ",".join(lnlist)

parse_xml('myfile.xml')

上述代码的输出是：

Traceback (most recent call last):
  File "test.py", line 70, in <module>
    parse_xml(fvar)
  File "test.py", line 49, in parse_xml
    colcname = auth.find('CollectiveName').text
AttributeError: 'NoneType' object has no attribute 'text'

Answer 1

仅在找到节点时抓取text：

for auth in authors:
    lastname = auth.find('LastName')
    if lastname is not None:
        lnlist.append(lastname.text.encode('utf8'))
    else:
        colcname = auth.find('CollectiveName')
        if colcname is not None:
            lnlist.append(colcname.text)

在Python中从ElementTree访问节点中的备用属性

1 个答案: