使用Perl提取多级XML

时间:2010-06-22 13:58:54

标签: xml perl

我有一个XML文件如下:

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2010//EN" "http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_100101.dtd">
<PubmedArticleSet>
<PubmedArticle>
    <MedlineCitation Owner="NLM" Status="Publisher">
        <PMID>20555148</PMID>
        <DateCreated>
            <Year>2010</Year>
            <Month>6</Month>
            <Day>17</Day>
         </DateCreated>
        <Article PubModel="Print-Electronic">
        <Journal>
            <ISSN IssnType="Electronic">1875-8908</ISSN>
            <JournalIssue CitedMedium="Internet">
                <PubDate>
                    <Year>2010</Year>
                    <Month>Jun</Month>
                    <Day>16</Day>
                </PubDate>
            </JournalIssue>
            <Title>Journal of Alzheimer's disease : JAD</Title>
        </Journal>
        <ArticleTitle>CSF Neurofilament Proteins Levels are Elevated in Sporadic Creutzfeldt-Jakob Disease.</ArticleTitle>
        <Pagination>
            <MedlinePgn/>
        </Pagination>
        <Abstract>
            <AbstractText>In this study we investigated the cerebrospinal fluid (CSF) levels of neurofilament light (NFL) and heavy chain (NFHp35), total tau (t-tau), and glial fibrillary acidic protein (GFAP) to detect disease specific profiles in sporadic Creutzfeldt Jakob disease (sCJD) patients and Alzheimer's disease (AD) patients. CSF levels of NFL, NFHp35, t-tau, and GFAP of 23 sCJD patients and 55 AD patients were analyzed and compared to non-demented controls. Median NFL, NFHp35, GFAP, and t-tau levels were significantly increased in sCJD patients and AD patients versus controls (p &lt; 0.0001 in all). NFL, NFHp35, and t-tau levels were significantly increased in sCJD patients versus AD patients (p &lt; 0.005), but GFAP concentrations did not differ between sCJD and AD. The results suggest that neuroaxonal damage, reflected by higher CSF levels of NFL, NFHp35, and t-tau, is more pronounced in the pathophysiology of sCJD than in AD. The comparable CSF GFAP concentrations suggest that astroglial damage or astrocytosis is equally pronounced in the pathophysiology of AD and sCJD. Prospective studies are needed to determine whether NFL and NFHp35 may be additional tools in the differential diagnosis of rapidly progressive dementias.</AbstractText>
        </Abstract>
        <Affiliation>Department of Neurology, Radboud University Nijmegen Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Alzheimer Centre Nijmegen, The Netherlands.</Affiliation>
        <AuthorList>
            <Author>
                <LastName>van Eijk</LastName>
                <ForeName>Jeroen J J</ForeName>
                <Initials>JJ</Initials>
            </Author>
            <Author>
                <LastName>van Everbroeck</LastName>
                <ForeName>Bart</ForeName>
                <Initials>B</Initials>
            </Author>
            <Author>
                <LastName>Abdo</LastName>
                <ForeName>W Farid</ForeName>
                <Initials>WF</Initials>
            </Author>
            <Author>
                <LastName>Kremer</LastName>
                <ForeName>Berry P H</ForeName>
                <Initials>BP</Initials>
            </Author>
            <Author>
                <LastName>Verbeek</LastName>
                <ForeName>Marcel M</ForeName>
                <Initials>MM</Initials>
            </Author>
        </AuthorList>
        <Language>ENG</Language>
        <PublicationTypeList>
            <PublicationType>JOURNAL ARTICLE</PublicationType>
        </PublicationTypeList>
        <ArticleDate DateType="Electronic">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>16</Day>
        </ArticleDate>
    </Article>
    <MedlineJournalInfo>
        <MedlineTA>J Alzheimers Dis</MedlineTA>
        <NlmUniqueID>9814863</NlmUniqueID>
        <ISSNLinking>1387-2877</ISSNLinking>
    </MedlineJournalInfo>
</MedlineCitation>
<PubmedData>
    <History>
        <PubMedPubDate PubStatus="entrez">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="pubmed">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="medline">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
    </History>
    <PublicationStatus>aheadofprint</PublicationStatus>
    <ArticleIdList>
        <ArticleId IdType="pii">720R60380216K661</ArticleId>
        <ArticleId IdType="doi">10.3233/JAD-2010-090649</ArticleId>
        <ArticleId IdType="pubmed">20555148</ArticleId>
    </ArticleIdList>
</PubmedData>

如何使用Perl提取AbstractText? THX。

2 个答案:

答案 0 :(得分:6)

以下是使用XML::Twig的快速而肮脏的示例。

use 5.012;
use warnings;
use XML::Twig;

XML::Twig->new(
    twig_handlers => {
        AbstractText => sub { say $_->text },
    },
)->parsefile( 'your_data.xml' );

答案 1 :(得分:2)

使用XML解析器库。对于小东西,您可以使用XML::Simple。对于非常大的文件,XML :: Twig或XML :: Parser

使用XML :: Simple

的示例
use XML::Simple; 
my $xml = XMLin("~/junk/a.xml"); 
my $AbstractText = $xml->{PubmedArticle}->{MedlineCitation}->{Article}->{Abstract}->{AbstractText};