使用perl XML :: LibXML解析XML

时间:2013-10-28 03:21:30

标签: xml perl xml-libxml

我有一个webservice,它以下列格式返回XML。我正在使用XML :: LibXML来解析输出。

<QueryResponse xmlns="http://www.exchangenetwork.net/schema/node/2" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
    <LastSet>true</LastSet>
    <Results>
        <SRS:SubstanceInformation xsi:schemaLocation="http://www.exchangenetwork.net/schema/SRS/3 http://www.exchangenetwork.net/schema/SRS/3" xmlns:SRS="http://www.exchangenetwork.net/schema/SRS/3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <SRS:ChemicalSubstance>
                <SRS:ChemicalSubstanceIdentification>
                    <SRS:EPAChemicalInternalNumber>76109</SRS:EPAChemicalInternalNumber>
                    <SRS:CASRegistryNumber>1000-82-4</SRS:CASRegistryNumber>
                    <SRS:ChemicalSubstanceSystematicName>Urea, N-(hydroxymethyl)-</SRS:ChemicalSubstanceSystematicName>
                    <SRS:EPAChemicalRegistryName>Methylolurea</SRS:EPAChemicalRegistryName>
                    <SRS:EPAChemicalIdentifier/>
                    <SRS:ChemicalSubstanceDefinitionText/>
                    <SRS:ChemicalSubstanceCommentText/>
                    <SRS:MolecularFormulaCode>C2H6N2O2</SRS:MolecularFormulaCode>
                    <SRS:ChemicalSubstanceFormulaWeightQuantity>90.08</SRS:ChemicalSubstanceFormulaWeightQuantity>
                    <SRS:ChemicalSubstanceLinearStructureCode>O=C(NCO)N</SRS:ChemicalSubstanceLinearStructureCode>
                    <SRS:InternationalChemicalIdentifier/>
                    <SRS:FormerCASRegistryNumberList/>
                    <SRS:IncorrectlyUsedCASRegistryNumberList>
                        <SRS:CASRegistryNumber>50-00-0</SRS:CASRegistryNumber>
                    </SRS:IncorrectlyUsedCASRegistryNumberList>
                    <SRS:ClassificationList/>
                    <SRS:TechnicalPointOfContact/>
                    <SRS:SubstanceRequestor/>
                    <SRS:SubstanceCreateDate>2006-10-13 14:30:12.0</SRS:SubstanceCreateDate>
                    <SRS:SubstanceLastUpdateDate>2010-01-20 12:29:21.0</SRS:SubstanceLastUpdateDate>
                    <SRS:SubstanceStatus>A</SRS:SubstanceStatus>
                </SRS:ChemicalSubstanceIdentification>
                <SRS:ChemicalSubstanceSynonymList>
                    <SRS:ChemicalSubstanceSynonym>
                        <SRS:ChemicalSubstanceSynonymName>Urea, (hydroxymethyl)-</SRS:ChemicalSubstanceSynonymName>
                        <SRS:ChemicalSynonymStatusName>Reviewed</SRS:ChemicalSynonymStatusName>
                        <SRS:ChemicalSynonymSourceName>Chemical Update System (CUS) 1986</SRS:ChemicalSynonymSourceName>
                        <SRS:RegulationReasonText/>
                        <SRS:CharacteristicList/>
                        <SRS:AlternateIdentifierList/>
                    </SRS:ChemicalSubstanceSynonym>
                </SRS:ChemicalSubstanceSynonymList>
            </SRS:ChemicalSubstance>
        </SRS:SubstanceInformation>
    </Results>
    <RowCount>1</RowCount>
    <RowId>0</RowId>
</QueryResponse>

我无法弄清楚如何到达XML中的ChemicalSubstanceIdentification节点。我的代码是

my $parser = XML::LibXML->load_xml(location => 'output.xml');

my $doc = XML::LibXML::XPathContext->new($parser);
$doc->registerNs('SRS', 'http://www.exchangenetwork.net/schema/SRS/3');
my $chemIdent = $doc->findnodes('/QueryResponse/Results/SRS:SubstanceInformation/SRS:ChemicalSubstance/SRS:ChemicalSubstanceIdentification');

我正在做的事情有问题。任何帮助表示赞赏。谢谢!

1 个答案:

答案 0 :(得分:3)

XPath上的前几个元素位于XML文档的http://www.exchangenetwork.net/schema/node/2命名空间中。您必须为QueryResponseResults元素指定该命名空间才能使XPath正常工作。

或者,如果SRS:SubstanceInformation中只有一个Results,您可以通过QueryResponse跳过Results//

//SRS:SubstanceInformation/SRS:ChemicalSubstance/SRS:ChemicalSubstanceIdentification