在R中解析嵌套XML(带名称空间)

时间:2017-08-11 11:31:00

标签: r xml xpath xml-parsing nested

我正在尝试解析来自Web API的xml响应。

对于如下的简单xml,我可以使用xpathSApply并轻松获取相关数据。

以下是example.xml

<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
    <PLANT>
        <COMMON>Bloodroot</COMMON>
        <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
        <ZONE>4</ZONE>
        <LIGHT>Mostly Shady</LIGHT>
        <PRICE>$2.44</PRICE>
        <AVAILABILITY>031599</AVAILABILITY>
    </PLANT>
    <PLANT>
        <COMMON>Columbine</COMMON>
        <BOTANICAL>Aquilegia canadensis</BOTANICAL>
        <ZONE>3</ZONE>
        <LIGHT>Mostly Shady</LIGHT>
        <PRICE>$9.37</PRICE>
        <AVAILABILITY>030699</AVAILABILITY>
    </PLANT>
</CATALOG>

>library(XML)
>doc<-xmlTreeParse("example.xml",useInternal=TRUE) 
>rootNode<-xmlRoot(doc)
>xpathSApply(rootNode,"//COMMON",xmlValue)
[1] "Bloodroot" "Columbine"

> getNodeSet(doc,"//PLANT")
[[1]]
<PLANT>
  <COMMON>Bloodroot</COMMON>
  <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
  <ZONE>4</ZONE>
  <LIGHT>Mostly Shady</LIGHT>
  <PRICE>$2.44</PRICE>
  <AVAILABILITY>031599</AVAILABILITY>
</PLANT> 

[[2]]
<PLANT>
  <COMMON>Columbine</COMMON>
  <BOTANICAL>Aquilegia canadensis</BOTANICAL>
  <ZONE>3</ZONE>
  <LIGHT>Mostly Shady</LIGHT>
  <PRICE>$9.37</PRICE>
  <AVAILABILITY>030699</AVAILABILITY>
</PLANT> 

attr(,"class")
[1] "XMLNodeSet"

> xmlSApply(getNodeSet(rootNode,"//PRICE"),xmlValue) #provides a list of all PRICE values in the xml
[1] "$2.44" "$9.37"

但是,相同的命令不适用于具有命名空间详细信息的以下xml。无论如何,我可以获取节点/标签中的数据。

以下是example1.xml

<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"><s:Body><GetByFilterTradeResponse xmlns="http://entrader.contigoenergy.com/Contigo.Entrader.Service"><GetByFilterTradeResult xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<CATALOG>
    <CATEGORY>
        <FAMILY>
            <PLANT>
                <COMMON>Bloodroot</COMMON>
                <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
                <ZONE>4</ZONE>
                <DETAILS>
                    <PRICEINBULK>2.3</PRICEINBULK>
                    <MINVOLUME>100</MINVOLUME>
                </DETAILS>
                <LIGHT>Mostly Shady</LIGHT>
                <PRICE>$2.44</PRICE>
                <AVAILABILITY>031599</AVAILABILITY>
            </PLANT>
            <PLANT>
                <COMMON>Columbine</COMMON>
                <BOTANICAL>Aquilegia canadensis</BOTANICAL>
                <ZONE>3</ZONE>
                <DETAILS>
                    <PRICEINBULK>9.00</PRICEINBULK>
                    <MINVOLUME>100</MINVOLUME>
                </DETAILS>
                <LIGHT>Mostly Shady</LIGHT>
                <PRICE>$9.37</PRICE>
                <AVAILABILITY>030699</AVAILABILITY>
            </PLANT>
        </FAMILY>
    </CATEGORY> 
</CATALOG>
</GetByFilterTradeResult></GetByFilterTradeResponse></s:Body></s:Envelope>

以下命令不会从上面的xml中提取节点值

>doc<-xmlTreeParse("example1.xml",useInternal=TRUE) 
>rootNode<-xmlRoot(doc) 
> xpathSApply(rootNode,"//COMMON",xmlValue) 
list()

> getNodeSet(doc,"//PLANT")
list()
attr(,"class")
[1] "XMLNodeSet"

> xmlSApply(getNodeSet(rootNode,"//PRICE"),xmlValue) 
list()

0 个答案:

没有答案