XML到Dataframe

时间:2014-12-02 09:57:42

标签: xml r dataframe

有谁知道如何将以下XML转换为R数据帧?

    <?xml version="1.0"?>
    <soap:Envelope>
      <soap:Body>
       <getCampaignsResponse>
                <getCampaignsResult>
                    <campaign>
                        <categoryBids>
                                <categoryBid>
                                    <campaignCategoryUID>1234</campaignCategoryUID>
                                    <campaignID>1211</campaignID>
                                    <categoryID>1254</categoryID>
                                    <selected>true</selected>
                                    <bidInformation>
                                      <biddingStrategy>Cpc</biddingStrategy>
                                      <cpcBid>
                                        <cpc>0.5</cpc>
                                      </cpcBid>
                                      <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                                <categoryBid>
                                      <campaignCategoryUID>5487</campaignCategoryUID>
                                      <campaignID>3244</campaignID>
                                      <categoryID>1234</categoryID>
                                      <selected>true</selected>
                                      <bidInformation>
                                        <biddingStrategy>Cpc</biddingStrategy>
                                        <cpcBid>
                                          <cpc>0.2</cpc>
                                        </cpcBid>
                                        <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                      </categoryBids>
                  </campaign>
              </getCampaignsResult>
          </getCampaignsResponse>
      </soap:Body>
  </soap:Envelope>

XML对象的类是:

> str(data)  
Classes 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr> 

数据框应包含以下列:
campaignCategoryUID
CAMPAIGNID
的categoryID
BIDDINGSTRATEGY
cpc

使用xmlToDataFramexmlToList,我无法取得有用的结果。任何帮助都非常感谢!

1 个答案:

答案 0 :(得分:1)

您必须手动提取节点xpathSApply,并且可能需要更改解析响应的方式,因为它没有任何命名空间定义:

library(XML)

xml <- '<?xml version="1.0"?>
    <soap:Envelope>
      <soap:Body>
       <getCampaignsResponse>
                <getCampaignsResult>
                    <campaign>
                        <categoryBids>
                                <categoryBid>
                                    <campaignCategoryUID>1234</campaignCategoryUID>
                                    <campaignID>1211</campaignID>
                                    <categoryID>1254</categoryID>
                                    <selected>true</selected>
                                    <bidInformation>
                                      <biddingStrategy>Cpc</biddingStrategy>
                                      <cpcBid>
                                        <cpc>0.5</cpc>
                                      </cpcBid>
                                      <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                                <categoryBid>
                                      <campaignCategoryUID>5487</campaignCategoryUID>
                                      <campaignID>3244</campaignID>
                                      <categoryID>1234</categoryID>
                                      <selected>true</selected>
                                      <bidInformation>
                                        <biddingStrategy>Cpc</biddingStrategy>
                                        <cpcBid>
                                          <cpc>0.2</cpc>
                                        </cpcBid>
                                        <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                      </categoryBids>
                  </campaign>
              </getCampaignsResult>
          </getCampaignsResponse>
      </soap:Body>
  </soap:Envelope>'

doc <- xmlRoot(xmlTreeParse(xml, useInternalNodes = TRUE))

data <- data.frame(campaignCategoryUID=xpathSApply(doc, "//campaignCategoryUID", xmlValue),
                   campaignID=xpathSApply(doc, "//campaignID", xmlValue),
                   categoryID=xpathSApply(doc, "//categoryID", xmlValue),
                   biddingStrategy=xpathSApply(doc, "//biddingStrategy", xmlValue),
                   cpc=xpathSApply(doc, "//cpc", xmlValue))

data

##   campaignCategoryUID campaignID categoryID biddingStrategy cpc
## 1                1234       1211       1254             Cpc 0.5
## 2                5487       3244       1234             Cpc 0.2

您还可以在功能上进行提取:

nodes <- c("campaignCategoryUID", "campaignID", "categoryID", "biddingStrategy", "cpc")
data <- rbind.data.frame(sapply(nodes, function(x) xpathSApply(doc, sprintf("//%s", x), xmlValue)))

如果您不需要处理边缘情况(即,如果所有提取都是统一的并且不会出现&#34;错误&#34;)。