在R中解析具有不同数量的具有相同名称的子节点的XML

时间:2017-03-15 19:43:54

标签: r xml

我要使用以下代码使用R解析以下XMl文件:

Fun2 <-function(xdata){
    dumFun <- function(x){
        xname <- xmlName(x)
        xattrs <- xmlAttrs(x)
        c(sapply(xmlChildren(x), xmlValue), name = xname, xattrs)
    }
    dum <- xmlParse(xdata)
    as.data.frame(t(xpathSApply(dum, "//*/name", dumFun)), stringsAsFactors = FALSE)
}

我要添加的是另一列ID为52和53的XML列。问题是ID有2个值,但标签为#34;名称&#34;有6个值,感谢任何帮助。

<?xml version='1.0' encoding='UTF-8'?>
<gwl>
  <version>20161109152411</version>
  <entities>
    <entity id="52" version="1234">
      <names>
        <name type="primary">Carl A.</name>
        <name type="alt">David A.</name>
        <name type="alt">Daniel A.</name>
      </names>
    </entity>

    <entity id="53" version="12346">
      <names>
        <name type="primary">Carl B.</name>
        <name type="alt">David B.</name>
        <name type="alt">Daniel B.</name>
      </names>
    </entity>
  </entities>
</gwl>

所需的输出如下:

-----------------------------------
|Column1      | Column2  | Column3|
-----------------------------------
|52           | Carl A.  | primary|
-----------------------------------
|52           | David A. | alt    |
-----------------------------------
|52           | Daniel A.| alt    |
-----------------------------------
|53           | Carl B.  | primary|
-----------------------------------
|53           | David B. | alt    |
-----------------------------------
|53           | Daniel B.| alt    |
-----------------------------------

1 个答案:

答案 0 :(得分:1)

编辑:根据编辑后的所需输出

获取ID值并循环遍历每个ID的节点集,并获取name节点的xmlvalue和属性。最后使用rbind将所有内容组合在一起并将其转换为数据框。

df1 <- do.call( 'rbind', lapply( xmlSApply(doc["//entity"], function(x) xmlGetAttr(x, "id")), 
                                 function(x) {
                                   t( xmlSApply( doc[ paste("//entity[@id=", x, "]//name", sep = "") ], 
                                                 function( y ) c(x, xmlValue(y), xmlAttrs(y)) ))
                                 }))

colnames( df1 ) <- c( 'Column1', 'Column2', 'Column3' )
df1 <- data.frame( df1, stringsAsFactors = FALSE )
df1
#   Column1   Column2 Column3
# 1      52   Carl A. primary
# 2      52  David A.     alt
# 3      52 Daniel A.     alt
# 4      53   Carl B. primary
# 5      53  David B.     alt
# 6      53 Daniel B.     alt 

数据:

library(XML)
doc <- xmlParse('<gwl>
                    <version>20161109152411</version>
                    <entities>
                    <entity id="52" version="1234">
                    <names>
                    <name type="primary">Carl A.</name>
                    <name type="alt">David A.</name>
                    <name type="alt">Daniel A.</name>
                    </names>
                    </entity>
                    <entity id="53" version="12346">
                    <names>
                    <name type="primary">Carl B.</name>
                    <name type="alt">David B.</name>
                    <name type="alt">Daniel B.</name>
                    </names>
                    </entity>
                    </entities>
                    </gwl>')