递归提取XML属性

时间:2017-11-01 19:36:02

标签: r xml2

我有一个XML文档:

library("xml2")

xml_ex <- '
<Member name="ONE">
    <Member name="A"/>
    <Member name="B">
        <Member name="1"/>
        <Member name="2"/>
    </Member>
    <Member name="C"/>
</Member>'

ex <- read_xml(xml_ex)

如何在保留层次关系的同时从每个name中提取Member属性? E.g:

structure(
  list(
    ONE = structure(
      list(
        A = "", 
        B = structure(
          list(
            `1` = "",
            `2` = ""
            ), 
            .Names = c("1", "2")
        ), 
        C = ""),
        .Names = c("A", "B", "C")
    )
  ),
  .Names = "ONE"
)
## $ONE
## $ONE$A
## [1] ""
## 
## $ONE$B
## $ONE$B$`1`
## [1] ""
## 
## $ONE$B$`2`
## [1] ""
## 
## $ONE$C
## [1] ""

编辑:更改了目标输出

1 个答案:

答案 0 :(得分:0)

我已经到达了下面的解决方案,但有点笨重。

takeTheChildren <- function(x, search) {
  # extracting the nth node (search) from the nodeset x
  lapply(search, xml2::xml_child, x = x)
}

hierBuilder <- function(nodes) {

  if (!requireNamespace("xml2", quietly = TRUE)) {
    stop("`xml2` needed for this function to work. Please install it.", call. = FALSE)
  }

  # if we reach the leaf level of any of the node sets,
  # just return an empty string
  if (length(nodes) == 0L) {
    return("")
  }

  # extract the names of each of the current top level nodes
  names(nodes) <- sapply(nodes, xml2::xml_attr, attr = 'name')

  # count the children each of the current top level node has, make a sequence
  seq_ix <- lapply(nodes, function(node) {
    seq(xml2::xml_children(node))
  })

  # make a list of individual child nodes under each of the current top level
  # nodes, while preserving the hierarchy
  children <- mapply(takeTheChildren, x = nodes, search = seq_ix, SIMPLIFY = FALSE)

  # recurse on the current node's children
  return(lapply(children, hierBuilder))
}

一个恼人的要求是我们必须将初始的xml_doc或xml_nodeset作为递归工作的列表传递:

hierBuilder(list(ex))
## $ONE
## $ONE$A
## [1] ""
## 
## $ONE$B
## $ONE$B$`1`
## [1] ""
## 
## $ONE$B$`2`
## [1] ""
## 
## $ONE$C
## [1] ""