递归地为data.tree对象中的节点分配唯一名称

时间:2016-02-08 20:07:09

标签: r tree

我正在使用从XML块收集的列表列表,我希望用R的data.tree包定义的对象来表示。下面的示例似乎有用,我可以提取元素来自列表列表的data.tree表示。但是,我无法弄清楚如何使用任何文本格式或可视化选项(例如igraph),因为" child"每个列表的元素都没有唯一标记。

理想情况下,我想以递归方式重新命名"儿童"带序列号。例如,转换为:

    Children
    |-- RuleRule
    |-- RuleRule
    |-- RuleRule

对此:

    Children
    |-- RuleRule_01
    |-- RuleRule_02
    |-- RuleRule_03

甚至更好,重新命名"儿童"根据

等属性

儿童

    |-- RuleRule_15976
    |-- RuleRule_49444
    |-- RuleRule_15748

以下a similar question 几乎我正在寻找的东西。我不确定使用data.tree功能是否会简化子元素的重命名,或者是否应在初始化data.tree对象之前完成此操作。 data.tree的树遍历功能似乎是正确的路径,特别是因为我将使用的数据类型可以在任何级别上有多组子项。

一个独立的例子:

library(data.tree)

# a typical list
l <- structure(list(RuleStart = structure(list(Children = structure(list(
RuleOperator = structure(list(Children = structure(list(RuleRule = structure(list(
    Children = NULL, RefId = "49446"), .Names = c("Children", 
"RefId")), RuleRule = structure(list(Children = NULL, RefId = "15976"), .Names = c("Children", 
"RefId")), RuleRule = structure(list(Children = NULL, RefId = "49444"), .Names = c("Children", 
"RefId")), RuleRule = structure(list(Children = NULL, RefId = "15748"), .Names = c("Children", 
"RefId")), RuleRule = structure(list(Children = NULL, RefId = "49440"), .Names = c("Children", 
"RefId")), RuleRule = structure(list(Children = NULL, RefId = "15746"), .Names = c("Children", 
"RefId")), RuleRule = structure(list(Children = NULL, RefId = "49449"), .Names = c("Children", 
"RefId"))), .Names = c("RuleRule", "RuleRule", "RuleRule", 
"RuleRule", "RuleRule", "RuleRule", "RuleRule")), Type = "product"), .Names = c("Children", 
"Type"))), .Names = "RuleOperator")), .Names = "Children")), .Names = "RuleStart")

# convert XML list into data.tree object
n <- FromListExplicit(l$RuleStart, nameName=NULL, childrenName='Children')

# check
print(n, 'RefId')

1 个答案:

答案 0 :(得分:1)

感谢data.tree的作者提出的建议。以下函数将递归地重命名列表的元素。它似乎有效,但欢迎提出意见或更好的解决方案。

makeNamesUnique <- function(l) {
  l.names <- names(l$Children)
  # multiple children types
  tab <- table(l.names)
  t.names <- names(tab)

  # iterate over types
  for(this.type in seq_along(t.names)) {
    # iterate over duplicate names
    # get an index to this type
    idx <- which(l.names == t.names[this.type])
    for(this.element in seq_along(idx)) {
      # make a copy of this chunk of the tree
      l.sub <- l$Children[[idx[this.element]]]
      # if this is a terminal leaf then re-name and continue
      if(is.null(l.sub$Children)) {
        # print('leaf')
        names(l$Children)[idx[this.element]] <- paste0(t.names[this.type], '_', this.element)
      }
      # otherwise re-name and then step into this element and apply this function recursively
      else {
        # print('branch')
        names(l$Children)[idx[this.element]] <- paste0(t.names[this.type], '_', this.element)
        # fix this branch and splice back into tree
        l$Children[[idx[this.element]]] <- makeNamesUnique(l.sub)
      }
    }
  }

  return(l)
}