data.tree节点通过Id' s

时间:2017-04-14 20:42:52

标签: r tree hierarchical-data

我的数据是通过IdParentId系统链接的,我已设法添加正确的整数levels,但是,我想编写一个自动嵌套我的5层的函数层次结构为pathString的{​​{1}}。

结构:

data.tree

目前,我遇到的问题是pathString只能由以下单层读取:

Id                 Name               ParentId           ParentName    Level
701F0000006Iw8E    'Paid Media'       NA                 NA            1
701F0000006IS1t    'Bing ABC'         701F0000006Iw8Y    'Bing'        3    
701F0000006IS28    'Bing DEF'         701F0000006Iw8Y    'Bing'        3
701F0000006IS23    'Bing GHI'         701F0000006Iw8Y    'Bing'        3
701F0000006Imq9    'Bing JKL'         701F0000006Iw8Y    'Bing'        3
701F0000006IS1y    'Bing MNO'         701F0000006Iw8Y    'Bing'        3
701F0000006Iw8Y    'Bing'             701F0000006Iw8E    'Paid Media'  2
701F0000006IvcW    'Google'           701F0000006Iw8E    'Paid Media'  2
7012A000006rhY8    'Adwords ABC'      701F0000006IvcW    'Google'      3
701F0000006IS1j    'Adwords DEF'      701F0000006IvcW    'Google'      3
701F0000006IS1o    'Adwords GHI'      701F0000006IvcW    'Google'      3
701F0000006IS1Z    'Adwords JKL'      701F0000006IvcW    'Google'      3
701F0000006Ieci    'Adwords MNO'      701F0000006IvcW    'Google'      3

实施例。

dat$pathString <- paste(dat$ParentId, 
      dat$Id, 
      sep = "/")

实际上,要正确填充整个树,我需要识别字符串中的所有后续父项:

 "NA/701F0000000SOEq"

理想情况下,单个表达式可以在所有级别上等效运行,但我了解每个级别是否需要单独处理。

Full Id,ParentId系统: Dropbox Link

1 个答案:

答案 0 :(得分:3)

虽然您的问题要求输入路径字符串,但可以直接从数据框格式构建树。

library(data.tree)
dat <- read.table(text="
Id                 Name               ParentId           ParentName    Level
701F0000006Iw8E    'Paid Media'       NA                 NA            1
701F0000006IS1t    'Bing ABC'         701F0000006Iw8Y    'Bing'        2    
701F0000006IS28    'Bing DEF'         701F0000006Iw8Y    'Bing'        2
701F0000006IS23    'Bing GHI'         701F0000006Iw8Y    'Bing'        2
701F0000006Imq9    'Bing JKL'         701F0000006Iw8Y    'Bing'        2
701F0000006IS1y    'Bing MNO'         701F0000006Iw8Y    'Bing'        2
701F0000006Iw8Y    'Bing'             701F0000006Iw8E    'Paid Media'  3
701F0000006IvcW    'Google'           701F0000006Iw8E    'Paid Media'  3
7012A000006rhY8    'Adwords ABC'      701F0000006IvcW    'Google'      2
701F0000006IS1j    'Adwords DEF'      701F0000006IvcW    'Google'      2
701F0000006IS1o    'Adwords GHI'      701F0000006IvcW    'Google'      2
701F0000006IS1Z    'Adwords JKL'      701F0000006IvcW    'Google'      2
701F0000006Ieci    'Adwords MNO'      701F0000006IvcW    'Google'      2
", header=TRUE, stringsAsFactors = F)

# network build does not want a root node as a row, so adjust
# the given root to link to "tree_root"
dat$ParentId[is.na(dat$ParentId)] <- "tree_root"

# build the tree using the network layout - pairs of node ids
# in the first two columns. Remaining columns are node attributes
dat_network <- subset(dat, !is.na(dat$ParentId), c("Id", "ParentId", "Name"))
dat_tree <- FromDataFrameNetwork(dat_network, check = "check")

print(dat_tree, 'Name')

# 1  tree_root                              
# 2   °--701F0000006Iw8E          Paid Media
# 3       ¦--701F0000006Iw8Y            Bing
# 4       ¦   ¦--701F0000006IS1t    Bing ABC
# 5       ¦   ¦--701F0000006IS28    Bing DEF
# 6       ¦   ¦--701F0000006IS23    Bing GHI
# 7       ¦   ¦--701F0000006Imq9    Bing JKL
# 8       ¦   °--701F0000006IS1y    Bing MNO
# 9       °--701F0000006IvcW          Google
# 10          ¦--7012A000006rhY8 Adwords ABC
# 11          ¦--701F0000006IS1j Adwords DEF
# 12          ¦--701F0000006IS1o Adwords GHI
# 13          ¦--701F0000006IS1Z Adwords JKL
# 14          °--701F0000006Ieci Adwords MNO