R递归合并层次结构数据

时间:2017-03-04 15:59:48

标签: r

我有一个数据框,其中包含来自gnucash mysql数据库的帐户的子字段和父字段。我想将帐户层次结构存储在数据框中。在过去,我在mySQL中使用了递归连接,但随着层次结构的深入,它变得很麻烦。你还必须知道你的树有多少级别。我希望在R中有一种更简单的方法来构建层次结构(有或没有最大深度的知识)。

示例数据:

account_id <- c(1:11)
account_name <- c('root_account','dining', 'food', 'discretionary_expense',
                  'expenses', 'base_salary_wife', 'base_salary_husband',
                  'base_salary', 'salary', 'taxable_income',
                  'income')
account_parentid <- c(NA,3,4,5,1,8,8,9,10,11,1)
test.data <- data.frame(account_id, account_name, account_parentid)

期望的输出:

 account_id          account_name account_parentid lvl2_parentid lvl3_parentid lvl4_parentid lvls
1           1          root_account               NA            NA            NA            NA   NA
2           2                dining                3             4             6            NA    4
3           3                  food                4             5            NA            NA    3
4           4 discretionary_expense                5            NA            NA            NA    2
5           5              expenses                1            NA            NA            NA    1
6           6      base_salary_wife                8             9            10            11    5
7           7   base_salary_husband                8             9            10            11    5
8           8           base_salary                9            10            11            NA    4
9           9                salary               10            11            NA            NA    3
10         10        taxable_income               11            NA            NA            NA    2
11         11                income                1            NA            NA            NA    1

1 个答案:

答案 0 :(得分:3)

您可以使用data.tree包来处理分层数据:

获取测试数据:

account_id <- c(1:11)
account_name <- c('root_account','dining', 'food', 'discretionary_expense',
                  'expenses', 'base_salary_wife', 'base_salary_husband',
                  'base_salary', 'salary', 'taxable_income',
                  'income')
account_parentid <- c(NA,3,4,5,1,8,8,9,10,11,1)
test.data <- data.frame(account_id, account_parentid, account_name, stringsAsFactors = F)

转换为data.tree结构:

library(data.tree)
tree1 <- FromDataFrameNetwork(test.data[-1,])
tree1$account_name <- 'root_account'

显示:

ToDataFrameTree(tree1, account = 'name', 'account_name', 'pathString')

这将显示如下:

               levelName account          account_name    pathString
1  1                           1          root_account             1
2   ¦--5                       5              expenses           1/5
3   ¦   °--4                   4 discretionary_expense         1/5/4
4   ¦       °--3               3                  food       1/5/4/3
5   ¦           °--2           2                dining     1/5/4/3/2
6   °--11                     11                income          1/11
7       °--10                 10        taxable_income       1/11/10
8           °--9               9                salary     1/11/10/9
9               °--8           8           base_salary   1/11/10/9/8
10                  ¦--6       6      base_salary_wife 1/11/10/9/8/6
11                  °--7       7   base_salary_husband 1/11/10/9/8/7

不是问题的一部分,但真正有趣的地方在于您想要总结层次结构等。请参阅data.tree vignettes herehere