r - 来自子/父关系的分层数据框

时间:2015-10-11 19:43:16

标签: r hierarchy hierarchical-data

我有一个子 - 父数据框架,我想要转换为包含所有级别和级别编号的完整分层列表。下面的示例数据分为三个级别,但可能更多。我可以用什么函数来转换数据?

来源:

data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon",
  "airplane", "helicopter", "Ford", "BMW", "Airbus"), parent = c(NA, NA, NA, 
  "land", "land", "water", "air", "air", "air", "car", "car", "airplane"))

         name   parent
1        land     <NA>
2       water     <NA>
3         air     <NA>
4         car     land
5     bicycle     land
6        boat    water
7     balloon      air
8    airplane      air
9  helicopter      air
10       Ford      car
11        BMW      car
12     Airbus airplane

目的地:

data.frame(level1 = c("land", "water", "air", "land", "land", "water", "air", 
  "air", "air", "land", "land", "air"), level2 = c(NA, NA, NA, "car", "bicylcle", 
  "boat", "balloon", "airplane", "helicopter", "car", "car", "airplane"),
  level3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "Ford", "BMW", "Airbus"), 
  level_number = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3))

   level1     level2 level3 level_number
1    land       <NA>   <NA>            1
2   water       <NA>   <NA>            1
3     air       <NA>   <NA>            1
4    land        car   <NA>            2
5    land   bicylcle   <NA>            2
6   water       boat   <NA>            2
7     air    balloon   <NA>            2
8     air   airplane   <NA>            2
9     air helicopter   <NA>            2
10   land        car   Ford            3
11   land        car    BMW            3
12    air   airplane Airbus            3

3 个答案:

答案 0 :(得分:6)

Usind data.table您可以执行以下操作:

require(data.table)
l <- list() # initialize empty list
setDT(dat) 
setkey(dat, parent) # setting up the data as keyed data.table
current_lvl <- dat[is.na(parent), .(level_number = 1), keyby=.(level1 = name)]

不是 current_lvl 看起来如下(由level1键入)

   level1 level_number
1:    air            1
2:   land            1
3:  water            1

现在的诀窍是加入 dat current_lvl 并适当修改结果:

  current_lvl <- current_lvl[dat][ # Join the data.tables
!is.na(level_number)][ #exclude non-child-rows
  ,level_number := level_number + 1] # increment level_number
setnames(current_lvl, "name", paste0("level",ind+1)) # rename column
setkeyv(current_lvl, paste0("level",ind+1)) # set key

给你(由level2键入)

   level1 level_number     level2
1:    air            2   airplane
2:    air            2    balloon
3:   land            2    bicycle
4:  water            2       boat
5:   land            2        car
6:    air            2 helicopter

将其用于while - 循环,如下所示:

while(nrow(current_lvl) > 0){
  ind <- length(l) + 1
  l[[ind]] <- current_lvl
  current_lvl <- current_lvl[dat][!is.na(level_number)][,level_number := level_number + 1]
  if(nrow(current_lvl) == 0L){
    break
  }
  setnames(current_lvl, "name", paste0("level",ind+1))
  setkeyv(current_lvl, paste0("level",ind+1))
}

您可以查看 l 以查看结果。通过rbindlist组合,可以满足您的需求

res <- rbindlist(l, fill=TRUE)
setcolorder(res, sort(names(res)))
res

结果

> res
    level_number level1     level2 level3
 1:            1    air         NA     NA
 2:            1   land         NA     NA
 3:            1  water         NA     NA
 4:            2    air   airplane     NA
 5:            2    air    balloon     NA
 6:            2   land    bicycle     NA
 7:            2  water       boat     NA
 8:            2   land        car     NA
 9:            2    air helicopter     NA
10:            3    air   airplane Airbus
11:            3   land        car    BMW
12:            3   land        car   Ford

答案 1 :(得分:5)

使用data.tree包,您可以执行以下操作:

JSONObject obj = new JSONObject(gson.toJson(song));

请注意,我用“root”替换了NA,这使得转换为data.tree变得更加容易。即:

compile 'com.google.code.gson:gson:1.7.2'

获取所需的格式然后变得微不足道,因为我们可以使用data.tree中的层次结构基础结构:

library(data.tree)
df <- data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon", "airplane", "helicopter", "Ford", "BMW", "Airbus"), 
                 parent = c("root", "root", "root", "land", "land", "water", "air", "air", "air", "car", "car", "airplane"))

答案 2 :(得分:1)

不要将"root"用作toplevel-records的父值。使用data.tree-package的解决方案很棒,但是在较新版本中"root"是节点的保留名称。尽管它被自动替换为“root2”,但对FromDataFrameNetwork(df)的调用并不会返回所需的树。