我正在寻找一种方法来确定下面data.table
中每个子组的父组。
Group SubGroup Level Parent
1: A A1 0 NA
2: A A2 1 A1
3: A A3 1 A1
4: A A4 2 A3
5: A A5 3 A4
6: A A6 3 A4
7: A A7 3 A4
8: A A8 2 A3
9: A A9 2 A3
10: A A10 2 A3
这是我正在使用的计算,但我想知道是否有更好的方法。我的实际数据集包含多个组,因此我还想在计算中添加by=
参数。可以假设父级是最小行索引小于当前行的子组,小级别小于当前级别。
tmp = data.table(Group = "A", SubGroup = paste0("A", 1:10),
Level = c(0, 1, 1, 2, 3, 3, 3, 2, 2, 2))
tmp[, Parent := sapply(1:nrow(tmp), function(x)
tmp[, SubGroup[(suppressWarnings(max(which(Level[1:x] < Level[x]))))]])]
答案 0 :(得分:3)
dt = data.table(Group = "A", SubGroup = paste0("A", 1:11),
Level = c(0, 1, 1, 2, 3, 3, 3, 2, 2, 2, 3))
# need another grouping layer, to satisfy the row requirements
dt[, rowGroup := cumsum(c(0, diff(Level) != 0)), by = Group]
# get the parent for each Level and rowGroup
parents = dt[, .(Level = Level[.N] + 1, Parent = SubGroup[.N]), by = .(Group, rowGroup)]
setkey(parents, Group, Level, rowGroup)
setkey(dt, Group, Level, rowGroup)
# rolling merge that matches to previous rowGroup
parents[dt, roll = T][order(Group, rowGroup)]
# Group rowGroup Level Parent SubGroup
# 1: A 0 0 NA A1
# 2: A 1 1 A1 A2
# 3: A 1 1 A1 A3
# 4: A 2 2 A3 A4
# 5: A 3 3 A4 A5
# 6: A 3 3 A4 A6
# 7: A 3 3 A4 A7
# 8: A 4 2 A3 A8
# 9: A 4 2 A3 A9
#10: A 4 2 A3 A10
#11: A 5 3 A10 A11