我通常以“组织树”格式分析数据,以了解组织内特定领导者下的活动频率。我需要从两列数据中生成一个宽层次结构:员工姓名和主管名称。
----------
df <- data.frame("Employee"=c("Bill","James","Amy","Jen","Henry"),
"Supervisor"=c("Jen","Jen","Steve","Amy","Amy"))
df
# Employee Supervisor
# 1 Bill Jen
# 2 James Jen
# 3 Amy Steve
# 4 Jen Amy
# 5 Henry Amy
以一个宽的数据框结束,指定组织结构图,从CEO(或最高级员工)开始:
# Employee H1 H2 H3
# 1 Bill Steve Amy Jen
# 2 James Steve Amy Jen
# 3 Amy Steve NA NA
# 4 Jen Steve Amy NA
# 5 Henry Steve Amy NA
经过大量研究,data.tree
包似乎提供了最多的帮助。我该如何执行此操作?
答案 0 :(得分:2)
试试这个:
library(data.table)
setDT(df)
setnames(df, 'Supervisor', 'Supervisor.1')
j=1
while (df[, any(get(paste0('Supervisor.',j)) %in% Employee)]) {
df[df, on=paste0('Supervisor.',j,'==Employee'),
paste0('Supervisor.',j+1):= i.Supervisor.1]
j = j + 1
}
> df
# Employee Supervisor.1 Supervisor.2 Supervisor.3
# 1: Bill Jen Amy Steve
# 2: James Jen Amy Steve
# 3: Amy Steve NA NA
# 4: Jen Amy Steve NA
# 5: Henry Amy Steve NA
要在行内重新排序:
df = cbind(df[, 1], t(apply(df[, -1], 1, function(r) c(rev(r[!is.na(r)]), r[is.na(r)]))))
> df
# Employee V1 V2 V3
# 1: Bill Steve Amy Jen
# 2: James Steve Amy Jen
# 3: Amy Steve NA NA
# 4: Jen Steve Amy NA
# 5: Henry Steve Amy NA
答案 1 :(得分:1)
如果您不坚持输出,但想要使用层次结构,那么data.tree是一个很好的选择。以下是一些例子:
libary(data.tree)
df <- data.frame("Employee"=c("Bill","James","Amy","Jen","Henry"),
"Supervisor"=c("Jen","Jen","Steve","Amy","Amy"))
dt <- FromDataFrameNetwork(df)
#here's your org chart:
print(dt)
让我们找到Jennas的下属,以及他们在等级中的等级:
Get(FindNode(dt, 'Jen')$leaves, 'level')
这将返回如下:
Bill James
4 4
为了好玩,让我们添加人事预算:
dt$Set(salary = c(100000, 80000, 60000, 40000, 35000, 70000))
打印工资和累计工资
print(dt, 'salary', sal_subordinates = function(node) Aggregate(node, 'salary', sum))
这将打印如下:
levelName salary sal_subordinates
1 Steve 100000 80000
2 °--Amy 80000 130000
3 ¦--Jen 60000 75000
4 ¦ ¦--Bill 40000 40000
5 ¦ °--James 35000 35000
6 °--Henry 70000 70000
data.tree vignettes有更多使用分层数据和聚合的例子。