我习惯于使用SQL来解决分层连接,但是我想知道是否可以在Python中完成,也许可以使用Pandas。哪一个更有效呢?
CSV数据:
emp_id,fn,ln,mgr_id
1,Matthew,Reichek,NULL
2,John,Cottone,3
3,Chris,Winter,1
4,Sergey,Bobkov,2
5,Andrey,Botelli,2
6,Karen,Goetz,7
7,Tri,Pham,3
8,Drew,Thompson,7
9,BD,Alabi,7
10,Sreedhar,Kavali,7
我想找到每个员工的级别(老板是1级,依此类推):
我在SQL中的递归代码为:
with recursive cte as
(
select employee_id, first_name, last_name, manager_id, 1 as level
from icqa.employee
where manager_id is null
union
select e.employee_id, e.first_name, e.last_name, e.manager_id, cte.level + 1
from icqa.employee e
inner join cte
on e.manager_id = cte.employee_id
where e.manager_id is not null
)
select * from cte
答案 0 :(得分:1)
您可以将字典emp_id
映射到mgr_id
,然后创建递归函数,如
idmap = dict(zip(df['emp_id'], df['mgr_id']))
def depth(id_):
if np.isnan(id_):
return 1
return depth(idmap[id_]) + 1
计算给定id
的深度。
为了提高效率(不对相同的id
重复计算),
您可以使用备忘录(由下面的@functools.lru_cache
decorator处理):
import numpy as np
import pandas as pd
import functools
nan = np.nan
df = pd.DataFrame({'emp_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fn': ['Matthew', 'John', 'Chris', 'Sergey', 'Andrey', 'Karen', 'Tri', 'Drew', 'BD', 'Sreedhar'], 'ln': ['Reichek', 'Cottone', 'Winter', 'Bobkov', 'Botelli', 'Goetz', 'Pham', 'Thompson', 'Alabi', 'Kavali'], 'mgr_id': [nan, 3.0, 1.0, 2.0, 2.0, 7.0, 3.0, 7.0, 7.0, 7.0]})
def make_depth(df):
idmap = dict(zip(df['emp_id'], df['mgr_id']))
@functools.lru_cache()
def depth(id_):
if np.isnan(id_):
return 1
return depth(idmap[id_]) + 1
return depth
df['depth'] = df['mgr_id'].apply(make_depth(df))
print(df.sort_values(by='depth'))
收益
emp_id fn ln mgr_id depth
0 1 Matthew Reichek NaN 1
2 3 Chris Winter 1.0 2
1 2 John Cottone 3.0 3
6 7 Tri Pham 3.0 3
3 4 Sergey Bobkov 2.0 4
4 5 Andrey Botelli 2.0 4
5 6 Karen Goetz 7.0 4
7 8 Drew Thompson 7.0 4
8 9 BD Alabi 7.0 4
9 10 Sreedhar Kavali 7.0 4