如何优雅地将下面的递归SQL查询移植到Pandas python代码? 不知何故,如果不编写自己的递归函数,我就不会看到直截了当的方式?
Python示例代码:
import datetime
import numpy as np
import pandas as pd
import pandas.io.data
from pandas import Series, DataFrame
data = {
'ID': [1,2,3,4,5,6,7,8],
'Name': ['Keith','Josh','Robin','Raja','Tridip','Arijit','Amit','Dev'],
'MgrID': [0,1,1,2,0,5,5,6]
}
df = pd.DataFrame.from_dict(data)
df.set_index('ID', inplace=True, drop=False, append=False)
df.ix[df.query('MgrID >0')['MgrID']]
试图得到这个:
nLevel ID Name
================================
1 6 Arijit
2 8 Dev
1 1 Keith
2 2 Josh
2 3 Robin
3 4 Raja
1 5 Tridip
2 7 Amit
递归SQL查询:
;WITH Employee (ID, Name, MgrID) AS
(
SELECT 1, 'Keith', NULL UNION ALL
SELECT 2, 'Josh', 1 UNION ALL
SELECT 3, 'Robin', 1 UNION ALL
SELECT 4, 'Raja', 2 UNION ALL
SELECT 5, 'Tridip', NULL UNION ALL
SELECT 6, 'Arijit', NULL UNION ALL
SELECT 7, 'Amit', 5 UNION ALL
SELECT 8, 'Dev', 6
)
,Hierarchy AS
(
-- Anchor
SELECT ID
,Name
,MgrID
,nLevel = 1
,Family = ROW_NUMBER() OVER (ORDER BY Name)
FROM Employee
WHERE MgrID IS NULL
UNION ALL
-- Recursive query
SELECT E.ID
,E.Name
,E.MgrID
,H.nLevel+1
,Family
FROM Employee E
JOIN Hierarchy H ON E.MgrID = H.ID
)
SELECT nLevel ,ID,space(nLevel+(CASE WHEN nLevel > 1 THEN nLevel ELSE 0 END))+Name Name FROM Hierarchy ORDER BY Family, nLevel
答案 0 :(得分:1)
首先,您需要更正python代码MgrID
列表中的拼写错误:
[0,1,1,2,0,0,5,6]
其次,如果这个作业是在SQL中递归完成的,为什么你认为Python / Pandas可以在没有递归方法的情况下完成呢?这不是太难:
def nlevel(id, mgr_dict=df.MgrID, _cache={0:0}):
if id in _cache:
return _cache[id]
return 1+nlevel(mgr_dict[id],mgr_dict)
df['nLevel'] = df.ID.map(nlevel)
print df[['nLevel','ID','Name']]
然后输出(nLevel
)就是你所需要的(除了我不能从你的SQL中理解的顺序):
nLevel ID Name
ID
1 1 1 Keith
2 2 2 Josh
3 2 3 Robin
4 3 4 Raja
5 1 5 Tridip
6 1 6 Arijit
7 2 7 Amit
8 2 8 Dev
[8 rows x 3 columns]