我有一个看起来像这样的数据框:
name parent_id id
languages 0 1
cyrillic script 1 2
latin script 1 3
bulgarian 2 4
russian 2 5
czech 3 6
polish 3 7
我使用此命令从父ID中获取父名称:
df['parent_name'] = df['parent_id'].map(df.set_index('id')['name'])
print(df)
name parent_id id parent_name
russian 2 5 cyrillic script
czech 3 6 latin script
polish 3 7 latin script
但是,我还想递归获取每个节点的所有祖先列表,例如:
name parent_id id path
languages 0 1 []
...
russian 2 5 ['languages', 'cyrillic script']
czech 3 6 ['languages', 'latin script']
polish 3 7 ['languages', 'latin script']
对于我来说,列表中祖先元素的顺序无关紧要。
有可能吗?
答案 0 :(得分:1)
我建议一个构造id路径的递归函数。然后将其应用于数据框的id列。
df= pd.DataFrame({'name': ['languages',
'cyrillic script',
'latin script',
'bulgarian',
'russian',
'czech',
'polish',],
'parent_id': [0, 1, 1, 2, 2, 3, 3,],
'id': [1, 2, 3, 4, 5, 6, 7]})
dict_id = df.set_index('id').parent_id.to_dict()
dict_name = df.set_index('id').name.to_dict()
def get_parent_id(anc):
anc = [anc] if not isinstance(anc, list) else anc
if anc[-1] == 0:
return anc
else:
parent = get_parent_id([dict_id[anc[-1]]])
anc += parent
return anc
df['path_id'] = df.id.apply(get_parent_id) # includes language id
# get names and drop the language itself
df['path'] = df.apply(lambda x: [dict_name[id_] for id_ in x.path_id
if not (id_ == x.id or id_ == 0)], axis=1)
Out[237]:
name parent_id id path_id path
0 languages 0 1 [1, 0] []
1 cyrillic script 1 2 [2, 1, 0] [languages]
2 latin script 1 3 [3, 1, 0] [languages]
3 bulgarian 2 4 [4, 2, 1, 0] [cyrillic script, languages]
4 russian 2 5 [5, 2, 1, 0] [cyrillic script, languages]
5 czech 3 6 [6, 3, 1, 0] [latin script, languages]
6 polish 3 7 [7, 3, 1, 0] [latin script, languages]