在Pandas数据框中获取节点祖先

时间:2019-11-12 15:14:32

标签: python pandas numpy

我有一个看起来像这样的数据框:

name               parent_id       id
languages                  0        1
cyrillic script            1        2       
latin script               1        3
bulgarian                  2        4
russian                    2        5
czech                      3        6
polish                     3        7

我使用此命令从父ID中获取父名称:

df['parent_name'] = df['parent_id'].map(df.set_index('id')['name'])
print(df)

name               parent_id       id            parent_name
russian                    2        5            cyrillic script
czech                      3        6            latin script
polish                     3        7            latin script

但是,我还想递归获取每个节点的所有祖先列表,例如:

name               parent_id       id            path
languages                  0        1            []
...
russian                    2        5            ['languages', 'cyrillic script']
czech                      3        6            ['languages', 'latin script']
polish                     3        7            ['languages', 'latin script']

对于我来说,列表中祖先元素的顺序无关紧要。

有可能吗?

1 个答案:

答案 0 :(得分:1)

我建议一个构造id路径的递归函数。然后将其应用于数据框的id列。

df= pd.DataFrame({'name': ['languages',
'cyrillic script',
'latin script',
'bulgarian',
'russian',
'czech',
'polish',],
'parent_id': [0,    1,  1,  2,  2,  3,  3,],
'id': [1,   2,  3,  4,  5,  6,  7]})

dict_id = df.set_index('id').parent_id.to_dict()
dict_name = df.set_index('id').name.to_dict()

def get_parent_id(anc):

    anc = [anc] if not isinstance(anc, list) else anc

    if anc[-1] == 0:
        return anc

    else:
        parent = get_parent_id([dict_id[anc[-1]]])
        anc += parent
        return anc

df['path_id'] = df.id.apply(get_parent_id)  # includes language id
# get names and drop the language itself
df['path'] = df.apply(lambda x: [dict_name[id_] for id_ in x.path_id
                                 if not (id_ == x.id or id_ == 0)], axis=1)
Out[237]: 
              name  parent_id  id       path_id                          path
0        languages          0   1        [1, 0]                            []
1  cyrillic script          1   2     [2, 1, 0]                   [languages]
2     latin script          1   3     [3, 1, 0]                   [languages]
3        bulgarian          2   4  [4, 2, 1, 0]  [cyrillic script, languages]
4          russian          2   5  [5, 2, 1, 0]  [cyrillic script, languages]
5            czech          3   6  [6, 3, 1, 0]     [latin script, languages]
6           polish          3   7  [7, 3, 1, 0]     [latin script, languages]