Question

我在层次结构上有问题。我有这样的数据。

   id       performance_rating     parent_id     level
   111           8                   null         0 
   122           3                   null         0
   123           9                   null         0
   254           5                   111          1
   265           8                   111          1
   298           7                   122          1
   220           6                   123          1
   305           5                   298          2
   395           8                   220          2
   ...           ...                 ...         ...
   654           4                   562          5

ID是人的唯一身份。 performance_rating是他的满分10分 parent_id是在相应ID之上工作的人员的ID。

我需要找出一棵树的平均等级（111,122,123）。

我尝试过的是根据级别分离数据帧。然后合并它和groupby。但这很长。

Answer 1

有几种不同的方法可以做到这一点-这是一个丑陋的解决方案。

我们使用while和for循环遍历一个函数，以“后级”数据帧的每一列：这要求我们首先将“ id”设置为索引，然后按“级别”降序排列。它还不需要重复的ID。去吧：

df = df.set_index('id')
df = df.sort_values(by='level', ascending=False)

for i in df.index:
    while df.loc[i, 'level'] > 1:
        old_pid = df.loc[i, 'parent_id']
        df.loc[i, 'parent_id'] = df.loc[old_pid, 'parent_id']
        old_level = df.loc[i,'level']
        df.loc[i, 'level'] = old_level - 1

这样，无论有多少个层次，我们都将所有东西放在层次结构的第1层，然后可以做：

grouped = df.groupby('parent_id').mean()

（或您需要的任何变体）希望对您有所帮助！

如何在pandas / sql中按分层数据分组？

1 个答案: