熊猫按类别分类分组

时间:2020-03-31 14:11:41

标签: python pandas pandas-groupby

我需要能够按类别对Pandas第二分组的结果进行排序。

第一个groupby从另一列创建一个列表,第二个是我需要的groupby结果。问题在于第二个groupby不遵守数据帧的原始排序分类索引

import pandas as pd
import numpy  as np
import numpy.ma as ma
from   pathlib import Path

fr   = Path('../data/rules-1.xlsx')
df   = pd.read_excel(fr, sheet_name='MS')
from pandas.api.types import CategoricalDtype

print('Before:')
display(df)
ms_cat         = ['Parent-C', 'Parent-A', 'Parent-B']
df['ParentMS'] = df['ParentMS'].astype(CategoricalDtype(list(ms_cat)),order=True)
df             = df.reset_index()
df             = df.set_index('ParentMS')
df             = df.sort_index()
print('After:')
display(df)

df_g           = df.  groupby(['ParentMS', 'Milestone'])['Tasks'].apply(list)
df_g           = df_g.groupby('ParentMS')

# Category sort is not honored after the second groupby()
for name, group in df_g:
    print(name, group)

This the input file:
[enter image description here][1]


  [1]: https://i.stack.imgur.com/KZnZD.png

1 个答案:

答案 0 :(得分:0)

将两行“ df_g”结合起来对我来说很成功。我无法解释,但确实有用

df_g = df.groupby(['ParentMS','里程碑'])['RN']。apply(list).groupby('ParentMS')