我需要能够按类别对Pandas第二分组的结果进行排序。
第一个groupby从另一列创建一个列表,第二个是我需要的groupby结果。问题在于第二个groupby不遵守数据帧的原始排序分类索引
import pandas as pd
import numpy as np
import numpy.ma as ma
from pathlib import Path
fr = Path('../data/rules-1.xlsx')
df = pd.read_excel(fr, sheet_name='MS')
from pandas.api.types import CategoricalDtype
print('Before:')
display(df)
ms_cat = ['Parent-C', 'Parent-A', 'Parent-B']
df['ParentMS'] = df['ParentMS'].astype(CategoricalDtype(list(ms_cat)),order=True)
df = df.reset_index()
df = df.set_index('ParentMS')
df = df.sort_index()
print('After:')
display(df)
df_g = df. groupby(['ParentMS', 'Milestone'])['Tasks'].apply(list)
df_g = df_g.groupby('ParentMS')
# Category sort is not honored after the second groupby()
for name, group in df_g:
print(name, group)
This the input file:
[enter image description here][1]
[1]: https://i.stack.imgur.com/KZnZD.png
答案 0 :(得分:0)
将两行“ df_g”结合起来对我来说很成功。我无法解释,但确实有用
df_g = df.groupby(['ParentMS','里程碑'])['RN']。apply(list).groupby('ParentMS')