Question

我有以下df，

amount    id    year_month
20        10    201903
20        10    201903
50        20    201903
10        20    201903
 5        30    201903
 5        40    201903
30        50    201904
10        60    201904
10        70    201904
 5        80    201904

我想groupby id和year_month并首先获得sum中的amount，

df_1 = df.groupby(['id', 'year_month'], as_index=False)['amount'].sum()

然后将amount的总和除以amount year_month的{{1}}总数，

groupby

我想首先按降序对每个df_1['pct']=df_1['amount'].div(df_1.groupby('year_month')['amount'].transform('sum')).mul(100).round(2) amount id year_month pct 40 10 201903 36.36 60 20 201903 54.55 5 30 201903 4.55 5 40 201903 4.55 30 50 201904 54.55 10 60 201904 18.18 10 70 201904 18.18 5 80 201904 9.09（例如pct）中的year_month进行排序；然后计算在每个201903中id的总和pct小于或等于80的{{1}}的百分比；我想知道什么是最好的方法，结果看起来像（使用year_month值作为标题）；

year_month

Answer 1

默认情况下，功能groupby按分组列排序，因此应省略sort_values。然后使用具有累加总和的自定义Lambda函数，与Series.le进行比较，对于True的百分比使用mean，最后将Series转换为{ {3}}与Series.to_frame进行转置：

DataFrame

熊猫在groupby sum之后对每个组中的值进行排序，并在使用cumsum后获得值的百分比

1 个答案: