如何分组和执行自定义计算

时间:2019-02-12 09:46:42

标签: python pandas pandas-groupby

我有一个数据框-df如下:

Stud_id card    Nation  Gender  Age  Code   Amount  yearmonth
111     1       India   M      Adult 543    100     201601
111     1       India   M      Adult 543    100     201601
111     1       India   M      Adult 543    150     201602
111     1       India   M      Adult 612    100     201602
111     1       India   M      Adult 715    200     201603
222     2       India   M      Adult 715    200     201601
222     2       India   M      Adult 543    100     201604
222     2       India   M      Adult 543    100     201603
333     3       India   M      Adult 543    100     201601
333     3       India   M      Adult 543    100     201601
333     4       India   M      Adult 543    150     201602
333     4       India   M      Adult 612    100     201607

现在,我想要两个数据框,如下所示:

df_1

card    Code    Total_Amount    Avg_Amount
1       543     350             175
2       543     200             100
3       543     200             200
4       543     150             150
1       612     100             100
4       612     100             100
1       715     200             200
2       715     200             200

df_1的逻辑:

  1. Total_Amount:对于每个唯一的card和唯一的Code,获取金额的总和(例如:card1Code543 = 350
  2. Avg_Amount:将每个唯一yearmonth和唯一card的总数除以唯一Code的数量(例如:Total_Amount = {{ 1}},唯一350的数目是yearmonth

2 = 175

df_2

Code Avg_Amount 543 156.25 612 100 715 200 的逻辑:

  1. df_2:{{1}中每个Avg_Amount的{​​{1}}的总和(例如Avg_AmountCode {{1的总和}}是df_1。将其除以行数-Code。因此543

1 个答案:

答案 0 :(得分:2)

df1 = df.groupby(['card','Code'])['yearmonth','Amount'].apply(lambda x: [sum(x.Amount),sum(x.Amount)/len(set(x.yearmonth))]).apply(pd.Series).reset_index()

df1.columns= ['card','Code','Total_Amount','Avg_Amount']

输出

   card  Code  Total_Amount  Avg_Amount
0     1   543           350       175.0
1     1   612           100       100.0
2     1   715           200       200.0
3     2   543           200       100.0
4     2   715           200       200.0
5     3   543           200       200.0
6     4   543           150       150.0
7     4   612           100       100.0

第二次

df2 = df1.groupby('Code')['Avg_Amount'].apply(lambda x: sum(x)/len(x)).reset_index(name='Avg_Amount')

输出

   Code  Avg_Amount
0   543      156.25
1   612      100.00
2   715      200.00