我有一个数据框-df
如下:
Stud_id card Nation Gender Age Code Amount yearmonth
111 1 India M Adult 543 100 201601
111 1 India M Adult 543 100 201601
111 1 India M Adult 543 150 201602
111 1 India M Adult 612 100 201602
111 1 India M Adult 715 200 201603
222 2 India M Adult 715 200 201601
222 2 India M Adult 543 100 201604
222 2 India M Adult 543 100 201603
333 3 India M Adult 543 100 201601
333 3 India M Adult 543 100 201601
333 4 India M Adult 543 150 201602
333 4 India M Adult 612 100 201607
现在,我想要两个数据框,如下所示:
df_1
:
card Code Total_Amount Avg_Amount
1 543 350 175
2 543 200 100
3 543 200 200
4 543 150 150
1 612 100 100
4 612 100 100
1 715 200 200
2 715 200 200
df_1
的逻辑:
Total_Amount
:对于每个唯一的card
和唯一的Code
,获取金额的总和(例如:card
:1
,Code
:543 = 350
)Avg_Amount
:将每个唯一yearmonth
和唯一card
的总数除以唯一Code
的数量(例如:Total_Amount
= {{ 1}},唯一350
的数目是yearmonth
2 = 175
:
df_2
Code Avg_Amount
543 156.25
612 100
715 200
的逻辑:
df_2
:{{1}中每个Avg_Amount
的{{1}}的总和(例如Avg_Amount
:Code
{{1的总和}}是df_1
。将其除以行数-Code
。因此543
答案 0 :(得分:2)
df1 = df.groupby(['card','Code'])['yearmonth','Amount'].apply(lambda x: [sum(x.Amount),sum(x.Amount)/len(set(x.yearmonth))]).apply(pd.Series).reset_index()
df1.columns= ['card','Code','Total_Amount','Avg_Amount']
输出
card Code Total_Amount Avg_Amount
0 1 543 350 175.0
1 1 612 100 100.0
2 1 715 200 200.0
3 2 543 200 100.0
4 2 715 200 200.0
5 3 543 200 200.0
6 4 543 150 150.0
7 4 612 100 100.0
第二次
df2 = df1.groupby('Code')['Avg_Amount'].apply(lambda x: sum(x)/len(x)).reset_index(name='Avg_Amount')
输出
Code Avg_Amount
0 543 156.25
1 612 100.00
2 715 200.00