我已经派生了我想要的分组,但想根据每月的总数计算一个百分比列,即无论originating_system_id中的字符串是什么
d = [('Total_RFQ_For_Month', 'size')]
df_RFQ_Channel = df.groupby(['Year_Month','originating_system_id'])['state'].agg(d)
#df_RFQ_Channel['RFQ_Pcent_For_Month'] = ?
display(df_RFQ_Channel)
Year_Month originating_system_id Total_RFQ_For_Month RFQ_Pcent_For_Month
2017-11 BBT 59 7.90%
EUCR 33 4.42%
MAXL 6 0.80%
MXUS 649 86.88%
2017-12 BBT 36 73.47%
EUCR 7 14.29%
MAXL 6 12.24%
2018-01 BBT 88 9.52%
EUCR 26 2.81%
MAXL 4 0.43%
MXUS 800 86.58%
VOIX 6 0.65%
示例:
7.90% is BBT's Total_RFQ_For_Month (59) divided by the sum of all for 2017-11 (747)
2.81% is EUCR's Total_RFQ_For_Month (26) divided by the sum of all for 2018-01 (924).
答案 0 :(得分:3)
对Series
使用transform
,其尺寸与原始DataFrame
相同,因此可以除以Total_RFQ_For_Month
列:
#create columns from MultiIndex
df = df.reset_index()
s = df.groupby('Year_Month')['Total_RFQ_For_Month'].transform('sum')
df['RFQ_Pcent_For_Month'] = df['Total_RFQ_For_Month'].div(s).mul(100).round(2)
print (df)
Year_Month originating_system_id Total_RFQ_For_Month RFQ_Pcent_For_Month
0 2017-11 BBT 59 7.90
1 2017-11 EUCR 33 4.42
2 2017-11 MAXL 6 0.80
3 2017-11 MXUS 649 86.88
4 2017-12 BBT 36 73.47
5 2017-12 EUCR 7 14.29
6 2017-12 MAXL 6 12.24
7 2018-01 BBT 88 9.52
8 2018-01 EUCR 26 2.81
9 2018-01 MAXL 4 0.43
10 2018-01 MXUS 800 86.58
11 2018-01 VOIX 6 0.65
百分比:
df['RFQ_Pcent_For_Month'] = (df['Total_RFQ_For_Month'].div(s)
.mul(100)
.round(2)
.astype(str)
.add('%'))
print (df)
Year_Month originating_system_id Total_RFQ_For_Month RFQ_Pcent_For_Month
0 2017-11 BBT 59 7.9%
1 2017-11 EUCR 33 4.42%
2 2017-11 MAXL 6 0.8%
3 2017-11 MXUS 649 86.88%
4 2017-12 BBT 36 73.47%
5 2017-12 EUCR 7 14.29%
6 2017-12 MAXL 6 12.24%
7 2018-01 BBT 88 9.52%
8 2018-01 EUCR 26 2.81%
9 2018-01 MAXL 4 0.43%
10 2018-01 MXUS 800 86.58%
11 2018-01 VOIX 6 0.65%
<强>详细强>:
print (s)
0 747
1 747
2 747
3 747
4 49
5 49
6 49
7 924
8 924
9 924
10 924
11 924
Name: Total_RFQ_For_Month, dtype: int64
答案 1 :(得分:1)
重新创建你的df的步骤:
df = pd.DataFrame(columns=['Year_Month', 'originating_system_id', 'Total_RFQ_For_Month'])
# only two months
df.loc[0]=['2017-11','BBT',59]
df.loc[1]=['2017-11','EUCR',33]
df.loc[2]=['2017-11','MAXL',6]
df.loc[3]=['2017-11','MXUS',649]
df.loc[4]=['2017-12','BBT',36]
df.loc[5]=['2017-12','EUCR',7]
df.loc[6]=['2017-12','MAXL',88]
# Same as your DF
gp1 = df.groupby(['Year_Month','originating_system_id']).sum()
gp2=gp1.reset_index()
gp3 = df[['Year_Month','Total_RFQ_For_Month']].groupby(['Year_Month']).sum().rename(columns={'Total_RFQ_For_Month':
'RFQ_For_Month_Sum'})
gp2=gp2.merge(gp3, on='Year_Month')
gp2['RFQ_Pcent_For_Month']=((gp2['Total_RFQ_For_Month']*100)/gp2['RFQ_For_Month_Sum']).round(3).astype(str).add('%')
gp2.drop(['RFQ_For_Month_Sum'],1,inplace=True)