我有一个数据框-df,如下所示:
df = pd.DataFrame({"Customer_no": ['1', '1', '1', '2', '2', '6', '7','8','9','10'],
"Card_no": ['111', '222', '333', '444', '555', '666', '777','888','999','000'],
"Card_name":['AAA','AAA','BBB','CCC','AAA','DDD','EEE','BBB','CCC','CCC'],
"Group_code":['123','123','456','678','123','434','678','365','678','987'],
"Amount":['100','240','450','212','432','123','543','567','232','453'],
"Category" :['Electrical','Electrical','Hardware','House','Electrical','Car','House','Toy','House','Bike123']})
现在,我需要按客户编号分组并获得总金额,前1,2、3个类别及其相对于总额的百分比。 注意:在我的Toy数据集中,我只有2个类别,在我的原始数据中,我还有更多类别,我需要选择前5个类别。 我的数据框应如下所示:
df = pd.DataFrame({"Customer_no": ['1', '1', '1', '2', '2', '6', '7','8','9','10'],
"Card_no": ['111', '222', '333', '444', '555', '666', '777','888','999','000'],
"Card_name":['AAA','AAA','BBB','CCC','AAA','DDD','EEE','BBB','CCC','CCC'],
"Group_code":['123','123','456','678','123','434','678','365','678','987'],
"Amount":['100','240','450','212','432','123','543','567','232','453'],
"Category" :['Electrical','Electrical','Hardware','House','Electrical','Car','House','Toy','House','Bike123'],
"Total amount" :['790','790','790','644','644','123','543','567','232','453'],
"Top-1 Category":['Hardware','Hardware','Hardware','Electrical','Electrical','Car','House','Toy',
'House','Bike123'],
"Top-1 Category %":['57','57','57','67','67','100','100','100','100','100'],
"Top-2 Category":['Electrical','Electrical','Electrical','House','House','','','','',''] ,
"Top-2 Category %":['43','43','43','33','33','0','0','0','0','0']})
请求您的帮助以实现它。
注意: 1)通过对每个客户将所有类别分组并汇总每个客户明智的金额来选择最高类别。哪个类别的前1个类别的数量更多,下一个相似的是前2个类别,依此类推 2)前1个类别的百分比:每个类别的总金额除以总金额并乘以100。这是针对每个客户的。对于前2个类别也是如此。
答案 0 :(得分:0)
尝试一下:
#Your data
df = pd.DataFrame({"Customer_no": ['1', '1', '1', '2', '2', '6', '7','8','9','10'],
"Card_no": ['111', '222', '333', '444', '555', '666', '777','888','999','000'],
"Card_name":['AAA','AAA','BBB','CCC','AAA','DDD','EEE','BBB','CCC','CCC'],
"Group_code":['123','123','456','678','123','434','678','365','678','987'],
"Amount":['100','240','450','212','432','123','543','567','232','453'],
"Category" :['Electrical','Electrical','Hardware','House','Electrical','Car','House','Toy','House','Bike123']})
#Make some columns numerical
for i in ["Customer_no","Card_no","Group_code","Amount"]:
df[i] = pd.to_numeric(df[i])
#Total sum
Total_amount = pd.DataFrame(df.groupby(["Customer_no"]).Amount.sum()).reset_index().rename(columns={'Amount':'Total amount'})
#Add some nesessery colums and grouping
Top_1_Category = pd.DataFrame(df.groupby(['Customer_no',"Category"]).Amount.sum()).reset_index().rename(columns={'Amount':'group_sum'})
df = df.merge(Total_amount,how='left',on='Customer_no')
df = df.merge(Top_1_Category,how='left',on=['Category','Customer_no'])
group_top_1 = df[['Customer_no','Category','group_sum']].loc[df.groupby('Customer_no').group_sum.agg('idxmax')].rename(columns={'Category':'Top-1 Category','group_sum':'group_sum_1'})
df = df.merge(group_top_1,how='left',on='Customer_no')
#Make columns 'Top-1 Category %'
df['Top-1 Category %'] = round(100*df['group_sum_1']/df['Total amount'],0)
#drop unnecessary columns
df.drop(['group_sum','group_sum_1'],axis=1,inplace=True)
您可以添加Top-2
类似的列