新手试图打破我对擅长的瘾。我有一个付费发票数据集,其中包含供应商和国家/地区以及金额。我想知道每个供应商,他们拥有最大发票金额的国家以及他们在该国家/地区的总业务的百分比。使用这个数据集我希望结果是:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Company' : ['bar','foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'bar'],
'Country' : ['two','one', 'one', 'two', 'three', 'two', 'two', 'one', 'three', 'one'],
'Amount' : [4, 2, 2, 6, 4, 5, 6, 7, 8, 9],
'Pct' : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]})
CoCntry = df.groupby(['Company', 'Country'])
CoCntry.aggregate(np.sum)
查看了多个示例,包括:Extract row with max value和Getting max value using groupby
2:Python : Getting the Row which has the max value in groups using groupby我已经创建了一个按国家/地区汇总发票数据的DataFrameGroupBy。我正在努力寻找如何找到最大行。之后我必须弄清楚如何计算百分比。建议欢迎。
答案 0 :(得分:2)
您可以使用transform
将第一级Series
的每组求和值返回Pct
Company
。然后使用idxmax
按每个组的最大值过滤Dataframe
,并使用Amount
Series
对Pct
列进行最后划分:
g = CoCntry.groupby(level='Company')['Amount']
Pct = g.transform('sum')
print (Pct)
Company Country
bar one 25
three 25
two 25
foo one 28
three 28
two 28
Name: Amount, dtype: int64
CoCntry = CoCntry.loc[g.idxmax()]
print (CoCntry)
Amount Pct
Company Country
bar one 11 0
foo two 11 0
CoCntry.Pct = CoCntry.Amount.div(Pct)
print (CoCntry.reset_index())
Company Country Amount Pct
0 bar one 11 0.440000
1 foo two 11 0.392857
类似的另一种解决方案:
CoCntry = df.groupby(['Company', 'Country']).Amount.sum()
print (CoCntry)
Company Country
bar one 11
three 4
two 10
foo one 9
three 8
two 11
Name: Amount, dtype: int64
g = CoCntry.groupby(level='Company')
Pct = g.sum()
print (Pct)
Company
bar 25
foo 28
Name: Amount, dtype: int64
maxCoCntry = CoCntry.loc[g.idxmax()].to_frame()
maxCoCntry['Pct'] = maxCoCntry.Amount.div(Pct, level=0)
print (maxCoCntry.reset_index())
Company Country Amount Pct
0 bar one 11 0.440000
1 foo two 11 0.392857
答案 1 :(得分:2)
设置
df = pd.DataFrame({'Company' : ['bar','foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'bar'],
'Country' : ['two','one', 'one', 'two', 'three', 'two', 'two', 'one', 'three', 'one'],
'Amount' : [4, 2, 2, 6, 4, 5, 6, 7, 8, 9],
})
解决方案
# sum total invoice per country per company
comp_by_country = df.groupby(['Company', 'Country']).Amount.sum()
# sum total invoice per company
comp_totals = df.groupby('Company').Amount.sum()
# percent of per company per country invoice relative to company
comp_by_country_pct = comp_by_country.div(comp_totals).rename('Pct')
回答OP问题
哪个'Country'
的{1}}总发票总额最高,以及该公司总业务的百分比。
'Company'