在DataFrameGroupBy

时间:2017-01-22 17:14:08

标签: pandas

新手试图打破我对擅长的瘾。我有一个付费发票数据集,其中包含供应商和国家/地区以及金额。我想知道每个供应商,他们拥有最大发票金额的国家以及他们在该国家/地区的总业务的百分比。使用这个数据集我希望结果是:

Desired output

import pandas as pd
import numpy as np
df = pd.DataFrame({'Company' : ['bar','foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'bar'],
    'Country' : ['two','one', 'one', 'two', 'three', 'two', 'two', 'one', 'three', 'one'],
    'Amount' : [4, 2, 2, 6, 4, 5, 6, 7, 8, 9],
    'Pct' : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]})
CoCntry = df.groupby(['Company', 'Country'])
CoCntry.aggregate(np.sum)

查看了多个示例,包括:Extract row with max valueGetting max value using groupby

2Python : Getting the Row which has the max value in groups using groupby我已经创建了一个按国家/地区汇总发票数据的DataFrameGroupBy。我正在努力寻找如何找到最大行。之后我必须弄清楚如何计算百分比。建议欢迎。

2 个答案:

答案 0 :(得分:2)

您可以使用transform将第一级Series的每组求和值返回Pct Company。然后使用idxmax按每个组的最大值过滤Dataframe,并使用Amount SeriesPct列进行最后划分:

g = CoCntry.groupby(level='Company')['Amount']
Pct = g.transform('sum')
print (Pct)
Company  Country
bar      one        25
         three      25
         two        25
foo      one        28
         three      28
         two        28
Name: Amount, dtype: int64

CoCntry  = CoCntry.loc[g.idxmax()]
print (CoCntry)
                 Amount  Pct
Company Country             
bar     one          11    0
foo     two          11    0

CoCntry.Pct = CoCntry.Amount.div(Pct)
print (CoCntry.reset_index())
  Company Country  Amount       Pct
0     bar     one      11  0.440000
1     foo     two      11  0.392857

类似的另一种解决方案:

CoCntry = df.groupby(['Company', 'Country']).Amount.sum()
print (CoCntry)
Company  Country
bar      one        11
         three       4
         two        10
foo      one         9
         three       8
         two        11
Name: Amount, dtype: int64

g =  CoCntry.groupby(level='Company')
Pct = g.sum()
print (Pct)
Company
bar    25
foo    28
Name: Amount, dtype: int64

maxCoCntry  = CoCntry.loc[g.idxmax()].to_frame()
maxCoCntry['Pct'] = maxCoCntry.Amount.div(Pct, level=0)
print (maxCoCntry.reset_index())

  Company Country  Amount       Pct
0     bar     one      11  0.440000
1     foo     two      11  0.392857

答案 1 :(得分:2)

设置

df = pd.DataFrame({'Company' : ['bar','foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'bar'],
    'Country' : ['two','one', 'one', 'two', 'three', 'two', 'two', 'one', 'three', 'one'],
    'Amount' : [4, 2, 2, 6, 4, 5, 6, 7, 8, 9],
    })

解决方案

# sum total invoice per country per company
comp_by_country = df.groupby(['Company', 'Country']).Amount.sum()

# sum total invoice per company
comp_totals = df.groupby('Company').Amount.sum()

# percent of per company per country invoice relative to company
comp_by_country_pct = comp_by_country.div(comp_totals).rename('Pct')

回答O​​P问题
哪个'Country'的{​​1}}总发票总额最高,以及该公司总业务的百分比。

'Company'