Question

我有一个大型数据框，其中一列名为货币和 amount_in_euros ，货币列包含EUR，GBR等数据，而amount_in_euros包含浮点值。我想计算每种货币的总和（欧元，GBR等），并将货币的最大值放在新系列中。 我必须为每个客户计算此操作。如何在熊猫中实现这一点。

输入：

Customer  currency   amount_in_euros
1           EUR      10
1           GBR      6
1           GBR      18
1           EUR      2
1           EUR      3
2           IND      12 
.
.
.

输出：

Customer  currency   amount_in_euros   max
1           EUR      10                GBR
1           GBR      6                 GBR
1           GBR      18                GBR
1           EUR      2                 GBR
1           EUR      3                 GBR 
2           IND      12                IND
. 
. 
.

到目前为止，我试过了，

df=pd.read_csv('analysis.csv')
res=pd.DataFrame()
for u,v in df.groupby(['Customer']):
   temp= v[['currency','amount_in_euros']].groupby(['currency'])['amount_in_euros'].sum().reset_index().sort_values('amount_in_euros',ascending=False)
   v['max']=temp['currency'].iloc[0]
   res=res.append(v)

我的上述代码对我来说很好，但由于追加操作需要很长时间。请帮我解决这个问题。提前谢谢。

Answer 1

使用：

首先按sum和Customer

currency

按sort_values

max

drop_duplicates
set_index

Series

上次按map

df1 = df.groupby(['Customer', 'currency'], as_index=False)['amount_in_euros'].sum()
s = (df1.sort_values(['Customer','amount_in_euros'])
        .drop_duplicates('Customer', keep='last')
        .set_index('Customer')['currency'])

df['max'] = df['Customer'].map(s)
print (df)
   Customer currency  amount_in_euros  max
0         1      EUR               10  GBR
1         1      GBR                6  GBR
2         1      GBR               18  GBR
3         1      EUR                2  GBR
4         1      EUR                3  GBR
5         2      IND               12  IND

编辑：

新列中第一，第二，第三个值的类似解决方案：

print (df)
   Customer currency  amount_in_euros
0         1      EUR               10
1         1      GBR                6
2         1      GBR               18
3         1      EUR                2
4         1      USD                1
5         1      USD                2
6         1      EUR                3
7         2      IND               12
8         2      USD                2

df1 = df.groupby(['Customer', 'currency'], as_index=False)['amount_in_euros'].sum()
df2 = df1.sort_values(['Customer','amount_in_euros'])
df2 = (df2.set_index(['Customer', 
                      df2.groupby(['Customer']).cumcount(ascending=False)])['currency']
          .unstack()
          .add_prefix('max_'))

print (df2)
         max_0 max_1 max_2
Customer                  
1          GBR   EUR   USD
2          IND   USD  None

df = df.join(df2, on='Customer')

print (df)
   Customer currency  amount_in_euros max_0 max_1 max_2
0         1      EUR               10   GBR   EUR   USD
1         1      GBR                6   GBR   EUR   USD
2         1      GBR               18   GBR   EUR   USD
3         1      EUR                2   GBR   EUR   USD
4         1      USD                1   GBR   EUR   USD
5         1      USD                2   GBR   EUR   USD
6         1      EUR                3   GBR   EUR   USD
7         2      IND               12   IND   USD  None
8         2      USD                2   IND   USD  None

如何根据pandas groupby中的另一个系列获得最大值

1 个答案: