假设我有一个如下所示的DataFrame:
Bank Name House This Wk
Barc Germany 100
Barc UK 300
Barc UK 500
JPM Japan 200
JPM NYC 100
BOA LA 900
BOA LA 50
BOA LA 50
DB Italy 45
我想按银行名称分组,同时输出最大的房屋价值以及总价值......
例如,使用上面的示例将导致:
Bank Name Total House This Wk
Barc 900 UK 500
JPM 300 Japan 200
BOA 1000 LA 900
DB 45 Italy 45
基本上,它是按照银行名称对Total
进行分组,但也会将最大的贡献者House
输出到总数,贡献的金额为This Wk
。
我该怎么做呢?
答案 0 :(得分:5)
In [121]: df.groupby('Bank Name', group_keys=False) \
...: .apply(lambda x: x.nlargest(1, 'This Wk').assign(Total=x['This Wk'].sum())) \
...: [['Bank Name','Total','House','This Wk']]
...:
Out[121]:
Bank Name Total House This Wk
5 BOA 1000 LA 900
2 Barc 900 UK 500
8 DB 45 Italy 45
3 JPM 300 Japan 200
答案 1 :(得分:3)
您可以使用df.groupby
函数列表来考虑dfGroupBy.agg
:
In [732]: out = df.groupby('Bank Name')['This Wk'].agg(['sum', 'idxmax', 'max'])\
.rename(columns={'sum' : 'Total', 'idxmax' : 'House', 'max' : 'This Wk'})\
.reset_index()
In [734]: out['House'] = df.loc[out['House'], 'House'].values; out
Out[734]:
Bank Name Total House This Wk
0 BOA 1000 LA 900
1 Barc 900 UK 500
2 DB 45 Italy 45
3 JPM 300 Japan 200
答案 2 :(得分:0)
使用apply
的另一种方式是
In [17]: (df.groupby('Bank Name', sort=False)
.apply(lambda x: pd.Series(
[x['This Wk'].sum(),
x.loc[x['This Wk'].idxmax(), 'House'],
x['This Wk'].max()],
index=['Total', 'House', 'This Wk']))
.reset_index())
Out[17]:
Bank Name Total House This Wk
0 Barc 900 UK 500
1 JPM 300 Japan 200
2 BOA 1000 LA 900
3 DB 45 Italy 45