Question

我有3列主要是国家，类型（投资类型）和金额。我想知道每个投资类型的投资额最大的国家。所以预期的国家名单是＆＃34; can，gb，ind＆＃34;。

    import pandas as pd
    import numpy as np

    df = pd.DataFrame({"country": ["ind", "usa", "gb", "ind", "gb", "usa", "can", "can", "usa", "ind", "gb", "can"], \
    "type":["deposit", "bonds", "cash", "cash", "bonds", "deposit", "bonds", "deposit", "deposit", "bonds", "cash", "deposit"], \
    "amount": [1000, 120, 90, 200, 150, 300, 100, 400, 250, 300, 250, 5000]})

    print(df)
    print(df.groupby("type")["amount"].max())
    ##How to get the corresponding coutry per max amount of the investment type?

    amount country     type
0     1000     ind  deposit
1      120     usa    bonds
2       90      gb     cash
3      200     ind     cash
4      150      gb    bonds
5      300     usa  deposit
6      100     can    bonds
7      400     can  deposit
8      250     usa  deposit
9      300     ind    bonds
10     250      gb     cash
11    5000     can  deposit
type
bonds       300
cash        250
deposit    5000
Name: amount, dtype: int64

我可以按投资类型对其进行分组并计算最大值，但是如何提取相应的国家/地区名称？

Answer 1

您可以使用drop_duplicates

df.sort_values(['type','amount']).drop_duplicates('type',keep='last')
Out[285]: 
    amount country     type
9      300     ind    bonds
10     250      gb     cash
11    5000     can  deposit

或仅使用idxmax

df.loc[df.groupby('type')['amount'].idxmax()]
Out[287]: 
    amount country     type
9      300     ind    bonds
10     250      gb     cash
11    5000     can  deposit

Answer 2

您需要在type子句中添加groupby和df.groupby(['country','type'])['amount'].max().reset_index()。

    country type    amount
0   can    bonds    100
1   can    deposit  5000
2   gb     bonds    150
3   gb     cash     250
4   ind    bonds    300
5   ind    cash     200
6   ind    deposit  1000
7   usa    bonds    120
8   usa    deposit  300

输出：

{{1}}

查找特定列的最大值但返回pandas中的另一列

2 个答案: