我想获取每组的最小列值为所有行,
示例:
df = pd.DataFrame({'asset_symbol': ['100', '100', '100', '1015', '1015'],
'percent_thresh': [0.75, 0.85, 0.95, 0.75, 0.85],
'rank': [7.0, 7.0, 4.0, 2.0, 3.0]})
+--------------+----------------+------+
| asset_symbol | percent_thresh | rank |
+--------------+----------------+------+
| 100 | 0.75 | 7 |
+--------------+----------------+------+
| 100 | 0.85 | 7 |
+--------------+----------------+------+
| 100 | 0.95 | 4 |
+--------------+----------------+------+
| 1015 | 0.75 | 2 |
+--------------+----------------+------+
| 1015 | 0.85 | 3 |
+--------------+----------------+------+
所需表:
+--------------+----------------+------+
| asset_symbol | percent_thresh | rank |
+--------------+----------------+------+
| 100 | 0.95 | 4 |
+--------------+----------------+------+
| 1015 | 0.75 | 2 |
+--------------+----------------+------+
我的尝试是:
def max_row(df, column):
return df.loc[df[column].idxmin()]
df.groupby('asset_symbol').apply(max_row, 'rank')
但是我通常不使用Apply
答案 0 :(得分:2)
IIUC,
df.loc[df.groupby('asset_symbol')['rank'].idxmin()]
输出:
asset_symbol percent_thresh rank
2 100 0.95 4.0
3 1015 0.75 2.0
答案 1 :(得分:1)
让我们做sort_values
+ drop_duplicates
df.sort_values('rank').drop_duplicates('asset_symbol')
asset_symbol percent_thresh rank
3 1015 0.75 2.0
2 100 0.95 4.0