Question

在此数据框中：

region  area    other
alabama 99151.5 0.564506436
alabama 99151.5 0.193809515
arkansas    165927  0.878569179
arkansas    165927  0.00946268
arkansas    165927  0.075263353
colorado    408747  0.62052038
colorado    408747  0.723038731
georgia 117363  0.970624899
georgia 117363  0.534441671
idaho   198303  0.378282313
idaho   198303  0.836349349

我想按区域保留2个顶部区域，但是我不能使用pandas nlargest命令，因为它不允许我在区域列中保留重复项。我该怎么做？

- 编辑：

预期产出：

region  area    other
colorado    408747  0.62052038
colorado    408747  0.723038731
idaho   198303  0.378282313
idaho   198303  0.836349349

Answer 1

在sort_values groupby

之前，您可能需要head

df.sort_values(['area','other']).groupby('area').head(2)

即使pandas数据帧中存在重复项，也要保留N个最大行

1 个答案: