Question

我有一个名为parking的数据框，其中有多列，在本例中为“注册状态”，“违规代码”和“召唤号”。

对于每个注册状态，我想要3个违规代码，其中行数最高。我能得到的最好的是：

parking_state_group = parking.groupby（[['Registration State'，'Violation Code']）['Summons Number']。count（）

打印时（即print（parking_state_group.reset_index（））看起来像：

     Registration State  Violation Code  Summons Number
0                    99               0              14
1                    99               6               1
2                    99              10               6
3                    99              13               2
4                    99              14              75
...                 ...             ...             ...
1811                 WY              37               3
1812                 WY              38               4
1813                 WY              40               4
1814                 WY              46               1
1815                 WY              68               1

这至少让我了解了每个州的每个违规代码的数量（“召唤号”就像每一行的ID字段一样）。我希望这仅针对每个州的计数最高的州返回3个违规代码，所以类似：

      Registration State  Violation Code  Summons Number
0                    99               14             75
1                    99               31             61
2                    99               87             55
...                 ...             ...             ...
1812                 WY              38               4
1813                 WY              40               4
1811                 WY              37               3

我尝试过.nlargest（），但这似乎并没有获得最大的.count（），只有一列中的最大值，这不是我想要的。

Answer 1

让我们尝试

df[['Registration State', 'Violation Code', 'Summons Number']].groupby('Registration State')['Summons Number'].nlargest(3).reset_index().rename(columns={'level_1':'Violation Code'})

将数据帧按列分组，然后获取另一列的前3个.count（）值？

1 个答案: