Question

我希望每年看到两个最高值。

df = pd.DataFrame({'year':[2018, 2018, 2018, 2017, 2017, 2006],'value':[1,2,3,4,5,6], 'title':['a', 'b', 'c', 'd','e','f'], 'smth1':[6,6,4,5,6,4], 
'smth2':[9,8,7,6,5,2], 'smth3': [2,2,3,3,4,4]})

我从这里开始使用想法来防止cols丢失：pandas nlargest lost one column，

因此，这非常适合添加到set_index中的1或2个列：

df_top = df.set_index('title').groupby('year')['value'].nlargest(2).reset_index()

但是我想查看更多数据，然后尝试：

df_top = df.set_index('title','smth1', 'smth2').groupby('year')['value'].nlargest(2).reset_index()

因此，我在year列之后出现了“ level_1” col，而不是“ smth1”。

万一我写：

df_top = df.set_index('title','smth1', 'smth2', 'smth3').groupby('year')['value'].nlargest(2).reset_index()

我收到了“ ValueError ”，参数“ inplace”的预期类型为bool，收到的类型为str。”

Answer 1

IIUC

df.sort_values('value').groupby('year').tail(2)
Out[148]: 
   year  value title  smth1  smth2  smth3
1  2018      2     b      6      8      2
2  2018      3     c      4      7      3
3  2017      4     d      5      6      3
4  2017      5     e      6      5      4
5  2006      6     f      4      2      4

熊猫groupby nlargest不允许查看所有数据set_index问题

1 个答案: