Question

我有一个像这样的pandas数据框：

  character  count
0         a    104
1         b     30
2         c    210
3         d     40
4         e    189
5         f     20
6         g     10

我希望数据框中只包含前3个字符，其余字符合并为others，因此表格变为：

  character  count
0         c    210
1         e    189
2         a    104
3    others    100

我怎样才能做到这一点？

谢谢。

Answer 1

我们可以使用Series.nlargest()方法：

In [31]: new = df.nlargest(3, columns='count')

In [32]: new = pd.concat(
    ...:         [new,
    ...:          pd.DataFrame({'character':['others'],
    ...:                        'count':df.drop(new.index)['count'].sum()})
    ...:         ], ignore_index=True)
    ...:

In [33]: new
Out[33]:
  character  count
0         c    210
1         e    189
2         a    104
3    others     60

或少一点惯用解决方案：

In [16]: new = df.nlargest(3, columns='count')

In [17]: new.loc[len(new)] = ['others', df.drop(new.index)['count'].sum()]

In [18]: new
Out[18]:
  character  count
2         c    210
4         e    189
0         a    104
3    others    100

将行与“其他”组合在一起＆＃39;在熊猫

1 个答案: