我有一个像这样的pandas数据框:
character count
0 a 104
1 b 30
2 c 210
3 d 40
4 e 189
5 f 20
6 g 10
我希望数据框中只包含前3个字符,其余字符合并为others
,因此表格变为:
character count
0 c 210
1 e 189
2 a 104
3 others 100
我怎样才能做到这一点?
谢谢。
答案 0 :(得分:6)
我们可以使用Series.nlargest()方法:
In [31]: new = df.nlargest(3, columns='count')
In [32]: new = pd.concat(
...: [new,
...: pd.DataFrame({'character':['others'],
...: 'count':df.drop(new.index)['count'].sum()})
...: ], ignore_index=True)
...:
In [33]: new
Out[33]:
character count
0 c 210
1 e 189
2 a 104
3 others 60
或少一点惯用解决方案:
In [16]: new = df.nlargest(3, columns='count')
In [17]: new.loc[len(new)] = ['others', df.drop(new.index)['count'].sum()]
In [18]: new
Out[18]:
character count
2 c 210
4 e 189
0 a 104
3 others 100