我的count
看起来像这样:
DataFrame
我想最好每个日期获得前3行(根据值)。 我期待这样的事情:
name value
date
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 george 3
2016-05-05 tom 40
2016-05-07 kara 24
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
但我也可以这样做:
name value
date
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 tom 40
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
我试过 name value
date
2016-05-05 tom 40
2016-05-07 jane 90
2016-05-07 sally 39
,但我得到了这个奇怪的结果:
df.nlargest(3, 'value')
我尝试每天运行它:
name value
date
2016-05-01 kelly 20
2016-05-01 kelly 20
2016-05-01 kelly 20
2016-05-05 tom 40
2016-05-05 tom 40
2016-05-05 tom 40
2016-05-05 sarah 25
2016-05-05 sarah 25
2016-05-05 sarah 25
2016-05-07 kara 24
2016-05-07 kara 24
...
2016-05-07 sally 39
2016-05-07 sally 39
2016-05-07 jane 90
2016-05-07 jane 90
2016-05-07 jane 90
但我遇到了同样的问题(每个名字重复3次)
答案 0 :(得分:2)
首先,这将完成工作:
df.sort_values('value', ascending=False).groupby(level=0).head(3).sort_index()
答案 1 :(得分:0)
[:n]
结果<{3}}切片在sort_values()
中使用sort_values()
,然后使用descending mode,然后使用first n
results in a slice保留sort_index()
天。
import pandas as pd
import cStringIO
df = pd.read_table(cStringIO.StringIO('''
date name value
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 george 3
2016-05-05 tom 40
2016-05-07 kara 24
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
'''), sep=' *', index_col=0, engine='python')
print 'Original DataFrame:'
print df
print
df_top3 = df.sort_values('value', ascending=False)[:3].sort_index()
print 'Top 3 Largest value DataFrame:'
print df_top3
print
Original DataFrame:
name value
date
2016-05-01 kelly 20
2016-05-05 john 12
2016-05-05 sarah 25
2016-05-05 george 3
2016-05-05 tom 40
2016-05-07 kara 24
2016-05-07 jane 90
2016-05-07 sally 39
2016-05-07 sam 28
Top 3 Largest value DataFrame:
name value
date
2016-05-05 tom 40
2016-05-07 jane 90
2016-05-07 sally 39