pandas nlargest返回超过n行

时间:2016-06-03 23:40:11

标签: python pandas dataframe

我的count看起来像这样:

DataFrame

我想最好每个日期获得前3行(根据值)。 我期待这样的事情:

            name      value 
 date   
 2016-05-01 kelly      20  
 2016-05-05 john       12  
 2016-05-05 sarah      25  
 2016-05-05 george     3  
 2016-05-05 tom        40  
 2016-05-07 kara       24  
 2016-05-07 jane       90  
 2016-05-07 sally      39  
 2016-05-07 sam        28  

但我也可以这样做:

            name      value 
 date   
 2016-05-01 kelly      20  
 2016-05-05 john       12  
 2016-05-05 sarah      25  
 2016-05-05 tom        40  
 2016-05-07 jane       90  
 2016-05-07 sally      39  
 2016-05-07 sam        28  

我试过 name value date 2016-05-05 tom 40 2016-05-07 jane 90 2016-05-07 sally 39 ,但我得到了这个奇怪的结果:

df.nlargest(3, 'value')

我尝试每天运行它:
name value date 2016-05-01 kelly 20 2016-05-01 kelly 20 2016-05-01 kelly 20 2016-05-05 tom 40 2016-05-05 tom 40 2016-05-05 tom 40 2016-05-05 sarah 25 2016-05-05 sarah 25 2016-05-05 sarah 25 2016-05-07 kara 24 2016-05-07 kara 24 ... 2016-05-07 sally 39 2016-05-07 sally 39 2016-05-07 jane 90 2016-05-07 jane 90 2016-05-07 jane 90

但我遇到了同样的问题(每个名字重复3次)

2 个答案:

答案 0 :(得分:2)

首先,这将完成工作:

df.sort_values('value', ascending=False).groupby(level=0).head(3).sort_index()

答案 1 :(得分:0)

使用[:n]结果<{3}}切片

sort_values()中使用sort_values(),然后使用descending mode,然后使用first n results in a slice保留sort_index()天。

import pandas as pd
import cStringIO

df  = pd.read_table(cStringIO.StringIO('''
 date   name      value 
 2016-05-01 kelly      20  
 2016-05-05 john       12  
 2016-05-05 sarah      25  
 2016-05-05 george     3  
 2016-05-05 tom        40  
 2016-05-07 kara       24  
 2016-05-07 jane       90  
 2016-05-07 sally      39  
 2016-05-07 sam        28 
'''), sep=' *', index_col=0, engine='python')

print 'Original DataFrame:'
print df
print

df_top3 = df.sort_values('value', ascending=False)[:3].sort_index()
print 'Top 3 Largest value DataFrame:'
print df_top3
print
Original DataFrame:
              name  value
date                     
2016-05-01   kelly     20
2016-05-05    john     12
2016-05-05   sarah     25
2016-05-05  george      3
2016-05-05     tom     40
2016-05-07    kara     24
2016-05-07    jane     90
2016-05-07   sally     39
2016-05-07     sam     28

Top 3 Largest value DataFrame:
             name  value
date                    
2016-05-05    tom     40
2016-05-07   jane     90
2016-05-07  sally     39