更有效的方法来查找pandas dataframe列中的顶级值

时间:2016-04-28 14:07:02

标签: python pandas dataframe

我有一个带有两列x和y的df。列y是x值的计数。 x值具有不同的计数。如何在不迭代行的情况下获得每个x的前两个y计数的结果。

示例df:

df = pd.DataFrame({"x": [101, 101, 101, 101, 201, 201, 201, 405, 405], "y": [1, 2, 3, 4, 1, 2, 3, 1, 2]})

     x  y
0  101  1
1  101  2
2  101  3
3  101  4
4  201  1
5  201  2
6  201  3
7  405  1
8  405  2

期望的结果:

x      y
101    3
101    4
201    2
201    3
405    1
405    2

2 个答案:

答案 0 :(得分:1)

你可以这样做:

def clean_dob(value):
  pass

class MyModel(ndb.Model):
  # ...

因此In [35]: df.loc[df.groupby(['x'])['y'].apply(lambda x: x.iloc[-2:]).index.get_level_values(1)] Out[35]: x y 2 101 3 3 101 4 5 201 2 6 201 3 7 405 1 8 405 2 在' x'列并返回最后2个值,假设df已经按照您的显示排序。这会生成带有多索引的df,并且可以使用groupby

将第二级值用于索引回原始df

修改

要回答您的评论,您可以get_level_values再次使用groupbytransform一起将值重置为rank1

2

答案 1 :(得分:0)

如果您的数据框没有排序,这是一个解决方案:

<link rel="shortcut icon" type="image/vnd.microsoft.icon" href="//s1.wp.com/i/favicon.ico" sizes="16x16 32x32">

<link rel="shortcut icon" type="image/x-icon" href="//s1.wp.com/i/favicon.ico" sizes="16x16 32x32">

<link rel="icon" type="image/x-icon" href="//s1.wp.com/i/favicon.ico" sizes="16x16 32x32">

<link rel="icon" type="image/png" href="//s1.wp.com/i/favicons/favicon-64x64.png" sizes="64x64">

<link rel="icon" type="image/png" href="//s1.wp.com/i/favicons/favicon-96x96.png" sizes="96x96">

<link rel="icon" type="image/png" href="//s1.wp.com/i/favicons/android-chrome-192x192.png" sizes="192x192">

<link rel="apple-touch-icon" sizes="57x57" href="//s1.wp.com/i/favicons/apple-touch-icon-57x57.png"><link rel="apple-touch-icon" sizes="60x60" href="//s1.wp.com/i/favicons/apple-touch-icon-60x60.png">

<link rel="apple-touch-icon" sizes="72x72" href="//s1.wp.com/i/favicons/apple-touch-icon-72x72.png"><link rel="apple-touch-icon" sizes="76x76" href="//s1.wp.com/i/favicons/apple-touch-icon-76x76.png">

<link rel="apple-touch-icon" sizes="114x114" href="//s1.wp.com/i/favicons/apple-touch-icon-114x114.png">

<link rel="apple-touch-icon" sizes="120x120" href="//s1.wp.com/i/favicons/apple-touch-icon-120x120.png">

<link rel="apple-touch-icon" sizes="144x144" href="//s1.wp.com/i/favicons/apple-touch-icon-144x144.png">

<link rel="apple-touch-icon" sizes="152x152" href="//s1.wp.com/i/favicons/apple-touch-icon-152x152.png">

<link rel="apple-touch-icon" sizes="180x180" href="//s1.wp.com/i/favicons/apple-touch-icon-180x180.png"> 

不幸的是,In [1]: df.groupby('x')['y'].nlargest(2) Out[1]: x 101 3 4 2 3 201 6 3 5 2 405 8 2 7 1 dtype: int64 无法应用于分组数据框,因此需要进行一些重新格式化。