熊猫grouby索引编制:如果不满足条件,则丢弃整个组

时间:2019-03-13 23:19:22

标签: pandas pandas-groupby

我有一个熊猫数据框,如下所示:

df = pd.DataFrame({'alpha':['a','a','b','b'],'beta':[1,2,3,4]})

enter image description here

这个想法是只返回在列beta上满足特定条件的组,否则丢弃整个组

我想要的结果是:

enter image description here

但是,例如。

df.groupby('alpha').apply(lambda x: x.beta>1) 

不起作用。

2 个答案:

答案 0 :(得分:2)

您可以使用groupby.filter,例如:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<figure class="theorem">
    <figcaption>Theorem 1</figcaption>
    <p>This is your typical theorem, probably proved by the author of the current paper, and doesn't need any special decoration.</p>
</figure>
<figure class="theorem">
    <figcaption>Theorem 2 (Noether [<a href="#">1</a>])</figcaption>
    <p>This theorem is important enough to warrant attribution to its author and a reference to the entry in the bibliography where the author proves this theorem.</p>
</figure>

对于print (df.groupby('alpha').filter(lambda x: (x.beta >1).all())) alpha beta 2 b 3 3 b 4 ,我了解到您希望lambdaall中的组的值根据您的预期输出应大于1。

答案 1 :(得分:2)

尝试不使用groupbyisin

df.loc[~df.alpha.isin(df.loc[df.beta<=1,'alpha'])]
Out[316]: 
  alpha  beta
2     b     3
3     b     4

如果要分组,可以使用transform,因为它比传递lambda更为有效

df[df.beta.gt(1).groupby(df.alpha).transform('all')]
Out[317]: 
  alpha  beta
2     b     3
3     b     4

计时

%timeit df.groupby('alpha').filter(lambda x: (x.beta >1).all())
100 loops, best of 3: 2.53 ms per loop
%timeit df.loc[~df.alpha.isin(df.loc[df.beta<=1,'alpha'])]
1000 loops, best of 3: 874 µs per loop
%timeit df[df.beta.gt(1).groupby(df.alpha).transform('all')]
100 loops, best of 3: 2.04 ms per loop