Question

我有一个熊猫数据框，如下所示：

df = pd.DataFrame({'alpha':['a','a','b','b'],'beta':[1,2,3,4]})

这个想法是只返回在列beta上满足特定条件的组，否则丢弃整个组

我想要的结果是：

但是，例如。

df.groupby('alpha').apply(lambda x: x.beta>1)

不起作用。

Answer 1

您可以使用groupby.filter，例如：

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<figure class="theorem">
    <figcaption>Theorem 1</figcaption>
    <p>This is your typical theorem, probably proved by the author of the current paper, and doesn't need any special decoration.</p>
</figure>
<figure class="theorem">
    <figcaption>Theorem 2 (Noether [<a href="#">1</a>])</figcaption>
    <p>This theorem is important enough to warrant attribution to its author and a reference to the entry in the bibliography where the author proves this theorem.</p>
</figure>

对于print (df.groupby('alpha').filter(lambda x: (x.beta >1).all())) alpha beta 2 b 3 3 b 4，我了解到您希望lambda中all中的组的值根据您的预期输出应大于1。

Answer 2

尝试不使用groupby，isin

df.loc[~df.alpha.isin(df.loc[df.beta<=1,'alpha'])]
Out[316]: 
  alpha  beta
2     b     3
3     b     4

如果要分组，可以使用transform，因为它比传递lambda更为有效

df[df.beta.gt(1).groupby(df.alpha).transform('all')]
Out[317]: 
  alpha  beta
2     b     3
3     b     4

计时

%timeit df.groupby('alpha').filter(lambda x: (x.beta >1).all())
100 loops, best of 3: 2.53 ms per loop
%timeit df.loc[~df.alpha.isin(df.loc[df.beta<=1,'alpha'])]
1000 loops, best of 3: 874 µs per loop
%timeit df[df.beta.gt(1).groupby(df.alpha).transform('all')]
100 loops, best of 3: 2.04 ms per loop

熊猫grouby索引编制：如果不满足条件，则丢弃整个组

2 个答案: