我有一个熊猫数据框,如下所示:
df = pd.DataFrame({'alpha':['a','a','b','b'],'beta':[1,2,3,4]})
这个想法是只返回在列beta
上满足特定条件的组,否则丢弃整个组
我想要的结果是:
但是,例如。
df.groupby('alpha').apply(lambda x: x.beta>1)
不起作用。
答案 0 :(得分:2)
您可以使用groupby.filter
,例如:
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<figure class="theorem">
<figcaption>Theorem 1</figcaption>
<p>This is your typical theorem, probably proved by the author of the current paper, and doesn't need any special decoration.</p>
</figure>
<figure class="theorem">
<figcaption>Theorem 2 (Noether [<a href="#">1</a>])</figcaption>
<p>This theorem is important enough to warrant attribution to its author and a reference to the entry in the bibliography where the author proves this theorem.</p>
</figure>
对于print (df.groupby('alpha').filter(lambda x: (x.beta >1).all()))
alpha beta
2 b 3
3 b 4
,我了解到您希望lambda
中all
中的组的值根据您的预期输出应大于1。
答案 1 :(得分:2)
尝试不使用groupby
,isin
df.loc[~df.alpha.isin(df.loc[df.beta<=1,'alpha'])]
Out[316]:
alpha beta
2 b 3
3 b 4
如果要分组,可以使用transform
,因为它比传递lambda
更为有效
df[df.beta.gt(1).groupby(df.alpha).transform('all')]
Out[317]:
alpha beta
2 b 3
3 b 4
计时
%timeit df.groupby('alpha').filter(lambda x: (x.beta >1).all())
100 loops, best of 3: 2.53 ms per loop
%timeit df.loc[~df.alpha.isin(df.loc[df.beta<=1,'alpha'])]
1000 loops, best of 3: 874 µs per loop
%timeit df[df.beta.gt(1).groupby(df.alpha).transform('all')]
100 loops, best of 3: 2.04 ms per loop