我要选择具有groupby条件的行。
import pandas as pd
import numpy as np
dftest = pd.DataFrame({'A':['Feb',np.nan,'Air','Flow','Feb',
'Beta','Cat','Feb','Beta','Air'],
'B':['s','s','t','s','t','s','t','t','t','t'],
'C':[5,4,3,2,1,7,6,5,4,3],
'D':[4,np.nan,3,np.nan,2,
np.nan,2,3,np.nan,7]})
def filcols3(df,dd):
if df.iloc[0]['D']==dd:
return df
dd=4
grp=dftest.groupby('B').apply(filcols3,dd)
grp的结果是:
A B C D
B
s 0 Feb s 5 4.0
1 NaN s 4 NaN
3 Flow s 2 NaN
5 Beta s 7 NaN
这就是我想要的。
如果我使用以下代码(第2部分)
def filcols3(df,dd):
if df.iloc[0]['D']<=dd:
return df
dd=3
结果是:
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 Air t 3.0 3.0
3 NaN NaN NaN NaN
4 Feb t 1.0 2.0
5 NaN NaN NaN NaN
6 Cat t 6.0 2.0
7 Feb t 5.0 3.0
8 Beta t 4.0 NaN
9 Air t 3.0 7.0
我为这个结果感到惊讶,我的意思是得到
A B C D
2 Air t 3 3.0
4 Feb t 1 2.0
6 Cat t 6 2.0
7 Feb t 5 3.0
8 Beta t 4 NaN
9 Air t 3 7.0
第2部分的代码有什么问题?如何获得我想要的最终结果?
答案 0 :(得分:3)
apply
的行为在这里有点不直观,但是,如果要根据每个组的特定条件过滤出整个组,则可以使用GroupBy.transform
并获得掩码过滤器df
:
df[df.groupby('B')['D'].transform('first') <= 3]
A B C D
2 Air t 3 3.0
4 Feb t 1 2.0
6 Cat t 6 2.0
7 Feb t 5 3.0
8 Beta t 4 NaN
9 Air t 3 7.0
或者,修正您的代码,
df[df.groupby('B')['D'].transform(lambda x: x.values[0] <= 3)]
A B C D
2 Air t 3 3.0
4 Feb t 1 2.0
6 Cat t 6 2.0
7 Feb t 5 3.0
8 Beta t 4 NaN
9 Air t 3 7.0
答案 1 :(得分:3)
可以使用filter
dftest.groupby('B').filter(lambda x : any(x['D'].head(1)<=3))
Out[538]:
A B C D
2 Air t 3 3.0
4 Feb t 1 2.0
6 Cat t 6 2.0
7 Feb t 5 3.0
8 Beta t 4 NaN
9 Air t 3 7.0
或 无 groupby
drop_duplicates
s=df.drop_duplicates('B').D<=3
df[df.B.isin(df.loc[s.index,'B'][s])]
Out[550]:
A B C D
2 Air t 3 3.0
4 Feb t 1 2.0
6 Cat t 6 2.0
7 Feb t 5 3.0
8 Beta t 4 NaN
9 Air t 3 7.0